The Art of R Programming: A Tour of Statistical Software Design ppt

404 2K 0
The Art of R Programming: A Tour of Statistical Software Design ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

R is the world’s most popular language for developing statistical software: Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep markets running smoothly. The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro. Along the way, you’ll learn about functional and object- oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You’ll also learn to: • Create artful graphs to visualize complex data sets and functions • Write more efficient code using parallel R and vectorization TAME YOUR DATA TAME YOUR DATA • Interface R with C/C++ and Python for increased speed or functionality • Find new packages for text analysis, image manipula- tion, and thousands more • Squash annoying bugs with advanced debugging techniques Whether you’re designing aircraft, forecasting the weather, or you just need to tame your data, The Art of R Programming is your guide to harnessing the power of statistical computing. ABOUT THE AUTHOR Norman Matloff is a professor of computer science (and a former professor of statistics) at the University of California, Davis. His research interests include parallel processing and statistical regression, and he is the author of several widely used web tutorials on software development. He has written articles for the New York Times, the Washington Post, Forbes Magazine, and the Los Angeles Times, and he is the co-author of The Art of Debugging (No Starch Press). SHELVE IN : COMPUTERS/MATHEMATICAL & STATISTICAL SOFTWARE $39.95 ($41.95 CDN) www.nostarch.com THE FINEST IN GEEK ENTERTAINMENT ™ FSC LOGO “I LIE FLAT.” This book uses RepKover — a durable binding that won’t snap shut. A TOUR OF STATISTICAL SOFT WARE DESIGN NORMAN MATLOFF THE ART OF R PROGR AMMING THE ART OF R PROGR AMMING THE ART OF R PROGRAMMING THE ART OF R PROGRAMMING MATLOFF www.it-ebooks.info www.it-ebooks.info THE ART OF R PROGRAMMING www.it-ebooks.info www.it-ebooks.info THE ART OF R PROGRAMMING A Tour of Statistical Software Design by Norman Matloff San Francisco www.it-ebooks.info THE ART OF R PROGRAMMING. Copyright © 2011 by Norman Matloff. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. 1514131211 123456789 ISBN-10: 1-59327-384-3 ISBN-13: 978-1-59327-384-2 Publisher: William Pollock Production Editor: Alison Law Cover and Interior Design: Octopod Studios Developmental Editor: Keith Fancher Technical Reviewer: Hadley Wickham Copyeditor: Marilyn Smith Compositors: Alison Law and Serena Yang Proofreader: Paula L. Fleming Indexer: BIM Indexing & Proofreading Services For information on book distributors or translations, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 38 Ringold Street, San Francisco, CA 94103 phone: 415.863.9900; fax: 415.863.9950; info@nostarch.com; www.nostarch.com Library of Congress Cataloging-in-Publication Data Matloff, Norman S. The art of R programming : tour of statistical software design / by Norman Matloff. p. cm. ISBN-13: 978-1-59327-384-2 ISBN-10: 1-59327-384-3 1. Statistics-Data processing. 2. R (Computer program language) I. Title. QA276.4.M2925 2011 519.50285'5133-dc23 2011025598 No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the infor- mation contained in it. www.it-ebooks.info BRIEF CONTENTS Acknowledgments xvii Introduction . . . . . . xix Chapter 1: Getting Started . . 1 Chapter 2: Vectors . . . . . . . . . 25 Chapter 3: Matrices and Arrays. . . . . 59 Chapter 4: Lists. . . 85 Chapter 5: Data Frames . . . . . 101 Chapter 6: Factors and Tables . . . . . . 121 Chapter 7: R Programming Structures 139 Chapter 8: Doing Math and Simulations in R . . . 189 Chapter 9: Object-Oriented Programming . . . . . . 207 Chapter 10: Input/Output . . . 231 Chapter 11: String Manipulation . . . . 251 Chapter 12: Graphics . . . . . . 261 Chapter 13: Debugging . . . . . 285 Chapter 14: Performance Enhancement: Speed and Memory . . . 305 Chapter 15: Interfacing R to Other Languages . . 323 Chapter 16: Parallel R . . . . . . 333 Appendix A: Installing R. . . 353 Appendix B: Installing and Using Packages . . 355 www.it-ebooks.info www.it-ebooks.info CONTENTS IN DETAIL ACKNOWLEDGMENTS xvii INTRODUCTION xix Why Use R for Your Statistical Work? xix Object-Oriented Programming xvii Functional Programming xvii Whom Is This Book For? xviii My Own Background xix 1 GETTING STARTED 1 1.1 How to Run R 1 1.1.1 Interactive Mode 2 1.1.2 Batch Mode 3 1.2 A First R Session 4 1.3 Introduction to Functions 7 1.3.1 Variable Scope 9 1.3.2 Default Arguments 9 1.4 Preview of Some Important R Data Structures 10 1.4.1 Vectors, the R Workhorse 10 1.4.2 Character Strings 11 1.4.3 Matrices 11 1.4.4 Lists 12 1.4.5 Data Frames 14 1.4.6 Classes 15 1.5 Extended Example: Regression Analysis of Exam Grades 16 1.6 Startup and Shutdown 19 1.7 Getting Help 20 1.7.1 The help() Function 20 1.7.2 The example() Function 21 1.7.3 If You Don’t Know Quite What You’re Looking For 22 1.7.4 Help for Other Topics 23 1.7.5 Help for Batch Mode 24 1.7.6 Help on the Internet 24 www.it-ebooks.info 2 VECTORS 25 2.1 Scalars, Vectors, Arrays, and Matrices 26 2.1.1 Adding and Deleting Vector Elements 26 2.1.2 Obtaining the Length of a Vector 27 2.1.3 Matrices and Arrays as Vectors 28 2.2 Declarations 28 2.3 Recycling 29 2.4 Common Vector Operations 30 2.4.1 Vector Arithmetic and Logical Operations 30 2.4.2 Vector Indexing 31 2.4.3 Generating Useful Vectors with the : Operator 32 2.4.4 Generating Vector Sequences with seq() 33 2.4.5 Repeating Vector Constants with rep() 34 2.5 Using all() and any() 35 2.5.1 Extended Example: Finding Runs of Consecutive Ones 35 2.5.2 Extended Example: Predicting Discrete-Valued Time Series 37 2.6 Vectorized Operations 39 2.6.1 Vector In, Vector Out 40 2.6.2 Vector In, Matrix Out 42 2.7 NA and NULL Values 43 2.7.1 Using NA 43 2.7.2 Using NULL 44 2.8 Filtering 45 2.8.1 Generating Filtering Indices 45 2.8.2 Filtering with the subset() Function 47 2.8.3 The Selection Function which() 47 2.9 A Vectorized if-then-else: The ifelse() Function 48 2.9.1 Extended Example: A Measure of Association 49 2.9.2 Extended Example: Recoding an Abalone Data Set 51 2.10 Testing Vector Equality 54 2.11 Vector Element Names 56 2.12 More on c() 56 3 MATRICES AND ARRAYS 59 3.1 Creating Matrices 59 3.2 General Matrix Operations 61 3.2.1 Performing Linear Algebra Operations on Matrices 61 3.2.2 Matrix Indexing 62 3.2.3 Extended Example: Image Manipulation 63 3.2.4 Filtering on Matrices 66 3.2.5 Extended Example: Generating a Covariance Matrix 69 viii Contents in Detail www.it-ebooks.info [...]... worthy project www.it-ebooks.info BARUG has also benefited from the financial support of Revolution Analytics and countless hours, energy, and ideas from David Smith and Joe Rickert of that firm Jay Emerson and Mike Kane, authors of the award-winning bigmemory package in CRAN, read through an early draft of Chapter 16 on parallel R programming and made valuable comments John Chambers (founder of S, the “ancestor”... parallel programming Whom Is This Book For? Many use R mainly in an ad hoc way—to plot a histogram here, perform a regression analysis there, and carry out other discrete tasks involving statistical operations But this book is for those who wish to develop software in R The programming skills of our intended readers may range anywhere from those of a professional software developer to “I took a programming... dissertation in abstract probability theory, I spent the early years of my career as a statistics professor—teaching, doing research, and consulting in statistical methodology I was one of about a dozen professors at the University of California, Davis who founded the Department of Statistics at that university Later I moved to the Department of Computer Science at the same institution, where I have... daughter Laura, an engineering student, read parts of the early chapters and made some good suggestions that improved the book My own CRAN projects and other R- related research (parts of which serve as examples in the book) have benefited from the advice, feedback, and/or encouragement of many people, especially Mark Bravington, Stephen Eglen, Dirk Eddelbuett, Jay Emerson, Mike Kane, Gary King, Duncan... most of my career I do research in parallel programming, web traffic, data mining, disk system performance, and various other areas Much of my computer science teaching and research involves statistics Thus, I have the points of view of both a “hard-core” computer scientist and of a statistician and statistics researcher I hope this blend enables this book to fill a gap in the literature and enhances... Davis computer science colleague, Sean Davis Needless to say, there is no implication that they endorse my views in that section of the book, but their comments were quite helpful Early in the project, I made a very rough (and very partial) draft of the book available for public comment and received helpful feedback from Ramon Diaz-Uriarte, Barbara F La Scala, Jason Liao, and my old friend Mike Hannon... standard among professional statisticians • It is comparable, and often superior, in power to commercial products in most of the significant senses—variety of operations available, programmability, graphics, and so on • It is available for the Windows, Mac, and Linux operating systems • In addition to providing statistical operations, R is a general-purpose programming language, so you can use it to automate... their standard errors, residuals, and so on You then pick and choose, programmatically, which parts of that object to extract You will see that R s approach makes programming much easier, partly because it offers a certain uniformity of access to data This uniformity stems from the fact that R is polymorphic, which means that a single function can be applied to different types of inputs, which the. .. Jim Porzak, cofounder of the Bay Area useR Group (BARUG, http://www.bay -r. org/ ), for his frequent encouragement as I was writing this book And while on the subject of BARUG, I must thank Jim and the other cofounder, Mike Driscoll, for establishing that lively and stimulating forum At BARUG, the speakers on wonderful applications of R have always left me feeling that writing this book was a very worthy... also extended examples • There is a separate chapter on how to take advantage of the knowledge of R s internal behavior and other facilities to speed up R code • A chapter discusses the interface of R to other languages, such as C and Python, again with emphasis on extended examples as well as tips on debugging My Own Background I come to the R party through a somewhat unusual route After writing a . RepKover — a durable binding that won’t snap shut. A TOUR OF STATISTICAL SOFT WARE DESIGN NORMAN MATLOFF THE ART OF R PROGR AMMING THE ART OF R PROGR AMMING THE. R PROGRAMMING www.it-ebooks.info www.it-ebooks.info THE ART OF R PROGRAMMING A Tour of Statistical Software Design by Norman Matloff San Francisco www.it-ebooks.info THE ART OF R PROGRAMMING. Copyright

Ngày đăng: 23/03/2014, 05:24

Từ khóa liên quan

Mục lục

  • Copyright

  • Brief Contents

  • Contents in Detail

  • Acknowledgments

  • Introduction

    • Why Use R for Your Statistical Work?

    • Whom Is This Book For?

    • My Own Background

    • 1: Getting Started

      • 1.1 How to Run R

      • 1.2 A First R Session

      • 1.3 Introduction to Functions

      • 1.4 Preview of Some Important R Data Structures

      • 1.5 Extended Example: Regression Analysis of Exam Grades

      • 1.6 Startup and Shutdown

      • 1.7 Getting Help

      • 2: Vectors

        • 2.1 Scalars, Vectors, Arrays, and Matrices

        • 2.2 Declarations

        • 2.3 Recycling

        • 2.4 Common Vector Operations

        • 2.5 Using all() and any()

        • 2.6 Vectorized Operations

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan