Thông tin tài liệu
R is the world’s most popular language for developing
statistical software: Archaeologists use it to track the
spread of ancient civilizations, drug companies use it
to discover which medications are safe and effective,
and actuaries use it to assess financial risks and keep
markets running smoothly.
The Art of R Programming takes you on a guided tour
of software development with R, from basic types
and data structures to advanced topics like closures,
recursion, and anonymous functions. No statistical
knowledge is required, and your programming skills
can range from hobbyist to pro.
Along the way, you’ll learn about functional and object-
oriented programming, running mathematical simulations,
and rearranging complex data into simpler, more useful
formats. You’ll also learn to:
• Create artful graphs to visualize complex data sets
and functions
• Write more efficient code using parallel R and
vectorization
TAME YOUR DATA
TAME YOUR DATA
• Interface R with C/C++ and Python for increased
speed or functionality
• Find new packages for text analysis, image manipula-
tion, and thousands more
• Squash annoying bugs with advanced debugging
techniques
Whether you’re designing aircraft, forecasting the
weather, or you just need to tame your data, The Art of
R Programming is your guide to harnessing the power
of statistical computing.
ABOUT THE AUTHOR
Norman Matloff is a professor of computer science
(and a former professor of statistics) at the University
of California, Davis. His research interests include
parallel processing and statistical regression, and
he is the author of several widely used web tutorials
on software development. He has written articles for
the New York Times, the Washington Post, Forbes
Magazine, and the Los Angeles Times, and he is the
co-author of The Art of Debugging (No Starch Press).
SHELVE IN :
COMPUTERS/MATHEMATICAL &
STATISTICAL SOFTWARE
$39.95 ($41.95 CDN)
www.nostarch.com
THE FINEST IN GEEK ENTERTAINMENT
™
FSC LOGO
“I LIE FLAT.”
This book uses RepKover — a durable binding that won’t snap shut.
A TOUR OF STATISTICAL SOFT WARE DESIGN
NORMAN MATLOFF
THE
ART OF R
PROGR AMMING
THE
ART OF R
PROGR AMMING
THE ART OF R PROGRAMMING
THE ART OF R PROGRAMMING
MATLOFF
www.it-ebooks.info
www.it-ebooks.info
THE ART OF R
PROGRAMMING
www.it-ebooks.info
www.it-ebooks.info
THE ART OF R
PROGRAMMING
A Tour of Statistical
Software Design
by Norman Matloff
San Francisco
www.it-ebooks.info
THE ART OF R PROGRAMMING. Copyright © 2011 by Norman Matloff.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the
prior written permission of the copyright owner and the publisher.
1514131211 123456789
ISBN-10: 1-59327-384-3
ISBN-13: 978-1-59327-384-2
Publisher: William Pollock
Production Editor: Alison Law
Cover and Interior Design: Octopod Studios
Developmental Editor: Keith Fancher
Technical Reviewer: Hadley Wickham
Copyeditor: Marilyn Smith
Compositors: Alison Law and Serena Yang
Proofreader: Paula L. Fleming
Indexer: BIM Indexing & Proofreading Services
For information on book distributors or translations, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
38 Ringold Street, San Francisco, CA 94103
phone: 415.863.9900; fax: 415.863.9950; info@nostarch.com; www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Matloff, Norman S.
The art of R programming : tour of statistical software design / by Norman Matloff.
p. cm.
ISBN-13: 978-1-59327-384-2
ISBN-10: 1-59327-384-3
1. Statistics-Data processing. 2. R (Computer program language) I. Title.
QA276.4.M2925 2011
519.50285'5133-dc23
2011025598
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and
company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark
symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been
taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any
person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the infor-
mation contained in it.
www.it-ebooks.info
BRIEF CONTENTS
Acknowledgments xvii
Introduction . . . . . . xix
Chapter 1: Getting Started . . 1
Chapter 2: Vectors . . . . . . . . . 25
Chapter 3: Matrices and Arrays. . . . . 59
Chapter 4: Lists. . . 85
Chapter 5: Data Frames . . . . . 101
Chapter 6: Factors and Tables . . . . . . 121
Chapter 7: R Programming Structures 139
Chapter 8: Doing Math and Simulations in R . . . 189
Chapter 9: Object-Oriented Programming . . . . . . 207
Chapter 10: Input/Output . . . 231
Chapter 11: String Manipulation . . . . 251
Chapter 12: Graphics . . . . . . 261
Chapter 13: Debugging . . . . . 285
Chapter 14: Performance Enhancement: Speed and Memory . . . 305
Chapter 15: Interfacing R to Other Languages . . 323
Chapter 16: Parallel R . . . . . . 333
Appendix A: Installing R. . . 353
Appendix B: Installing and Using Packages . . 355
www.it-ebooks.info
www.it-ebooks.info
CONTENTS IN DETAIL
ACKNOWLEDGMENTS xvii
INTRODUCTION xix
Why Use R for Your Statistical Work? xix
Object-Oriented Programming
xvii
Functional Programming
xvii
Whom Is This Book For?
xviii
My Own Background
xix
1
GETTING STARTED 1
1.1 How to Run R 1
1.1.1 Interactive Mode
2
1.1.2 Batch Mode
3
1.2 A First R Session
4
1.3 Introduction to Functions
7
1.3.1 Variable Scope
9
1.3.2 Default Arguments
9
1.4 Preview of Some Important R Data Structures
10
1.4.1 Vectors, the R Workhorse
10
1.4.2 Character Strings
11
1.4.3 Matrices
11
1.4.4 Lists
12
1.4.5 Data Frames
14
1.4.6 Classes
15
1.5 Extended Example: Regression Analysis of Exam Grades
16
1.6 Startup and Shutdown
19
1.7 Getting Help
20
1.7.1 The help() Function
20
1.7.2 The example() Function
21
1.7.3 If You Don’t Know Quite What You’re Looking For
22
1.7.4 Help for Other Topics
23
1.7.5 Help for Batch Mode
24
1.7.6 Help on the Internet
24
www.it-ebooks.info
2
VECTORS 25
2.1 Scalars, Vectors, Arrays, and Matrices 26
2.1.1 Adding and Deleting Vector Elements
26
2.1.2 Obtaining the Length of a Vector
27
2.1.3 Matrices and Arrays as Vectors
28
2.2 Declarations
28
2.3 Recycling
29
2.4 Common Vector Operations
30
2.4.1 Vector Arithmetic and Logical Operations
30
2.4.2 Vector Indexing
31
2.4.3 Generating Useful Vectors with the : Operator
32
2.4.4 Generating Vector Sequences with seq()
33
2.4.5 Repeating Vector Constants with rep()
34
2.5 Using all() and any()
35
2.5.1 Extended Example: Finding Runs of Consecutive Ones
35
2.5.2 Extended Example: Predicting Discrete-Valued Time Series
37
2.6 Vectorized Operations
39
2.6.1 Vector In, Vector Out
40
2.6.2 Vector In, Matrix Out
42
2.7 NA and NULL Values
43
2.7.1 Using NA
43
2.7.2 Using NULL
44
2.8 Filtering
45
2.8.1 Generating Filtering Indices
45
2.8.2 Filtering with the subset() Function
47
2.8.3 The Selection Function which()
47
2.9 A Vectorized if-then-else: The ifelse() Function
48
2.9.1 Extended Example: A Measure of Association
49
2.9.2 Extended Example: Recoding an Abalone Data Set
51
2.10 Testing Vector Equality
54
2.11 Vector Element Names
56
2.12 More on c()
56
3
MATRICES AND ARRAYS 59
3.1 Creating Matrices 59
3.2 General Matrix Operations
61
3.2.1 Performing Linear Algebra Operations on Matrices
61
3.2.2 Matrix Indexing
62
3.2.3 Extended Example: Image Manipulation
63
3.2.4 Filtering on Matrices
66
3.2.5 Extended Example: Generating a Covariance Matrix
69
viii
Contents in Detail
www.it-ebooks.info
[...]... worthy project www.it-ebooks.info BARUG has also benefited from the financial support of Revolution Analytics and countless hours, energy, and ideas from David Smith and Joe Rickert of that firm Jay Emerson and Mike Kane, authors of the award-winning bigmemory package in CRAN, read through an early draft of Chapter 16 on parallel R programming and made valuable comments John Chambers (founder of S, the “ancestor”... parallel programming Whom Is This Book For? Many use R mainly in an ad hoc way—to plot a histogram here, perform a regression analysis there, and carry out other discrete tasks involving statistical operations But this book is for those who wish to develop software in R The programming skills of our intended readers may range anywhere from those of a professional software developer to “I took a programming... dissertation in abstract probability theory, I spent the early years of my career as a statistics professor—teaching, doing research, and consulting in statistical methodology I was one of about a dozen professors at the University of California, Davis who founded the Department of Statistics at that university Later I moved to the Department of Computer Science at the same institution, where I have... daughter Laura, an engineering student, read parts of the early chapters and made some good suggestions that improved the book My own CRAN projects and other R- related research (parts of which serve as examples in the book) have benefited from the advice, feedback, and/or encouragement of many people, especially Mark Bravington, Stephen Eglen, Dirk Eddelbuett, Jay Emerson, Mike Kane, Gary King, Duncan... most of my career I do research in parallel programming, web traffic, data mining, disk system performance, and various other areas Much of my computer science teaching and research involves statistics Thus, I have the points of view of both a “hard-core” computer scientist and of a statistician and statistics researcher I hope this blend enables this book to fill a gap in the literature and enhances... Davis computer science colleague, Sean Davis Needless to say, there is no implication that they endorse my views in that section of the book, but their comments were quite helpful Early in the project, I made a very rough (and very partial) draft of the book available for public comment and received helpful feedback from Ramon Diaz-Uriarte, Barbara F La Scala, Jason Liao, and my old friend Mike Hannon... standard among professional statisticians • It is comparable, and often superior, in power to commercial products in most of the significant senses—variety of operations available, programmability, graphics, and so on • It is available for the Windows, Mac, and Linux operating systems • In addition to providing statistical operations, R is a general-purpose programming language, so you can use it to automate... their standard errors, residuals, and so on You then pick and choose, programmatically, which parts of that object to extract You will see that R s approach makes programming much easier, partly because it offers a certain uniformity of access to data This uniformity stems from the fact that R is polymorphic, which means that a single function can be applied to different types of inputs, which the. .. Jim Porzak, cofounder of the Bay Area useR Group (BARUG, http://www.bay -r. org/ ), for his frequent encouragement as I was writing this book And while on the subject of BARUG, I must thank Jim and the other cofounder, Mike Driscoll, for establishing that lively and stimulating forum At BARUG, the speakers on wonderful applications of R have always left me feeling that writing this book was a very worthy... also extended examples • There is a separate chapter on how to take advantage of the knowledge of R s internal behavior and other facilities to speed up R code • A chapter discusses the interface of R to other languages, such as C and Python, again with emphasis on extended examples as well as tips on debugging My Own Background I come to the R party through a somewhat unusual route After writing a . RepKover — a durable binding that won’t snap shut.
A TOUR OF STATISTICAL SOFT WARE DESIGN
NORMAN MATLOFF
THE
ART OF R
PROGR AMMING
THE
ART OF R
PROGR AMMING
THE. R
PROGRAMMING
www.it-ebooks.info
www.it-ebooks.info
THE ART OF R
PROGRAMMING
A Tour of Statistical
Software Design
by Norman Matloff
San Francisco
www.it-ebooks.info
THE ART OF R PROGRAMMING. Copyright
Ngày đăng: 23/03/2014, 05:24
Xem thêm: The Art of R Programming: A Tour of Statistical Software Design ppt, The Art of R Programming: A Tour of Statistical Software Design ppt, 5 Extended Example: Regression Analysis of Exam Grades, 1 Scalars, Vectors, Arrays, and Matrices, 4 Using Rprof() to Find Slow Spots in Your Code, 6 Oh No, the Data Doesn’t Fit into Memory!, 1 Writing C/C++ Functions to Be Called from R