IT training data mining for dummies brown 2014 09 29

411 338 0
IT training data mining for dummies brown 2014 09 29

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

www.ebook3000.com www.ebook3000.com Data Mining Data Mining For Dummies® Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com Copyright © 2014 by John Wiley & Sons, Inc., Hoboken, New Jersey Media and software compilation copyright © 2014 by John Wiley & Sons, Inc All rights reserved Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permissions Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc and may not be used without written permission Samsung and Galaxy S are registered trademarks of Samsung Electronics Co Ltd All other trademarks are the property of their respective owners John Wiley & Sons, Inc is not associated with any product or vendor mentioned in this book LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ For general information on our other products and services, please contact our Customer Care Department within the U.S at 877-762-2974, outside the U.S at 317-572-3993, or fax 317-572-4002 For technical support, please visit www.wiley.com/techsupport Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with standard print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com For more information about Wiley products, visit www.wiley.com Library of Congress Control Number: 2014935519 ISBN 978-1-118-89317-3 (pbk); ISBN 978-1-118-89316-6 (ebk); ISBN 978-1-118-89319-7 (ebk) Manufactured in the United States of America 10 Contents at a Glance Introduction Part I: Getting Started with Data Mining Chapter 1: Catching the Data-Mining Train Chapter 2: A Day in Your Life as a Data Miner 17 Chapter 3: Teaming Up to Reach Your Goals 49 Part II: Exploring Data-Mining Mantras and Methods 61 Chapter 4: Learning the Laws of Data Mining 63 Chapter 5: Embracing the Data-Mining Process 73 Chapter 6: Planning for Data-Mining Success 89 Chapter 7: Gearing Up with the Right Sof tware 97 Part III: Gathering the Raw Materials 109 Chapter 8: Digging into Your Data 111 Chapter 9: Making New Data 119 Chapter 10: Ferreting Out Public Data Sources 141 Chapter 11: Buying Data 163 Part IV: A Data Miner’s Survival Kit 171 Chapter 12: Getting Familiar with Your Data 173 Chapter 13: Dealing in Graphic Detail 195 Chapter 14: Showing Your Data Who’s Boss 219 Chapter 15: Your Exciting Career in Modeling 245 Part V: More Data-Mining Methods 273 Chapter 16: Data Mining Using Classic Statistical Methods 275 Chapter 17: Mining Data for Clues 295 Chapter 18: Expanding Your Horizons 307 Part VI: The Part of Tens 319 Chapter 19: Ten Great Resources for Data Miners 321 Chapter 20: Ten Useful Kinds of Analysis That Complement Data Mining 325 Appendix A: Glossary 333 Appendix B: Data-Mining Sof tware Sources 339 Appendix C: Major Data Vendors 349 Appendix D: Sources and Citations 357 Index 361 Table of Contents Introduction About This Book Foolish Assumptions Icons Used in This Book Beyond the Book Where to Go from Here Part I: Getting Started with Data Mining Chapter 1: Catching the Data-Mining Train Getting Real about Data Mining Not your professor’s statistics The value of data mining Working for it Doing What Data Miners Do 10 Focusing on the business 10 Understanding how data miners spend their time 11 Getting to know the data-mining process 11 Making models 12 Understanding mathematical models 12 Putting information into action 13 Discovering Tools and Methods 13 Visual programming 14 Working quick and dirty 15 Testing, testing, and testing some more 16 Chapter 2: A Day in Your Life as a Data Miner 17 Starting Your Day Off Right 17 Meeting the team 18 Exploring with aim 18 Structuring time with the right process 20 Understanding Your Business Goals 20 Understanding Your Data 22 Describing data 22 Exploring data 23 Cleaning data 27 Preparing Your Data 28 Taking first steps with the property data 28 Preparing the ownership change indicator 32 Merging the datasets 32 Deriving new variables 34 viii Data Mining For Dummies Modeling Your Data 40 Using balanced data 40 Splitting data 41 Building a model 43 Evaluating Your Results 44 Examining the decision tree 44 Using a diagnostic chart 46 Assessing the status of the model 47 Putting Your Results into Action 48 Chapter 3: Teaming Up to Reach Your Goals 49 Nothing Could Be Finer Than to Be a Data Miner 49 You can be a data miner 50 Using the knowledge you have 51 Data Miners Play Nicely with Others 51 Cooperation is a necessity 51 Oh, the people you’ll meet! 53 Working with Executives 56 Greetings and elicitations 57 Lining up your priorities 58 Talking data mining with executives 58 Part II: Exploring Data-Mining Mantras and Methods 61 Chapter 4: Learning the Laws of Data Mining 63 1st Law: Business Goals 63 2nd Law: Business Knowledge 64 3rd Law: Data Preparation 65 4th Law: Right Model 66 5th Law: Pattern 67 6th Law: Amplification 68 7th Law: Prediction 69 8th Law: Value 70 9th Law: Change 70 Chapter 5: Embracing the Data-Mining Process 73 Whose Standard Is It, Anyway? 73 Approaching the process in phases 74 Cycling through phases and projects 74 Documenting your work 75 Business Understanding 76 Data Understanding 79 Data Preparation 82 Modeling 84 Evaluation 86 Deployment 87 Index IBM, 341–342 KNIME.com AG, 342 KXEN, 342–343 Megaputer, 343 Oracle, 343–344 R Foundation, 344 RapidMiner, 344 Revolution Analytics, 345 sales representatives, engaging with, 106–108 Salford Systems, 345 SAS Institute, 345–346 Statsoft Inc., 346 Tableau Software, 346–347 Teradata, 347 University of Ljubljana, 347 University of Waikato, 348 Wolfram Research, 348 verifying data quality, 81–82 video tutorials for software, 308 visual programming defined, 338 general discussion, 14–15 importing data into, 28–32 overview, 102–103, 340 terminology related to, 191 visualization, 338 See also graphs vocabulary, data mining, 191 voter research, 127–130, 131 web analytics, 331–332 web logs, 121 web page testing, 126–127 web scraping, 23 weighting, 243 weights, linear models, 264 Weka associations, creating rules for, 300–303 associations, importing data for, 298–300 associations, refining results for, 303–306 chart matrix, 212 comments, 225 exporting data, 230, 233 interactive scatterplots, 209 supplier information, 348 text files, opening in, 185–188 wizards, visual programming interface, 29 Wolberg, William H., 249 See also breast tumor diagnosis data Wolfram Alpha, 348 Wolfram Research, 348 workflow for decision tree creation, 250– 258, 263 writing business cases, 94 •W• •Z• warehouse clubs, 123–124 Warning! icon, Zeroth Law (0th Law) of Data Mining, 71 •X• XML, importing, 190 xy pairs, 199 381 382 Data Mining For Dummies About the Author Meta S Brown helps technical professionals communicate with everybody else She’s the creator of the Storytelling for Data Analysts and Storytelling for Tech workshops Dedication For Marty, who never gives me a hard time about work Ever Author’s Acknowledgments A number of experts shared their experience and time to contribute to this book Each of them is named somewhere in the pages that follow The researchers who share data make books like this possible Sources are cited within the book Wiley editors Christopher Morris, Leah Michael, John Edwards, and Kyle Looper are models of professionalism I wonder if they know how exceptional that is Tom Khabaza, technical editor and the world’s best data-mining mentor, is a fountain of knowledge and a real mensch Laaren Brown and Lenny Hort — authors, editors, and much more — provided excellent advice galore Publisher’s Acknowledgments Acquisitions Editor: Kyle Looper Project Coordinator: Patrick Redmond Senior Project Editor: Christopher Morris Cover Image: ©iStock.com/Media Mates Oy Copy Editor: John Edwards Technical Editor: Thomas Khabaza Editorial Assistant: Claire Johnson Sr Editorial Assistant: Cherie Case ... Started with Data Mining, lets you know what data mining really is, and what it s like to be a data miner Part II, Exploring Data Mining Mantras and Methods, takes you deeper to understand how data. .. out about data- mining principles, processes, planning, and tools Data Mining For Dummies And in Part III, Gathering the Raw Materials, you’ll get into the heart of data mining: data itself You’ll... Part I Getting Started with Data Mining Visit www .dummies. com for great For Dummies content online In this part . .  ✓ Understanding how data miners work ✓ Looking over a data miner’s shoulder

Ngày đăng: 05/11/2019, 15:01

Từ khóa liên quan

Mục lục

  • Title Page

  • Copyright Page

  • Contents at a Glance

  • Table of Contents

  • Introduction

    • About This Book

    • Foolish Assumptions

    • Icons Used in This Book

    • Beyond the Book

    • Where to Go from Here

  • Part I: Getting Started with Data Mining

    • Chapter 1: Catching the Data-Mining Train

      • Getting Real about Data Mining

        • Not your professor’s statistics

        • The value of data mining

        • Working for it

      • Doing What Data Miners Do

        • Focusing on the business

        • Understanding how data miners spend their time

        • Getting to know the data-mining process

        • Making models

        • Understanding mathematical models

        • Putting information into action

      • Discovering Tools and Methods

        • Visual programming

        • Working quick and dirty

        • Testing, testing, and testing some more

    • Chapter 2: A Day in Your Life as a Data Miner

      • Starting Your Day Off Right

        • Meeting the team

        • Exploring with aim

        • Structuring time with the right process

      • Understanding Your Business Goals

      • Understanding Your Data

        • Describing data

        • Exploring data

        • Cleaning data

      • Preparing Your Data

        • Taking first steps with the property data

        • Preparing the ownership change indicator

        • Merging the datasets

        • Deriving new variables

      • Modeling Your Data

        • Using balanced data

        • Splitting data

        • Building a model

      • Evaluating Your Results

        • Examining the decision tree

        • Using a diagnostic chart

        • Assessing the status of the model

      • Putting Your Results into Action

    • Chapter 3: Teaming Up to Reach Your Goals

      • Nothing Could Be Finer Than to Be a Data Miner

        • You can be a data miner

        • Using the knowledge you have

      • Data Miners Play Nicely with Others

        • Cooperation is a necessity

        • Oh, the people you’ll meet!

      • Working with Executives

        • Greetings and elicitations

        • Lining up your priorities

        • Talking data mining with executives

  • Part II: Exploring Data-Mining Mantras and Methods

    • Chapter 4: Learning the Laws of Data Mining

      • 1st Law: Business Goals

      • 2nd Law: Business Knowledge

      • 3rd Law: Data Preparation

      • 4th Law: Right Model

      • 5th Law: Pattern

      • 6th Law: Amplification

      • 7th Law: Prediction

      • 8th Law: Value

      • 9th Law: Change

    • Chapter 5: Embracing the Data-Mining Process

      • Whose Standard Is It, Anyway?

        • Approaching the process in phases

        • Cycling through phases and projects

        • Documenting your work

      • Business Understanding

      • Data Understanding

      • Data Preparation

      • Modeling

      • Evaluation

      • Deployment

    • Chapter 6: Planning for Data-Mining Success

      • Setting the Course with Formal Business Cases

        • Satisfying the boss

        • Minimizing your own risk

      • Building Business Cases

        • Elements of the business case

        • Putting it in writing

        • The basics on benefits

      • Avoiding the Failure Option

    • Chapter 7: Gearing Up with the Right Software

      • Putting Data-Mining Tools in Perspective

        • Avoiding software risks

        • Focusing on business goals, not tools

        • Determining what you need

        • Comparing tools

        • Shopping for software

      • Evaluating Software

        • Don’t fall in love (with your software)

        • Engaging with sales representatives

        • The sales professional’s mantra — BANT

  • Part III: Gathering the Raw Materials

    • Chapter 8: Digging into Your Data

      • Focusing on a Problem

      • Managing Scope

      • Using Your Organization’s Own Data

        • Appreciating your own data

        • Handling data with respect

    • Chapter 9: Making New Data

      • Fathoming Loyalty Programs

        • Grasping the loyalty concept

        • Your data bonanza

        • Putting loyalty data to work

      • Testing, Testing . . .

        • Experimenting in direct marketing

        • Spying test opportunities

        • Testing online

      • Microtargeting to Win Elections

        • Treating voters as individuals

        • Looking at an example

        • Enhancing voter data

        • Gaining an information advantage

        • Developing your own test data

        • Taking discoveries on the campaign trail

      • Surveying the Public Landscape

        • Eliciting information with surveys

        • Using surveys

        • Developing questions

        • Conducting surveys

        • Recognizing limitations

        • Bringing in help

      • Getting into the Field

        • Going where no data miner has gone before

        • Doing more than asking

      • One Challenge, Many Approaches

    • Chapter 10: Ferreting Out Public Data Sources

      • Looking Over the Lay of the Land

      • Exploring Public Data Sources

        • United States federal government

        • Governments around the world

        • United States state and local governments

    • Chapter 11: Buying Data

      • Peeking at Consumer Data

      • Beyond Consumer Data

      • Desperately Seeking Sources

      • Assessing Quality and Suitability

  • Part IV: A Data Miner’s Survival Kit

    • Chapter 12: Getting Familiar with Your Data

      • Organizing Data for Mining

      • Getting Data from There to Here

        • Text files

        • Databases

        • Spreadsheets, XML, and specialty data formats

      • Surveying Your Data

    • Chapter 13: Dealing in Graphic Detail

      • Starting Simple

        • Eyeballing variables with bar charts and histograms

        • Relating one variable to another with scatterplots

      • Building on Basics

        • Making scatterplots say more

        • Interacting with scatterplots

      • Working Fast with Graphs Galore

      • Extending Your Graphics Range

    • Chapter 14: Showing Your Data Who’s Boss

      • Rearranging Data

        • Controlling variable order

        • Formatting data properly

        • Labeling data

        • Controlling case order

        • Getting rows and columns right

        • Putting data where you need it

      • Sifting Out the Data You Need

        • Narrowing the fields

        • Selecting relevant cases

        • Sampling

      • Getting the Data Together

        • Merging

        • Appending

      • Making New Data from Old Data

        • Deriving new variables

        • Aggregation

      • Saving Time

    • Chapter 15: Your Exciting Career in Modeling

      • Grasping Modeling Concepts

      • Cultivating Decision Trees

        • Examining a decision tree

        • Using decision trees to aid communication

        • Constructing a decision tree

        • Getting acquainted with common decision tree types

        • Adapting to your tools

      • Neural Networks for Prediction

        • Looking inside a neural network

        • Issues surrounding neural network models

      • Clustering

        • Supervised and unsupervised learning

        • Clustering to clarify

  • Part V: More Data-Mining Methods

    • Chapter 16: Data Mining Using Classic Statistical Methods

      • Understanding Correlation

        • Picturing correlations

        • Measuring the strength of a correlation

        • Drawing lines in the data

        • Giving correlations a try

      • Understanding Linear Regression

        • Working with straight lines

        • Finding the best line

        • Using linear regression coefficients

        • Interpreting model statistics

        • Applying common sense

      • Understanding Logistic Regression

        • Looking into logistic regression

        • Appreciating the appeal of logistic regression

        • Looking over a logistic regression example

    • Chapter 17: Mining Data for Clues

      • Tracking Combinations

      • Finding Associations in Data

        • Structuring association rules

        • Getting ready

        • Shopping for associations

        • Refining results

        • Understanding the metrics

    • Chapter 18: Expanding Your Horizons

      • Squeezing More Out of What You Have

        • Mastering your data-mining application

        • Fine-tuning your settings

        • Analyzing your analysis

        • Using meta-models (ensemble models)

      • Widening Your Range

        • Tackling text

        • Detecting sequences

        • Working with time series

      • Taking on Big Data

        • Coming to terms with Big Data

        • Conducting predictive analytics with Big Data

      • Blending Methods for Best Results

  • Part VI: The Part of Tens

    • Chapter 19: Ten Great Resources for Data Miners

      • Society of Data Miners

      • KDnuggets

      • All Analytics

      • The New York Times

      • Forbes

      • SmartData Collective

      • CRISP-DM Process Model

      • Nate Silver

      • Meta’s Analytics Articles page

      • First Internet Gallery of Statistics Jokes

    • Chapter 20: Ten Useful Kinds of Analysis That Complement Data Mining

      • Business Analysis

      • Conjoint Analysis

      • Design of Experiments

      • Marketing Mix Modeling

      • Operations Research

      • Reliability Analysis

      • Statistical Process Control

      • Social Network Analysis

      • Structural Equation Modeling

      • Web Analytics

  • Appendix A: Glossary

  • Appendix B: Data-Mining Sof tware Sources

  • Appendix C: Major Data Vendors

  • Appendix D: Sources and Citations

  • Index

  • About the Author

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan