Ebook An introduction to statistical methods and data analysis (6th edition) Part 1

Thông tin tài liệu

(BQ) Part 1 book An introduction to statistical methods and data analysis has contents: Statistics and the scientific method; using surveys and experimental studies to gather data; data description, probability and probability distributions; inferences about population central values; inferences comparing two population central values,...and other contents.

17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM Page iii An Introduction to Statistical Methods and Data Analysis Sixth Edition R Lyman Ott Michael Longnecker Texas A&M University Australia • Brazil • Japan • Korea • Mexico • Singapore • Spain • United Kingdom • United States 17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM An Introduction to Statistical Methods and Data Analysis, Sixth Edition R Lyman Ott, Michael Longnecker Senior Acquiring Sponsoring Editor: Molly Taylor Assistant Editor: Dan Seibert Editorial Assistant: Shaylin Walsh Page iv © 2010, 2001 Brooks/Cole, Cengage Learning ALL RIGHTS RESERVED No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher Media Manager: Catie Ronquillo Marketing Manager: Greta Kleinert Marketing Assistant: Angela Kim Marketing Communications Manager: Mary Anne Payumo Project Manager, Editorial Production: Jennifer Risden For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706 For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions Further permissions questions can be e-mailed to permissionrequest@cengage.com Creative Director: Rob Hugel Library of Congress Control Number: 2008931280 Art Director: Vernon Boes ISBN-13: 978-0-495-01758-5 Print Buyer: Judy Inouye ISBN-10: 0-495-01758-2 Permissions Editor: Roberta Broyer Production Service: Macmillan Publishing Solutions Text Designer: Helen Walden Copy Editor: Tami Taliferro Brooks/Cole 10 Davis Drive Belmont, CA 94002-3098 USA Illustrator: Macmillan Publishing Solutions Cover Designer: Hiroko Chastain/ Cuttriss & Hambleton Cover Images: Professor with medical model of head educating students: Scott Goldsmith/Getty Images; dollar diagram: John Foxx/Getty Images; multi-ethnic business people having meeting: Jon Feingersh/Getty Images; technician working in a laboratory: © istockphoto.com/Rich Legg; physical background with graphics and formulas: © istockphoto.com/Ivan Dinev; students engrossed in their books in the college library: © istockphoto.com/Chris Schmidt; group of colleagues working together on a project: © istockphoto.com/Chris Schmidt; mathematical assignment on a chalkboard: © istockphoto.com/Bart Coenders Compositor: Macmillan Publishing Solutions Printed in Canada 12 11 10 09 08 Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan Locate your local office at www.cengage.com/international Cengage Learning products are represented in Canada by Nelson Education, Ltd To learn more about Brooks/Cole, visit www.cengage.com/brookscole Purchase any of our products at your local college store or at our preferred online store www.ichapters.com 17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM Page v Contents Preface xi PART CHAPTER Statistics and the Scientific Method 1.1 1.2 1.3 1.4 1.5 1.6 Introduction Why Study Statistics? Some Current Applications of Statistics A Note to the Student 12 Summary 13 Exercises 13 PART CHAPTER Introduction Collecting Data 15 Using Surveys and Experimental Studies to Gather Data 16 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Introduction and Abstract of Research Study 16 Observational Studies 18 Sampling Designs for Surveys 24 Experimental Studies 30 Designs for Experimental Studies 35 Research Study: Exit Polls versus Election Results Summary 47 Exercises 48 46 v 17582_00_FM_pi-xvi.qxd vi 11/26/08 8:13 PM Page vi Contents PART CHAPTER Summarizing Data 55 Data Description 56 Introduction and Abstract of Research Study 56 Calculators, Computers, and Software Systems 61 Describing Data on a Single Variable: Graphical Methods 62 Describing Data on a Single Variable: Measures of Central Tendency Describing Data on a Single Variable: Measures of Variability 85 The Boxplot 97 Summarizing Data from More Than One Variable: Graphs and Correlation 102 3.8 Research Study: Controlling for Student Background in the Assessment of Teaching 112 3.9 Summary and Key Formulas 116 3.10 Exercises 117 3.1 3.2 3.3 3.4 3.5 3.6 3.7 CHAPTER Probability and Probability Distributions 140 Introduction and Abstract of Research Study 140 Finding the Probability of an Event 144 Basic Event Relations and Probability Laws 146 Conditional Probability and Independence 149 Bayes’ Formula 152 Variables: Discrete and Continuous 155 Probability Distributions for Discrete Random Variables 157 Two Discrete Random Variables: The Binomial and the Poisson 158 Probability Distributions for Continuous Random Variables 168 A Continuous Probability Distribution: The Normal Distribution 171 Random Sampling 178 Sampling Distributions 181 Normal Approximation to the Binomial 191 Evaluating Whether or Not a Population Distribution Is Normal 194 Research Study: Inferences about Performance-Enhancing Drugs among Athletes 199 4.16 Minitab Instructions 201 4.17 Summary and Key Formulas 203 4.18 Exercises 203 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 PART CHAPTER Analyzing Data, Interpreting the Analyses, and Communicating Results 221 Inferences about Population Central Values 5.1 5.2 5.3 5.4 5.5 222 Introduction and Abstract of Research Study 222 Estimation of m 225 Choosing the Sample Size for Estimating m 230 A Statistical Test for m 232 Choosing the Sample Size for Testing m 245 78 17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM Page vii vii Contents The Level of Significance of a Statistical Test 246 Inferences about m for a Normal Population, s Unknown 250 Inferences about m When Population Is Nonnormal and n Is Small: Bootstrap Methods 259 5.9 Inferences about the Median 265 5.10 Research Study: Percent Calories from Fat 270 5.11 Summary and Key Formulas 273 5.12 Exercises 275 5.6 5.7 5.8 CHAPTER Inferences Comparing Two Population Central Values 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 CHAPTER CHAPTER 360 Introduction and Abstract of Research Study 360 Estimation and Tests for a Population Variance 362 Estimation and Tests for Comparing Two Population Variances 369 Tests for Comparing t Ͼ Population Variances 376 Research Study: Evaluation of Method for Detecting E coli 381 Summary and Key Formulas 386 Exercises 387 Inferences about More Than Two Population Central Values 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 CHAPTER Introduction and Abstract of Research Study 290 Inferences about m1 Ϫ m2: Independent Samples 293 A Nonparametric Alternative: The Wilcoxon Rank Sum Test 305 Inferences about m1 Ϫ m2: Paired Data 314 A Nonparametric Alternative: The Wilcoxon Signed-Rank Test 319 Choosing Sample Sizes for Inferences about m1 Ϫ m2 323 Research Study: Effects of Oil Spill on Plant Growth 325 Summary and Key Formulas 330 Exercises 333 Inferences about Population Variances 7.1 7.2 7.3 7.4 7.5 7.6 7.7 290 Introduction and Abstract of Research Study 402 A Statistical Test about More Than Two Population Means: An Analysis of Variance 405 The Model for Observations in a Completely Randomized Design 414 Checking on the AOV Conditions 416 An Alternative Analysis: Transformations of the Data 421 A Nonparametric Alternative: The Kruskal–Wallis Test 428 Research Study: Effect of Timing on the Treatment of Port-Wine Stains with Lasers 431 Summary and Key Formulas 436 Exercises 438 Multiple Comparisons 9.1 9.2 402 451 Introduction and Abstract of Research Study Linear Contrasts 454 451 17582_00_FM_pi-xvi.qxd viii 11/26/08 8:13 PM Page viii Contents 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 CHAPTER 10 Which Error Rate Is Controlled? 460 Fisher’s Least Significant Difference 463 Tukey’s W Procedure 468 Student–Newman–Keuls Procedure 471 Dunnett’s Procedure: Comparison of Treatments to a Control 474 Scheffé’s S Method 476 A Nonparametric Multiple-Comparison Procedure 478 Research Study: Are Interviewers’ Decisions Affected by Different Handicap Types? 482 Summary and Key Formulas 488 Exercises 490 Categorical Data 499 Introduction and Abstract of Research Study 499 Inferences about a Population Proportion p 500 Inferences about the Difference between Two Population Proportions, p1 Ϫ p2 507 10.4 Inferences about Several Proportions: Chi-Square Goodness-of-Fit Test 513 10.5 Contingency Tables: Tests for Independence and Homogeneity 521 10.6 Measuring Strength of Relation 528 10.7 Odds and Odds Ratios 530 10.8 Combining Sets of ϫ Contingency Tables 535 10.9 Research Study: Does Gender Bias Exist in the Selection of Students for Vocational Education? 538 10.10 Summary and Key Formulas 545 10.11 Exercises 546 10.1 10.2 10.3 CHAPTER 11 Linear Regression and Correlation 572 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 CHAPTER 12 Introduction and Abstract of Research Study 572 Estimating Model Parameters 581 Inferences about Regression Parameters 590 Predicting New y Values Using Regression 594 Examining Lack of Fit in Linear Regression 598 The Inverse Regression Problem (Calibration) 605 Correlation 608 Research Study: Two Methods for Detecting E coli 616 Summary and Key Formulas 621 Exercises 623 Multiple Regression and the General Linear Model 664 12.1 12.2 12.3 12.4 12.5 12.6 Introduction and Abstract of Research Study 664 The General Linear Model 674 Estimating Multiple Regression Coefficients 675 Inferences in Multiple Regression 683 Testing a Subset of Regression Coefficients 691 Forecasting Using Multiple Regression 695 17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM Page ix ix Contents 12.7 12.8 12.9 12.10 12.11 12.12 CHAPTER 13 Further Regression Topics 13.1 13.2 13.3 13.4 13.5 13.6 13.7 CHAPTER 14 14.5 14.6 14.7 14.8 14.9 15.5 15.6 15.7 15.8 CHAPTER 16 Introduction and Abstract of Research Study 763 Selecting the Variables (Step 1) 764 Formulating the Model (Step 2) 781 Checking Model Assumptions (Step 3) 797 Research Study: Construction Costs for Nuclear Power Plants Summary and Key Formulas 824 Exercises 825 950 Introduction and Abstract of Research Study 950 Randomized Complete Block Design 951 Latin Square Design 963 Factorial Treatment Structure in a Randomized Complete Block Design 974 A Nonparametric Alternative—Friedman’s Test 978 Research Study: Control of Leatherjackets 982 Summary and Key Formulas 987 Exercises 989 The Analysis of Covariance 1009 16.1 16.2 16.3 16.4 817 878 Introduction and Abstract of Research Study 878 Completely Randomized Design with a Single Factor 880 Factorial Treatment Structure 885 Factorial Treatment Structures with an Unequal Number of Replications 910 Estimation of Treatment Differences and Comparisons of Treatment Means 917 Determining the Number of Replications 921 Research Study: Development of a Low-Fat Processed Meat 926 Summary and Key Formulas 931 Exercises 932 Analysis of Variance for Blocked Designs 15.1 15.2 15.3 15.4 715 763 Analysis of Variance for Completely Randomized Designs 14.1 14.2 14.3 14.4 CHAPTER 15 Comparing the Slopes of Several Regression Lines 697 Logistic Regression 701 Some Multiple Regression Theory (Optional) 708 Research Study: Evaluation of the Performance of an Electric Drill Summary and Key Formulas 722 Exercises 724 Introduction and Abstract of Research Study 1009 A Completely Randomized Design with One Covariate 1012 The Extrapolation Problem 1023 Multiple Covariates and More Complicated Designs 1026 17582_00_FM_pi-xvi.qxd x 11/26/08 8:13 PM Page x Contents 16.5 Research Study: Evaluation of Cool-Season Grasses for Putting Greens 1028 16.6 Summary 1034 16.7 Exercises 1034 CHAPTER 17 Analysis of Variance for Some Fixed-, Random-, and Mixed-Effects Models 1041 Introduction and Abstract of Research Study 1041 A One-Factor Experiment with Random Treatment Effects Extensions of Random-Effects Models 1048 Mixed-Effects Models 1056 Rules for Obtaining Expected Mean Squares 1060 Nested Factors 1070 Research Study: Factors Affecting Pressure Drops Across Expansion Joints 1075 17.8 Summary 1080 17.9 Exercises 1081 17.1 17.2 17.3 17.4 17.5 17.6 17.7 CHAPTER 18 Split-Plot, Repeated Measures, and Crossover Designs 18.1 18.2 18.3 18.4 18.5 18.6 18.7 18.8 CHAPTER 19 1044 1091 Introduction and Abstract of Research Study 1091 Split-Plot Designed Experiments 1095 Single-Factor Experiments with Repeated Measures 1101 Two-Factor Experiments with Repeated Measures on One of the Factors 1105 Crossover Designs 1112 Research Study: Effects of Oil Spill on Plant Growth 1120 Summary 1122 Exercises 1122 Analysis of Variance for Some Unbalanced Designs 19.1 Introduction and Abstract of Research Study 1135 19.2 A Randomized Block Design with One or More Missing Observations 1137 19.3 A Latin Square Design with Missing Data 1143 19.4 Balanced Incomplete Block (BIB) Designs 1148 19.5 Research Study: Evaluation of the Consistency of Property Assessments 1155 19.6 Summary and Key Formulas 1159 19.7 Exercises 1160 Appendix: Statistical Tables 1169 Answers to Selected Exercises 1210 References 1250 Index 1254 1135 17582_00_FM_pi-xvi.qxd 11/26/08 8:13 PM Page xi Preface Intended Audience An Introduction to Statistical Methods and Data Analysis, Sixth Edition, provides a broad overview of statistical methods for advanced undergraduate and graduate students from a variety of disciplines This book is intended to prepare students to solve problems encountered in research projects, to make decisions based on data in general settings both within and beyond the university setting, and finally to become critical readers of statistical analyses in research papers and in news reports The book presumes that the students have a minimal mathematical background (high school algebra) and no prior course work in statistics The first eleven chapters of the textbook present the material typically covered in an introductory statistics course However, this book provides research studies and examples that connect the statistical concepts to data analysis problems, which are often encountered in undergraduate capstone courses The remaining chapters of the book cover regression modeling and design of experiments We develop and illustrate the statistical techniques and thought processes needed to design a research study or experiment and then analyze the data collected using an intuitive and proven four-step approach This should be especially helpful to graduate students conducting their MS thesis and PhD dissertation research Major Features of Textbook Learning from Data In this text, we approach the study of statistics by considering a four-step process by which we can learn from data: Designing the Problem Collecting the Data Summarizing the Data Analyzing Data, Interpreting the Analyses, and Communicating the Results xi 17582_10_ch10_p499-571.qxd 11/25/08 4:12 PM Page 557 10.11 Exercises 557 Age Up to 39 40 and Over Total 38 82 42 88 80 170 120 130 250 Promoted Not Promoted Total The results from Minitab are shown here: Tabulated statistics: Promoted, Age Using frequencies in Counts Rows: Promoted Columns: Age 39 or Less 40 and Over All No 82 48.24 68.33 81.60 170 88 51.76 100.00 68.00 67.69 88.40 170.00 Yes 38 47.50 31.67 38.40 42 80 52.50 100.00 32.31 32.00 41.60 80.00 All 120 130 250 48.00 52.00 100.00 100.00 100.00 100.00 120.00 130.00 250.00 Cell Contents: Count % of Row % of Column Expected count Pearson Chi-Square = 0.012, DF = 1, P-Value = 0.914 Likelihood Ratio Chi-Square = 0.012, DF = 1, P-Value = 0.914 a Is there significant evidence of an association between age and promotion decision? b What is the impact of combining the age categories? Compare the answers obtained here to the answers from Exercise 10.42 Ag 10.44 Integrated Pest Management (IPM) adopters apply significantly less insecticides and fungicides than nonadopters among grape producers The paper “Environmental and Economic Consequences of Technology Adoption: IPM in Viticulture” Agricultural Economics 18 (2008): 145 –155 contained the following adoption rates for the six states that account for most of the U.S production A survey of 712 grape-producing growers asked whether or not the growers were using an IPM program on the farms State IPM Adopted IPM Not Adopted Total Cal Mich New York Oregon Penn Wash Total 39 92 55 69 19 114 22 88 24 83 30 77 189 523 131 124 133 110 107 107 712 17582_10_ch10_p499-571.qxd 558 11/25/08 4:12 PM Page 558 Chapter 10 Categorical Data The results from Minitab are shown here: Tabulated statistics: IPM Adopted, State Rows: IPM Adopted CAL Columns: State MICH NewYork Oregon PENN WASH ALL No 92 17.59 70.23 96.23 69 13.19 55.65 91.08 114 21.80 85.71 97.70 88 16.83 80.00 80.80 83 15.87 77.57 78.60 77 523 14.72 100.00 71.96 73.46 78.60 523.00 Yes 39 20.63 29.77 34.77 55 29.10 44.35 32.92 19 10.05 14.29 35.30 22 11.64 20.00 29.20 24 12.70 22.43 28.40 30 189 15.87 100.00 28.04 26.54 28.40 189.00 All 124 131 18.40 17.42 100.00 100.00 131.00 124.00 Cell Contents: 133 110 107 107 712 18.68 15.45 15.03 15.03 100.00 100.00 100.00 100.00 100.00 100.00 133.00 110.00 107.00 107.00 712.00 Count % of Row % of Column Expected count Pearson Chi-Square = 34.590, DF = 5, P-Value = 0.000 Likelihood Ratio Chi-Square = 34.131, DF = 5, P-Value = 0.000 a Provide a graphical display of the data b Is there significant evidence that the proportion of grape farmers who have adopted IPM is different across the six states? Ag 10.45 Refer to Exercise 10.44 Suppose that the grape farmers in the states California, Michigan, and Washington were provided with information about the effectiveness of IPM by the county agents; whereas the farmers in the remaining states were not a Is there significant evidence that providing information about IPM is associated with a higher adoption rate? b Discuss why or why not your conclusion in part (b) provides justification for expanding the program for county agents to discuss IPM with grape farmers to other states Soc 10.46 Social scientists have produced convincing evidence that parental divorce is negatively associated with the educational success of their children The paper “Maternal Cohabitation and Educational Success” in Sociology of Education 78 (2005): 144 –164 describes a study that addresses the impact of cohabiting mothers on the success of their children in graduating from high school The following table displays the educational outcome by type of family for 1,168 children Type of Family Two Parent HighSch Grad Single-Parent Step-Parent Always Divorce No Cohab With Cohab Total Yes No 407 45 61 16 231 29 124 11 193 51 1,016 152 Total 452 77 260 135 244 1,168 17582_10_ch10_p499-571.qxd 11/25/08 4:12 PM Page 559 10.11 Exercises 559 The results from Minitab are shown here: Rows: HighSchoolGrad 2-Parent Columns: Family Type NoCohab Single-Always Single-Divorce WithCohab All No 45 29.61 9.96 58.8 11 7.24 8.15 17.6 16 10.53 20.78 10.0 29 19.08 11.15 33.8 51 152 33.55 100.00 20.90 13.01 31.8 152.0 Yes 407 40.06 90.04 393.2 124 12.20 91.85 117.4 61 6.00 79.22 67.0 231 22.74 88.85 226.2 193 1016 19.00 100.00 79.10 86.99 212.2 1016.0 All 452 38.70 100.00 452.0 135 11.56 100.00 135.0 77 6.59 100.00 77.0 260 22.26 100.00 260.0 244 1168 20.89 100.00 100.00 100.00 244.0 1168.0 Cell Contents: Count % of Row % of Column Expected count Pearson Chi-Square = 24.864, DF = 4, P–Value = 0.000 Likelihood Ratio Chi-Square = 23.247, DF = 4, P-Value = 0.000 a Display the above data in a graph to demonstrate any differences in the proportion of high school graduates across family types b Is there significant evidence that the proportion of students who graduate from high school is different across the various family types? Soc 10.47 Refer to Exercise 10.46 For those students living within a stepparent family does cohabitation appear to affect high school graduation rates? 10.6 Measuring Strength of Relation 10.48 Refer to Exercise 10.40 Describe the type of relation that exists between the categories of universities and the ratings of recent graduates of the universities 10.49 Refer to Exercise 10.42 Describe the type of relation that exists between the age of middle managers and the proportion of middle managers who were promoted 10.50 Refer to Exercise 10.44 Describe the type of relation that exists between the various states and the proportion of farms in which an IPM program was adopted 10.51 Refer to Exercise 10.46 Describe the type of relation that exists between the family type and the proportion of students who graduated from high school 10.7 Odds and Odds Ratios Med 10.52 A food-frequency questionnaire is used to measure dietary intake The respondent specifies the number of servings of various food items they consumed over the previous week The dietary cholesterol is then quantified for each respondent The researchers were interested in assessing if there was an association between dietary cholesterol intake and high blood pressure In a large sample of individuals who had completed the questionnaire, 250 persons with a high dietary cholesterol intake (greater than 300 mg/day) were selected and 250 persons with a low dietary cholesterol intake (less than 300 mg/day) were selected The 500 selected participants had their medical history taken and were classified as having normal or high blood pressure The data are given here 17582_10_ch10_p499-571.qxd 560 11/25/08 4:12 PM Page 560 Chapter 10 Categorical Data Blood Pressure Dietary Cholesterol High Low Total High Low 159 78 91 172 250 250 Total 237 263 500 a Compute the difference in the estimated risk of having high blood pressure (pˆ Ϫ pˆ 2) for the two groups (low versus high dietary cholesterol intake) b Compute the estimated relative risk of having high blood pressure ΂pˆ ΃ for the two p ˆ1 groups (low versus high dietary cholesterol intake) c Compute the estimated odds ratio of having high blood pressure for the two groups (low versus high dietary cholesterol intake) d Based on your results from (a)–(c), how the two groups compare? Med 10.53 Refer to Exercise 10.52 a Is there a significant difference between the low and high dietary cholesterol intake groups relative to their risk of having high blood pressure? Use ␣ ϭ 05 b Place a 95% confidence interval on the odds ratio of having high blood pressure What can you conclude about the odds of having high blood pressure for the two groups? c Are your conclusions in (a) and (b) consistent? Safety 10.54 The article “Who Wants Airbags” in Chance 18 (2005): –16 discusses whether air bags should be mandatory equipment in all new automobiles Using data from the National Highway Traffic Safety Administration (NHTSA), they obtain the following information about fatalities and the usage of air bags and seat belts All passenger cars sold in the U.S starting in 1998 are required to have air bags NHTSA estimates that air bags have saved 10,000 lives as of January 2004 The authors examined accidents in which there was a harmful event (personal or property), and from which at least one vehicle was towed After some screening of the data, they obtained the following results (The authors detail in their article the types of screening of the data that was done.) Air Bag Installed Yes No Total Killed Survived 19,276 5,723,539 27,924 4,826,982 47,200 10,550,521 Total 5,742,815 4,854,906 10,597,721 a Calculate the odds of being killed in a harmful event car accident for a vehicle with and without air bags Interpret the two odds b Calculate the odds ratio of being killed in a harmful event car accident with and without air bags What does this ratio tell you about the importance of having air bags in a vehicle? c Is there significant evidence of a difference between vehicles with and without air bags relative to the proportion of persons killed in a harmful event vehicle accident? Use a ϭ 05 d Place a 95% confidence interval on the odds ratio Interpret this interval 10.55 Refer to Exercise 10.54 The authors also collected information about accidents concerning seat belt usage The article compared fatality rates for occupants using seat belts properly with those for occupants not using seat belts The data are given here 17582_10_ch10_p499-571.qxd 11/25/08 4:12 PM Page 561 10.11 Exercises 561 Seat Belt Usage Seat Belt No Seat Belt Total Killed Survived 16,001 7,758,634 31,199 2,791,887 47,200 10,550,521 Total 7,774,635 2,823,086 10,597,721 a Calculate the odds of being killed in a harmful event car accident for a vehicle in which occupants were using seat belts and those who were not using seat belts Interpret the two odds b Calculate the odds ratio of being killed in a harmful event car accident with and without seat belts being used properly What does this ratio tell you about the importance of using seat belts? c Is there significant evidence of a difference between vehicles with and without proper seat belt usage relative to the proportion of persons killed in a harmful event vehicle accident? Use a ϭ 05 d Place a 95% confidence interval on the odds ratio Interpret this interval 10.56 Refer to Exercises 10.54 and 10.55 Which of the two safety devices appears to be more effective in preventing a death during an accident? Justify your answer using the information from the previous two exercises 10.57 Refer to Exercises 10.54 and 10.55 To obtain a more accurate picture of the impact of air bags on preventing deaths, it is necessary to account for the effect of occupants using both seat belts and air bags If the occupants of the vehicles in which air bags are installed are more likely to be also wearing seat belts, then it is possible that some of the apparent effectiveness of the air bags is in fact due to the increased usage of seat belts Thus, one more ϫ table is necessary, the table displaying a comparison of proper seat belt usage for occupants with air bags available with those for occupants without air bags available That data are given here Seat Belt Usage Air Bags Seat Belt No Seat Belt Total Yes No 4,871,940 2,902,694 870,875 1,952,211 5,742,815 4,854,905 Total 7,774,634 2,823,086 10,597,720 a Is there significant evidence of an association between air bag installation and the proper usage of seat belts? Use a ϭ 05 b Provide justification for your results in part (a) 10.58 With reference to the information provided in Exercises 10.54, 10.55, and 10.57, there was one more question of interest to the researchers If people in cars with air bags are more likely to be wearing seat belts, then how much of the improvement in fatality rates with air bags is really due to seat belt usage? The harmful event fatalities were then classified according to both availability of air bags and seat belt usage The data are given here Seat Belt Usage Air Bags Yes No Total Seat Belt No Seat Belt Total 8,626 7,374 10,650 20,550 19,276 27,924 16,000 31,200 47,200 17582_10_ch10_p499-571.qxd 562 11/25/08 4:12 PM Page 562 Chapter 10 Categorical Data a Use the information in the previous table and the data from Exercise 10.57 to compute the fatality rates for the four air bag and seat belt combinations b Describe the confounding effect of seat belt usage on the effect of air bags on reducing fatalities Supplementary Exercises 10.59 The following experiment is from the book Small Data Sets A genetics experiment was run in which the characteristics of tomato plants were recorded for the numbers of offspring expressing four phenotypes Phenotype Frequency Tall, cut-leaf 926 Dwarf, cut-leaf 293 Tall, potato-leaf 288 Dwarf, potato-leaf 104 Total 1,611 a State the null hypothesis that the theoretical occurrence of the phenotypes should be in the proportion 9:3:3:1 b Test the null hypothesis in part (a) at the a ϭ 05 level 10.60 Another study from the book Small Data Sets describes the family structure in the Hutterite Brethren, a religious group that is essentially a closed population with nearly all marriages involving members of the group The researchers were interested in studying the offsprings of such families The following data list the distribution of sons in families with children Number of Sons Frequency 0 14 25 21 22 a Test the hypothesis that the number of sons in a family of children follows a binomial distribution with p ϭ Use a ϭ 05 b Suppose that p is unspecified Evaluate the general fit of a binomial distribution Using the p-value from your test statistic, comment on the adequacy of using a binomial model for this situation c Compare your results from parts (a) and (b) 10.61 The following study is from the book Small Data Sets Data were collected to determine if a horse’s chances of winning a race are affected by its starting position relative to the inside rail of the track The following data give the starting position of the winning horse in 144 races, where position is closest to the inside rail of the track and position is farthest from the inside rail Starting Position Frequency of Winners 29 19 18 25 17 10 15 11 a State the null hypothesis that there is no difference in the chance of winning based on starting position b Test the null hypothesis from (a) at an a ϭ 05 level 17582_10_ch10_p499-571.qxd 11/25/08 4:12 PM Page 563 10.11 Exercises 563 10.62 An entomologist was interested in determining if Colorado potato beetles were randomly distributed over a potato field or if they tended to appear in clusters The field was gridded into evenly spaced squares and counts of the beetle were conducted The following data give the number of squares in which beetles, beetle, beetles, etc were observed If the appearance of the potato beetle is random, a Poisson model should provide a good fit to the data Starting Position 678 Number of Squares 227 56 28 or More 14 Total 1,011 a The average number of beetles per square is 0.5 Does the Poisson distribution provide a good fit to the data? b Based on your results in (a) the Colorado potato beetles appear randomly across the field? Soc 10.63 A speaker who advises managers on how to avoid being unionized claims that only 25% of industrial workers favor union membership, 40% are indifferent, and 35% are opposed In addition, the adviser claims that these opinions are independent of actual union membership A random sample of 600 industrial workers yields the following data: Favor Indifferent Opposed Total Members Nonmembers 140 70 42 198 18 132 200 400 Total 210 240 150 600 a What part of the data are relevant to the 25%, 40%, 35% claims? b Test this hypothesis using a ϭ 01 10.64 What can be said about the p-value in Exercise 10.63? 10.65 Test the hypothesis of independence in the data of Exercise 10.63 How conclusively is it rejected? 10.66 Calculate (for the data of Exercise 10.63) percentages of workers in favor of unionization, indifferent to it, and opposed to it; so separately for members and for nonmembers Do the percentages suggest there is a strong relation between membership and opinion? Pol Sci 10.67 Three different television commercials are advertising an established product The commercials are shown separately to theater panels of consumers; each consumer views only one of the possible commercials and then states an opinion of the product Opinions range from (very favorable) to (very unfavorable) The data are as follows Opinion Commercial Total A B C 32 53 41 87 141 93 91 76 67 46 20 36 44 10 63 300 300 300 Total 126 321 234 102 117 900 17582_10_ch10_p499-571.qxd 564 11/25/08 4:12 PM Page 564 Chapter 10 Categorical Data a Calculate expected frequencies under the null hypothesis of independence b How many degrees of freedom are available for testing this hypothesis? c Is there evidence that the opinion distributions are different for the various commercials? Use a ϭ 01 10.68 State bounds on the p-value for Exercise 10.67 10.69 In your judgment, is there a strong relation between type of commercial and opinion in the data of Exercise 10.67? Support your judgment with computations of percentages and a l value Bus 10.70 A direct-mail retailer experimented with three different ways of incorporating order forms into its catalog In type catalogs, the form was at the end of the catalog; in type 2, it was in the middle; and in type 3, there were forms both in the middle and at the end Each form was sent to a sample of 1,000 potential customers, none of whom had previously bought from the retailer A code on each form allowed the retailer to determine which type it was; the number of orders received on each type of form was recorded Minitab was used to calculate expected frequencies and the x2 statistic Minitab output yields the following output Tabulated statistics: Received, Type of Form Rows: Received Columns: Type of Form No 944 33.48 94.40 940 961 34.08 96.10 940 915 2820 32.45 100.00 91.50 94.00 940 2820 Yes 56 31.11 5.60 60 39 21.67 3.90 60 180 85 47.22 100.00 6.00 8.50 180 60 All All 1000 1000 1000 3000 33.33 33.33 33.33 100.00 100.00 100.00 100.00 100.00 1000 1000 3000 1000 Cell Contents: Count % of Row % of Column Expected count Pearson Chi-Square = 19.184, DF = 2, P-Value = 0.000 Likelihood Ratio Chi-Square = 19.037, DF = 2, P-Value = 0.000 Is there significant evidence that the proportion of order forms received differs for the three types of forms? Bus 10.71 Describe the strength of the relation between proportion of forms received and type of forms for the data in Exercise 10.70 Bus 10.72 A programming firm had developed a more elaborate, more complex version of its spreadsheet program A “beta test” copy of the program was sent to a sample of users of the current program From information supplied by the users, the firm rated the sophistication of each user; indicated standard, basic applications of the program and indicated the most complex applications Each user indicated a preference between the current version and the test version, with indicating a strong preference for the current version, indicating no particular preference 17582_10_ch10_p499-571.qxd 11/25/08 4:12 PM Page 565 10.11 Exercises 565 between the two versions, and indicating a strong preference for the new version The data were analyzed using JMP IN Partial output is shown here SOPHIST By PREFER Crosstabs SOPHIST PREFER Count 32 28 32.99 28.87 10 24 16.67 40.00 6.06 12.12 44 56 Tests Source DF Model Error 180 C Total 188 Total Count 17 17.53 16 26.67 15.15 38 –LogLikelihood 19.91046 172.23173 192.14219 190 Test Likelihood Ratio Pearson ChiSquare 39.821 44.543 12 12.37 10.00 24.24 26 8.25 6.67 14 42.42 26 Row % 97 60 33 190 RSquare (U) 0.1036 Prob>ChiSq

Ngày đăng: 18/05/2017, 10:17

Xem thêm: Ebook An introduction to statistical methods and data analysis (6th edition) Part 1, Ebook An introduction to statistical methods and data analysis (6th edition) Part 1

Ebook An introduction to statistical methods and data analysis (6th edition) Part 1

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan