Statistical analysis microsoft excel 2016

C o n t e n t s a t a G l a n c e Introduction 1 About Variables and Values .9 How Values Cluster Together 37 Variability: How Values Disperse 65 How Variables Move Jointly: Correlation 85 Charting Statistics 121 How Variables Classify Jointly: Contingency Tables 139 Using Excel with the Normal Distribution .181 Statistical Analysis: Microsoft Excel® 2016 Telling the Truth with Statistics .211 Testing Differences Between Means: The Basics 235 10 Testing Differences Between Means: Further Issues 263 11 Testing Differences Between Means: The Analysis of Variance .299 12 Analysis of Variance: Further Issues .329 13 Experimental Design and ANOVA 349 14 Statistical Power 377 15 Multiple Regression Analysis and Effect Coding: The Basics 401 16 Multiple Regression Analysis and Effect Coding: Further Issues 431 17 Analysis of Covariance: The Basics 479 18 Analysis of Covariance: Further Issues 499 Index .521 Conrad Carlberg 800 East 96th Street, Indianapolis, Indiana 46240 USA Statistical Analysis: Microsoft Excelđ 2016 Copyright â 2018 by Pearson Education, Inc All rights reserved No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher No patent liability is assumed with respect to the use of the information contained herein Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions Nor is any liability assumed for damages resulting from the use of the information contained herein ISBN-13: 978-0-7897-5905-4 ISBN-10: 0-7897-5905-5 Library of Congress Control Number: 2017955944 Printed in the United States of America 17 Trademarks All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized Que Publishing cannot attest to the accuracy of this information Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark Editor-in-Chief Greg Wiegand Acquisitions Editor Trina MacDonald Development Editor Charlotte Kughen Managing Editor Sandra Schroeder Project Editor Mandie Frank Copy Editor Chuck Hutchinson Indexer Erika Millen Proofreader Abigail Manheim Technical Editor Michael Turner Editorial Assistant Courtney Martin Warning and Disclaimer Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied The information provided is on an “as is” basis The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book Special Sales For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at corpsales@pearsoned com or (800) 382-3419 For government sales inquiries, please contact governmentsales@pearsoned.com For questions about sales outside the U.S., please contact intlcs@pearsoned.com Designer Chuti Prasertsith Compositor codeMantra Microsoft and/or its respective suppliers make no representations about the suitability of the information contained in the documents and related graphics published as part of the services for any purpose All such documents and related graphics are provided “as is” without warranty of any kind Microsoft and/ or its respective suppliers hereby disclaim all warranties and conditions with regard to this information, including all warranties and conditions of merchantability, whether express, implied or statutory, fitness for a particular purpose, title and non-infringement In no event shall Microsoft and/or its respective sup-pliers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of information available from the services The documents and related graphics contained herein could include technical inaccuracies or typographical errors Changes are periodically added to the information herein Microsoft and/or its respective sup-pliers may make improvements and/or changes in the product(s) and/or the program(s) described herein at any time Partial screenshots may be viewed in full within the software version specified Microsoft® and Windows® are registered trademarks of the Microsoft Corporation in the U.S.A and other countries Screenshots and icons reprinted with permission from the Microsoft Corporation This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation Contents Introduction Using Excel for Statistical Analysis .1 About You and About Excel Clearing Up the Terms Making Things Easier The Wrong Box? Wagging the Dog What’s in This Book About Variables and Values Variables and Values Recording Data in Lists 10 Making Use of Lists 11 Scales of Measurement 13 Category Scales 13 Numeric Scales 15 Telling an Interval Value from a Text Value .16 Charting Numeric Variables in Excel 18 Charting Two Variables 18 Understanding Frequency Distributions 21 Using Frequency Distributions 23 Building a Frequency Distribution from a Sample .26 Building Simulated Frequency Distributions 34 How Values Cluster Together 37 Calculating the Mean .38 Understanding Functions, Arguments, and Results 39 Understanding Formulas, Results, and Formats 42 Minimizing the Spread 44 Calculating the Median 49 Choosing to Use the Median 50 Static or Robust? 51 Calculating the Mode .52 Getting the Mode of Categories with a Formula 56 From Central Tendency to Variability .63 Variability: How Values Disperse 65 Measuring Variability with the Range .66 Sample Size and the Range .67 Variations on the Range 69 The Concept of a Standard Deviation .70 Arranging for a Standard 71 Thinking in Terms of Standard Deviations 72 Contents v Calculating the Standard Deviation and Variance 74 Squaring the Deviations 77 Population Parameters and Sample Statistics 78 Dividing by N − 79 Bias in the Estimate and Degrees of Freedom 81 Excel’s Variability Functions .82 Standard Deviation Functions 82 Variance Functions 83 How Variables Move Jointly: Correlation .85 Understanding Correlation 85 The Correlation, Calculated 87 Using the CORREL() Function .93 Using the Analysis Tools 96 Using the Correlation Tool 98 Correlation Isn’t Causation .101 Using Correlation 102 Removing the Effects of the Scale 103 Using the Excel Function 106 Getting the Predicted Values 107 Getting the Regression Formula 109 Using TREND() for Multiple Regression 111 Combining the Predictors 111 Understanding “Best Combination” 112 Understanding Shared Variance 116 A Technical Note: Matrix Algebra and Multiple Regression in Excel .118 Charting Statistics 121 Characteristics of Excel Charts 122 Chart Axes .122 Date Variables on Category Axes .123 Other Numeric Variables on a Category Axis 125 Histogram Charts 127 Using a Pivot Table to Count the Records 127 Using Advanced Filter and FREQUENCY() 129 The Data Analysis Add-in’s Histogram .131 The Built-in Histogram 132 Data Series Addresses 133 Box-and-Whisker Plots 134 Managing Outliers 137 Diagnosing Asymmetry 137 Comparing Distributions 138 vi Statistical Analysis: Microsoft Excel® 2016 How Variables Classify Jointly: Contingency Tables 139 Understanding One-Way Pivot Tables 139 Running the Statistical Test .143 Making Assumptions .148 Random Selection 148 Independent Selections 150 The Binomial Distribution Formula 150 Using the BINOM.INV() Function .152 Understanding Two-Way Pivot Tables 158 Probabilities and Independent Events .161 Testing the Independence of Classifications 163 About Logistic Regression 168 The Yule Simpson Effect 169 Summarizing the Chi-Square Functions .171 Using CHISQ.DIST() 171 Using CHISQ.DIST.RT() and CHIDIST() .173 Using CHISQ.INV() 174 Using CHISQ.INV.RT() and CHIINV() 175 Using CHISQ.TEST() and CHITEST() 176 Using Mixed and Absolute References to Calculate Expected Frequencies 177 Using the Pivot Table’s Index Display 178 Using Excel with the Normal Distribution 181 About the Normal Distribution 181 Characteristics of the Normal Distribution .181 The Unit Normal Distribution 186 Excel Functions for the Normal Distribution 187 The NORM.DIST( ) Function 187 The NORM.INV( ) Function .190 Confidence Intervals and the Normal Distribution 192 The Meaning of a Confidence Interval .193 Constructing a Confidence Interval 194 Excel Worksheet Functions That Calculate Confidence Intervals 198 Using CONFIDENCE.NORM( ) and CONFIDENCE( ) 198 Using CONFIDENCE.T( ) 201 Using the Data Analysis Add-In for Confidence Intervals 202 Confidence Intervals and Hypothesis Testing 204 The Central Limit Theorem 205 Dealing with a Pivot Table Idiosyncrasy 206 Making Things Easier .207 Making Things Better 209 Contents vii Telling the Truth with Statistics 211 A Context for Inferential Statistics 212 Establishing Internal Validity 213 Threats to Internal Validity 214 Problems with Excel’s Documentation 218 The F-Test Two-Sample for Variances 219 Why Run the Test? 220 Reproducibility 232 A Final Point 234 Testing Differences Between Means: The Basics .235 Testing Means: The Rationale 236 Using a z-Test 237 Using the Standard Error of the Mean .240 Creating the Charts 244 Using the t-Test Instead of the z-Test 252 Defining the Decision Rule .254 Understanding Statistical Power .258 10 Testing Differences Between Means: Further Issues 263 Using Excel’s T.DIST() and T.INV() Functions to Test Hypotheses 263 Making Directional and Nondirectional Hypotheses 264 Using Hypotheses to Guide Excel’s t-Distribution Functions 265 Completing the Picture with T.DIST() 273 Using the T.TEST() Function 275 Degrees of Freedom in Excel Functions 275 Equal and Unequal Group Sizes .276 The T.TEST() Syntax 278 Using the Data Analysis Add-in t-Tests 291 Group Variances in t-Tests .291 Visualizing Statistical Power 297 When to Avoid t-Tests .298 11 Testing Differences Between Means: The Analysis of Variance 299 Why Not t-Tests? 299 The Logic of ANOVA .301 Partitioning the Scores 302 Comparing Variances .305 The F-Test 309 Using Excel’s F Worksheet Functions .312 Using F.DIST() and F.DIST.RT() 312 Using F.INV() and FINV() 314 The F-Distribution 315 viii Statistical Analysis: Microsoft Excel® 2016 Unequal Group Sizes 316 Multiple Comparison Procedures 318 The Scheffé Procedure .320 Planned Orthogonal Contrasts 324 12 Analysis of Variance: Further Issues 329 Factorial ANOVA 329 Other Rationales for Multiple Factors 330 Using the Two-Factor ANOVA Tool 333 The Meaning of Interaction 335 The Statistical Significance of an Interaction 336 Calculating the Interaction Effect 338 The Problem of Unequal Group Sizes .342 Repeated Measures: The Two Factor Without Replication Tool .345 Excel’s Functions and Tools: Limitations and Solutions 346 Mixed Models 347 Power of the F-Test 348 13 Experimental Design and ANOVA .349 Crossed Factors and Nested Factors .349 Depicting the Design Accurately 351 Nuisance Factors 352 Fixed Factors and Random Factors .352 The Data Analysis Add-In’s ANOVA Tools 354 Data Layout .356 Calculating the F Ratios 357 Adapting the Data Analysis Tool for a Random Factor .357 Designing the F-Test 358 The Mixed Model: Choosing the Denominator .359 Adapting the Data Analysis Tool for a Nested Factor .361 Data Layout for a Nested Design 362 Getting the Sums of Squares 363 Calculating the F Ratio for the Nesting Factor 363 Randomized Block Designs 364 Interaction Between Factors and Blocks 366 Tukey’s Test for Nonadditivity 368 Increasing Statistical Power 369 Blocks as Fixed or Random 370 Split-Plot Factorial Designs 371 Assembling a Split-Plot Factorial Design 371 Analysis of the Split-Plot Factorial Design .372 Contents ix 14 Statistical Power 377 Controlling the Risk .377 Directional and Nondirectional Hypotheses .378 Changing the Sample Size .378 Visualizing Statistical Power 378 The Statistical Power of t-Tests 382 Nondirectional Hypotheses 382 Making a Directional Hypothesis .385 Increasing the Size of the Samples 387 The Dependent Groups t-Test 387 The Noncentrality Parameter in the F-Distribution 389 Variance Estimates 389 The Noncentrality Parameter and the Probability Density Function 393 Calculating the Power of the F-Test .395 Calculating the Cumulative Density Function 396 Using Power to Determine Sample Size 397 15 Multiple Regression Analysis and Effect Coding: The Basics 401 Multiple Regression and ANOVA 402 Using Effect Coding 404 Effect Coding: General Principles .404 Other Types of Coding .406 Multiple Regression and Proportions of Variance 406 Understanding the Segue from ANOVA to Regression .409 The Meaning of Effect Coding 411 Assigning Effect Codes in Excel 414 Using Excel’s Regression Tool with Unequal Group Sizes .416 Effect Coding, Regression, and Factorial Designs in Excel 418 Exerting Statistical Control with Semipartial Correlations .420 Using a Squared Semipartial to Get the Correct Sum of Squares .421 Using TREND() to Replace Squared Semipartial Correlations 422 Working with the Residuals 424 Using Excel’s Absolute and Relative Addressing to Extend the Semipartials 426 16 Multiple Regression Analysis and Effect Coding: Further Issues 431 Solving Unbalanced Factorial Designs Using Multiple Regression 431 Variables Are Uncorrelated in a Balanced Design 433 Variables Are Correlated in an Unbalanced Design 434 Order of Entry Is Irrelevant in the Balanced Design 435 Order Entry Is Important in the Unbalanced Design 437 Proportions of Variance Can Fluctuate .439 540 multiple regression sum of squares residual, 456–458 zero constant, forcing, 466–467 matrix algebra and, 118–119 orthogonal coding, 406 predictors, combining, 111–113 proportions of variance calculating, 406–409 variance estimates via ANOVA, 410–411 regression equation, 113 shared variance, 116–118 TREND() function, 113 unbalanced factorial designs, 431–440 order entry in, 437–439 proportions of variance, 439–440 unequal group sizes in observational research, 476–478 Regression tool with, 416–417 in true experiment, 474–476 N N - degrees of freedom, 79–80 naming conventions for functions, 83–84 nCr formula, 151 negative correlation, 85 negative R2, 470–474 negative skew, 134 negative z-scores, 74 negatively skewed distributions, 22–23, 182 nested factors, 330, 349–352, 361–364 new_x’s, 107 nominal scales, 13–15 nonadditivity, Tukey’s test for, 368–369 noncentral F-distributions, 348, 391–393 noncentrality parameter, 348 definition of, 389 PDF (probability density function), 393–395 nondirectional hypothesis, 264–265, 378 F-Test Two-Sample for Variances, 229–231 statistical power, 382–385 nonparametrics, 15 normal approximation to the binomial, 209–210 normal distribution, 23, 181 central limit theorem, 208 calculating, 205–206 exact binomial probability, 207–209 normal approximation to the binomial, 209–210 characteristics of, 181–182 kurtosis, 184–186 skewness, 182–184 confidence intervals CONFIDENCE() function, 198–201 CONFIDENCE.NORM() function, 198–201 CONFIDENCE.T() function, 201–202 constructing, 194–198 Data Analysis add-in for, 202–204 definition of, 192–193 hypothesis testing, 204–205 meaning of, 193–194 exact binomial probability, 207–208 normal approximation to the binomial, 209–210 NORM.DIST() function, 187 cumulative probability, requesting, 188 partitioning scores point estimate, requesting, 189 syntax, 187 NORMDIST() function, 189, 192 NORM.INV() function, 190–191 NORM.S.DIST() function, 191 NORM.S.INV() function, 192 NORMSINV() function, 192 pivot table labels, 206–207 standard normal distribution, 393 t-distribution, 184 t-tests and, 285 unit normal distribution, 186–187 NORM.DIST() function, 187, 243, 246–247 cumulative probability, requesting, 188 point estimate, requesting, 189 syntax, 187 NORMDIST() function, 24, 189, 192, 244, 246–247 NORM.INV() function, 190–191, 256–257 NORM.S.DIST() function, 191 NORM.S.INV() function, 192 NORMSINV() function, 192 nuisance factors, 352 null hypothesis, 143, 155–156, 237 accepting hypothesis, 384 rejecting, 258, 384 numeric scales, 15–18 numeric variables, charting Bar charts, 14–15 Column charts, 13–14 frequency distributions building from sample, 26–34 definition of, 21 descriptive statistics, 23–25 examples of, 21–22 negatively skewed, 22–23 positively skewed, 22–23 simulated frequency distributions, 34–36 pivot charts, 11–13 XY (Scatter) charts, 18–20 O observational studies multiple regression and, 440–443 unequal group sizes in, 476–478 one-tailed hypothesis See directional hypothesis one-way pivot tables assumptions, making independent selection, 150 random selection, 148–150 BINOM.DIST() function, 143–146, 157 binomial distribution formula, 150–151 BINOM.INV() function, 152–154, 157 creating, 139–142 decision rules, setting, 147 hypothesis testing, 143, 155–157 order entry a priori ordering approach, 443 in balanced designs, 435–437 in unbalanced designs, 437–439 ordinal scales, 15 orthogonal contrasts, 324–327 outliers, managing, 137 P parameters, population, 78–79 partial correlations See semipartial correlations partitioning scores, 302–305 sum of squares between groups calculating, 303–304 variance based on, 306–309 541 542 partitioning scores sum of squares within groups calculating, 304–305 variance based on, 305–306 PDF (probability density function), 393–395 Pearson, Karl, 88, 103, 169–170, 183 PEARSON() function, 88 PERCENTILE.INC() function, 69–70 pivot charts, 11–13, 38, 128 pivot tables, 11–13, 38 assumptions, making independent selection, 150 random selection, 148–150 chi-square distribution, 163 calculating, 164–166 CHIDIST() function, 173–174 CHIINV() function, 175 CHISQ.DIST() function, 166–168, 171–172 CHISQ.DIST.RT() function, 168, 173–174 CHISQ.INV() function, 166–168, 174–175 CHISQ.INV.RT() function, 175 CHISQ.TEST() function, 163–164, 176 CHITEST() function, 176 counting records with, 127–129 expected frequencies, calculating, 177 grouping data, 31–34 grouping data with, 31–34 Index display, 178–179 labels, 206–207 one-way pivot tables BINOM.DIST() function, 143–146, 157 binomial distribution formula, 150–151 BINOM.INV() function, 152–154, 157 creating, 139–142 decision rules, setting, 147 hypothesis testing, 143, 155–157 Recommended Pivot Tables button, 140–141 two-way pivot tables chi-square distribution, 163–168 creating, 158–161 logistic regression, 168–169 probabilities and independent events, 161–162 Yule Simpson effect, 169–171 planned contrasts, 324–327, 512–514 platykurtic curves, 184 point estimate, requesting, 189 pooled variance, 268 population, 237 population parameters, 78–79 population values, standard error of the mean chart, 246–247 positive correlation, 85 positive skew, 134 positive z-scores, 74 positively skewed distributions, 22–23, 183 power See statistical power predicted values, calculating, 456–457 prediction errors, calculating, 456–457 predictions See regression Predictive Analytics: Microsoft Excel, 169 predictors, combining, 111–113 a priori ordering approach, 443 probability binomial probability exact binomial probability, 207–208 normal approximation to the binomial, 209–210 regression cumulative probability, requesting, 188 independent events and, 161–162 probability density function (PDF), 189, 393–395 probability mass function, 189 t-tests, 290 probability density function (PDF), 189, 393–395 probability mass function, 189 proportional cell frequencies, 344 proportions of variance calculating, 406–409 in unbalanced designs, 439–440 variance estimates via ANOVA, 410 variance estimates via regression, 410–411 Q Q (semi-interquartile range), 70 QR decomposition, 463–465 QUARTILE.INC() function, 69 Quick Analysis tool, 13 R R , 116–117 calculating, 458–459, 473–474 negative R2, 470–474 r2, 116–117 RAND() function, 148–150 random factors, 233, 347, 352–353 random selection, 148–150 randomized block designs, 364–365 fixed versus random blocks, 370–371 interaction between factors and blocks, 366–367 statistical power, 369–370 Tukey’s test for nonadditivity, 368–369 range definition of, 66–67 interquartile range (IQR), 69–70, 137 range finder, 134 sample size and, 67–68 semi-interquartile range, 70 range finder, 134, 245 ratio scales, 16 Recommended PivotTables feature, 13, 140–141 recording data in, 10–12 records, counting FREQUENCY() function, 129–131 with pivot tables, 127–129 reducing bias, 480–481, 493–498 regression, 235–236 “best combination” linear combinations, creating, 112–113 LINEST() function, 113–116 calculating manually, 103–106, 107–109 common regression line, testing for, 489–493 definition of, 102–103 internal validity and, 215–216 logistic regression, 168–169 multiple regression, 111–116, 401–402 See also LINEST() function ANOVA (analysis of variance) and, 402–403 dummy coding, 406 effect coding, 404–406, 411–416 experimental designs, 440–443 factorial designs, 418–430 matrix algebra and, 118–119 orthogonal coding, 406 predictors, combining, 111–113 543 544 regression proportions of variance, 406–411 regression equation, 113 shared variance, 116–118 TREND() function, 113 unbalanced factorial designs, 431–440 unequal group sizes, 416–417, 474–478 regression formula, 109–111, 113 residual error, variance estimates via, 410–411 regression approach, 483–486 regression coefficients (LINEST() function), 444–445, 452–456 calculating, 455–456 F ratio, 459–460 SSCP (sum of squares and cross products), 453–454 regression diagnostics (LINEST() function), 458–461 R2 calculation, 458–459 standard error of estimate, 459 standard errors, 460–461 regression lines See trendlines Regression tool, 416–417, 507 rejecting hypothesis, 156–157, 384 residual error, 3, 366, 507 residuals, 20 multiple regression, 424–426 residual error, 3, 366, 507 sum of squares residual, 456–458 results of formulas, 43–44 risk, controlling, 377–378 robust statistics, 51 robustness studies, 285 S sample size range and, 67–68 statistical power and, 378, 387, 397–399 critical F value, 398–399 degrees of freedom within, 399 sum of squares between and within, 398–399 samples, building frequency distributions from grouping with FREQUENCY(), 27–30 grouping with pivot tables, 31–34 tallying sample, 26–27 relative position, 94 sampling distribution, 143, 156 relative references, 429 Sampling Techniques (Cochran), 213 removing bias, 480–481, 493–498 scales of measurement, 13 repeated measures analysis, 345–346 replication definition of, 333 Two-Factor With Replication ANOVA tool, 333–335, 345–346 reproducibility definition of, 232 F-Test Two-Sample for Variances, 232–234 research hypothesis, 237 category scales, 13–15 numeric scales, 15–18 scatter charts, 18–20, 123, 133–134 Scheffé’s method, 320–324, 507–512 score partitioning, 302–305 sum of squares between groups calculating, 303–304 variance based on, 306–309 sum of squares within groups squares, sum of calculating, 304–305 variance based on, 305–306 score ranges, 264–273 SCP (sum of squares and cross products), 119 Select Data Source dialog box, 248 selection independent selection, 150 internal validity and, 214 random selection, 148–150 semi-interquartile range, 70 semipartial correlations calculating, 420–421 extending with absolute and relative addressing, 426–430 replacing with TREND(), 422–424 squared semipartial correlations, 421–422 Set Objective field, 47 Single Factor ANOVA tool, 309–312, 354 unequal group sizes, 316–318 SKEW() function, 25, 183, 183 skewed distributions, 50, 134 negatively skewed distributions, 22–23, 182 positively skewed distributions, 22–23, 183 quantifying skewness, 183–184 standard deviation and, 182 visualizing, 182–183 SLOPE() function, 109–111 smoothing constant, 218–219 Snedecor, George, 316 Solver, 45 finding, 45–46 installing, 45–46 setting up worksheet for, 46–49 set theory, 116 Solver Results dialog box, 47 setting the alpha level, 242–243 sphericity, 346 shared variance, 116–118 split-plot factorial designs, 371 sigma, 78 significance level See alpha analysis of, 372–376 assembling, 370–372 Simpson’s paradox, 170 spread, minimizing, 44–49 simulated frequency distributions, 34–36 SQRT() function, 24–25, 76, 511 Single Factor ANOVA tool, 309–312, 354 square of standard deviation See variance single-factor ANOVA (analysis of variance), 235–236 See also factorial ANOVA (analysis of variance); F-test squared deviations, 48, 267 comparing variance, 305–309 factorial designs, 418–420 failure to find significant mean difference, 482–483 logic of, 301 multiple comparison procedures, 318–327, 507–514 multiple regression, 402–403 score partitioning, 302–305 squares, sum of covariate total sum of squares, 505 nested factors, 363 squared semipartial correlations, 421–422 extending with absolute and relative addressing, 426–430 replacing with TREND(), 422–424 SSCP (sum of squares and cross products) 545 546 squares, sum of calculating, 453–454 inverse of, 454 sum of squares between groups, 409–410 calculating, 303–304 variance based on, 306–309 sum of squares regression, 456–458 sum of squares residual, 456–458 sum of squares within groups, 409 calculating, 304–305 variance based on, 305–306 SUMSQ() function, 473 squaring deviations, 77–78 SSB (sum of squares between and within), 398–399 SSCP (sum of squares and cross products) calculating, 453–454 inverse of, 453–454 standard deviation, 70–72 See also standard error; variance bias, 81–82 calculating, 74–77 examples of, 72–74 functions, 82–83 N − degrees of freedom, 79–80 skewed distributions and, 182 standard error of the mean chart, 247 standard error calculating, 460–461 LINEST() function, 445 standard error for dependent groups, 287–288 standard error of difference in means, 268–269, 290 standard error of estimate, 447–448, 459 calculating, 459 LINEST() function, 447–448 standard error of the mean, 289 calculating, 238–240 definition of, 238 error rates and statistical tests, 242–244 t-tests, 289 visualization of underlying distributions, 241–242 standard normal distribution, 186–187, 393 Stanley, Julian, 213 statistical power, 377 directional hypothesis, 378 explained, 258–260 F-test calculating, 395 CDF (cumulative density function), 396–397 central F-distributions, 390–391 noncentral F-distributions, 391–393 noncentrality parameter, 389, 393–395 PDF (probability density function), 393–395 sample size, 397–399 variance estimates, 389–393 increasing with ANCOVA (analysis of covariance), 480, 481–490 nondirectional hypothesis, 378 randomized block designs, 369–370 risk, controlling, 377–378 sample size and, 378, 387 t-tests, 382 beta and, 260 dependent groups, 387–389 directional hypothesis, 385–386 nondirectional hypothesis, 382–385 sample size, 387 tables visualizing, 297–298, 378 actual difference between population means, 381–382 basic analysis, 379–380 no difference between population means, 380 statistical process control, 66 statistical significance, 258 See also alpha status bar, customizing, 48 STDEV() function, 2, 80, 82 STDEVA() function, 82 STDEV.P() function, 74, 80, 83 STDEVP() function, 2, 76, 80, 82 STDEVPA() function, 83 STDEV.S() function, 80, 83 SUM() function, 41 sum of squares covariate total sum of squares, 505 nested factors, 363 squared semipartial correlations, 421–422 extending with absolute and relative addressing, 426–430 replacing with TREND(), 422–424 SSCP (sum of squares and cross products) calculating, 453–454 inverse of, 454 sum of squares between groups, 398–399, 409–410 calculating, 303–304 variance based on, 306–309 sum of squares regression, 456–458 sum of squares residual, 456–458 sum of squares within groups, 398–399, 409 calculating, 304–305 variance based on, 305–306 SUMSQ() function, 473 SUMPRODUCT() function, 323 SUMSQ() function, 473 Survey Sampling (Kish), 213 susceptible statistics, 51 T tables, 11–13, 38 assumptions, making independent selection, 150 random selection, 148–150 chi-square distribution, 163 calculating, 164–166 CHIDIST() function, 173–174 CHIINV() function, 175 CHISQ.DIST() function, 166–168, 171–172 CHISQ.DIST.RT() function, 168, 173–174 CHISQ.INV() function, 166–168, 174–175 CHISQ.INV.RT() function, 175 CHISQ.TEST() function, 163–164, 176 CHITEST() function, 176 counting records with, 127–129 expected frequencies, calculating, 177 grouping data, 31–34 Index display, 178–179 labels, 206–207 one-way pivot tables BINOM.DIST() function, 143–146, 157 binomial distribution formula, 150–151 BINOM.INV() function, 152–154, 157 creating, 139–142 decision rules, setting, 147 hypothesis testing, 143, 155–157 547 548 tables Recommended Pivot Tables button, 140–141 two-way pivot tables chi-square distribution, 163–168 creating, 158–161 logistic regression, 168–169 probabilities and independent events, 161–162 Yule Simpson effect, 169–171 Tails argument (T.TEST() function), 279–285 tallying sample, 26–27 T.DIST() function, 259, 263–264, 273–274 t-distribution, 184, 252–254, 393 See also t-tests T.DIST.RT() function, 385 testing for common regression line, 489–493 error rates, manipulating, 261–262 F-test alpha, 311–312 ANOVA: Single Factor tool, 309–312 calculated F versus critical F, 311 central F-distributions, 390–391 degrees of freedom, 449 experimental designs, 357–364 factorial ANOVA, 348 F.DIST(), 227–228, 312–313 F.DIST.RT(), 226–227, 312–313 F.INV(), 227–228 F.INV.RT(), 228 F-Test Two-Sample for Variances tool, 219–234 noncentral F-distributions, 391–393 statistical power, 389–399 hypotheses, 155–157 T.DIST() function, 263–264, 273–274 T.INV() function, 263–264 independence of classifications, 163 CHIDIST() function, 173–174 CHIINV() function, 175 CHISQ.DIST() function, 166–168, 171–172 CHISQ.DIST.RT() function, 168, 173–174 CHISQ.INV() function, 166–168, 174–175 CHISQ.INV.RT() function, 175 CHISQ.TEST() function, 163–164, 176 CHITEST() function, 176 internal validity and, 215 t-tests conservative tests, 292 correlation, 289–290 critical values, 257 Data Analysis add-in, 291 definition of, 235 degrees of freedom, 275 dependent groups, 276 directional hypothesis, 264–273 error rate, manipulating, 261–262 group variances in, 291–297 independent observations, 285–287 liberal tests, 292 limitations of, 299–301 nondirectional hypothesis, 264–265 normal distributions, 285 probability, 290 robustness, 285 standard error for dependent groups, 287–288 standard error of difference in means, 290 standard error of the mean, 289 statistical power of, 258–260, 382–389 t-tests statistical power, visualizing, 297–298 T.DIST() function, 263–264, 273–274 t-distribution, 184, 252–254, 393 T.INV() function, 263–264, 270–273 t-statistics, 253, 290 T.TEST() function, 275, 278–291 unequal group variances, 276–278 when to avoid, 298 when to use, 235–236 Tukey’s test for nonadditivity, 368–369 z-tests, 256–257 text axes See category axes T.INV() function, 257, 263–264, 270–273, 380 tools ANOVA: Single Factor, 309–312, 354 ANOVA: Two-Factor With Replication, 333–335, 355 ANOVA: Two-Factor Without Replication, 345–346, 354–355 Correlation, 96–101, 219 correlation matrix, 100 main diagonal, 100 Covariance, 219 F-Test Two-Sample for Variances alpha, 223 changes to decision rule, 224–226 directional hypothesis, 229, 231 example of, 222–229 F-distribution functions, 226–229 help documentation for, 219–220 nondirectional hypothesis, 229–231 reproducibility, 232–234 variable division, 223–224 when to use, 220–222 Goal Seek, 45 Histogram, 131–132 Regression, 416–417, 507 Solver, 45 finding, 45–46 installing, 45–46 setting up worksheet for, 46–49 Tools menu, Add-Ins command, 45, 98 total cross-product, 505 total of squared deviations, 267 TRANSPOSE() function, 119, 453 transposition, 453 treatment by covariate interaction, 483–484 TREND() function, 408, 422–424 multiple regression, 111–116 simple regression, 107–109 trendlines, 20, 90, 108–109, 486 t-statistics, 253, 273, 290 T.TEST() function, 275 arrays, identifying, 279 running, 290–291 syntax, 278 Tails argument, 279–285 Type argument, 285–290 t-tests, 263 conservative tests, 292 correlation, 289–290 critical values, 257 Data Analysis add-in, 291 degrees of freedom, 275 dependent groups, 276 directional hypothesis, 264–273 critical value, determining, 270–273 pooled variance, 268 score ranges, 267 standard error of difference in means, 268–269 total of squared deviations, 267 t-statistic, comparing to critical t-statistic, 273 549 550 t-tests error rate, manipulating, 261–262 group variances in, 291–297 Equal Variances t-test, 292–295 Unequal Variances t-test, 295–297 independent observations, 285–287 liberal tests, 292 limitations of, 299–301 nondirectional hypothesis, 264–265 normal distributions, 285 probability, 290 robustness, 285 standard error for dependent groups, 287–288 standard error of difference in means, 290 standard error of the mean, 289 statistical power of, 382 beta and, 260 dependent groups, 387–389 directional hypothesis, 385–386 explained, 258–260 nondirectional hypothesis, 382–385 sample size, 387 statistical power, visualizing, 297–298 T.DIST() function, 263–264, 273–274 t-distribution, 184, 252–254, 393 T.INV() function, 263–264, 270–273 t-statistics, 253, 290 T.TEST() function, 275 arrays, identifying, 279 running, 290–291 syntax, 278 Tails argument, 279–285 Type argument, 285–290 unequal group variances, 276–278 when to avoid, 298 when to use, 235–236 Tukey, John, 69, 135, 368 Tukey’s test for nonadditivity, 368–369 two variables, charting, 18–20 Two-Factor With Replication ANOVA tool, 357–358 Two-Factor Without Replication ANOVA tool, 354–355 two-way pivot tables chi-square distribution, 163 CHIDIST() function, 173–174 CHIINV() function, 175 CHISQ.DIST() function, 166–168, 171–172 CHISQ.DIST.RT() function, 168, 173–174 CHISQ.INV() function, 166–168, 174–175 CHISQ.INV.RT() function, 175 CHISQ.TEST() function, 163–164, 176 CHITEST() function, 176 explained, 164–166 creating, 158–161 logistic regression, 168–169 probabilities and independent events, 161–162 Yule Simpson effect, 169–171 Type argument (T.TEST() function), 285–290 Type I error See alpha Type II error See beta U unbalanced factorial designs, 342, 431–440 ANOVA (analysis of variance), 316–318, 342–345 multiple regression in observational research, 476–478 Regression tool with, 416–417 in true experiment, 474–476 variance order entry in, 437–439 proportions of variance, 439–440 underlying distributions, visualizing, 241–242 underlying ranges, standard error of the mean creating charts for, 248–252 distribution of sample means, 248 horizontal axis, 245–246 mean of the sample, 248 population values, 246–247 standard deviations, 247 z-scores, 245 unequal group sizes ANOVA (analysis of variance), 316–318, 342–345 multiple regression in observational research, 476–478 Regression tool with, 416–417 in true experiment, 474–476 unequal group variances (t-tests), 276–278 Unequal Variances t-test, 295–297 unit normal distribution, 186–187 V V_1, 396 V_2, 396, 399 validity, internal establishing, 213–214 threats to, 214–217 value axes, 122–123 value clusters See clusters of values values See also variability critical values comparing, 257–258 explained, 254–256 t-tests, 257 z-tests, 256–257 definition of, 10 VAR() function, 80, 83 VARA() function, 83 variability, 65 See also variance mean deviation, 77–78 population parameters, 78–79 range definition of, 66–67 interquartile range (IQR), 69–70 sample size and, 67–68 semi-interquartile range, 70 standard deviation, 70–72 bias, 81–82 calculating, 74–77 examples of, 72–74 functions, 82–83 N − degrees of freedom, 79–80 variables date variables on category axes, 123–125 numeric variables on, 125–126 definition of, 9–10 values, 10 variance See also ANCOVA (analysis of covariance); ANOVA (analysis of variance) bias, 81–82 calculating, 74–77 definition of, 72–74 definitional formula, 75 functions, 83 group variances in t-tests, 291–297 N − degrees of freedom, 79–80 pooled variance, 268 proportions of calculating, 406–409 variance estimates via ANOVA, 410 variance estimates via regression, 410–411 551 552 variance unequal group variances, 276–278 variance error of the mean, 239 variance estimates via regression, 410–411 variance in common, 407 variance error of the mean, 239 variation between blocks, 374–375 VAR.P() function, 74–75, 80, 84 VARP() function, 80, 83 VARPA() function, 84 VAR.S() function, 80, 84, 225, 268 vectors Group1 vector, 405 Group2 vector, 405 number of, 405 Vertical Error Bars window, 250 visible formulas, 43 VLOOKUP() function, 414–416 W whiskers, 136 See also box-and-whisker plots within block effects, 372 within cell error See residual error worksheet functions See functions worksheets, setting up for Solver, 46–49 X-Y-Z XY (Scatter) charts, 18–20, 133–134 Yule, Udny, 169–170 Yule Simpson effect, 169–171 zero constant, forcing, 466–467 z-scores, 72–74, 103–104, 245 calculating, 238 definition of, 236 z-tests, 237–238 creating charts for, 244–252 critical values, 256–257 standard error of the mean calculating, 238–240 definition of, 238 error rates and statistical tests, 242–244 visualization of underlying distributions, 241–242 This page intentionally left blank REGISTER YOUR PRODUCT at informit.com/register Download available product updates Access bonus material when applicable Receive exclusive offers on new editions and related products (Just check the box to hear from us when setting up your account.) Get a coupon for 35% for your next purchase, valid for 30 days Your code will be available in your InformIT cart (You will also find it in the Manage Codes section of your account page.) Registration benefits vary by product Benefits will be listed on your account page under Registered Products InformIT is the online home of information technology brands at Pearson, the world’s foremost education company At InformIT.com you can Shop our books, eBooks, software, and video training Take advantage of our special offers and promotions (informit.com/promotions) Sign up for special offers and content newsletters (informit.com/newsletters) Read free articles and blogs by information technology experts Access thousands of free chapters and video lessons Learn about InformIT community events and programs Addison-Wesley Cisco Press IBM Press Microsoft Press Pearson IT Certification Prentice Hall Que Sams VMware Press ... prerequisites.) As to Excel itself, it matters little whether you’re using Excel 97, Excel 2016, or any version in between Very little statistical functionality changed between Excel 97 and Excel 2003 The... 138 vi Statistical Analysis: Microsoft Excel 2016 How Variables Classify Jointly: Contingency Tables 139 Understanding One-Way Pivot Tables 139 Running the Statistical. .. book I NTR O D U C TI O N IN THIS INTRODUCTION Using Excel for Statistical Analysis What’s in This Book Using Excel for Statistical Analysis The problem is that it’s a huge amount of material

Statistical analysis microsoft excel 2016

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Cover

Title Page

Copyright Page

Contents

Introduction

Using Excel for Statistical Analysis

About You and About Excel

Clearing Up the Terms

Making Things Easier

The Wrong Box?

Wagging the Dog

What’s in This Book

1 About Variables and Values

Variables and Values

Recording Data in Lists

Making Use of Lists

Scales of Measurement

Category Scales

Numeric Scales

Telling an Interval Value from a Text Value

Charting Numeric Variables in Excel

Charting Two Variables

Understanding Frequency Distributions

Using Frequency Distributions

Building a Frequency Distribution from a Sample

Building Simulated Frequency Distributions

2 How Values Cluster Together

Calculating the Mean

Understanding Functions, Arguments, and Results

Tài liệu cùng người dùng

Tài liệu liên quan