Understanding test and exam results statistically

Thông tin tài liệu

Understanding test and exam results statistically Thay vì ngồi tự sướng với những con số tỉ lệ %, điểm trung bình PISA, số huy chương Olympic, người Singapore ngồi biên soạn ra quyển sách này để cảnh tỉnh những người làm giáo dục (và suy rộng ra cho toàn xã hội) về nguy cơ những trị số thống kê có thể dối lừa, ngụy biện và khiến chúng ta đưa ra quyết định sai lầm. Một cuốn sách rất ngắn, chỉ 158 trang nhưng hoàn toàn xứng đáng để đọc trích từ FB của Namlun Didong

Springer Texts in Education Kaycheng Soh Understanding Test and Exam Results Statistically An Essential Guide for Teachers and School Leaders Springer Texts in Education More information about this series at http://www.springer.com/series/13812 Kaycheng Soh Understanding Test and Exam Results Statistically An Essential Guide for Teachers and School Leaders 123 Kaycheng Soh Singapore Singapore ISSN 2366-7672 Springer Texts in Education ISBN 978-981-10-1580-9 DOI 10.1007/978-981-10-1581-6 ISSN 2366-7980 (electronic) ISBN 978-981-10-1581-6 (eBook) Library of Congress Control Number: 2016943820 © Springer Science+Business Media Singapore 2016 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer Science+Business Media Singapore Pte Ltd On Good (And Bad) Educational Statistics In Lieu of a Preface There are three kinds of lies: lies, damned lies, and statistics We education people are honest people, but we often use test and examination scores in such a way that the effect is the same as lies, though without the intention but not without the falsity We count 10 correctly spelt words as if we count 10 apples We count correctly the chosen 10 words in an MCQ test as if we count 10 oranges We count 10 correctly corrected sentences as if we count 10 pears Then, we add 10 + 10 + 10 = 30 We then concluded that Ben has 30 fruits, called Language We the same for something we call Math (meat) And, something we call Art, or Music, or PE (snacks) We then add fruits, meat, and snacks and call the total Overall (edible or food) We then make important decision using the Overall When doing this honestly, sincerely, and seriously, we also assume that there is no error in counting, be it done by this or another teacher (in fact, all teachers concerned) We also make the assumption, tacitly though, that one apple is as good as one orange, and one cut of meat as good as one piece of moachee Right or wrong, life has to go on After all, this has been done as far back as the long forgotten days of the little red house, and since this is a tradition, there must be nothing wrong So, why should we begin to worry now? A few of my class scored high, a few low, and most of them somewhere in between, reported Miss Lim on the recent SA1 performance of her class A qualitative description like this one fits almost all normal groups of students After hearing a few more descriptions similar to these, Miss Lim and her colleagues were not any wiser about their students’ performance When dealing with the test or examination scores of a group of students, more specific descriptions are needed It is here where numbers are more helpful than words Such numbers, given the high-sounding name statistics, help to summarize v vi On Good (And Bad) Educational Statistics the situation and make discussion more focused Even when looking at one student’s test score, it has to be seen in the context of the scores of other students who have taken the same test, for that score to have any meaning Thus, statistics are good But, that is not the whole truth, there are bad statistics That is why there are such interesting titles as these: Huff, D (1954) How to Lie with Statistics; Runyon, R.P (1981) How Numbers Lie; Hooke, R (1983) How to Tell the Liars from the Statisticians; Homes, C.B (1990) The Honest Truth about Lying with Statistics; Zuberi, T (2001) Think than Blood: How Racial Statistics Lie; Joel Best (2001) Damned Lies and Statistics; and Joel Best (2004) More Damned Lies and Statistics: How Numbers Confuse Public Issues These interesting and skeptical authors wrote about social statistics, statistics used by proponents and opponents to influence social policies None deals with educational statistics and how it has misled teachers and school leaders to make irreversible decisions that influence the future of the student, the school, and even the nation On the other hand, people also say “Statistics don’t lie but liars use statistics.” Obviously, there are good statistics and there are bad statistics, and we need to be able to differentiate between them Good statistics are the kind of numbers which simplifies a messy mass of numbers to surface the hidden trends and helps in the understanding of them and facilitates informed discussion and sound policy-making Bad statistics, on the other hand, the opposite and makes things even more murky or messy than it already is This latter case may happen, unintentionally due to lack of correct knowledge of statistics Bad statistics are those unintentionally misused A rational approach to statistics, noting that they can be good or bad, is to follow Joel Best’s advice: Some statistics are bad, but others are pretty good, and we need statistics—good statistics— to talk sensibly about social problems The solution, then, is not to give up on statistics, but to become better judges of the numbers we encounter We need to think critically about statistics… (Best 2001, p Emphasis added) In the educational context, increasingly more attention is being paid to statistics, using it for planning, evaluation, and research at different levels, starting from the classroom to the boardroom However, as the use of statistics has not been part of professional development in traditional programs, many users of educational statistics pick up ideas here and there on the job This is practical out of necessity, but it leaves too much to chance, and poor understanding and misuse can be fast contagious The notes in this collection have one shared purpose: to rectify misconceptions which have already acquired a life of their own and to prevent those that are to be born The problems, issues, and examples are familiar to teachers and school administrators and hence should be found relevant to daily handling of numbers in the school office as well as the classroom The notes discuss the uses and misuses of descriptive statistics which school administrators and teachers have to use and interpret in the course of their normal day-to-day work Inferential statistics are On Good (And Bad) Educational Statistics vii mentioned by the way but not covered extensively because in most cases they are irrelevant to the schools as they very seldom, if ever, have numbers collected through a random process The more I wrote, the more I realized that many of the misconceptions and misuses were actually caused by misunderstanding of something more fundamental —that of educational measurement Taking test scores too literally, obsession with decimals, and seeing too much meaning in small difference are some cases in point Because educational statistics is intimately tied up with educational measurement (much more than other social statistics do), misinterpretation of test and examination scores (marks, grades, etc.) may have as its root lack of awareness of the peculiar nature of educational statistics The root causes could be one or all of these: Taking test scores literally as absolute when they are in fact relative Taking test scores as equivalent when they are not Taking test scores as error-free when error is very much part of them (Incidentally, “test score” will mean “test and examination scores” hereafter to avoid the clumsiness.) These arise from the combination of two conceptual flaws First is the lack of understanding of levels of measurement There is a mix-up of highly fallible educational measurement (e.g., test scores) with highly infallible physical measurement (e.g., weight or height), looking at a test scores of 50 as if it is the same as 50 kg or 50 cm Secondly, there is a blind faith in score reliability and validity that the test scores have perfect consistency and truthfulness This indicates a need to clarify the several concepts relevant to reliability, validity, item efficiency, and levels of tests And, above all these, the question of consequences of test scores used, especially on students and curriculum, that is, what happens to them, the two most critical elements in schooling Statistics can be learned for its own sake as a branch of mathematics But, that is not the reason for teachers and school leaders to familiarize themselves with it In the school context, statistics are needed for proper understanding of test and examination results (in the form of scores) Hence, statistics and measurement need to go hand in hand so that statistics are meaningful and measurement is understood In fact, while statistics can stand-alone without educational measurement, educational measurement on which tests and examinations are based cannot without statistics Most books about tests and examination begin with concepts of measurement and have an appendix on statistics In this book, statistical understanding of test scores come first, followed by more exposition of measurement concepts The reversed order comes with the belief that without knowing how to interpret test scores first, measurement is void of meanings Anyway, statistics is a language for effective communication To build such a common language among educational practitioners calls for willingness to give up non-functioning notions and needs patience to acquire new meanings for old labels By the way, as the notes are not meant to be academic discourse, I take the liberty to avoid citing many references to support the arguments (not argumentative viii On Good (And Bad) Educational Statistics statements but just plain statements of ideas) and take for granted the teachers’ and school leaders’ trust in my academic integrity Of course, I maintain my intellectual honesty as best I can, but I stand to be corrected where I not intentionally lie I would like to record my appreciation for the anonymous reviewers for their perceptive comments on the manuscript and their useful suggestions for its improvement Beyond this, errors and omissions are mine Reference Best, J (2001) Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists Berkeley: University of California Press Contents Part I Statistical Interpretation of Test/Exam Results On Average: How Good Are They? 1.1 Average Is Attractive and Powerful 1.2 Is Average a Good Indictor? 1.2.1 Average of Marks 1.2.2 Average of Ratings 1.3 Two Meanings of Average 1.4 Other Averages 1.5 Additional Information Is Needed 1.6 The Painful Truth of Average 3 4 On Percentage: How Much Are There? 2.1 Predicting with Non-perfect Certainty 2.2 Danger in Combining Percentages 2.3 Watch Out for the Base 2.4 What Is in a Percentage? 2.5 Just Think About This Reference 9 11 12 13 13 13 On Standard Deviation: How Different Are They? 3.1 First, Just Deviation 3.2 Next, Standard 3.3 Discrepancy in Computer Outputs 3.4 Another Use of the SD 3.5 Standardized Scores 3.6 Scores Are not at the Same Type of Measurement 3.7 A Caution Reference 15 15 16 17 18 18 20 22 23 On Difference: Is that Big Enough? 4.1 Meaningless Comparisons 4.2 Meaningful Comparison 25 25 26 ix Epilogue Test scores are important: to students, because their future depends on these to a large extent, especially in competitive systems of education; to parents, because their children’s future is at stake; to teachers, because their understanding of students is based on these and their effectiveness is partly reflected by these; and, to school leaders, because the schools’ reputation is more often influenced by these However, training in the understanding and proper use of test scores has not been given as much as time and effort as it deserves in pre-service preparation of teachers; it is cursory at best Teachers learn this “tricks’ on the job and may learn improper knowledge and skills, and such inappropriateness gets perpetuated and shared It is an important professional knowledge and skills that teachers and school leaders need to acquire, for proper understanding and use of test scores and be fully aware of the subtlety behind test scores and their limitations This book begins with trying to explain the subtle statistical concepts but ends up with discussion on tests and measurement It is because of the nature of the two fields and their connectedness Test scores can be properly understood only when their make references to relevant statistical concepts In the process of writing, I always bear in mind the teachers and school leaders as my audience and limit myself to statistical and measurement concepts that are most relevant to them In this connection, I would like to thank the three anonymous reviewers who read my book proposal and made favourable comments and useful suggestions And, if there is any important omission, it is due to my limited experience and knowledge After all, statistics (educational or otherwise) is a living discipline with new ideas and techniques keep emerging very now and then As F.M Lord, a giant of tests and measurement at the Educational Testing Service, USA, once said, “the numbers not know where they came from” in his 1955 provocative article, On the Statistical Treatment of Football Numbers which appeared in the American Psychologist Test scores standing alone have apparent or © Springer Science+Business Media Singapore 2016 K Soh, Understanding Test and Exam Results Statistically, Springer Texts in Education, DOI 10.1007/978-981-10-1581-6 141 142 Epilogue seeming but inaccurate meanings They appear simple and straightforward, but they have contexts and limitations which govern their proper interpretation and hence proper use In a sense, test scores are not what they simply look like as the various chapters of this book try to show, hopefully, with some degree of success Christmas Eve 2015 Appendix A A Test Analysis Report This report demonstrates how a post-hoc analysis of test/exam can be done, by using the statistical and measurement concepts and techniques introduced In addition to using test results to make decisions on the students, test analysis can be conducted to study the efficacy of the test as an instrument for collecting information of students achievement This approach of looking into the test will enhance the teachers’ and school leaders’ understanding of how their tests work and identifying areas for improvement where assessment is concerned A.1 Students Three classes of Secondary students (N = 78) were tested with a language test which comprised 10 multiple-choice items (MCQ; scored for right and for wrong) and 10 Essay questions (each carrying a possible maximum score of five) A.2 Item-Analysis The first concern of the analysis is how well the 20 items work Item-analysis was run on the scores and item indices were calculated as facility (p; proportion of correct answers) and discrimination (r; correlation between item and total scores) The appropriateness of each item was evaluated by the conventional criteria and is shown in the Comments column in Table A1.1 The following are observed: • Among the MCQ items, in terms of facility, one item is very easy, two are easy, three adequate, and four difficult The subtest of MCQ as a whole has an adequate facility, indicating that it is appropriate for the students In term of discrimination, all items are adequate © Springer Science+Business Media Singapore 2016 K Soh, Understanding Test and Exam Results Statistically, Springer Texts in Education, DOI 10.1007/978-981-10-1581-6 143 144 Appendix A: A Test Analysis Report Table A1.1 Item-indices Item No Facility Discrimination within subtest Discrimination for whole test Comments Multiple-choice subtest 0.60 0.79 0.81 0.76 0.33 0.37 0.38 0.71 0.40 10 0.46 Subtest 0.56 Essay subtest 11 2.78 (0.56) 0.42 0.56 0.41 0.58 0.50 0.40 0.56 0.55 0.49 0.46 – 0.38 0.56 0.27 0.57 0.47 0.22 0.46 0.41 0.34 0.26 – Adequate in both indices Easy Adequate discrimination Very easy Adequate discrimination Easy Adequate discrimination Difficult Adequate discrimination Difficult Adequate discrimination Difficult Adequate discrimination Easy Adequate discrimination Difficult Adequate discrimination Adequate in both indices Adequate facility 0.39 0.38 12 13 14 15 16 2.86 1.17 1.81 1.29 2.54 (0.57) (0.23) (0.36) (0.26) (0.51) 0.50 0.46 0.68 0.66 0.64 0.52 0.48 0.69 0.62 0.66 17 2.67 (0.53) 0.66 0.62 18 2.73 (0.55) 0.79 0.76 19 2.23 (0.45) 0.76 0.73 Adequate facility Weak discrimination Adequate in both indices Difficult Adequate discrimination Difficult Strong discrimination Difficult Strong discrimination Adequate facility Strong discrimination Adequate facility Strong discrimination Adequate facility Strong discrimination Adequate facility Strong discrimination Difficult Strong discrimination Adequate facility Adequate facility 20 1.63 (0.33) 0.67 0.66 Subtest 21.71 (0.43) – – Whole 27.33 (0.46) – – test Note Figures in parentheses are facilities calculated as (mean/possible maximum) • Among the Essays, in terms of facility, six questions are adequate but four are difficult However, the subtest as a whole has an adequate facility indicating that it is suitable for the students In terms of discrimination, seven have strong discrimination, two are adequate, and one is weak It is therefore concluded that the test as a whole is well-designed and suites the target students Appendix A: A Test Analysis Report 145 Table A1.2 Reliability Test section Internal consistency reliability MCQ Essay Whole test 0.65 0.82 0.84 A.3 Reliability The second concern of the analysis is how reliable are the subtests and the whole test The reliability was estimated in terms of Cronbach’s alpha coefficient which is a measure of internal consistency As shown in Table A1.2, for the MCQ subtest, the reliability is a moderate 0.65 and for the Essay subtest it is a high 0.82 For the whole test, the reliability of 0.84 is high, close to the expected 0.90 for making decision on individuals A.4 Comparisons By Gender The third concern of the analysis is whether there are differences between the boys (N = 34) and girls (N = 44) As Table A1.3 shows, for the MCQ subtest, girls scored 1.9 points (1.9 %) higher than the boys with a large effect size Table A1.3 Performance by gender MCQ Mean SD Maximum Minimum Essay Mean SD Maximum Minimum Whole test Mean SD Maximum Minimum All Boys (N = 34) Girls (N = 44) Difference Effect size d 5.6 2.3 10 4.6 1.9 6.5 2.2 10 −1.9 – −3 −1.00 – – – 21.7 7.5 33 19.5 7.5 29 23.4 7.0 33 −3.9 – −4 −6 −0.52 – – – 27.3 9.1 40 24.0 9.0 34 29.9 8.5 40 −5.9 – −6 −7 −0.66 – – – 146 Appendix A: A Test Analysis Report Table A1.4 Performance by class MCQ Mean SD Maximum Minimum Essay Mean SD Maximum Minimum Whole test Mean SD Maximum Minimum 3E1 (N = 21) 3E3 (N = 27) 3E4 (N = 30) 3E1-3E3 3E1-3E4 6.3 1.8 6.0 1.9 10 4.8 2.7 0.3 (d = 1.67) −0.1 −1 −1 1.5 (d = 0.83) −0.9 23.9 6.4 33 12 21.5 5.7 30 20.4 9.2 31 2.4 (d = 0.38) 0.7 3.5 (d = 0.55) –2.8 12 30.2 7.6 40 14 27.4 6.9 39 11 25.2 11.3 39 2.8 (0.37) 0.7 (d = 0.66) –3.7 14 of d = 1.00 For the Essay subtest, the girls scored 3.9 point (39 %) higher than the boys with a medium effect size of d = 0.52 And, for the whole test, the girls scored 5.9 point (59 %) higher than the boys In sum, the girls scored better than the boys generally By Class The three classes are also compared on their performance, using 3E as the benchmark As shown in Table A1.4, for the MCQ subtest, 3E1 scored higher than the other two classes and the effect sizes are large (compared with 3E3) and very large (compared with 3E4) For the Essay subtest, 3E1 scored higher than the other two classes and the effect sizes are large (compared with 3E3) and very large (compared with 3E4) For the whole test, 3E1 scored higher than the other two classes and the effect sizes are small (compared with 3E3) and medium (compared with 3E4) A.5 Correlations and Multiple Regression It is of theoretical and practical significance to understand the relations between the two subtests and how they contribute to the total score As shown in Table A1.5, the two subtests have a moderate correlate coefficient of 0.67, sharing 49 % common variance (i.e., total individual differences) However, both subtests have higher correlations with the whole test and the correlation coefficients are high 0.79 Appendix A: A Test Analysis Report 147 Table A1.5 Correlation coefficients MCQ Essay Whole test MCQ Essay Whole test 1.00 0.67 1.00 0.79 0.98 1.00 Table A1.6 Multiple regression b-weight MCQ Essay Intercept R = 1.00, Adjusted R2 = 1.000 1.000 0.000 1.00 Beta p 0.262 0.823 0.00 0.01 0.01 1.00 (MCQ) and 0.98 (Essay) However, the near perfect correlation between the Essay subtest and the whole test indicates that the total scores for the whole test is almost totally determined by scores for the Essay subtests This indicates that the MCQ subtest plays a very limited role in differentiating among the students Table A1.6 shows the results multiple regression where the two sets of subtest scores are used to predict the total scores According to the results, the raw score equation is: Total scores ¼ Ã MCQ + * Essay + Intercept That is exactly how the total score is arrived at for each student However, as shown in Table A1.3, for all students, the MCQ has a standard deviation of only 2.3 and the Essay subtest 7.5 This difference in spread (see Chap 7, On Multiple Regression) will affect the contributions of the two subtest to the whole test and the score have to be standardized And, when the standardized scores are used for the multiple regression, the standardized regression coefficients (Beta’s) are 0.262 for MCQ subtest and 0.823 for Essay subtest Thus, the regression equation using the standardized scores is Standardized total scores = 0:262 Ã MCQ + 0:823 Ã Essay In this equation, the standardized regression weights (0.262 and 0.823) replaced the unstandardized ones (1.00 and 1.00) and the intercept is standardized at 0.00 It is important to note that this equation shows that the ratio of these two Beta-weights is 0.826/0.262 = 3.14 This means that students’ performance on this test as a whole depends much more on their Essay scores than MCQ scores 148 Appendix A: A Test Analysis Report A.6 Summary and Conclusion The analysis of the test scores of the 78 Secondary students for the 20-item test show the following results: The MCQ and Essay subtests and the whole test are suitable for the students in terms of difficulty and have adequate discrimination (i.e., being able to distinguish between students with differential achievement) Girls better than boys on both subtests and the whole test 3E1 scores higher than the other two classes, especially 3E4 The MCQ subtest has a lower reliability when compared with the Essay subtest However, the test as a whole has high reliability and can be used for making decision on the individual students The Essay subtest make three times contribution to the total scores that contributed by the MCQ subtest A.7 Recommendations For the future development, the following suggestions are to be considered: The effective items can be kept in the item pools for future use This will enhance the year-to-year comparability of tests and save the teachers time and effort of coming up with new items The less adequate items (in term of facility and discrimination) need be studied for content and item phrasing so as to inform teachers of the needed instructional changes and improvement in item-writing skills The number of MCQ items need be increased such that this subtest will contribute more to the total scores so that the students’ performance does not rely so much on the Essay subtest A balance between MCQ items and essay questions in terms of relative contributions to the total score is desirable for assessing skills in different language aspects Appendix B A Note on the Calculation of Statistics Using statistics to process test and exam results for better understanding inevitably involves calculation This is the necessary evil More than half a century ago, when I started as a primary school teacher, all test results were hand-calculated and this involved tedious work, rushing for time, boredom and, above all, risking inaccuracy Moreover, calculating to the third or fourth decimal values seemed to be a sign of conscientiousness (or professionalism) Then came the hand-operated but clumsy calculating machine and later the hand-held but still somewhat clumsy calculator As time passed by, the data gets bigger in size but the calculation gets easier although the statistics not change—a mean is still a mean and does not change its meaning however it is calculated Now, with the convenience, I can afford to use more statistics which are more complicated to calculate, for example, the SD and correlation coefficient, even regression and multiple regression And, not to forget the chi-square and exact probability With the readily availability of computing facilities, teachers and school leaders nowadays can afford the time and energy to use more statistics (and conceptually more complex ones) for better understanding of test and exam results to benefit the students and the school In the school context, sophisticated computing software designed for researchers who always have to handle large amount of complicated calculation is not necessary As I work more with class and school data, I have realized that Excel is able to most if not all the work that needs be done Moreover, it is almost omnipresent B.1 Using Excel • Create a master worksheet to store all data for all variables and have the labels across the very first row, keeping the first column for students’ series numbers and names The table is always row (individuals) by columns (variables) • For different analyses, create specific worksheets by copying from the master worksheet those needed data for the variables to be analyzed (e.g., correlated) © Springer Science+Business Media Singapore 2016 K Soh, Understanding Test and Exam Results Statistically, Springer Texts in Education, DOI 10.1007/978-981-10-1581-6 149 150 Appendix B: A Note on the Calculation of Statistics • Pay attention to the small down arrowhead next to Σ It lead to the many calculation functions you need For it enables you to find the basic of the total (Sum), the mean (Average), the frequency (count Numbers), the highest (Max), the lowest (Min) and “More Functions…” • The More Functions… has many choices and the one you need is always Statistical which leads you to many statistical functions, from AVERAGEV to Z.TEST Once you have used some of the functions, you need only Most Recently Used the next time and this lists only those you have used and may need this time • Learn to drag; point to the black dot at the right bottom corner of a command box and drag it to the right This allows you to repeat the calculation across the columns (for the variables) • Learn to use $ (not your money!) This fixes a variable for which is it to be constantly compared, for examples, Var1-Var2, Var1-Var3, etc correlation coefficients so that the first variable (Var1) is held constant Appendix B: A Note on the Calculation of Statistics 151 • Decide on the decimal places you need; and, for educational statistics, this mean two or three places and no more Fix that with the symbols so that you don’t have to the rounding yourself later If you don’t, when you divide 22/7 (the pi), you get 3.1428571428571400000000…, but you need only 3.14 or at most 3.143 B.2 Excel functions Under Statistical in Excel, there are many functions which are relevant to this Guide and will be needed to process test and exam scores of the students Those commonly used ones are briefly described below AVERAGE CORREL MAX MEDIAN MIN MODE.SNGL PERSON STDEV.P STDEV.S Calculates the average or arithmetic mean (or just mean) Calcuates correlation coefficient between two sets of scores Finds the highest score in a set Finds the middle most score in a set Finds thesmallest score in a set Finds the most frequently occuring score in a set When there are more than one mode, the lowes value is shown Calculates the Pearson’s product moment correlation coefficient Same as CORREL Calculates standard deviation based on the entire population Estimates standata deviation based on a sample B.3 Web-Based Calculators There are many user-friendly statistics calculators on the Internet and they are free However, they have different methods of data input, some allows copy-and-paste others need individual entering with or without spacing or separator; some calculate the statistics you need from raw data, other used processed data to calculate your needed statistics Needless to say, some give you just the statistics you need and others are more sophisticated giving you alternatives and choices and even help you in interpretation of the results The chi-square and exact probability calculators used above are just two such web-based tools 152 Appendix B: A Note on the Calculation of Statistics There are also suits of statistics calculators that can be downloaded free and can be used like many other computing software Of course, for such freeware, you need to learn how to operate them to get your desired statistics Also, watch out for their idiosyncrasies because of some limitations inherent in the programmes and reveal in error messages Appendix C Interesting and Useful Websites It is a truism that there is no end to understanding statistics And, the more one knows, the more one wants to know There are numerous websites on statistics to help in this journey of learning For readers who wish to learn more about statistics (more specifically, educational statistics) in a more formal and academic way, the websites listed below should prove useful Listed later are some web-based statistics calculators which take away the chore of data manipulation (which is a bane of using statistics to deal with test and examination scores) C.1 Recommended Readings Martz, E (15 December, 2015) Approaching Statistics as a Language The Minitab Blog http://blog.minitab.com/blog/understanding-statistics/approaching-statistics-as-alanguage Statistics is indeed a language which facilitates communication in a concrete and objective way to avoid miscommunication and confusion This webpage helps in understanding the nature of statistics Martz, E (29 July, 2015) 10 Statistical Terms Designed to Confuse Non-Statisticians The Minitab Blog http://blog.minitab.com/blog/understanding-statistics/10-statistical-terms-designedto-confuse-non-statisticians Like language, statistics uses words which have meanings Unfortunately, some words which are commonly used have different meanings when used as statistical terms This webpage lists the commonly mistaken ones © Springer Science+Business Media Singapore 2016 K Soh, Understanding Test and Exam Results Statistically, Springer Texts in Education, DOI 10.1007/978-981-10-1581-6 153 154 Appendix C: Interesting and Useful Websites Evansm J W (n.d.) Basic Statistics Web Site for Nova Southeastern University Educational Leadership Students The Minitab Blog This webpage provides a compressive suit of statistical concepts and techniques many of which are covered in this book These are explained and expanded to enrich the reader’s statistical understanding Remenyi, D., Onofrie, G., & English, J (2009) An Introduction to Statistics Using Microsoft Excel Academic Publishing Limited http://academic-publishing.org/pdfs/01c-xl-stats_extract.pdf The webpage is specific to the use of Excel and deals with most of the concepts and techniques covered in this book It consolidates what the reader has learned and more Sensky, T (n.d.) Basic Statistics: A Survival Guide [PPT] https://education.med.imperial.ac.uk/ext/intercalate11-12/statistics.ppt Focusing more on understanding than manipulation, this webpage consolidates and expands the reader’s new knowledge of statistics C.2 For Calculation Statistics Calculators http://www.mathportal.org/calculators/statistics-calculator/ This is a compressive web-based calculator which cover a very wide range of mathematical concepts and techniques, many of which are not covered in this book For statistics, it calculates descriptive statistics, standard deviation, and correlation among others Data can be cut from, say, Excel, and pasted on to it Calculation for the Chi-square Test http://www.quantpsy.org/chisq/chisq.htm This calculator is very convenient to use for calculating chi-square for various sizes of the two-way tables and therefore is a very flexible calculator Its output includes simple Pearson’s chi-square and corrected Yate’s chi-square, together with their corresponding p-values Effect Size Calculator http://www.uccs.edu/*lbecker/ This web-based calculator works out effect size (Cohen’s d) using means and standard deviations for the compared groups It also show the r, if this is the preferred effect size indicator Appendix C: Interesting and Useful Websites 155 Effect Size, Cohen’s d Calculator for T Test https://www.easycalculation.com/statistics/effect-size-t-test.php Sometimes, we read research reports or articles which show group comparison by the t-test, but the t-value and its corresponding p-value indicate the probability and not effect magnitude We therefore need to know the effect size and this calculator does just that for us

Ngày đăng: 13/08/2016, 09:04

Xem thêm: Understanding test and exam results statistically, Understanding test and exam results statistically, 3 Effect Size: Another Use the SD, 1 Correlations: Foundation of Education Systems, 2 A Score (Mark) Is not a Point

Understanding test and exam results statistically

Thông tin tài liệu

Từ khóa liên quan

Mục lục

On Good (And Bad) Educational Statistics

In Lieu of a Preface

Reference

Contents

About the Author

Statistical Interpretation of Test/Exam Results

1 On Average: How Good Are They?

1.1 Average Is Attractive and Powerful

1.2 Is Average a Good Indictor?

1.2.1 Average of Marks

1.2.2 Average of Ratings

1.3 Two Meanings of Average

1.4 Other Averages

1.5 Additional Information Is Needed

1.6 The Painful Truth of Average

2 On Percentage: How Much Are There?

2.1 Predicting with Non-perfect Certainty

2.2 Danger in Combining Percentages

2.3 Watch Out for the Base

2.4 What Is in a Percentage?

2.5 Just Think About This

Reference

3 On Standard Deviation: How Different Are They?

3.1 First, Just Deviation

3.2 Next, Standard

Tài liệu cùng người dùng

Tài liệu liên quan