Statistical tools for program evaluation methods and applications to economic policy, public health, and education

530 49 0
Statistical tools for program evaluation methods and applications to economic policy, public health, and education

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Jean-Michel Josselin Bent Le Maux Statistical Tools for Program Evaluation Methods and Applications to Economic Policy, Public Health, and Education Statistical Tools for Program Evaluation Jean-Michel Josselin • Benoıˆt Le Maux Statistical Tools for Program Evaluation Methods and Applications to Economic Policy, Public Health, and Education Jean-Michel Josselin Faculty of Economics University of Rennes Rennes, France Benoıˆt Le Maux Faculty of Economics University of Rennes Rennes, France ISBN 978-3-319-52826-7 ISBN 978-3-319-52827-4 DOI 10.1007/978-3-319-52827-4 (eBook) Library of Congress Control Number: 2017940041 # Springer International Publishing AG 2017 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Acknowledgments We would like to express our gratitude to those who helped us and made the completion of this book possible First of all, we are deeply indebted to the Springer editorial team and particularly Martina BIHN whose support and encouragement allowed us to finalize this project Furthermore, we have benefited from helpful comments by colleagues and we would like to acknowledge the help of Maurice BASLE´, Arthur CHARPENTIER, Pauline CHAUVIN, Salah GHABRI, and Christophe TAVE´RA Of course, any mistake that may remain is our entire responsibility In addition, we are grateful to our students who have been testing and experimenting our lectures for so many years Parts of the material provided here have been taught at the Bachelor and Master levels, in France and abroad Several students and former students have been helping us improve the book We really appreciated their efforts and are very grateful to them: Erwan AUTIN, Benoıˆt ´ LOVA ´ , and Adrien VEZIE CARRE´, Aude DAILLE`RE, Kristy´na DOSTA Finally, we would like to express our sincere gratefulness to our families for their continuous support and encouragement v Contents Statistical Tools for Program Evaluation: Introduction and Overview 1.1 The Challenge of Program Evaluation 1.2 Identifying the Context of the Program 1.3 Ex ante Evaluation Methods 1.4 Ex post Evaluation 1.5 How to Use the Book? Bibliography Part I 1 11 12 Identifying the Context of the Program Sampling and Construction of Variables 2.1 A Step Not to Be Taken Lightly 2.2 Choice of Sample 2.3 Conception of the Questionnaire 2.4 Data Collection 2.5 Coding of Variables Bibliography 15 15 16 22 27 33 43 Descriptive Statistics and Interval Estimation 3.1 Types of Variables and Methods 3.2 Tabular Displays 3.3 Graphical Representations 3.4 Measures of Central Tendency and Variability 3.5 Describing the Shape of Distributions 3.6 Computing Confidence Intervals References 45 45 47 54 64 69 77 87 Measuring and Visualizing Associations 4.1 Identifying Relationships Between Variables 4.2 Testing for Correlation 4.3 Chi-Square Test of Independence 89 89 92 99 vii viii Contents 4.4 Tests of Difference Between Means 4.5 Principal Component Analysis 4.6 Multiple Correspondence Analysis References 105 113 126 135 Econometric Analysis 5.1 Understanding the Basic Regression Model 5.2 Multiple Regression Analysis 5.3 Assumptions Underlying the Method of OLS 5.4 Choice of Relevant Variables 5.5 Functional Forms of Regression Models 5.6 Detection and Correction of Estimation Biases 5.7 Model Selection and Analysis of Regression Results 5.8 Models for Binary Outcomes References 137 137 147 153 156 164 167 174 180 187 Estimation of Welfare Changes 6.1 Valuing the Consequences of a Project 6.2 Contingent Valuation 6.3 Discrete Choice Experiment 6.4 Hedonic Pricing 6.5 Travel Cost Method 6.6 Health-Related Quality of Life References 189 189 191 200 211 216 221 230 Part II Ex ante Evaluation Financial Appraisal 7.1 Methodology of Financial Appraisal 7.2 Time Value of Money 7.3 Cash Flows and Sustainability 7.4 Profitability Analysis 7.5 Real Versus Nominal Values 7.6 Ranking Investment Strategies 7.7 Sensitivity Analysis References 235 235 238 244 249 255 257 263 266 Budget Impact Analysis 8.1 Introducing a New Intervention Amongst Existing Ones 8.2 Analytical Framework 8.3 Budget Impact in a Multiple-Supply Setting 8.4 Example 8.5 Sensitivity Analysis with Visual Basic References 269 269 271 275 277 281 288 Contents ix Cost Benefit Analysis 9.1 Rationale for Cost Benefit Analysis 9.2 Conceptual Foundations 9.3 Discount of Benefits and Costs 9.4 Accounting for Market Distortions 9.5 Deterministic Sensitivity Analysis 9.6 Probabilistic Sensitivity Analysis 9.7 Mean-Variance Analysis Bibliography 291 291 294 299 306 311 313 321 324 10 Cost Effectiveness Analysis 10.1 Appraisal of Projects with Non-monetary Outcomes 10.2 Cost Effectiveness Indicators 10.3 The Efficiency Frontier Approach 10.4 Decision Analytic Modeling 10.5 Numerical Implementation in R-CRAN 10.6 Extension to QALYs 10.7 Uncertainty and Probabilistic Sensitivity Analysis 10.8 Analyzing Simulation Outputs References 325 325 328 336 342 351 357 358 371 382 11 Multi-criteria Decision Analysis 11.1 Key Concepts and Steps 11.2 Problem Structuring 11.3 Assessing Performance Levels with Scoring 11.4 Criteria Weighting 11.5 Construction of a Composite Indicator 11.6 Non-Compensatory Analysis 11.7 Examination of Results References 385 385 388 390 395 398 401 410 416 Part III 12 Ex post Evaluation Project Follow-Up by Benchmarking 12.1 Cost Comparisons to a Reference 12.2 Cost Accounting Framework 12.3 Effects of Demand Structure and Production Structure on Cost 12.4 Production Structure Effect: Service-Oriented Approach 12.5 Production Structure Effect: Input-Oriented Approach 12.6 Ranking Through Benchmarking References 419 419 423 426 433 436 440 441 x 13 14 Contents Randomized Controlled Experiments 13.1 From Clinical Trials to Field Experiments 13.2 Random Allocation of Subjects 13.3 Statistical Significance of a Treatment Effect 13.4 Clinical Significance and Statistical Power 13.5 Sample Size Calculations 13.6 Indicators of Policy Effects 13.7 Survival Analysis with Censoring: The Kaplan-Meier Approach 13.8 Mantel-Haenszel Test for Conditional Independence References 443 443 448 453 463 471 474 Quasi-experiments 14.1 The Rationale for Counterfactual Analysis 14.2 Difference-in-Differences 14.3 Propensity Score Matching 14.4 Regression Discontinuity Design 14.5 Instrumental Variable Estimation References 489 489 492 498 512 519 530 480 483 487 Statistical Tools for Program Evaluation: Introduction and Overview 1.1 The Challenge of Program Evaluation The past 30 years have seen a convergence of management methods and practices between the public sector and the private sector, not only at the central government level (in particular in Western countries) but also at upper levels (European commission, OECD, IMF, World Bank) and local levels (municipalities, cantons, regions) This “new public management” intends to rationalize public spending, boost the performance of services, get closer to citizens’ expectations, and contain deficits A key feature of this evolution is that program evaluation is nowadays part of the policy-making process or, at least, on its way of becoming an important step in the design of public policies Public programs must show evidence of their relevance, financial sustainability and operationality Although not yet systematically enacted, program evaluation intends to grasp the impact of public projects on citizens, as comprehensively as possible, from economic to social and environmental consequences on individual and collective welfare As can be deduced, the task is highly challenging as it is not so easy to put a value on items such as welfare, health, education or changes in environment The task is all the more demanding that a significant level of expertise is required for measuring those impacts or for comparing different policy options The present chapter offers an introduction to the main concepts that will be used throughout the book First, we shall start with defining the concept of program evaluation itself Although there is no consensus in this respect, we may refer to the OECD glossary which states that evaluation is the “process whereby the activities undertaken by ministries and agencies are assessed against a set of objectives or criteria.” According to Michael Quinn Patton, former President of the American Evaluation Association, program evaluation can also be defined as “the systematic collection of information about the activities, characteristics, and outcomes of programs, for use by people to reduce uncertainties, improve effectiveness, and make decisions.” We may also propose our own definition of the concept: program evaluation is a process that consists in collecting, analyzing, and using information # Springer International Publishing AG 2017 J.-M Josselin, B Le Maux, Statistical Tools for Program Evaluation, DOI 10.1007/978-3-319-52827-4_1 Quasi-experiments 14.1 14 The Rationale for Counterfactual Analysis Impact evaluation assesses the degree to which changes in a specific outcome or variable of interest as measured by a pre-specified set of indicators can be attributed to a program rather than to other factors Such evaluation generally requires a counterfactual analysis to assess what the outcome would have looked like in the absence of the intervention The main issue is that the counterfactual cannot be observed individually (the same unit cannot be exposed and unexposed at the same time) which means that one cannot directly calculate any individual-level causal effect Instead, the counterfactual must be approximated with reference to a comparison group Broadly speaking, one needs to compare a group that received the intervention, the “treatment group”, against a similar group, the “comparison group”, which did not receive the intervention The observed difference in mean outcome between the treatment group and the comparison group can then be inferred to be caused by the intervention What is observed in the comparison group serves as the counterfactual of what would have happened in the absence of the intervention Two types of methods can be used to generate the counterfactual: randomized controlled experiments and quasi-experiments Both approaches rely on the estimation of the average causal effect in a population In the first case, the treatment group and the comparison group (also termed “control group” in this case) are selected randomly from the same population Similarly, quasi-experimental evaluation estimates the causal impact of an intervention, the difference being that it does not randomly assign the units between the treatment group and the comparison group Hence, a key issue with quasi-experimental methods is to find a proper comparison group that resembles the treatment group in everything but the fact of receiving the intervention The term “comparison group” differs from the narrower term “control group” in that the former is not necessarily selected randomly from the same population as program participants # Springer International Publishing AG 2017 J.-M Josselin, B Le Maux, Statistical Tools for Program Evaluation, DOI 10.1007/978-3-319-52827-4_14 489 490 14 Quasi-experiments Table 14.1 Treated and non-treated group in a two-period setting Non-treated group S¼0 Treated group S¼1 Δ treated/non-treated Before intervention P¼0 After intervention P¼1 00 y 01 y10 y11 Δ after/before y y11 À y10 y11 À y01 To illustrate the many difficulties that occur with a quasi-experimental design, let us consider a four-outcome setting by distinguishing the units that are exposed to the program from those who are not, and whether the outcomes are observed before or after the intervention More specifically, let ySP denote the average outcome in each case, with S being equal to one if the group was selected to receive treatment (and zero otherwise) and P denoting the time period (P ¼ before the intervention and P ¼ after) The four possible cases are depicted in Table 14.1 How can we assess the impact of the intervention using these four possible outcomes? Answering that question is not straightforward since a variation in the variable of interest, when measured by single differences, can always be thought of as the sum of two elements: the true average effect of the intervention (E hereafter) and biases due to the quasi-experimental design itself First, should we compare the outcome observed for the treated units after and before they have been exposed to the intervention, the analysis would suffer from an omitted-variable bias The within-subjects estimate of the treatment effect, which measures difference over time, is given by: S¼1 Δafter=before ¼ y11 À y10 ¼ E ỵ omitted variable bias The fundamental problem here is that the observed change through time could be due to the true effect E of the intervention but also due to other changes occurring over time during the same period The only case in which after-versus-before comparisons are relevant is when no other factor could plausibly have caused any observed change in outcome Second, should we concentrate on the period after the intervention, and compare the group that has been exposed to the intervention with the group that has not, then the analysis could suffer from a selection bias The between-subjects estimate of the treatment effect, which measures the difference between the treated and non-treated groups, is as follows: 11 y01 ẳ E ỵ selection bias Pẳ1 treated=non treated ¼ y A selection bias appears when the comparison group is drawn from a different population than the treatment group The differences that are observed between the 14.1 The Rationale for Counterfactual Analysis 491 treated and non-treated groups could have been generated by the selection process itself and not necessarily caused by the intervention Assume for instance that one aims to evaluate the effect of a tutoring program for children at risk of school failure It consists in lessons that prepare students to pass the examinations of the second semester Only students who volunteered attend these sessions To establish the impact of tutoring we may try to compare the average marks y10 of those who participated in the program before they were exposed to the intervention (e.g., first semester) with the marks y11 they obtained after the intervention (second semester) Imagine now that the teachers of the first semester were more prone to give good marks than the teachers from the second semester Approximating the impact of the intervention using after-versus-before comparison would yield an underestimation of the effect Should we focus on the second semester only, we could compare the marks y11 of those who benefited from the training sessions with the marks y01 of those who did not However, the evaluation may be affected in this case by a selection bias For instance, those who have decided to participate in the tutoring program may also be those who are the most motivated, and not necessarily those at risk of failing exams Using treated versus non-treated comparisons would yield in this case an overestimation of the impact Basically, the ideal way to eliminate a selection bias is to randomly select the units who belong to the non-treated and treated groups However, implementing randomized controlled experiments is not always feasible given the many legal, ethical, logistical, and political constraints that may be associated with it Another problem with randomization is that other biases may appear due to the sampling process itself, but also to the fact that the experimental design may demotivate those who have been randomized out, or generate noncompliance among those who have been randomized in In those instances, the alternative we are left with is quasiexperiment, i.e not to allocate participants randomly The key feature of quasi-experimental evaluation is that one needs to identify a comparison group among the non-treated units that is as similar as possible to the treatment group in terms of pre-intervention characteristics This comparison group is supposed to capture the counterfactual, i.e what would have been the outcome if the intervention had not been implemented The average treatment effect is then given by: E ¼ y11 À yc where yc denotes the counterfactual outcome While a quasi-experimental design aims to establish impact in a relevant manner, it does not relate the extent of the effect to the cost of the intervention Instead, the challenge is to prove causality by using an adequate identification strategy An identification strategy is the manner in which one uses observational data to approximate a randomized experiment In practice, the counterfactual yc is approximated using quasi-experimental methods such as difference-in-differences, regression discontinuity design, propensity score 492 14 Quasi-experiments matching, or instrumental variable estimation The chapter provides a review of these statistical tools 14.2 Difference-in-Differences Difference-in-differences, also known as double differencing, is by far the simplest method to estimate the impact of an intervention, especially as it does not necessarily require a large set of data The method consists in comparing the changes in outcome over time between treatment and comparison groups Unlike single differences (within- and between-subjects estimates), the approach considers both the time period P and the selection process S to estimate the impact Using the setting developed in the previous section, we have: À 11 Á À Á P¼0 b ¼ ΔP¼1  À y01 À y10 À y00 E treated=non treated À Δtreated=non treated ¼ y The method thus requires that outcome data be available for treated and non-treated units, both before and after the intervention The assumption underlying the identification strategy is that the selection bias is constant through time: selection bias % ΔP¼0 treated=non treated In other words, the approach aims to eliminate any potential difference between the treated and comparison groups by using information from the pre-intervention period An alternative but equivalent way to explain the difference-in-differences approach is to consider the change observed over time among the treated and non-treated units As stated in the previous section, this difference cannot be interpreted as the impact of the intervention, because other factors might have caused the observed variation However, one plausible way to take this dynamics into account is to use the change in outcome observed over time among the non-treated units: À 11 Á À Á S¼0 b ¼ ΔS¼1  À y10 À y01 À y00 E after=before À Δafter=before ¼ y The assumption underlying the identification strategy is that the trend of the treated group in the absence of the intervention would have been the same as that of the non-treated group: S¼0 omitted variable bias % Δafter=before This assumption is also known as the parallel-trend assumption In practice, it is common to present the result of a difference-in-differences evaluation using a framework similar to that of Table 14.2, where single and double 14.2 Difference-in-Differences 493 Table 14.2 Double difference calculations Non-treated group S¼0 Treated group S¼1 Δ treated/non-treated Before intervention P¼0 After intervention P¼1 Δ after/before 00 y 01 y y01 À y00 y10 y11 y11 À y10 y10 y00 y11 y01 y11 ỵ y00 y01 À y10 Fig 14.1 The difference-in-differences approach differences are elicited The treated-versus-non-treated and after-versus-before approaches yield the same result: À b ẳ y11 ỵ y00 y01 y10 ẳ y11 y01 ỵ y10 y00 E The counterfactual is thus approximated by: À Á b y c ẳ y01 ỵ y10 y00 In these expressions, the hat symbol over variable y means that the calculated quantity is only an estimate of the counterfactual yc , and so is the observed impact b of the intervention E Figure 14.1 illustrates the approach The treatment group (S ¼ 1) is represented in orange while the non-treated group (S ¼ 0) is represented in blue The outcome 494 14 Quasi-experiments variable is measured both before and after the intervention takes place After the intervention (P ¼ 1), the difference observed between the treated group and the non-treated group does not reveal the true effect of the intervention A difference was already observed before the intervention at P ¼ The difference-in-differences approach controls for this selection bias by subtracting the difference observed between the two groups before the intervention from the difference observed after the intervention In other words, we assume that without any intervention, the trend of the treated group would have been similar to that of the non-treated Graphically, this is equivalent to drawing a line parallel to the trend observed among non-treated units, but starting where the treated units are at P ¼ The dotted line yields the counterfactual b y c , which is depicted by a dotted square at P ¼ The results can also be reproduced through an econometric analysis The model requires the use of a database with information on treated and non-treated units, both before and after the intervention Formally, for each unit i and time period t, the outcome yit can be modeled via the following equation: yit ẳ ỵ Si ỵ Pt ỵ Si Pt ị ỵ Eit where Si is the group variable, Pt is the dummy that controls for the timing of the treatment, the coefficients α, β, γ and δ are the parameters to be estimated and εit is an error term which contains all the factors the model omits The product Si  Pt is an interaction term that represents the treatment variable, i.e whether an individual from group S ¼ received treatment in period P ¼ The estimated counterpart of the equation can be written as: b i þb b þ βS yit ¼ α γ Pt þ b Si Pt ị ỵ bE it Ordinary least squares (OLS) regressions are such that the mean of residuals is exactly equal to zero Thus, on average, we have: b ỵb b ỵ S P ỵ b S Pị ySP ẳ We thereby obtain the four possible outcomes of Table 14.2 If S ¼ P ¼ 0, we have b If S ¼ and P ¼ 1, we y00 ¼ α b Similarly, if S ¼ and P ¼ 0, we get y10 ẳ b ỵ 01 11 bỵb b þ βb þ b obtain y ¼ α γ Last, if S ¼ P ¼ 1, then we have y ẳ ỵb Under this setting, the single differences between subjects are given by: 10 À y00 ¼ βb ΔP¼0 treated=non treated ¼ y 11 À y01 ẳ b ỵ b Pẳ1 treated=non treated ẳ y The estimated impact of the intervention is: P¼0 b b ¼ ΔP¼1 E treated=non treated À Δtreated=non treated ¼ δ 14.2 Difference-in-Differences 495 The regression approach entails a loss in terms of simplicity, but has the advantage to provide a level of significance with respect to the estimated impact Moreover, to go beyond the assumption that other covariates not change across time, the regression model can be extended by including additional variables that may affect the outcome of interest Let us exemplify the approach through a simple application (example 1) Imagine that we would like to estimate the effects of a community-based health program on newborn mortality Assume that this program provides primary care through the use of nurse teams intervening at the city level (e.g., counseling and prevention) The data is provided in Table 14.3 Our dataset comprises 20 jurisdictions, among which the nine municipalities numbered from 12 to 20 were selected for the program (S ¼ 1) We also have information about the pre-intervention period (P ¼ 0) The mortality rate is expressed per thousand of newborns The last column provides information about the gross domestic product (GDP) per capita in each city, for both periods By definition, Table 14.3 forms a panel database, as each column contains observations over two periods for each unit Table 14.4 provides descriptive statistics for each period P ∈ {0, 1} and both groups S ∈ {0, 1} The program appears to have been implemented in municipalities that were poorer in terms of GDP and had worse mortality rates, which creates a selection bias Table 14.5 provides a more detailed view of the evolution of the groups The first frame of interpretation is that of single differences Non-treated municipalities evidence a decrease (À1) in their average mortality rate that by construction cannot be attributed to the policy, but to some yet unspecified variables The treated municipalities show a relatively larger decrease (À3.22) that must nevertheless be related to the initial gap (6.71) between the two groups, as well as to yet unspecified variables This gap still remains but decreases to 4.49 with the intervention To control for the initial differences between the two groups and for their evolution over time, the difference-in-differences approach provides a second and more accurate evaluation of the effect of the intervention: b ẳ y11 ỵ y00 y01 y10 ẳ 17:22 ỵ 13:73 12:73 À 20:44 ¼ À2:22 E The actual impact of the health program on the treated group is in fact À2.22, less than the observed reduction by À3.22 A possible extension of the method is to relax the parallel-trend assumption by including additional variables The previous interpretation assumes a similar pattern among the treated and non-treated units We can go beyond this statement through the use of Ordinary Least Squares (OLS) regression For instance, according to Table 14.4, the increase in per capita GDP observed for the cities not covered by the program might have influenced their mortality rate However, the increase in GDP observed among the treated units was much smaller, suggesting that their mortality rate would not have followed a similar track in the absence of the intervention This could mean that we previously underestimated the effect of 496 Table 14.3 Database for example 14 Quasi-experiments Municipality S P Mortality rate Municipal GDP 1 2 3 4 5 6 7 8 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 14 16 15 17 16 15 15 20 19 12 11 12 11 16 15 10 8 10 19 15 22 18 20 17 20 17 22 19 21 18 19 16 21 18 20 17 9865 10,608 8698 9692 9520 9820 8542 9876 6200 7023 12,698 13,466 13,569 14,569 7231 8965 10,236 11,598 12,589 13,569 13,202 14,598 7566 7727 5640 5964 6720 7023 6560 6780 5201 5469 5678 6521 7021 7243 5023 6038 6541 6456 14.2 Difference-in-Differences Table 14.4 Summary statistics for example 497 S Non-treated group Before After Treated group Before After P Mortality rate Municipal GDP 13.73 12.73 10213.64 11253.09 20.44 17.22 6216.67 6580.11 Table 14.5 Basic difference-in-differences: example Non-treated group S¼0 Treated group S¼1 Δ treated/non-treated Before intervention P¼0 After intervention P¼1 Δ after/before 00 01 y01 À y00 ¼ À1 y ¼ 13:73 y ¼ 12:73 y10 ¼ 20:44 y11 ¼ 17:22 y11 À y10 ¼ À3:22 y10 À y00 ¼ 6:71 y11 À y01 ¼ 4:49 b ¼ À2:22 E the intervention We may relax the parallel-trend assumption by including GDP per capita as an additional variable in the analysis Figure 14.2 provides the coding to be used in R-CRAN (our programs only display outputs directly used in the analysis) Command read table is used to upload the database in R-CRAN using the path C : //mydataDID csv (that denotes the location of the file), saved afterwards under the name “D” The file format is csv, with “;” as a separator, and can be created with Excel The command head displays the first rows of the dataset The first regression (reg1) consists in reproducing the double-difference results using OLS (command lm) By default, in R-CRAN, one needs to specify only the interaction term S∗P in the lm command; the software will automatically include S and P with the interaction term As expected, we obtain the same result as previously The constant (intercept) amounts to 13.73, which is the average outcome y00 for the non-treated group before the intervention (see Table 14.5) The second coefficient stands for the single difference between the treated and Á non-treated for the pre-intervention period ( y10 À y00 ¼ 20:44 À 13:73 ¼ 6:71 The third and here non-significant coefficient represents the decrease in the mortality rate observed for the Ámunicipalities not covered by the program ( y01 À y00 ¼ 12:73 À 13:73 ¼ À1 Last, the coefficient on the interaction term yields the effect of the intervention (À2.22), which is also found to be non-significant The second regression (reg2) extends the model through the inclusion of the GDP per capita among the covariates The coefficient on the interaction term is much higher, as anticipated; it is significant and amounts to À3.07 Command confint yields the 95% confidence interval for the interaction term, namely [À5.05; À1.10] The implementation of the program is therefore associated with a significant reduction in mortality 498 14 Quasi-experiments Fig 14.2 Difference-in-differences with R-CRAN: example 14.3 Propensity Score Matching The idea behind matching is to select and pair units that would be identical in everything but the fact of receiving the intervention Several matching algorithms exist One difficulty they share is that the units may differ in more than one variable, which yields a problem known as the curse of dimensionality To overcome that, the propensity score matching method estimates a model of the probability of participating in the treatment using a set of observed characteristics (overt bias), and then uses the fitted values of this model to match the units It thus allows the multidimensional problem to be reduced to a single dimension: that of the propensity score If the score is accurately computed, the outcome observed for the comparison group should provide a satisfactory counterfactual Figure 14.3 illustrates the approach The main focus is on the post-intervention period, although the scores are often estimated based on pre-intervention characteristics The orange dots represent the treated units (S ¼ 1), while the blue ones represent those units that did not receive the intervention (S ¼ 0) The counterfactual is approximated using the units that could have been selected in theory (with similar propensity scores), but were not Matched units are indicated with two squares connected by a dotted line By matching on the propensity score, we are able to construct two comparable groups The difference in mean outcome between these groups yields the estimated impact of the intervention As can be seen, one condition for using the method is the existence of a sufficient overlap between the propensity scores This is known as the common support condition For example, in 14.3 Propensity Score Matching 499 Fig 14.3 The propensity score matching approach Fig 14.3, those units with very low and very high propensity scores, respectively to the left and to the right, are excluded from the analysis Propensity scores are derived from a qualitative response regression that estimates the probability of a unit’s exposure to the intervention, conditional on a set of observable characteristics that may affect participation in the program For instance, a logit model can be used and be specified as:  ln Si À Si  ẳ ỵ x1i ỵ þ βK xKi where i stands for unit i, Si specifies whether unit i belongs to the treatment group, the x’s represent the individual characteristics, and the β’s are the coefficients to be estimated Once estimated, the model yields the propensity score, defined as the estimated probability b S i that unit i receives treatment, given a vector of observed covariates An important problem with respect to propensity score matching is to identify the x variables to be included in the model In general, any variable that is thought to simultaneously influence the exposure S and the outcome variable y should be included One should not use variables observed after the intervention, as they could themselves be influenced by the intervention Thus, a crucial issue is the availability of characteristics observed before the intervention takes place Once the model has been estimated, the treated units are matched to the non-treated units that are most similar in terms of their propensity score b S The two most common methods are nearest neighbor matching and caliper matching With nearest neighbor matching, each unit in the treatment group is matched with a unit from the control group that is closest in terms of propensity score The second 500 14 Table 14.6 Data for example Unit Score Outcome Control group 0.2 10 0.3 30 0.4 40 Treatment group 0.3 45 0.4 50 0.5 60 0.6 70 Quasi-experiments Average outcome 26.67 56.25 method uses a standardized distance which is acceptable for any match This tolerance level is imposed on the propensity scores and observations which are outside of the caliper are dropped As a matter of comparison, caliper matching generally gives more accurate results, as nearest neighbor matching may link units with very different scores if no closer match is available The average treatment effect is estimated by computing the difference in means between the two groups: b ¼ yS¼1 À yS¼0 E matched matched b exist depending on whether the focus is on the sample average Three measures of E treatment effect for the treated group (ATT), the sample average treatment effect for the control group (ATC), or the sample average treatment effect (ATE) The following example illustrates the difference (Table 14.6) The non-treated group consists of three units (controls 1, and 3), while the treated group comprises four units (treated 4, 5, and 7) Each unit displays a score value (in this illustrative case, unidimensional and exogenous) and a related outcome Figure 14.4 shows the corresponding scatter plot, with blue dots for the control group and orange dots for the treated group Without matching, the difference in outcome means between the two groups is: S¼0 b unmatched ¼ yS¼1 unmatched E ¼ 56:25 À 26:67 ¼ 29:58 unmatched À y The matching principle takes into account the fact that a number of units are relatively distant from the others and that it would be more relevant to compare the outcomes of units with neighboring locations by giving more weights to central observations As was mentioned before, the nearest neighbor matching may link units in three different ways With the ATT method, the focus is on the treated units (4, 5, and 7) and how they differ from their non-treated counterparts (controls 1, and 3) Figure 14.5 displays the corresponding matching All four orange dots are matched The analysis excludes unit as it differs too much from any potential counterpart 14.3 Propensity Score Matching 501 70 Treatment and control groups 60 40 30 Outcome 50 10 20 0.2 0.3 0.4 0.5 0.6 Score Fig 14.4 Scatter plot for example With the ATC method, the focus is reversed and the blue dots of the control group are matched to their nearest neighbor treated units, which leaves treated units and unmatched Table 14.7 shows how the corresponding differences in means are calculated More generally, one would prefer to combine both approaches and compute the average treatment effect (ATE) instead Hand calculations from Table 14.6 give the following differences in outcome means between the two groups: b ATT ị ẳ 45 ỵ 50 ỵ 60 ỵ 70ị 30 ỵ 40 ỵ 40 ỵ 40ị ẳ 18:75 E 4 b ATCị ẳ 45 ỵ 45 ỵ 50ị 10 ỵ 30 ỵ 40ị ẳ 20 E 3 b ATEị ẳ 45 ỵ 50 ỵ 60 ỵ 70ị ỵ 45 ỵ 45 ỵ 50ị E 30 ỵ 40 ỵ 40 þ 40Þ þ ð10 þ 30 þ 40Þ À ¼ 19:286 502 14 60 Outcome 40 50 50 30 20 10 20 10 40 60 30 Outcome 70 ATC 70 ATT Quasi-experiments 0.2 0.3 0.4 0.5 0.6 0.2 0.3 0.4 0.5 0.6 Score Score Fig 14.5 ATT and ATC matching methods: example Table 14.7 ATT, ATC and ATE average outcomes ATT Outcome Matched outcome Matched control unit Unit Treatment group Control group Unit 45 50 60 70 30 40 40 40 3 ATC Outcome Matched outcome Matched treated unit Unit Control group Treatment group Unit 10 30 40 45 45 50 4 Mean outcome differences that take into account matching differ from the b unmatched ¼ 29:58) unmatched result (E Figure 14.6 provides the coding to be used in R-CRAN to run ATT, ATC and ATE analyses The package Matching is uploaded via the library command The first step consists in creating the data to be matched By order of appearance, S is a vector stating which units have been treated or not In our example, the first three units are coded which designates them as controls, code implies that the corresponding unit is treated; Score denotes the variable we wish to match on (in this illustrative case, its values are exogenous); Outcome is our variable of interest; and Units indexes the individuals The command Match is then used to compute the average treatment effect for the treated (mymatch1), for the controls (mymatch2), and the ATE measurement (mymatch3) In each case, we have to 14.3 Propensity Score Matching Fig 14.6 ATT, ATC and ATE with R-CRAN: example 503 .. .Statistical Tools for Program Evaluation Jean-Michel Josselin • Benoıˆt Le Maux Statistical Tools for Program Evaluation Methods and Applications to Economic Policy, Public Health, and Education. .. citizens prefer to receive goods and services now rather than later 8 Statistical Tools for Program Evaluation: Introduction and Overview Ex ante evaluation Financial evaluation Economic evaluation. .. question may differ greatly In health, for instance, the outcome can relate to survival In education, it can be Statistical Tools for Program Evaluation: Introduction and Overview school completion

Ngày đăng: 03/01/2020, 15:49

Từ khóa liên quan

Mục lục

  • Acknowledgments

  • Contents

  • 1: Statistical Tools for Program Evaluation: Introduction and Overview

    • 1.1 The Challenge of Program Evaluation

    • 1.2 Identifying the Context of the Program

    • 1.3 Ex ante Evaluation Methods

    • 1.4 Ex post Evaluation

    • 1.5 How to Use the Book?

    • Bibliography

    • Part I: Identifying the Context of the Program

    • 2: Sampling and Construction of Variables

      • 2.1 A Step Not to Be Taken Lightly

      • 2.2 Choice of Sample

      • 2.3 Conception of the Questionnaire

      • 2.4 Data Collection

      • 2.5 Coding of Variables

      • Bibliography

      • 3: Descriptive Statistics and Interval Estimation

        • 3.1 Types of Variables and Methods

        • 3.2 Tabular Displays

        • 3.3 Graphical Representations

        • 3.4 Measures of Central Tendency and Variability

        • 3.5 Describing the Shape of Distributions

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan