342
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A checklist to guide sensitivity analyses and replications of impact evaluations

, , , &
Received 26 Jun 2023, Accepted 08 Feb 2024, Published online: 25 Feb 2024

ABSTRACT

The transparency, reproducibility, and ethics movement has helped promote efforts to improve the rigour and validity of social science research. However, replication research, where the original data is used to check the validity and robustness of the estimations and recommendations in the original paper, has not been widely used as a tool in this field. This article outlines a checklist and standardised process that can guide replication researchers in creating and justifying their replication plans. To create the checklist, we reviewed 31 resources that were identified through unstructured keyword searches. We also reviewed and coded replication studies from 3ie’s Replication Program which includes more than 20 3ie-funded replication papers and guidance on our replication process. There were five categories of potential validation tests and robustness checks that researchers could include. The checklist attributes range from testing specific assumptions for select impact evaluation methods to data transformations or dealing with heterogeneity. We hope that this checklist helps stimulate demand for replication research and promotes the use of replication research as a tool within transparency, reproducibility and ethics.

1. Introduction

The transparency, reproducibility, and ethics (TRE) movement has grown such that TRE has now become a widely accepted and innate part of social science research. Multiple institutions, both research and funders alike, have enacted TRE policies to guide their own research. Within the reproducibility component, there have been substantial efforts to promote computational reproducibility (ability to reproduce statistical code and results used in a study), such as the creation of the Social Science Reproduction Platform. While computational reproducibility can help identify foundational coding discrepancies, it cannot provide insight into the internal validity, model specifications, or analytical choices used to produce the results in the original paper. Replications are an underutilised tool within TRE that can help to support transparency and reproducibility by providing additional verification and insight into the results from an original paper.

The term ‘replication’ can frequently be used as a catchall term and can be used to describe multiple different scenarios (Clemens Citation2015; Vilhuber Citation2020). Here, we use it to describe when data from the original study is used to check the validity and robustness of the estimations and recommendations (International Initiative for Impact Evaluation (3ie), Citationn.d.). Replication studies are beneficial to the research world as they can be used to increase a study’s credibility (Brown and Wood Citation2018). The benefits that replication can offer to decision-making were the motivations behind the International Initiative for Impact Evaluation’s (3ie) replication program. Through this programme, researchers were awarded funding to replicate influential impact evaluations across multiple sectors.

However, replication research is not always positively viewed within the sector. The relationship between replication researchers and original researchers is often affected by perceptions that replication researchers are trying to overturn results rather than to promote science (Gertler, Galiani, and Romero Citation2018). Part of this perception of replications is due to the lack of standardisation within replication research. For the most part, replication researchers select the analyses that they will focus on guided by their own experiences and knowledge. This ad hoc process of identifying analyses does not support co-creation of the replication with the original authors and does not mitigate negative perceptions of replication research.

With this paper, we aim to provide a resource for impact evaluation researchers to use, either for replication research or for sensitivity analyses of their own work. Building on the taxonomy created by Brown and Wood (Citation2018), we provide a checklist that provides guidance on specific attributes that should be checked based on the impact evaluation method used along with recommended robustness and sensitivity tests. This checklist is not meant to be exhaustive but serves as a starting point that researchers can use when writing their replication plans. We provide guidance on how to integrate this checklist into a transparent process that documents how and why analytical choices are made. By creating a standardised process, we hope that this checklist will encourage conversation between replication and original researchers and ensure that it becomes a key tool within the TRE movement.

2. Creation of checklist

The initial checklist was informed by the framework provided in Brown and Wood’s (Citation2018) article, which provides a broad taxonomy of tips for replication organised into four groups (validity of assumptions, data transformations, estimation methods, and heterogeneous impacts). We expand on their work by including separating assumptions by the identification strategy used as well as by including a broader set of checks to perform based on a review of existing replication. We focused on six common impact evaluation designs to include in the checklist: randomised controlled trials, difference-in-differences, instrumental variable, interrupted time series, matching, and regression discontinuity design. To identify additional attributes and tests to include in the checklist, we followed a two-step approach to identify resources. We first generated a list of foundational resources for the common impact evaluation methodologies that were included in the checklist. We then used an unstructured keyword search within Google Scholar to identify additional resources. In addition, we also used backwards citation tracking to assess if any of the resources cited within an article would also be useful to inform the checklist. Thirty-one resources were identified and reviewed by the study team with the full list provided in the references section. Within the checklist, we included citations to specific resources if they provided guidance on the test or process to check that appropriate attribute.

To further refine the checklist framework, we used 3ie’s database of replication studies. These replications were funded by 3ie between 2014 and 2022 as part of the organisation’s efforts to improve replication within the social sciences. Initially, 3ie’s replications were performed in a somewhat ad hoc manner, with general replication guidelines and expectations, but no specific instructions outlining what to check in various papers. 3ie’s guidelines for replication researchers that they fund, which were originally published in 2012, have been adjusted over time to continually improve transparency in the replication process. The current guidelines for 3ie funded replications discuss what type of replication is to be performed (those focused on internal replication not external validity), which papers would be replicated (impact evaluations within low and/or middle income countries), and requires a replication plan that ensures researchers stick to pre-specified robustness checks, as well as take part in an extensive review process. However, which robustness checks should be included in a replication is not specified, and the pre-specified checks can be deviated from as necessary, provided an explanation is given. Since the replication analyses were primarily determined based on the replication researcher’s experience and intuition, we decided to categorise the replication analyses by Brown and Wood’s (Citation2018) framework. We extracted data from each study and categorised it according to our initial checklist criteria. For analyses that could not be easily categorised, we created additional categories and/or criteria to capture those elements. The summary of the replication study data extraction is presented in the section below.

3. Summary of 3ie’s replication studies

We reviewed 24 3ie funded replication studies, which were published between 2014 and 2022. Most of the original papers that were replicated were Randomised Control Trials (RCTs), though some also employed other identification strategies, sometimes in addition to the RCT: sixteen RCTs, six difference-in-differences, two instrumental variable, one propensity score matching, and three other impact evaluation methods (coarsened exact matching with difference-in-differences, fixed effects with matching, and natural experiment) ().

Figure 1. Original paper identification strategy within replication studies (source: 3ie). Studies categorised as ‘other’ used coarsened exact matching with difference-in-difference, fixed effects with matching, and natural experiment.

Figure 1. Original paper identification strategy within replication studies (source: 3ie). Studies categorised as ‘other’ used coarsened exact matching with difference-in-difference, fixed effects with matching, and natural experiment.

We then categorised the replication researchers’ methods into the replication categories discussed by Brown and Wood (Citation2018) (). The replications employed many of the checklist categories, though in an ad hoc way and not always mentioning the checks performed that did not result in any new information or issues.

Figure 2. Replication analyses by checklist categories (source: 3ie).

Figure 2. Replication analyses by checklist categories (source: 3ie).

The most common checks included adjustments to the estimation method and checks for heterogeneous outcomes. The validity of assumptions and conditions, and standard checks have not been as common in 3ie replication studies.

For instance, in the case of the RCTs, a common estimation method check was to account for clustering with the generalised linear mixed model (GLMM) (four of the RCT replications performed this check). For other identification strategies, replication researchers often included propensity score matching or added in regressors to check the estimation methods. Heterogeneous outcomes were often checked to see if there are different outcomes and additional information to be gleaned from sub-groups. A relatively common validity check for RCTs is the Cox Proportional Hazard (three of the 3ie replications performed this check) or confirming that the pre-treatment groups are balanced. There are many other assumptions by identification strategy that should be checked if possible, such as parallel trends for difference-in-differences, which only two of the six difference-in-differences replications mention checking. Based on our review of the 3ie replications, it is evident that there has not been consistency in the types of checks being performed, and often common checks that are to be expected for certain identification strategies are not mentioned in the replication.

Furthermore, even within the same category, there is variation in how the check is performed. With regard to data transformations, many papers change the way the variables are constructed, and how outliers and missing data are dealt with (for instance, impute instead of drop) since this can significantly influence the results. However, these data transformations also only occur in some of the replications when researchers identified a potential issue, rather than including this as a standard check to the original paper methods. About one-third of the replication studies looked at variable construction and outliers, which can also affect the results. We have included this as a standard check in the checklist that researchers can check this every time.

4. Checklist

There are five main categories in the checklist: 1) Validation of Assumptions/Conditions; 2) Data Transformations; 3) Estimation Methods; 4) Heterogenous Outcomes; 5) Standard Checks. Each section follows the same structure. The first column provides the key attribute/condition that may not have been validated or tested in the original study. The second column provides the recommended test to check if the attribute has not been met. Replication authors are expected to fill in the third column with comments on whether or not that specific attribute needs to be tested. If the authors determine that the attribute should be tested, they should fill out the action column with details on the specific analysis that they will be using. provides an illustrative example of how authors could fill out the checklist table. The full complete checklist, including a guidance to authors section, is included in the Online Appendix.

Table 3. Key attributes and recommended tests for the data transformations section of checklist

Table 4. Key attributes and recommended tests for the estimation methods section of checklist

Table 5. Key attributes and recommended tests for the heterogenous outcomes section of checklist

Table 6. Key attributes and recommended tests for the standard checks section of checklist

Table 1. Example of filled out table

4.1 Validity of assumptions and conditions

This section presents key attributes of identification strategies that should be validated (). General causal inference attributes were identified from foundational impact evaluation texts detailed in the methods section. These attributes cover key assumptions such as the ‘Stable Unit Treatment Values Assumption’ (SUTVA), consistency, and ignorability. These general assumptions are applicable to any impact evaluation study and authors should comment on whether or not the assumptions were met or adequately considered in the study to be replicated.

Table 2. Key attributes and recommended tests for the validity of assumptions and conditions section of checklist

We then present key assumptions and considerations by the six identification strategies included in this research. For study designs that combine multiple identification strategies, the replication researcher should comment on the assumptions for each included identification strategy to ensure they were met.

In some cases (e.g. SUTVA or instrumental variable’s exclusion restriction), the assumption is assumed to be met and may not be able to be tested. In those cases, we recommend that the replication researcher consider reaching out to the original study authors to comment on whether there were any suspected violations and include that explanation in the checklist.

4.2 Data transformations

This section provides various data transformations that should be checked and/or implemented if not included in original analysis (). The data transformations were informed by Brown and Wood (Citation2018) as well as through a keyword search in the literature. Key attributes for this section primarily focus on handling missing data, such as attrition, missing data patterns, and non-compliance. Additional data transformations are included in the standard checks section as they are recommended to be tested in every replication study.

4.3 Estimation methods

The following considerations are to test the appropriateness of the estimation strategy and determine alternatives (). This section primarily focuses on the estimation strategy chosen by the original researcher. It includes testing alternative estimation strategies (e.g. using survival modelling to assess incidence ratios instead of general estimating equations) or testing alternative estimation specifications (e.g. using likelihood-based variance estimators instead of bootstrapped estimators). We have not included specific guidelines for each estimation strategy type as there are numerous regression models that could be chosen. Instead, we recommend that the replication researcher provides justification and references to explain why they have chosen that estimation strategy and how this analysis will complement the original results. It is important to note that the aim in choosing a different estimation strategy than the original research should not be to invalidate the original results but to provide insight. For example, if the original research aggregated individual data to the cluster level before conducting the regression, the replication researcher may choose to use individual-level data in the regression as a measure of controlling for within cluster variation.

4.4 Heterogeneous outcomes

The following considerations are to assess for potential heterogeneity (). A new entry should be used for each separate heterogeneity analysis. The heterogenous outcomes selected should be justified in the checklist below based on the original paper as well as sector-specific references. Replication researchers should be cautious to note that there may not be sufficient power to fully test these sub-group analyses. These considerations along with potential issues related to multiple hypothesis testing and imbalance between sub-groups (especially for randomised studies where randomisation may have been implemented at a higher unit of analysis) should be commented upon in the checklist and in the replication paper when interpreting results.

4.5 Standard checks

The following considerations are recommended to be checked in every replication study (). These considerations include concordance with the pre-analysis plan, assessing covariate balance, various data transformations (such as treatment of outliers, variable transformations, and clustered standard errors), and ex-post power calculations that may be relevant in each replication study.

5. Conclusion

In this paper, we outlined a checklist that can be used to guide both replication research as well as sensitivity analyses within an original study. We recognise that this checklist is not exhaustive and that there are additional assumptions or robustness tests that could be included. Rather than creating an exhaustive list, we aimed to create this guidance document that can serve as the starting point for a replication researcher.

This work complements the broader TRE movement as it continues to support transparency and reproducibility within the social science field. It also complements other resources, such as the Guide for Accelerating Reproducibility in the Social Sciences (Berkeley Initiative for Transparency in the Social Sciences Citation2022). Along with these other resources, our checklist can be used to help identify which analyses should be included and provides a clear justification for why it should be included.

This checklist is a value-add into the impact evaluation world as it includes specific methods-based assumptions along with guidance on estimation methods, how to treat heterogeneity, and data transformations into one resource. Each of these attributes are found in other resources but as of yet, have not been compiled into one guidance document. The checklist and the associated process that comes with completing the checklist provides a transparent way for replication researchers to guide their analyses. It also builds in communication with the original authors to ensure that the replication research is co-created. Application of this checklist can help mitigate the risk that replication researchers deliberately seek to overturn the original paper results or are perceived to be doing so. We recommend that replication researchers include the completed checklist in their pre-analysis plan as well as an appendix within their completed study to further promote transparency on their replication. We hope that this checklist stimulates demand for replication research and that researchers continue to revise and add to the checklist as a public good.

Supplemental material

Supplemental Material

Download MS Word (35.5 KB)

Acknowledgements

We would like to acknowledge Marie-Eve Augier and Katherine Quant for their support in managing this project. We also would like to thank Sayak Khatua and Carolyn Huang for reviewing the initial drafts of the replication checklist.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19439342.2024.2318695.

Additional information

Funding

This work was supported by the Bill and Melinda Gates Foundation under grant number OPP1034373.

Notes on contributors

Sridevi K. Prasad

Sridevi K Prasad is a Senior Research Associate and Leader of Trainings at the International Initiative for Impact Evaluation (3ie). At 3ie, Sridevi currently leads 3ie’s Virtual Trainings Initiative, which seeks to develop capacity building resources in impact evaluation methodology for researchers and policymakers. Sridevi also led evidence mapping research on the associations between water, sanitation, and hygiene achievements and higher-level outcomes related to prosperity, stability and resilience for USAID’s Center for Water Security, Sanitation, and Hygiene. She also led and conducted two replication studies on HIV combination prevention interventions. She is currently providing technical support to an impact evaluation within the financial sector for the European Bank of Reconstruction and Development. She has a Master’s in Public Health, specialising in Monitoring & Evaluation as well as Epidemiology and Biostatistics, from Boston University and a Bachelor of Arts in Molecular and Cell Biology from the University of California, Berkeley.

Fiona Kastel

Fiona Kastel is a Research Associate at the International Initiative for Impact Evaluation (3ie). At 3ie, Fiona provides research, programme management, and data analytics support for multiple programmes and initiatives on agriculture, education, finance, health, and policy and institutional reform. She leads 3ie’s Data Innovations Group and provides technical support on impact evaluations, evaluability assessments, evidence synthesis products, and virtual trainings. Her primary projects include a geospatial impact evaluation of an agricultural intensification programme in Niger, an impact evaluation of the European Bank for Reconstruction and Development’s COVID-19 Solidarity Package, and the strengthening evidence and economic modelling partnership with Millennium Challenge Corporation. Fiona holds a Master of Public Affairs from Brown University and a Bachelor of Science in Quantitative Analysis of Markets and Organization from the University of Utah.

Suvarna Pande

Suvarna Pande is a Research Associate at the International Initiative for Impact Evaluation (3ie). At 3ie, Suvarna contributes to the production of systematic reviews, replication studies, and evidence gap maps of research on the effectiveness of economic and social interventions in low- and middle-income countries. She provides research support at all stages of the review process, from literature synthesis to data and meta-analysis. She is particularly interested in systematic, critical assessment of behavioural interventions in experimental and quasi-experimental designs. Suvarna is finishing her PhD in Behavioural Development Economics from University of East Anglia, UK on the Psychology of poverty, its effect on cognitive systems and risk and social preferences. She holds an MPhil and Masters in Economics from Jawaharlal Nehru University and a Bachelors in Economics University of Delhi.

Chenxiao Zhang

Chenxiao Zhang is a Data Scientist at the World Bank. He interned at the International Initiative for Impact Evaluation (3ie) to identify new robustness checks and approaches to causal inference techniques as well as to conduct research on how remote sensing techniques can be implemented in impact evaluations. Chenxiao holds a Masters of Science in Data Science for Public Policy from Georgetown University and a Bachelors of Social Science in Sociology and Political Science from Hong Kong Baptist University.

Douglas M. Glandon

Douglas Glandon is Chief of Strategic Initiatives and Leader of Methods Development at the International Initiative for Impact Evaluation (3ie). At 3ie, Douglas designs, leads, and quality assures projects in evidence generation, synthesis, and knowledge translation, including portfolios of work with USAID and the Millennium Challenge Corporation. As Leader of Methods Development, he also guides the organisation in updating and enhancing our methods and approaches for generating timely, policy-responsive, rigorous evidence to inform international development policy, programming, and investments. Douglas is particularly interested in the intersection of data science and econometrics and measuring the costs and effectiveness of complex and multi-sectoral programmes. Douglas has a PhD in Health Systems from the Johns Hopkins Bloomberg School of Public Health, a Master’s in Public Health from the Tufts University School of Medicine, and a Bachelor of Arts in International Relations and Community Health from Tufts University.

References

  • Bärnighausen, T., C. Oldenburg, P. Tugwell, C. Bommer, C. Ebert, M. Barreto, E. Djimeu, et al. 2017. “Quasi-Experimental Study Designs Series—Paper 7: Assessing the Assumptions.” Journal of Clinical Epidemiology 89 (September): 53–66. https://doi.org/10.1016/j.jclinepi.2017.02.017.
  • Bell, M. L., M. G. Kenward, D. L. Fairclough, and N. J. Horton. 2013. “Differential Dropout and Bias in Randomised Controlled Trials: When It Matters and When It May Not.” BMJ: British Medical Journal 346:e8668. https://doi.org/10.1136/bmj.e8668. jan21 1
  • Berkeley Initiative for Transparency in the Social Sciences. 2022. “Chapter 5 Checking for Robustness.” Guide for Accelerating Computational Reproducibility in the Social Sciences. September 20, 2022. https://bitss.github.io/ACRE/.
  • Bernal, J. L., S. Cummins, and A. Gasparrini. 2017. “Interrupted Time Series Regression for the Evaluation of Public Health Interventions: A Tutorial.” International Journal of Epidemiology, dyw098. https://doi.org/10.1093/ije/dyw098.
  • Bhaskaran, K., A. Gasparrini, S. Hajat, L. Smeeth, and B. Armstrong. 2013. “Time Series Regression Studies in Environmental Epidemiology.” International Journal of Epidemiology 42 (4): 1187–1195. https://doi.org/10.1093/ije/dyt092.
  • Bilniski, A., and L. A. Hatfield. 2020. “Nothing to See Here? Non-Inferiority Approaches to Parallel Trends and Other Model Assumptions.” arXiv. http://arxiv.org/abs/1805.03273.
  • Branzon, Z., and F. Mealli. 2019. “The Local Randomization Framework for Regression Discontinuity Designs: A Review and Some Extensions.” arXiv. http://arxiv.org/abs/1810.02761.
  • Brown, A. N., and B. D. K. Wood. 2018. “Which Tests Not Witch Hunts: A Diagnostic Approach for Conducting Replication Research.” Economics 12 (1): 20180053. https://doi.org/10.5018/economics-ejournal.ja.2018-53.
  • Cinelli, C., and C. Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (1): 39–67. https://doi.org/10.1111/rssb.12348.
  • Cinelli, C., and C. Hazlett. 2022. “An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables.” SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4217915.
  • Clemens, M. 2015. “The Meaning of Failed Replications: A Review and Proposal.” Working Paper 399, CGD Working Papers. Center for Global Development.
  • de Souza, R. J., R. B. Eisen, S. Perera, B. Bantoto, M. Bawor, B. B. Dennis, Z. Samaan, and L. Thabane. 2016. “Best (But Oft-Forgotten) Practices: Sensitivity Analyses in Randomized Controlled Trials.” The American Journal of Clinical Nutrition 103 (1): 5–17. https://doi.org/10.3945/ajcn.115.121848.
  • Dette, H., and M. Schumann. 2020. “Difference-in-differences Estimation under Non-parallel Trends.” Working Paper.
  • Gertler, P., S. Galiani, and M. Romero. 2018. “How to Make Replication the Norm.” Nature 554 (7693): 417–419. https://doi.org/10.1038/d41586-018-02108-9.
  • Glewwe, P., and P. Todd. 2022. Impact Evaluation in International Development: Theory, Methods and Practice. Washington, DC: World Bank. http://hdl.handle.net/10986/37152.
  • Hayes, R. J., and L. H. Moulton. 2017. Cluster Randomised Trials. 2nd ed. https://www.routledge.com/Cluster-Randomised-Trials/Hayes-Moulton/p/book/9781032339580.
  • Heinrich, C., A. Maffioli, and G. Vázquez. 2010. “A Primer for Applying Propensity- Score Matching.” Technical Notes IDB-TN-161. Impact Evaluation Guidelines. Inter-American Development Bank. https://publications.iadb.org/en/primer-applying-propensity-score-matching.
  • Ibrahim, J. G., H. Chu, and M.-H. Chen. 2012. “Missing Data in Clinical Studies: Issues and Methods.” Journal of Clinical Oncology 30 (26): 3297–3303. https://doi.org/10.1200/JCO.2011.38.7589.
  • Imai, K., and M. Ratkovic. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” The Annals of Applied Statistics 7 (1). https://doi.org/10.1214/12-AOAS593.
  • International Initiative for Impact Evaluation (3ie). n.d. “Replication Program.” International Initiative for Impact Evaluation (3ie). Accessed September 20, 2022. https://www.3ieimpact.org/research/replication.
  • Jacob, R., P. Zhu, M.-A. Somers, and H. Bloom. 2012. “A Practical Guide to Regression Discontinuity.” Working Paper, Methodological Publication. MDRC. https://www.mdrc.org/sites/default/files/regression_discontinuity_full.pdf.
  • Jeremy, L., and S. A. Swanson. 2018. “Understanding the Assumptions Underlying Instrumental Variable Analyses: A Brief Review of Falsification Strategies and Related Tools.” Current Epidemiology Reports 5 (3): 214–20. https://doi.org/10.1007/s40471-018-0152-1.
  • Lall, R. 2016. “How Multiple Imputation Makes a Difference.” Political Analysis 24 (4): 414–433. https://doi.org/10.1093/pan/mpw020.
  • Mantobaye, M., W. S. Rea, and W. Robert Reed. 2018. “Which Panel Data Estimator Should I Use?: A Corrigendum and Extension.” Economics 12 (1): 20180004. https://doi.org/10.5018/economics-ejournal.ja.2018-4.
  • Romano, J. P., and M. Wolf. 2005. “Stepwise Multiple Testing as Formalized Data Snooping.” Econometrica 73 (4): 1237–1282. https://doi.org/10.1111/j.1468-0262.2005.00615.x.
  • Roth, J., P. H. C. Sant’anna, A. Bilinski, and J. Poe. 2023. “What’s Trending in Difference-In-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235 (2): 2218–44. https://doi.org/10.1016/j.jeconom.2023.03.008.
  • Rudolph, K. E., and E. A. Stuart. 2018. “Using Sensitivity Analyses for Unobserved Confounding to Address Covariate Measurement Error in Propensity Score Methods.” American Journal of Epidemiology 187 (3): 604–613. https://doi.org/10.1093/aje/kwx248.
  • Small, D. S. 2007. “Sensitivity Analysis for Instrumental Variables Regression with Overidentifying Restrictions.” Journal of the American Statistical Association 102 (479): 1049–1058. https://doi.org/10.1198/016214507000000608.
  • Smith, L. M., L. E. Lévesque, J. S. Kaufman, and E. C. Strumpf. 2016. “Strategies for Evaluating the Assumptions of the Regression Discontinuity Design: A Case Study Using a Human Papillomavirus Vaccination Programme.” International Journal of Epidemiology, dyw195. https://doi.org/10.1093/ije/dyw195.
  • Sterne, J. A., J. B. Carlin, P. Royston, J. R. Carpenter, P. Royston, M. G. Kenward, and A. M. Wood. 2009. “Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls.” BMJ: British Medical Journal 338:b2393. https://doi.org/10.1136/bmj.b2393. jun29 1
  • Tian, A., T. Coupé, S., Khatua, W. R. Reed,and B. Wood. 2022. “Power to the Researchers: Calculating Power After Estimation.” Working Paper 17/2022, Christchurch, New Zealand: UC Business School. https://repec.canterbury.ac.nz/cbt/econwp/2217.pdf.
  • Vilhuber, L. 2020. “Reproducibility and Replicability in Economics.” Harvard Data Science Review 2 (4). https://doi.org/10.1162/99608f92.4f6b9e67.
  • Wang, X., Y. Jiang, N. R. Zhang, and D. S. Small. 2018. “Sensitivity Analysis and Power for Instrumental Variable Studies.” Biometrics Bulletin 74 (4): 1150–60. https://doi.org/10.1111/biom.12873.
  • Weuve, J., E. J. Tchetgen Tchetgen, M. Maria Glymour, T. L. Beck, N. T. Aggarwal, R. S. Wilson, D. A. Evans, and C. F. M. de Leon. 2012. “Accounting for Bias Due to Selective Attrition: The Example of Smoking and Cognitive Decline.” Epidemiology 23 (1): 119–28. https://doi.org/10.1097/EDE.0b013e318230e861.
  • Wing, C., K. Simon, and R. A. Bello-Gomez. 2018. “Designing Difference in Difference Studies: Best Practices for Public Health Policy Research.” Annual Review of Public Health 39 (1): 453–69. https://doi.org/10.1146/annurev-publhealth-040617-013507.

Replication Studies

  • Aiken, A. M., C. Davey, J. R. Hargreaves, and R. J. Hayes. 2014. “Reanalysis of Health and Educational Impacts of a School-Based Deworming Program in Western Kenya.” Replication Paper 3. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/reanalysis-health-and-educational-impacts-school-based.
  • Alinaghi, N., and W. Robert Reed. 2019. “Risk Sharing and Transaction Costs: A Replication Study of Evidence from Kenya’s Mobile Money Revolution.” Replication Paper 22. 2019th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0022.
  • Atanda, A. A. 2019. “Biometric Smartcards and Payment Disbursement: A Replication Study of a State Capacity-Building Experiment in India.” Replication Paper 23. 2019th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0023.
  • Basurto, M. P., R. Burga, J. L. Flor Toro, and C. Huaroto. 2015. “Walking on Solid Ground: A Replication Study on Piso Firme’s Impact.” Replication Paper 7. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/walking-solid-ground-replication-study-piso-firmes.
  • Beteta, E., G. Aguilar, O. Elorreaga, J. Pierre Meneses, E. Ventura, and C. Huaroto. 2018. “Mobile Money and Its Impact on Improving Living Conditions in Niger: A Replication Study.” Replication Paper 19. 2018th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0019.
  • Bowser, W. H., 2015. “The Long and Short of Returns to Public Investments in Fifteen Ethiopian Villages.” Replication Paper 4. 2015th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0004.
  • Cameron, D., E. Whitney, and P. Winters. 2015. “The Effects of Land Titling on the Urban Poor: A Replication of Property Rights.” Replication Paper 9. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/effects-land-titling-urban-poor-replication-property.
  • Carvalho, N., and S. Rokicki. 2015. “The Impact of India’s JSY Conditional Cash Transfer Programme: A Replication Study.” Replication Paper 6. Replication Paper, International Initiative for Impact Evaluation (3ie). https://3ieimpact.org/evidence-hub/publications/replication-papers/impact-indias-jsy-conditional-cash-transfer-programme.
  • Chen, B., and M. Alam. 2017. “STRETCHing HIV Treatment: A Replication Study of Task Shifting in South Africa.” Replication Paper 13. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/stretching-hiv-treatment-replication-study-task.
  • Djimeu, E. W. 2018. “When to Start ART? A Replication Study of Timing of Antiretroviral Therapy for HIV-1-Associated Tuberculosis.” Replication Paper 14. 2018th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0014.
  • Djimeu, E. W., and E. G. Dickens. 2020. “Treatment as Prevention: A Replication Study on Early Antiretroviral Therapy Initiation and HIV-1 Transmission.” Replication Paper 24. 2020th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0024.
  • Djimeu, E. W., J. E. Korte, and F. A. Calvo. 2015. “Male Circumcision and HIV Acquisition: Reinvestigating the Evidence from Young Men in Kisumu, Kenya.” Replication Paper 8, Replication Paper. International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/male-circumcision-and-hiv-acquisition-reinvestigating.
  • Donato, K., and A. Garcia Mosqueira. 2016. “Power to the People?: A Replication Study of a Community-Based Monitoring Programme in Uganda.” Replication Paper 11. 2016th ed. Replication Paper,International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0011.
  • Hein, N. A., D. S. Bagenda, and J. Luo. 2018. “PEPFAR and Adult Mortality: A Replication Study of HIV Development Assistance Effects in Sub-Saharan African Countries.” Replication Paper 15. 2018th ed, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0015.
  • Iversen, V., and R. Palmer-Jones. 2014. “TV, Female Empowerment and Demographic Change in Rural India.” Replication Paper 2. 2014th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0002.
  • Kuecken, M., and M.-A. Valfort. 2015. “Fighting Corruption Does Improve Schooling: A Replication Study of a Newspaper Campaign in Uganda.” Replication Paper 10. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/fighting-corruption-does-improve-schooling-replication.
  • Lhachimi, S. K., and T. Seuring. 2018. “Thou Shalt Be given… but How? A Replication Study of a Randomized Experiment on Food Assistance in Northern Ecuador.” Replication Paper 17. 2018th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0017.
  • Prasad, S. K., E. W. Djimeu, J. E. Korte, and D. M. Glandon. 2022. “Treatment as Prevention: A Replication Study of a Universal Test and Treatment Cluster-Randomized Trial in Zambia and South Africa.” Replication Paper 26. 2022nd ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0026.
  • Prasad, S. K., and D. M. Glandon. 2022. “Treatment as Prevention: A Replication Study on a Universal Test and Treat Cluster-Randomized Trial in South Africa from 2012–2016.” Replication Paper 25. 2022nd ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0025.
  • Reimão, M. E. 2019. “Cash and Change: A Replication Study of a Cash Transfer Experiment in Malawi.” Replication Paper 21. 2019th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0021.
  • Smith, L. M., N. A. Hein, and D. S. Bagenda. 2017. “Cash Transfers and HIV/HSV-2 Prevalence a Replication of a Cluster Randomized Trial in Malawi.” Replication Paper 12. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/cash-transfers-and-hivhsv-2-prevalence-replication.
  • Stage, J., and T. Thangavelu. 2018. “Savings Revisited: A Replication Study of a Savings Intervention in Malawi.” Replication Paper 18. 2018th ed. Replication Paper,International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0018.
  • Wang, H., F. Qiu, and J. Luo. 2019. “Impact of Unconditional Cash Transfers: A Replication Study of the Short-Term Effects in Kenya.” Replication Paper 20. 2019th ed. Replication Paper, International Initiative for Impact Evaluation (3ie). https://doi.org/10.23846/RPS0020.
  • Wood, B. D. K., and M. Dong. 2015. “Recalling Extra Data: A Replication Study of Finding Missing Markets.” Replication Paper 5. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/recalling-extra-data-replication-study-finding-missing.
  • Yu, F., N. A. Hein, and D. S. Bagenda. 2018. “Preventing HIV and HSV-2 Through Improving Knowledge and Attitudes a Replication Study of a Multicomponent Intervention in Zimbabwe.” Replication Paper 16. Replication Paper, International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/replication-papers/preventing-hiv-and-hsv-2-through-improving-knowledge.