913
Views
0
CrossRef citations to date
0
Altmetric
Economics

Using routine data for long term impact evaluation: Methodological reflections from a complex health system intervention in a low-income context

ORCID Icon
Article: 2153448 | Received 15 Jun 2022, Accepted 25 Nov 2022, Published online: 19 Dec 2022

Abstract

: There is increased availability of routine secondary data globally in many health facilities due to technological advancement and the growth in electronic records. Such data present the opportunity for their use when evaluating health system intervention programs. Unfortunately, the progress is rather slow as many researchers and development partners rely on survey data with the associated costs and inconveniences. Admittedly, using routine secondary data for impact evaluation can be tedious. There is a need for a step-by-step guide for early career researchers who may be interested in such an approach. This paper provides a methodological reflection based on a graduate study experience. It highlights the necessary methodological concepts and exemplifies the same using sample results. Hopefully, the study will guide early career researchers desiring to carry out similar studies to think through each step deeply and consciously carefully. It is intended as a methodological guide rather than an empirical work.

1. Introduction

Routine data are increasingly available in many health facilities worldwide due to advances in data storage and processing technologies and the growth in electronic records, which facilitate the collection of institutional routine data (Todd et al., Citation2020). The history and use of electronic health records (EHRs) have evolved significantly over time. Before the early 1990s, there was much scepticism around EHRs due to high costs, data entry errors and poor physicians’ acceptance. By the early 1990s, the development of computer and internet technologies had changed the narrative and initiated web-based EHRs in high-income countries (Evans, Citation2016). EHRs are an established part of the healthcare delivery system that has seen routine health information systems (RHIS) installation and use. They offer the opportunity for public health systems to record facility-level data to guide program planning and management. For surveillance purposes, they are universal in scope, less costly, and available at each health system level (Gimbel et al., Citation2011).

According to the World Health Organization (WHO), RHIS data provide a picture of services delivered in health facilities and the health status of people using the services (WHO, Citation2021). RHIS data are reported regularly (monthly or quarterly) and provide information on a wide range of services across the entire country at all health system levels. It can inform sub-national (e.g., district) planning and resource allocation and monitor geographical inequities of population health targets and outcomes (WHO, Citation2021). Besides being available, recent studies have shown that the quality of such data, even in low resource settings, continues to improve in terms of availability, reliability, and consistency (Gimbel et al., Citation2011) and the frequency of missing data (Muthee et al., Citation2018).

Despite the potential for using routine data for program impact evaluation, the progress is slow. Many researchers and development partners continue to rely on survey data (De Allegri et al., Citation2018; Brocklehurst et al., Citation2013) with the associated costs and inconveniences (Hox & Boeije, Citation2005). Routine data are useful in evaluating long-term program impact (Tung et al., Citation2015; Zombre et al., Citation2017). Recent examples are studies that relied on such data to evaluate the long-term effects of complex health system intervention programs in Africa (Chansa et al., Citation2019; Kuunibe et al., Citation2020).

The above notwithstanding, using routine secondary data for impact evaluation turns out to be rather tedious, requiring careful thinking through many processes and procedures. For example, it is often hard to understand which methods to use to analyze such data (Clarke et al., Citation2019). Before the analysis, there is an entire process of accessing, checking, cleaning, understanding the data and possibly dealing with missing values (Curley et al., Citation2019; Kuunibe et al., Citation2020; Powell et al., Citation2003). Unfortunately, it is difficult to find studies based on a practical experience of these processes. This paper provides a methodological reflection on the processes involved, based on a graduate study experience, from checking data for completeness to final analysis. It aims to improve research practice among early career researchers, especially those in health research who may contemplate using routine data for health system program evaluation. The rest of the paper proceeds as follows: methods—evaluating health system intervention using routine data; the processes involved in using routine secondary data; sample results from a reference graduate study; and conclusion.

2. Materials and methods

2.1. Evaluating health system interventions using routine data

Health system interventions address barriers and constraints at different levels to improve health outcomes (WHO, Citation2007). Such interventions may be educational, policy changes or health promotion campaigns and could be complex, including multiple independent or interacting components (Clarke et al., Citation2019). Evaluating health system interventions is quite daunting, especially where such interventions involve numerous stakeholders and components. In general, the basic consideration in impact evaluation is to deal with the problem of selection bias arising from non-random assignments where it is difficult to find a good comparison group for estimating the so-called counterfactual (Khandker et al., Citation2010). In this basic model [Equationequation (1)]

(1) Hpi=∝θi+πTi+γi(1)

where Hpi is the outcome of interest, θi are observed characteristics of the individual unit and the local environment, Ti represents whether an individual unit participates in the program, π is the coefficient of the impact of the program (to be estimated) and γi is the random error term; selection bias causes participation (Ti) and the error term (γi) to be correlated, leading to biased estimates of the program impact (Khandker et al., Citation2010).

Based on the data structure and underlying assumptions, selection bias can be accounted for using procedures such as regression discontinuity (Cattaneo & Titiunik, Citation2021), difference-in-difference (Wing et al., Citation2018) and propensity score matching (Schulte & Mascha, Citation2018). In the case of routine data, the interrupted time series analysis (ITSA; Lagarde, Citation2012; Linden, Citation2015) represents the best option. Even where the programme being evaluated was randomly assigned, an interrupted time series (ITS) design with a control group represents a robust way of estimating such long-term impacts (Fretheim et al., Citation2015; Michielutte et al., Citation2000). The ITS design, a quasi-experimental design, is based on the assumption that, had the intervention not taken place, there would not be a change in the underlying trend of the outcome. This hypothetical unchanged trend, in the absence of the intervention, given the pre-existing trend, provides the much desired counterfactual based on which the evaluation of the impact is done by examining the change occurring in the post-intervention period (Bernal et al., Citation2017; Shin, Citation2017).

In multiple groups (ITS with a control group), the main identification assumption is that the change in the level and trend in the outcome variables are presumed to be the same for both the intervention and control groups. In such a case, confounding omitted variables affect both groups similarly (Linden, Citation2015); therefore, outcomes in the control group are not expected to diverge from underlying secular trends (McLintock et al., Citation2014). ITS design distinguishes between the effects of time from that of the intervention and, therefore, netting out the impact of the existing trend before the intervention (Devkaran & O’Farrell, Citation2015; Serumaga et al., Citation2011). The addition of a control group would control for history if the intervention or control groups were exposed to the same non-program influences (Michielutte et al., Citation2000). The level and trend in the control group before and after the intervention provide valid estimates of the counterfactual for estimating these differences (Jacob et al., Citation2016).

2.2. Processes in using routine secondary data

2.2.1. Pre-analysis stage

Using secondary data requires one to identify useful data sources; once a research gap has been identified, retrieve the data and evaluate how well the data meet the quality requirements. Most health system intervention programs are implemented over some time (months). At the design stage, provisions are made for baseline and endline surveys, which form the basis for estimation. Instead, project designers may consider using existing data (e.g., routine HMIS) collected before and after the intervention and for analysis. Such data are usually available for discrete-time intervals (say monthly), which offers an opportunity to evaluate the immediate or short-term impacts and long-term impacts. For many secondary datasets, there is an organisation supporting them that welcomes access in addition to publishing the lists of publications that have used the dataset. A vast quantity of secondary data in public health is available, even though locating such data is not always straightforward (Boslaugh, Citation2007).

A typical example is the English Longitudinal Survey of Ageing (ELSA; Steptoe et al., Citation2013). However, aside from being accessible, the issues involved in using such data may not be as simple. Once obtained, it is crucial to subject data to some pre-analysis processes, including cleaning, editing, and most importantly, identifying missing values, understanding why they are missing and taking steps to “fill” the missing values (Dong & Peng, Citation2013; Horton & Kleinman, Citation2007).

Missing values occur due to many reasons and vary by data type. For example, missing data may occur in survey data due to noncoverage, total nonresponse or item nonresponse (Curley et al., Citation2019). In the case of routine data, missing data may arise due to technical (equipment breakdown) or human errors (during data collection and/or entry; Pratama et al., Citation2016). In general, there are two schools of thought regarding handling missing data. One school favours complete case analysis (CCA; Hughes et al., Citation2019) by dropping all individuals or units with missing data on any variable included in the analysis. The other school of thought favours imputation (Dettori et al., Citation2018), which substitutes each value with a reasonable guess and carries out the analysis as if there were no missing values. Each of these has its advantages and disadvantages. CCA is simple and easy to use and therefore is widely used; however, it tends to lower the sample size and lower the study power leading to potentially biased estimates. Imputation ensures the sample size is statistically large and leads to potentially efficient estimates; however, it is difficult to implement (Curley et al., Citation2019; Pratama et al., Citation2016). Not accounting for missing values leads to bias and inefficient estimates (Hughes et al., Citation2019). In recent times, the advocacy is for researchers to impute missing data (Dong & Peng, Citation2013; Horton & Kleinman, Citation2007). However, it is important first to understand the reasons for the missingness.

The reasons for missing data, also called missing mechanism, are classified into threeFootnote1: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). The decision to impute or not depends largely on the missing mechanism. For example, it might be necessary to impute where the mechanism is MAR. For the cases of MCAR or MNAR, there might not be the need to impute. Therefore, when using routine data, often prone to missing values, one must check for the percentage missing, analyse, and identify the missing mechanism. While it is pretty straightforward to calculate the missing percentage, understanding the missing mechanism requires understanding the data generation process.

Unfortunately, most researchers who have employed secondary data do not normally talk about missing values (Schneider et al., Citation2016) and may not impute missing values (Stausberg, Citation2014). As noted earlier, except in the cases where data are missing completely at random (MCAR), the results are very likely to be biased (Arel-Bundock & Pelc, Citation2018).

Also worth noting is that even when data is MAR and therefore needs to be imputed, the procedure used is as important as the decision to impute. Data can be imputed using single or multiple imputation (MI) techniques. Single imputation involves determining a replacement value based on certain rulesFootnote2 such as simple mean, last value observed or worst value observed and replacing this value with all missing data points for the particular variable. On the other hand, MI employs a statistical procedure that narrows the uncertainty around missing values by calculating different rounds of imputations and using the pooled imputed data for analysis (Dettori et al., Citation2018). MI fills in missing data values using a predictive model to respond to the uncertainty of possible values that are to be imputed (Dong & Peng, Citation2013). The predictive model, however, requires that indicators (independent variables) that are used to impute missing values for other indicators (dependent variable(s)) should not contain missing values (StataCorp, Citation2013). While CCA and single imputation are relatively easy and straightforward to implement than multiple imputations, estimates resulting from the latter are less biased and more efficient (Horton & Kleinman, Citation2007).

2.2.2. Analysis stage

Like the pre-analysis stage, several processes must be followed after one is satisfied that the data is good enough for analysis. First, it is important to check the data for outliers, autocorrelation, stationarity and seasonality.Footnote3 These are issues that usually arise when using routine data. OutliersFootnote4 are values that appear unusually far from the central tendency because they are too large or too small and can disturb the results when employing analysis methods such as cluster analysis, time series or meta-analysis (Leys et al., Citation2019). Outliers can be identified by plotting the data on scatter plots and doing a visual inspection. Once identified, appropriate procedures should be employed to eliminate or manage such outliers. Similarly, one can inspect the data for autocorrelation by using the autocorrelation (ACF) and partial autocorrelation (PACF) plots. Finally, the Dicky–Fuller (DF) test could be used to examine data for trend stationarity by decomposing the trend into systematic, periodic, and random components (Shin, Citation2017). A further examination of the PACF for a pattern is necessary to determine the presence or otherwise of seasonality (regular patterns).

As stated in Section 2.1, the ITSA is the appropriate method to evaluate the long-term impact of health system interventions. It is usually implemented using the segmented regression model (Bernal et al., Citation2017; Linden, Citation2018). The specific representation of the regression modelFootnote5 depends on whether there is a control group or not (single or multiple groups). In addition, there are usually multiple interventions at different times and sometimes overlapping in interventions implemented in most LMICs. Accounting for similar interventions, which have the potential to blur the impact of the intervention being evaluated, alters the multiple group representation slightly. EquationEquation (2) is a typical multiple group case that accounts for a parallel intervention.

(2) Yt=θ0+θ1Tt+θ2Xt+θ3XtTt+θ4Z+θ5ZTt+θ6ZXt+θ7ZXtTt+θ8G+θ9GTt+θ10GXt+θ11GXtTt+T2+μt(2)

In Equationequation (2), Yt is the outcome being evaluated, and the following coefficients denote the effect of the intervention being assessed; θ0 is the pre-intervention level in the control group, Tt is the pre-intervention trend in control facilities, Xt is the immediate impact in the control group and XtTt is a post-intervention trend in the control group. Z is the difference in pre-intervention level, ZTt is the difference in pre-intervention trend, ZXt is the difference in level in the period immediately following the intervention and ZXtTt is the difference in post-intervention trend. The remaining coefficients denote the effect of the intervention being evaluated together with the other parallel intervention; G is the pre-intervention level in the control group, GTt is the pre-intervention trend in the control group, θ10 GXt is the difference in the level immediately following the implementation of the parallel intervention, GXtTt is the difference in post-intervention trend. The coefficients θ1, θ2, θ3, θ4, θ5, θ6, θ7, θ8, θ9, θ10, and θ11, are the effects of the corresponding variables. Before final analysis, it is also important to decide on the model’s functional form (between linear, log-linear and quadratic). This can be done using the Akaike information criterion (AIC) and Bayesian information criterion (BIC; Shin, Citation2017). If the quadratic functional form is chosen, the coefficient is the effect of the quadratic term. It is also important to carry out the analysis using the imputed data for the main analysis and the original data for sensitivity analysis.

3. Example results and reflectionsFootnote6

3.1. Obtaining ethical clearance and accessing data

As alluded to earlier, an important step after conceiving the research idea and defining a research problem is where and how to obtain the appropriate data. This challenge can present a major obstacle, even leading to the abandonment of the research idea altogether, if it becomes impossible to identify or obtain the right data source. However, my experience in this regard as a graduate student was much easier than it would usually be since I got the full support of my host institution.Footnote7 My supervisor was particularly interested in the research idea and the methods we wanted to employ and devoted time to follow up bilaterally with the authorities (in a different country) responsible for the data. This assistance made it much easier (I imagine) to obtain official permission to access and use the data and receive ethical clearance from my university. One important lesson here is that doing research at whatever level (even as a student) is a collaborative endeavour that must take on board stakeholders’ interests, especially those with various roles. However, it is not to say that one should abandon original thinking. It is to make the point that one will need support from various stakeholders and partners to facilitate processes that are not necessarily technical, such as ethical issues and access to data. While ethical clearance might be a straightforward matter because it is more of a scientific process, access to a large routine data set, usually at national levels, is more political. It may require bigger voices than that from a (tiny) individual student who is probably unknown to anyone.

3.1.1. Understanding, managing and assessing the quality of your data

This stage is the most important and challenging aspect of routine data, yet this effort is often unnoticed.Footnote8 At this stage, it is like attempting a journey without a clear idea of what the road looks like. It is important to be open-minded about the possibility of discontinuing the study if the data quality is poor. Arriving at such a possible conclusion or the desirable alternative of continuing your research journey means that one must first understand the data thoroughly, identify all gaps, sit back (read around the gaps) and decide what is scientifically possible or not. In my case, it took nearly 3 months to complete data retrieval due to some practical challenges. I retrieved data from the National Health Information System database in Burkina Faso (Ministère de la Santé Burkina Faso, Citation2015). The language was French (and I don’t speak French), so I first needed to get each of the names of the indicators (initially about 23) translated from French to English. This process allowed me to correctly identify each and select the correct dialogue boxes to enable the data to be merged for all health facilities and months. I again got support from a colleague graduate student with a French background.

After retrieving the data, we had to double-check that the names were correctly captured (in French to reflect the desired indicators) before we renamed them with the corresponding English names. After this, the next step was to “play” with the data, a familiarisation approach. It involves exploring the data, seeking to understand whether or not the numbers for each indicator were plausible (given reports about such indicators in the literature) specific to the data context. Based on the data’s completeness and research objectives, we decided on 10Footnote9 indicators out of the 23 initially downloaded. Data were checked for the level of missingness per indicator (see Table for three of these indicators).

Table 1. Missing data by service indicators

From Table , one can see that percentage missing ranged from 7.7 to 18.0. Since routine data from low-income (i.e., Burkina Faso) contexts are usually not without errors (Ahanhanzo et al., Citation2014; Gimbel et al., Citation2011), these missing levels were not surprising. We were transparent about the issue, unlike in several instances where missing data were not reported (Tsvetanova et al., Citation2021). We then sought to understand the possible underlying reasons for which data could be missing.

It is important to consider the data generation process and missingness patterns in determining the missing data mechanism. From the staff of the local institution responsible for the database, the data generation process started at the facility level with a manual process. The process involved filling in the register book, counting and aggregating patient numbers every month, and transmitting counts to the district health directorate via paper forms. The data is then entered into the DHIMS-2 database and digitalized. This process can result in errors affecting data quality. For example, the staff at the facility level may fail to enter data for particular patients, days, or even longer periods due to either workload or forgetfulness, resulting in missing data for that patient/period. Errors may also occur in counting patients for monthly totals and in transcribing those totals into the reporting forms. Also, while transporting data from the facility to the district, there is a possibility that reporting forms are lost. For these scenarios, the reason for the missing pattern can be assumed to be rather random (MAR). It was also observed that for different facilities, data were missing for different unrelated months, irrespective of the district. The missing data mechanism was classified as MAR—the missing probability was independent of the service indicators (Pratama et al., Citation2016)—and proceeded to impute for missing observation. Multiple imputations were used in line with the earlier argument in Section 2.2. Given that the service volumes are counts, the Poisson model was used to predict values for missing cases. Five rounds of imputations were done in Stata version 15 using the mi impute Poisson command (StataCorp, Citation2013) and the average values used for the analysis.

Thus far, it is important to retrospectively understand how the data were generated and how that process could lead to missing data. This process may require discussions with the state agencies responsible for collecting the data.

3.1.2. Preliminary checks for stationarity and autocorrelation

In line with standard procedures when analyzing time series data (Shin, Citation2017), we conducted preliminary checks on the data before doing the main analysis. I compared the data pattern for intervention and control facilities, so each facility group’s checks were done separately (intervention vs control). The autocorrelation test revealed the presence of autocorrelation for all indicators, except for delivery care. The DF test for stationarity showed no significant trend for the systematic and random components. Still, it revealed the presence of a statistically significant trend in the periodic element for all indicators (see Table ). To confirm that periodic trend is not signalling seasonality, a further examination of the PACF was done. There was no regular movement in pattern to support the presence of seasonality.

Table 2. Stationarity test

The above notwithstanding, and to improve on the overall stationarity of the periodic component of the data, the moving average (MA) smootheningFootnote10 using two leads, the current value and two lags (Shin, Citation2017) was implemented in Stata 15 using tssmooth ma, option window (2,1,2). The smoothening was done separately for the periods before and after each intervention not to smoothen out the intervention effects. Figure presents the trend graphs for each indicator’s unadjusted (blue line) and adjusted series (red line), depicting a relatively more stationary series after implementing the MA smoothening. Subsequent regression analysis was based on the adjusted series (red line) since the non-stationary version (line blue) can result in biased estimates.

Figure 1. Trend graphs for services indicators.

Figure 1. Trend graphs for services indicators.

Another important matter to consider in regression analysis is the model functional form since choosing the wrong functional form will also lead to biased estimates. Usually, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are employed (when the models are non-nested). The test results (see Table ), using the minimum values of both the AIC and the BIC, showed that the quadratic functional form would fit best, except for delivery care. AIC and BIC values detected only a small difference between this indicator’s linear and quadratic functional form. Consequently, I employed the quadratic functional form.

Table 3. Selection of model functional form

Since autocorrelation was detected, the study used the generalized linear model (GLM) to adjust for the presence of autocorrelation (Greene, Citation2003; Shin, Citation2017; Wooldridge, Citation2010). Subsequently, the unadjusted and adjusted Durbin–Watson (DW) statistics were calculated to ensure correction for autocorrelation. Also, following the test for functional form, the quadratic functional form was implemented in analyzing each indicator.

3.1.3. Main results

Table (imputed data) and (original data) present results from the ITSA analysis, respectively. Corresponding graphs are in Figures . The unadjusted DW statistics indicate the presence of autocorrelation, while the adjusted DW statistics indicate its absence. Results from the imputed data suggest that at baseline (the period before January 2014), there was no significant difference between control and intervention facilities in levels of service provision for all three indicators. Similarly, before January 2014, there was no significant difference in pre-intervention trends for all three indicators. The baseline situation was similar to the results from the unimputed data.

Table 4. GLS-based estimates (imputed data)

Table 5. GLS-based estimates (unimputed data)

In the post-intervention comparison, the study examined the effect of PBF in the intervention compared to control facilities. The model estimates for the imputed data indicate that the intervention produced modest increases in the trend of service provision for antenatal and postnatal care but not for normal deliveries. There was no significant immediate (level) change in the intervention compared to control facilities for all three indicators. The results showed that PBF led to an increase of 0.4% in the month-to-month provision of service for antenatal care and 0.2% for postnatal care but no effect on normal deliveries. When considering the unimputed data, PBF did not affect deliveries and antenatal care. However, there was a decline in the level of postnatal care provision immediately following its implementation.

The graphical presentations in Figures make the picture clearer for imputed and unimputed data results. It can be seen that the data points are more normally distributed for all three indicated for the imputed data (Figure ) compared to the unimputed data (Figure ), which seem to cluster around the trend lines. Given one important requirement for regression analysis is that the data must be normally distributed, the estimates from the imputed data are more likely to be valid, justifying that it was important to have imputed for missing values.

Figure 2. Interrupted time series graphs (imputed data).

Figure 2. Interrupted time series graphs (imputed data).

Figure 3. Interrupted time series graphs (unimputed data).

Figure 3. Interrupted time series graphs (unimputed data).

4. Concluding remarks

This study contributes to the debate on handling routine data when using them for long-term evaluation of health program intervention. Routine data quality dimensions such as accuracy, correctness and completeness, among others, have been identified (Smith et al., Citation2018). What is not clear from an experiential perspective is what to do when there are gaps in these quality dimensions. As a result, many early career researchers (and research students) are usually at a loss regarding what to do when using routine data. In particular, the debate regarding whether to impute or not to impute missing data (Dettori et al., Citation2018; Hughes et al., Citation2019) is inconclusive. Unlike other studies (Hughes et al., Citation2019), this study emerges from an experiential point of view and argues that when the data missing mechanism is MAR, multiple imputations produce more valid results than using CCA (original data). The experiential approach makes the discussion more reflexive. It brings out step-by-step procedures and examples for early career researchers to follow when using multi-indicator routine data for long-term impact evaluation. It is hoped that the study will guide such researchers desiring to carry out similar studies to carefully think through each step deeply and consciously without stumbling several times before discovering what to do. Therefore, the study is intended as a methodological guide rather than an empirical work. The examples provided in the results section are from empirical research (Kuunibe, Citation2020) but should be related to the steps outlined in earlier sections to make meaning.

Acknowledgements

I acknowledge the Government of Ghana and the German Academic Exchange Services (DAAD) for awarding me a joint scholarship for my graduate studies in German; the guidance and contribution from members of the Health Economics and Health Financing Research Group of the Heidelberg Institute of Public Health during my graduate studies, and the insightful comments and suggestions from Prof Paul Kwame Nkegbe and Prof Haruna Issahaku, both of the University for Development Studies, Ghana.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Notes

1. For detailed explanation, see Hughes et al. (Citation2019).

2. For details, see Dettori et al. (Citation2018).

3. For an elaborate explanation, see Shin (Citation2017).

4. For details regarding how to classify, detect and manage outliers, see Leys et al. (Citation2019).

5. For details, see Linden (Citation2015).

6. (Kuunibe et al., Citation2020).

7. Health Economics and Health Financing Group, Heidelberg Institute of Global Health, University of Heidelberg, Germany.

8. The information presented here is an oversimplification of the process, for want of space.

9. In this study, I present examples from three of these indicators.

10. A remedial measure employed to attain more stationery trend before employing a regression model. For more details, see Shin (Citation2017).

References

  • Ahanhanzo, Y. G., Ouedraogo, L. T., Kpozèhouen, A., Coppieters, Y., Makoutodé, M., & Wilmet-Dramaix, M. (2014). Factors associated with data quality in the routine health information system of Benin. Archives of Public Health, 72(1), 1–14. https://doi.org/10.1186/2049-3258-72-25
  • Arel-Bundock, V., & Pelc, K. J. (2018). When can multiple imputation improve regression estimates? Political Analysis, 26(2), 240–245. https://doi.org/10.1017/pan.2017.43
  • Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: A tutorial. International Journal of Epidemiology, 46(1), 348–355. https://doi.org/10.1093/ije/dyw098
  • Boslaugh, S. (2007). An introduction to secondary data analysis. Secondary data sources for public health: A practical guide (pp. 2–10). Cambridge University Press.
  • Brocklehurst, P., Price, J., Glenny, A. M., Tickle, M., Birch, S., Mertz, E., & Grytten, J. (2013). The effect of different methods of remuneration on the behaviour of primary care dentists. Cochrane Database Syst Rev(11), Cd009853. https://doi.org/10.1002/14651858.CD009853.pub2
  • Cattaneo, M. D., & Titiunik, R. (2021). Regression discontinuity designs. arXiv preprint arXiv, 2108, 09400. https://doi.org/10.48550/arXiv.2108.09400
  • Chansa, C., Mukanu, M. M., Chama-Chiliba, C. M., Kamanga, M., Chikwenya, N., Bellows, B., & Kuunibe, N. (2019). Looking at the bigger picture: Effect of performance-based contracting of district health services on equity of access to maternal health services in Zambia. Health Policy and Planning, 35(1),36–46. %J Health Policy and Planning. https://doi.org/10.1093/heapol/czz130
  • Clarke, G. M., Conti, S., Wolters, A. T., & Steventon, A. (2019). Evaluating the impact of healthcare interventions using routine data. BMJ, 365. https://doi.org/10.1136/bmj.l2239
  • Curley, C., Krause, R. M., Feiock, R., & Hawkins, C. V. J. U. A. R. (2019). Dealing with missing data: A comparative exploration of approaches using the integrated city sustainability database. 55(2), 591–615. https://doi.org/10.1177/1078087417726394
  • De Allegri, M., Lohmann, J., & Schleicher, M. (2018). Results-based financing for health: Impact evaluation in Burkina Faso. https://www.rbfhealth.org/sites/rbf/files/documents/Burkina-Faso-Impact-Evaluation-Results-Report.pdf
  • Dettori, J. R., Norvell, D. C., & Chapman, J. R. (2018). The sin of missing data: Is all forgiven by way of imputation? 8(8), 892–894. https://doi.org/10.1177/2192568218811922
  • Devkaran, S., & O’Farrell, P. N. (2015). The impact of hospital accreditation on quality measures: An interrupted time series analysis. BMC Health Services Research, 15(1), 137. https://doi.org/10.1186/s12913-015-0784-5
  • Dong, Y., & Peng, C. Y. (2013). Principled missing data methods for researchers. Springerplus, 2(1), 222. https://doi.org/10.1186/2193-1801-2-222
  • Evans, R. S. (2016). Electronic health records: Then, now, and in the future. Yearbook of Medical Informatics, 25(S 01), S48–S61. https://doi.org/10.15265/IYS-2016-s006
  • Fretheim, A., Zhang, F., Ross-Degnan, D., Oxman, A. D., Cheyne, H., Foy, R., Goodacre, S., Herrin, J., Kerse, N., McKinlay, R. J., Wright, A., & Soumerai, S. B. (2015). A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation. Journal of Clinical Epidemiology, 68(3), 324–333. https://doi.org/10.1016/j.jclinepi.2014.10.003
  • Gimbel, S., Micek, M., Lambdin, B., Lara, J., Karagianis, M., Cuembelo, F., Gloyd, S. S., Pfeiffer, J., & Sherr, K. (2011). An assessment of routine primary care health information system data quality in Sofala Province, Mozambique. Population Health Metrics, 9(1), 1–9. https://doi.org/10.1186/1478-7954-9-12
  • Greene, H. W. (2003). Econometric Analysis (Fifth) ed.). Pearson Education Inc.
  • Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician, 61(1), 79–90. https://doi.org/10.1198/000313007x172556
  • Hox, J. J., & Boeije, H. R. (2005). Data collection, primary versus secondary. Amsterdam: Elsevier. https://doi.org/10.1016/B0-12-369398-5/00041-4
  • Hughes, R. A., Heron, J., Sterne, J. A., & Tilling, K. J. I. J. O. E. (2019). Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. 48(4), 1294–1304. https://doi.org/10.1093/ije/dyz032
  • Jacob, R., Somers, M. A., Zhu, P., & Bloom, H. (2016). The validity of the comparative interrupted time series design for evaluating the effect of school-level interventions. Evaluation Review, 40(3), 167–198. https://doi.org/10.1177/0193841x16663414
  • Khandker, S. R., Koolwal, G. B., & Samad, H. A. (2010). Handbook on impact evaluation: Quantitative methods and practices. World Bank Publications.
  • Kuunibe, N. (2020). Using routine panel and time series data to assess program impact in low–and middle-income countries: The case of performance-based financing in rural Burkina Faso. Oxford Academic Journals.
  • Kuunibe, N., Lohmann, J., Hillebrecht, M., Nguyen, H. T., Tougri, G., & De Allegri, M. (2020). What happens when performance‐based financing meets free healthcare? Evidence from an interrupted time‐series analysis. Health Policy and Planning, 35(8),906–917. %J Health Policy and Planning. https://doi.org/10.1093/heapol/czaa062
  • Lagarde, M. (2012). How to do (or not to do) … Assessing the impact of a policy change with routine longitudinal data. Health Policy and Planning, 27(1), 76–83. https://doi.org/10.1093/heapol/czr004
  • Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration. International Review of Social Psychology, 32(1). https://doi.org/10.5334/irsp.289
  • Linden, A. (2015). Conducting interrupted time-series analysis for single- and multiple-group comparisons. The Stata Journal, 15(2), 480–500. https://doi.org/10.1177/1536867X1501500208
  • Linden, A. (2018). Using forecast modelling to evaluate treatment effects in single-group interrupted time series analysis. J Eval Clin Pract. https://doi.org/10.1111/jep.12946
  • McLintock, K., Russell, A. M., Alderson, S. L., West, R., House, A., Westerman, K., & Foy, R. (2014). The effects of financial incentives for case finding for depression in patients with diabetes and coronary heart disease: Interrupted time series analysis. BMJ Open, 4(8), e005178. https://doi.org/10.1136/bmjopen-2014-005178
  • Michielutte, R., Shelton, B., Paskett, E. D., Tatum, C. M., & Velez, R. 2000. Use of an interrupted time-series design to evaluate a cancer screening program. Health Education Research 15: 615–623. http://her.oxfordjournals.org/content/15/5/615.full.pdf 5
  • Ministère de la Santé Burkina Faso. (2015). Metadonnees des Indicateurs du systeme National d’information Sanitaire (SNIS). http://www.cns.bf/IMG/Metadonnees/Meta_donnees_SNIS.pdf
  • Muthee, V., Bochner, A. F., Osterman, A., Liku, N., Akhwale, W., Kwach, J., Onyango, F., Odhiambo, J., Onyango, F., Puttkammer, N., & Prachi, M. (2018). The impact of routine data quality assessments on electronic medical record data quality in Kenya. PLoS One, 13(4), e0195362. https://doi.org/10.1371/journal.pone.0195362
  • Powell, A., Davies, H., & Thomson, R. (2003). Using routine comparative data to assess the quality of health care: Understanding and avoiding common pitfalls. Quality and Safety in Health Care, 12(2), 122–128. https://doi.org/10.1136/qhc.12.2.122
  • Pratama, I., Permanasari, A. E., Ardiyanto, I., & Indrayani, R. (2016, 24-27 October.). A review of missing values handling methods on time-series data. Paper presented at the 2016 international conference on information technology systems and innovation (ICITSI) (pp. 1-6). Institute of Electricals and Electronics Engineers (IEEE). https://doi.org/10.1109/ICITSI.2016.7858189
  • Schneider, A., Donnachie, E., Tauscher, M., Gerlach, R., Maier, W., Mielck, A., Linde, K., & Mehring, M. (2016). Costs of coordinated versus uncoordinated care in Germany: Results of a routine data analysis in Bavaria. BMJ Open, 6(6), e011621. https://doi.org/10.1136/bmjopen-2016-011621
  • Schulte, P. J., & Mascha, E. J. (2018). Propensity score methods: Theory and practice for anesthesia research. Anesthesia & Analgesia, 127(4), 1074–1084. https://doi.org/10.1213/ANE.0000000000002920
  • Serumaga, B., Ross-Degnan, D., Avery, A. J., Elliott, R. A., Majumdar, S. R., Zhang, F., & Soumerai, S. B. (2011). Effect of pay for performance on the management and outcomes of hypertension in the United Kingdom: Interrupted time series study. BMJ, 342(jan25 3), d108. https://doi.org/10.1136/bmj.d108
  • Shin, Y. (2017). Time Series Analysis in the social sciences the fundamentals (1) ed.). University of California Press.
  • Smith, M., Lix, L. M., Azimaee, M., Enns, J. E., Orr, J., Hong, S., & Roos, L. L. (2018). Assessing the quality of administrative data for research: A framework from the Manitoba centre for health policy. Journal of the American Medical Informatics Association, 25(3), 224–229. https://doi.org/10.1093/jamia/ocx078
  • StataCorp, L. P. (2013), Stata multiple imputation reference manual. Release. 13. https://www.stata.com/manuals13/mi.pdf
  • Stausberg, J. (2014). International prevalence of adverse drug events in hospitals: An analysis of routine data from England, Germany, and the USA. BMC Health Services Research, 14(1), 1–9. https://doi.org/10.1186/1472-6963-14-125
  • Steptoe, A., Breeze, E., Banks, J., & Nazroo, J. (2013). Cohort profile: The English longitudinal study of ageing. International Journal of Epidemiology, 42(6), 1640–1648. https://doi.org/10.1093/ije/dys168
  • Todd, O. M., Burton, J. K., Dodds, R. M., Hollinghurst, J., Lyons, R. A., Quinn, T. J., … Conroy, S. (2020). New horizons in the use of routine data for ageing research. Age and Ageing, 49(5), 716–722. https://doi.org/10.1093/ageing/afaa018
  • Tsvetanova, A., Sperrin, M., Peek, N., Buchan, I., Hyland, S., & Martin, G. P. (2021). Missing data was handled inconsistently in UK prediction models: A review of method used. Journal of Clinical Epidemiology, 140, 149–158. https://doi.org/10.1016/j.jclinepi.2021.09.008
  • Tung, Y. C., Chang, G. M., & Cheng, S. H. (2015). Long-term effect of fee-for-service-based reimbursement cuts on processes and outcomes of care for stroke: Interrupted time-series study from Taiwan. Circulation: Cardiovascular Quality and Outcomes, 8(1), 30–37. https://doi.org/10.1161/circoutcomes.114.001086
  • WHO. (2007). Health System Strengthening Interventions: Making the Case for Impact Evaluation. https://www.who.int/alliance-hpsr/resources/alliancehpsr_briefingnote2.pdf
  • WHO. (2021). Toolkit for Analysis and Use of Routine Health Facility Data: General Principles. https://www.who.int/data/data-collection-tools/health-service-data/toolkit-for-routine-health-information-system-data/modules
  • Wing, C., Simon, K., & Bello-Gomez, R. A. (2018). Designing difference in difference studies: Best practices for public health policy research. Annual Review of Public Health, 39(1), 453–469. https://doi.org/10.1146/annurev-publhealth-040617-013507
  • Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT press.
  • Zombre, D., De Allegri, M., & Ridde, V. (2017). Immediate and sustained effects of user fee exemption on healthcare utilization among children under five in Burkina Faso: A controlled interrupted time-series analysis. Social Science & Medicine, 179, 27–35. https://doi.org/10.1016/j.socscimed.2017.02.027