335
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Estimating poverty among refugee populations: a cross-survey imputation exercise for Chad

, ORCID Icon, &
Pages 94-113 | Received 22 Nov 2022, Accepted 04 Jan 2024, Published online: 09 Feb 2024

ABSTRACT

Household consumption surveys do not typically offer poverty estimates for refugees. We test the performance of a recently developed cross-survey imputation method to estimate poverty for a sample of refugees in Chad, combining survey and administrative data collected by the United Nations High Commissioner for Refugees (UNHCR). We find the imputed poverty rates are not statistically different from the poverty rates obtained directly from the survey consumption data. This result is robust to different model specifications, varying poverty lines, and assumptions of the error terms. Targeting results based on the imputed poverty estimates also outperform common targeting methods, such as proxy means tests and the current targeting method used by humanitarian organizations in Chad. Replicating this approach in at least some of the 122 other countries currently using UNHCR administrative data could help address data gaps and provide much-needed estimates to effectively respond to forcibly displaced crises.

JEL CLASSIFICATION:

Introduction

The UN General Assembly’s Sustainable Development Goal (SDG) 1 – to end poverty in all its forms by 2030 – explicitly pledges that ‘no one will be left behind’. To achieve this goal, accurate poverty measurement is essential, which typically requires the availability of high-quality household consumption surveys.Footnote1 It is equally important for these surveys to be inclusive and cover marginal populations, such as refugees and Internally Displaced Persons (IDPs). Unfortunately, household consumption surveys rarely include forcibly displaced populations, despite the fact that these populations are among the most vulnerable and deprived. They typically lack fundamental rights such as freedom of movement and the right to work, have eroded human and physical capital, and face more frequent shocks than surrounding host communities.

This is a significant and growing challenge, particularly in Sub-Saharan Africa. The global number of forcibly displaced persons almost doubled from 43.3 million in 2009 to 82.4 million in 2020. Among them, there are 26.4 million refugees, 4.1 million asylum seekers, 48 million IDPs (UNHCR, Citation2020), and other displaced populations under the United Nations High Commissioner for Refugees (UNHCR) protection. Almost four out of five refugees live in countries neighboring their place of origin, and some 84% of them live in developing countries. Sub-Saharan Africa hosts around one-third of the world’s refugee population, half of the 10 countries with the largest refugee population relative to the national population, and six of the 10 countries with the largest numbers of IDPs.

Yet, household consumption data for the region are not collected frequently, they are often of low quality and rarely include displaced populations. There can be different reasons for this challenge. One reason is that the region has the highest poverty rates in the world, which are strongly associated with fewer household surveys (Beegle et al., Citation2016). Another reason is that displaced populations are often scattered in different places; this hard-to-reach feature poses technical and logistic issues for survey implementation. A third reason may be the lack of political will or financial resources to cover non-citizens in national surveys. Consequently, measuring poverty among displaced populations in Sub-Saharan Africa is an important undertaking that is severely hampered by missing household consumption data.Footnote2

To address this challenge, imputation methods have been widely employed in economics to fill data gaps where a variable is missing in one survey but available in another survey, which are both representative of the same population. These methods are known as ‘cross-survey’ imputation methods and have been used to estimate poverty across geographical areas in the context of poverty mapping exercises (e.g. Elbers et al., Citation2003), or trends across time periods in the context of repeated cross-section or panel data (e.g. Dang et al., Citation2019). Yet, there is barely any study that applies cross-survey imputation to address the data challenge of missing household consumption surveys in the context of refugees.

To our knowledge, Altındağ et al. (Citation2021) and Dang and Verme (Citation2023) are the two exceptions. Studying the welfare of Syrian refugees in Lebanon using administrative data from humanitarian organizations, Altındağ et al. (Citation2021) find that administrative data curated by humanitarian organizations can be used to estimate refugee household welfare accurately for targeting purposes. On the other hand, Dang and Verme (Citation2023) propose cross-survey imputation methods to estimate poverty for the Syrian refugees living in Jordan, using survey and administrative data provided by the UNHCR. Their findings suggest that cross-survey imputation methods can provide encouraging results.

We make several new contributions in this paper.Footnote3 Beyond offering the first poverty imputation for refugees in Chad, we extend the nascent literature on measuring poverty for refugees in different respects. First, Chad represents an interesting economic and geographic context for examining refugee welfare. The country is a large, landlocked, Sub-Saharan country and among the poorest countries in the world. It hosts a multitude of refugees that come from different countries and are located in remote refugee camps. In these respects, Chad adds a new and completely different context from the two studies discussed above that focus on Syrian refugees in the Middle East, who generally have higher income levels and mostly live outside refugee camps.

Second, we exploit a rich and more diverse set of data than previous studies, which includes registration data, census-type targeting data, and a household consumption survey which were collected at about the same time. This triangulation between several different data sets offers an opportunity to obtain a more nuanced validation of the proposed imputation method. The richer data also allows us to rigorously examine imputation results against different poverty lines, including the food poverty line, the national poverty line, the international poverty line, and various other simulated poverty thresholds. To our knowledge, no previous studies have examined all these various poverty lines as we do.

Finally, in addition to the poverty imputation exercise, we also test the performance of the proposed method for targeting purposes. In particular, we compare the targeting performance of our method with the targeting method currently used in Chad to administer cash assistance to refugees. This exercise helps provide clear policy indications on how to improve targeting for refugees.

The estimation results indicate that the limited set of variables that are provided in the UNHCR registration system predict household consumption (welfare) reasonably well. Estimates from the three sets of data available for the analysis produce similar welfare figures. The current targeting strategy in Chad, which is used jointly by the National Commission on the Welcoming and Resettlement of Refugees (CNARR), UNHCR, and World Food Programme (WFP), is fairly accurate in predicting household welfare. However, our results suggest that this targeting strategy could be further improved by reducing the inclusion and exclusion errors. If these encouraging results are replicated in other contexts, poverty predictions for refugees can be expanded at scale, with good prospects for the improvement of targeted programs.Footnote4

The paper is organized as follows. The second section outlines the country context. The third section presents the data and analytical framework. The estimation results are presented in the fourth section, and the fifth section evaluates the targeting strategy used in Chad and our targeting method in light of the global experience. The final section offers further discussion on data limitations, suggest future directions of research, before concluding.

Country context

Chad is one of the poorest countries in the world. According to the latest household consumption survey administered in 2017–18, 42% of the population fall below the national poverty line (World Bank, Citation2021). The past decade has seen much instability for Chad with negative consequences on household well-being. Per capita Gross Domestic Product (GDP) decreased by 15% between 2015 and 2017, from an average of US$963 in 2015 to US$823 in 2017 (in 2010 purchasing power parity [PPP]). In terms of overall development, Chad ranks 187th of 189 countries on the Human Development Index (World Bank, Citation2021). Due to these challenges, the country struggled to meet many of the Millennium Development Goals (MDGs) in 2015. Barring unforeseen economic growth or large increases in official development assistance, the country appears unlikely to meet many of the Sustainable Development Goals (SDGs) objectives set for 2030.

Despite the current negative economic downturn, Chad continues to host a high number of refugees. In fact, Chad is among the top refugee-hosting countries in the world, ranking as the 10th largest host country for refugees in the world and the 5th largest host country in Sub-Saharan Africa (after Ethiopia, Kenya, Uganda, and the Demographic Republic of Congo, See ). Chad’s refugee population is sizable and represents about 3% of the national population. The number of forcibly displaced persons increased from 474,478 in 2015 to 667,586 as of March 2019, of which about 69% were refugees or asylum seekers.Footnote5 Refugees are much poorer than the host population and face a more severe challenge with food insecurity. The poverty rates for Sudanese and Central African Republic (CAR) refugees are estimated at 79.8% and 83.7%, respectively, compared to that of 70% for host communities (World Bank, Citation2021).

Of the 459,809 current refugees and asylum seekers, the majority (74%) are Sudanese refugees living in the eastern part of Chad, 21% are CAR refugees living in southern Chad, and a smaller number of Nigerian refugees (about 2%) are living in the Lake Chad Basin. The situation is further complicated by the large population of IDPs in the Lake Chad region, which was estimated at 165,313 at the end of 2018 (UNHCR, Citation2019). shows the locations of the refugee camps in Chad.

Map 1. Refugees and IDPs distribution within Chad.

Source: UNHCR, Citation2019
Map 1. Refugees and IDPs distribution within Chad.

Analytical framework and data

In this section, we provide an overview of the analytical framework before describing the data.

Analytical framework

The methodology used in this paper relies on the cross-survey imputation framework that was first introduced by Elbers et al. (Citation2003) to generate poverty maps.Footnote6 Most recently, Dang et al. (Citation2017) built on this literature to propose a model that imposes fewer restrictive assumptions and offers an explicit formula for estimating the poverty rate and its variance. Three new contributions introduced by this study are: (i) it offers a simple variance formula, which is in line with the recent statistical literature; (ii) it can accommodate complex design sampling; and (iii) the framework remains applicable to two surveys with different designs (such as imputing from a household consumption survey into a labor force survey). Finally, the approach allows for different modeling methods, including the standard linear regression model, its variant with a flexible specification of the empirical distribution of error terms, a logit model, and/or a probit model.

Formally, xj is a vector of characteristics that are commonly observed between two surveys, where j indicates survey type, with 1 and 2 being respectively the base survey (which we impute from) and the target survey (which we impute into). The welfare indicator is assumed to be a function of household and individual characteristics (xj):

(1) yj=βjxj+υcj+εj(1)

where yj is the welfare indicator (consumption per capita per month), βj is a vector of parameters, υcj is cluster (c) random effects, and εj is the idiosyncratic error term. We suppress the subscripts for household and individual characteristics for less clutttered notation.

This imputation framework is based on two assumptions. The first assumption (Assumption 1), which is critical for poverty imputation, states that measurement of household characteristics in each sample of data is a consistent measure of the characteristics of the whole population. In other words, it stipulates that the surveys considered are representative of the same target population. In our context, the two surveys represent the same population of refugees, and they were conducted approximatively at the same time. Therefore, the first assumption is satisfied. However, we will conduct means difference tests on the observed overlapping variables between the target data and base data to ensure that this is the case. The second assumption states that changes in xjbetween the data collection periods of the two data sets can capture the change in welfare over the period (Assumption 2). Since the two data sets that we analyze were collected in the same year, Assumption 2 is also satisfied by design.

Under these two assumptions, the imputed welfare is

(2) y21=β1x2+vc1+ε1(2)

where y21 represent the imputed welfare when we apply the estimated parameters (β1) and the estimated distributions of the error terms (υc1 and ε1) from the base survey to the variables (x2) in the target survey.

Since EquationEquation (1) is typically estimated with the standard cluster-effects linear regression model, Dang et al. (Citation2017) propose different imputation methods for poverty estimation. The first method relies on the assumption of the normal distribution for the two error terms (μcj and εj are uncorrelated, as are υcj|xjN0,σμcj and εj|xjN0,σεj). Hereafter, this method is referred to as the normal linear regression model. An alternative method proposed is the empirical error method, which assumes no functional form for these error terms and instead uses their estimated empirical distributions.

Since the estimated parameters are obtained using a different survey from the target survey, we can use simulation to estimate EquationEquation (2) (for a single draw) as follows:

(3) yˆ2,s1=βˆ1 x2+υˆ˜c1,s+εˆ˜1,s(3)

In EquationEquation (3),υˆ˜c1,s, and εˆ˜1,s represent the sth random draw (simulation) from their estimated distributions, using the base survey, for s = 1, … , S.

The imputed poverty rate (P2ˆ) and its variance (VP2ˆ) in the target survey are then estimated as:

  1. (4) Pˆ2=1Ss=1SPyˆ2,s1z1(4)

  2. (5) VPˆ2=1Ss=1SV(Pˆ2,s|x2)+V1Ss=1SPˆ2,s|x2(5)

where P(.) is the probability function that estimates the poverty rate in the population for each simulation and z1 is the poverty line in the base survey. In EquationEquation (5), Pˆ2,s is similarly defined as follows Pˆ2,s=P(yˆ2,s1z1).Footnote7

These poverty estimators offer consistent estimates of the parameters of interest. Furthermore, in terms of prediction accuracy, these estimators outperform the traditional proxy means testing technique, which typically omits the error terms υc1+ε1 and results in biased estimates of the welfare indicator (Dang et al., Citation2019). To provide further robustness check, we also employ two alternative modelling methods – the probit model and the logit model. These models place more restrictive assumptions on the error term but estimate poverty figures directly (i.e. EquationEquation (4) and EquationEquation (5)) instead of estimating consumption expenditure first and subsequently obtain poverty estimates using the predicted consumption expenditure.

Data

As part of its mandate to protect displaced persons in host countries, UNHCR collects data to monitor the welfare of refugees and other populations of interest and to deliver assistance and services. In this study, we use three sets of data collected by the UNHCR and its partners (). The first one is the ProGres data set, which is the UNHCR’s registration system covering all refugees or asylum seekers requiring assistance. The ProGres data set is a live instrument that is continuously updated as new refugees arrive, or as existing refugees contact the UNHCR. The data that we use were extracted at the end of December 2017. This set of data contains socioeconomic variables (such as household size, marital status, gender, age, country of origin, and region of residence) but has no consumption or expenditure data. This data set can therefore be considered the ‘census’ of refugees.

Table 1. Summary of data.

The second set of data, the Targeting data, is also a census-like data set for refugees living in Chad. The main objectives of this data set are to fill knowledge gaps on refugee livelihoods and the levels and differences of vulnerability in refugee households, and to categorize refugees into different income levels for assistance (i.e. including cash and food). Besides these objectives, the Targeting data aim to identify factors that can facilitate refugee self-reliance. Consequently, this data set is based on a mixed methods approach, including qualitative and quantitative methods. The first step involves conducting focus group discussion with refugee leaders, women’s organizations, and youth associations, to identify the wealth characteristics and key challenges that are specific to different ages and genders. The second step is to implement a sample survey across camps to confirm the wealth characteristics that were identified by refugees in the first step. Based on the outcomes of the first two steps, a detailed quantitative survey designed to capture wealth characteristics is administered to all the refugee households.

The Targeting data include all the Sudanese, Central African, and Nigerian refugees living in Chad. The data were collected June 17th–15 July 2017, and cover 19 refugee sites and refugees living in nine host villages. After the data are collected, a statistical model, which takes into account household welfare, is used to classify households into four socioeconomic groups (very poor, poor, average, and better off). For the variables that are relevant for this study, this data set contains demographic variables (household size, gender, age, country of origin, and region of residence), variables for asset and animal ownership, and variables reflecting shock-coping strategies. Similar to the ProGres data, the Targeting data do not collect information on consumption or expenditure; however, the Targeting data collect information on wealth.

The last data set is the Post-Distribution Monitoring (PDM) data, which is from a sample survey that covers similar themes as the Targeting data set. The PDM data set, which was collected in 2017 by the World Food Program (WFP), aims to provide a better understanding of how refugees use food assistance and contains data on consumption and expenditure. The PDM has a two-stage stratified random sample design, where the first stage includes the selection of camps and the second stage the selection of households. The different camps are stratified in three zones: (i) North East (Ourecassoni, Amnaback, Iridimi Touloum); (ii) Centre-East (Goz Amir, Djabal, Gaga, Teguine, Bredjing, Farchana); and (iii) South (Amboko, Dossey, Gondjé, Belom, Moyo) (see ). In addition, the sampling takes into account the kind of humanitarian assistance that is provided to refugees (in-kind, food voucher, or cash).

Importantly, the PDM includes two consumption aggregates measuring monthly total consumption and monthly food consumption, using retrospective questions with varying recall periods depending on the item considered (from seven days to one year). The consumption aggregate is compiled by aggregating the different food and non-food items, including expenditures on education, health, durable assets, and rent. For this study, we consider two welfare indicators from the PDM data set. The first is the household total consumption expenditure per capita per month, and the second is the household food consumption per capita per month.Footnote8

For poverty imputation purposes, we construct three data sets from the ProGres, Targeting, and PDM data. The first, which we refer to as ‘ProGres 2’ is obtained by appending the ProGres data to the PDM data (i.e. pooling the two datasets together). As the ProGres and PDM data share only demographic variables, ProGres 2 contains the demographic variables for all observations, although only the observations from the PDM data have consumption expenditure. As such, the ProGres 2 dataset allows us to estimate the welfare model using the PDM data and subsequently use this model to impute household consumption using the ProGres dataset.

The second constructed data set, ‘Targeting 2’, is obtained by appending the Targeting data to the PDM data. Therefore, the Targeting 2 dataset contains demographic variables, asset and animal ownership, and coping strategies variables as well as consumption data. The last constructed data set, ‘ProGres Targeting’, is obtained by first merging the ProGres and Targeting data (which we can match 72% of the observations) and subsequently appending these data to the PDM data. This data set is the most complete in terms of variables.

The motivation behind constructing these three sets of data is to check whether the different sources of data as well as the different sets of variables generate different poverty figures, such that we can determine the set of variables that best predicts poverty. To ensure comparability across the three data sets, we restrict the analysis to 16 (of the 19) refugee sites, because the PDM data cover only 16 sites. Consequently, this study covers the refugees in Chad that come from the Central African Republic and Sudan only.

Estimation results

In this section, we test the model assumptions and present the estimation results.

Testing model assumptions

As a first step, we check whether our data sets are representative of the same underlying population (Assumption 1) by performing means difference tests across key predictors. Since the PDM data is a subsample of the Targeting or ProGres data sets, we use a statistical test for partially overlapping samples. The results, shown in in the Annex, indicate that all the variables are not significantly different in terms of means and provide supporting evidence that the two samples are representative of the same population.Footnote9

To evaluate the performance of the welfare estimation model, we consider three models. Model 1 includes demographic and geographic variables (region of residence and country of origin). This is the most parsimonious model and uses the variables that are readily available in the ProGres data set. Model 2 adds to Model 1 variables related to animal and asset ownership. Model 2 is richer than Model 1, but it is more demanding in terms of the control variables, which may also be less reliable or more likely to be missing in the census data. Model 3 adds to Model 2 variables measuring coping strategies. To test for multicollinearity, reports the variance inflation factor (VIF) for the different models. It shows that no variable has a VIF that is over 5, which is far lower than the rule-of-thumb value of 10 given for harmful collinearity by Kennedy (Citation2008). We conclude that multicollinearity is not an issue for any of the models considered.

Next, we test the out-of-sample performance and possible overfitting of the three models, using the PDM data and the root mean square error (RMSE) and mean absolute error (MAE) as performance functions. To do so, the data set is split into five equal folds. In the first iteration, the first fold is used to test the model, and the rest are used to train the model. In the second iteration, the second fold is used as the testing set, while the rest serve as the training set. This process is repeated until each of the five folds has been used as the testing set. The performance function is obtained as the mean across the five iterations.

For the food consumption aggregate, the three models have similar measures of goodness-of-fit for both indicators (). Model 1’s RMSE is 0.55, Models 2 and 3’s RMSE is 0.54. For the MAE, Models 1 and 3 have a value of 0.42, whereas Model 2 has an RMSE of 0.41. When we turn to the overall consumption aggregate, we note the differences between the three models. The RMSE values range from 0.53 to 0.58, with Model 3 and Model 1 having the smallest and highest RMSE, respectively. The MAE is quite similar across the three models, within a range from 0.39 (Model 3) to 0.43 (Model 1). These results suggest that no model consistently outperforms the other models.

Table 2. Out of sample model performance, individual level.

Estimation results

applies the model to the three constructed data sets described earlier (ProGres 2, Targeting 2, and ProGres Targeting data), using the normal linear regression model and the empirical error model. We also show the results using two poverty lines in this table: (i) a US$ 1.9 -a- day poverty line in 2011 PPP, which represents the international poverty line for extreme poverty (panel A); and (ii) the national poverty line, which corresponds to around US$ 2.6 (World Bank, Citation2013) (panel B).Footnote10 In order to be consistent with the contemporaneous thinking on global poverty at the time that the data was collected we use the global poverty line of $1.90 per day in 2011 PPP, which is equivalent to the current (i.e. in 2023) global poverty line of $2.15 per day in 2017 PPP.

Table 3. Imputed poverty rates using the international and national poverty lines*.

shows that the imputed poverty rates are not statistically different from the poverty rates obtained directly from the survey consumption data (henceforth, ‘survey estimates’). That is, all the imputed poverty estimates fall inside the 95% confidence intervals of the survey estimates, with many even falling inside one standard error of the survey estimates. The normal linear regression model and the empirical error model offer quite similar estimation results, which is generally consistent with findings in poverty imputation for the general population in other countries (Dang et al., Citation2019).

To provide further robustness check, we also employ two alternative modelling options (the probit model and the logit model) and show the results in in the Annex. These models offer quite similar estimation results. The estimation results using the food poverty line, shown in in the Annex, are qualitatively similar.

Using the ProGres Targeting data, further simulates the estimation results for all the poverty lines between the 66th and 99th percentiles of the consumption distribution. Panels A and B offer estimation results using the normal linear model and the empirical error model, respectively. The results suggest that Models 1 and 2 predict the poverty rates for different poverty lines well. The imputed poverty rates are within the 95% confidence intervals (CIs) for all the arbitrary poverty lines considered, and they are similar across the normal linear regression and empirical error models. However, Model 3 overestimates poverty, and the imputed poverty rates are outside the 95% CIs of the survey-based rates for the set of different poverty lines considered. As Model 3 adds variables related to coping strategies, it may suffer from measurement errors. For example, households might not accurately report these strategies, for example, by overestimating the frequency of using these strategies to receive more assistance from humanitarian organizations.

Figure 1. Imputed welfare and survey-based welfare for different poverty lines, ProGres targeting.

Note: The (blue) dashed curve presents the actual poverty rates derived from the PDM data in the ProGres Targeting. The (green) solid curve with circle symbol represents the imputed poverty rates from Model 1 with observations from Merged ProGres Targeting (56,830 observations). The (indigo) solid curve with symbol “x” represents the imputed poverty rates from Model 2 with the Merged ProGres Targeting observations (56,829 observations) while the (orange) solid curve with the triangle symbol represents the imputed poverty rates from Model 3 with the Merged ProGres Targeting observations (56,829 observations).
Figure 1. Imputed welfare and survey-based welfare for different poverty lines, ProGres targeting.

shows the imputed welfare rates for the set of different poverty lines for all three models, but with a focus on food security. Welfare based on food security is defined in humanitarian settings as the inability to afford the minimum expenditure basket required to purchase a food basket (to satisfy basic needs). In particular, the minimum expenditure basket is defined by the WFP ‘as what a household requires in order to meet their essential needs, on a regular or seasonal basis, and its average cost’ (WFP, Citation2018). The results are similar to the overall welfare results displayed in . The results indicate that Models 1 and 2 predict the actual welfare rates well based on food security for different poverty lines and are within the 95% CIs for all the arbitrary poverty lines considered and the two different estimation models offer similar results. Again, Model 3 overestimates the poverty rates, as the imputed welfare rates are outside the CIs of the survey-based rates.Footnote11

Figure 2. Imputed welfare and survey-based welfare based on food security for different poverty lines, ProGres targeting.

Note: The (blue) dashed curve presents the actual poverty rates derived from the PDM data in the ProGres Targeting. The (green) solid curve with circle symbol represents the imputed poverty rates from Model 1 with observations from Merged ProGres Targeting (56,830 observations). The (indigo) solid curve with symbol “x” represents the imputed poverty rates from Model 2 with the Merged ProGres Targeting observations (56,829 observations) while the (orange) solid curve with the triangle symbol represents the imputed poverty rates from Model 3 with the Merged ProGres Targeting observations (56,829 observations).
Figure 2. Imputed welfare and survey-based welfare based on food security for different poverty lines, ProGres targeting.

In summary, our results show that while Models 1 and 2 predict poverty and food security poverty reasonably well for different arbitrary poverty lines, Model 3 always overestimates poverty for lower poverty lines and its predictions are outside the 95% CIs. The results remain similar regardless of the employed models (i.e. the normal linear regression model or the empirical error model). Put differently, the variables currently available in the ProGres UNHCR registration system can be combined with other survey data to predict the poverty rates of refugees using the proposed imputation methods.Footnote12

Targeting performance

The imputed welfare estimates can be useful in evaluating ex-post the inclusion/exclusion errors of the food assistance programs administered by government and humanitarian organizations during 2016/17. The targeting strategy for food assistance was agreed to and implemented by the UNHCR, WFP, and the Chadian government agency responsible for refugees, the CNARR. The proposed cross-imputation method has also been shown to perform better than the proxy means testing approach in refugee contexts (Dang & Verme, Citation2023).Footnote13 The current UNHCR/WFP/CNARR targeting approach relies on the Food Consumption Score (FCS) generated by WFP’s PDM surveys, which is a composite score based on dietary diversity, food consumption frequency, and the relative nutritional importance of different food items. As is the case for any index, the FCS is contingent on the selection of the food group weights as well as the food item thresholds, which are based on inherently subjective choices.

We show next how accurately the current targeting strategy identifies poor households in terms of inclusion (leakage) and exclusion (undercoverage) errors. The inclusion error is defined as the proportion of households that the targeting method considers as poor despite not being poor. This is expressed as FPTP+FP, where FP (false poor) is the number of non-poor households incorrectly considered by the targeting method and TP (true poor) is the number of poor households correctly reported poor. The exclusion error is defined as the proportion of households in poverty that the targeting method considers as non-poor FNTP+FN, where FN (false non-poor) is the number of poor households incorrectly considered non-poor by the targeting method.

Both error types are important from different perspectives. The inclusion error matters primarily from a budget perspective, as it represents a waste of resources. The exclusion error summarizes the program’s failure to cover households in need. Another common targeting indicator is the Coady-Grosh-Hoddinott (CGH) Ratio (Coady et al., Citation2004), which is obtained by dividing the proportion of beneficiaries falling within the target population by the proportion of beneficiaries that would result from a random allocation. For example, if the bottom 40% of the income distribution receives 60% of the funding, the performance indicator is 1.5 ( = 60/40). The higher the indicator, the greater is the performance of the targeting strategy (see in the Annex for a summary of these indicators).

shows the undercoverage and leakage rates for the different approaches. The method we propose (panel B) outperforms the targeting method currently used in Chad (panel A) for all the poverty lines except the 25th percentile poverty line. The errors are considerable, with the UNHCR/WFP/CNARR undercoverage rates ranging from 9% to 32% and the leakage rates from 12% to 36%, and our model-based undercoverage rates from 6% to 40% and the leakage rates from 9% to 41%. However, these methods perform relatively well when compared with international evidence. For example, Skoufias et al. (Citation2001) find that the undercoverage and leakage rates for the PROGRESA program in Mexico were 7% and 70%, respectively, for a poverty rate of 25%. These figures represent slightly better performance on the undercoverage rate but much worse performance on the leakage rate compared with those for Chad.

Table 4. Comparison of coverage and leakage rates (%).

In fact, the estimated targeting rates for Chad are also better than the median performance of similar scores for programs across the world. reports the CGH ratio for the 85 programs considered by Coady et al. (Citation2004) (A), the UNHCR/WFP/CNARR targeting program (B), and our proposed method (C). Notably, our methodology outperforms the UNHCR/WFP/CNARR targeting program and the median value of the programs covered by Coady et al. (Citation2004).

For a more general test, we empirically evaluate how the UNHCR/WFP/CNARR targeting strategy performs relative to the proposed targeting method based on imputed consumption for different poverty lines varying from 38% to 99% of the consumption distribution (). The results suggest that our proposed method outperforms the targeting method currently used in Chad for all the poverty lines between 38% and 99%. In other words, our proposed method would more accurately identify the intended beneficiaries than the targeting method currently used in Chad for any welfare programs targeting poor refugees.

Figure 3. Comparison of targeting performances of different targeting methods.

Figure 3. Comparison of targeting performances of different targeting methods.

Conclusion

Tracking the progress made toward SDG Goal 1 of eradicating poverty for all requires the availability of high-quality household consumption surveys. However, the majority of countries across the world, especially developing countries, face challenges in collecting poverty data. High-quality consumption surveys that are comparable for forcibly displaced persons and their hosts are, and will, remain in limited supply, given the cost and challenges associated with these types of surveys. In the meantime, cross-survey imputation methods can provide a second-best alternative that can potentially save time and resources.

We combine survey and census-type data on refugees to estimate welfare for refugees in Chad. We showed how different sets of variables as well as different sources of data perform in the identification of poor households, in particular how well the set of variables available in the ProGres database can predict poverty. In a second step, the paper estimated the accuracy of the current UNHCR/WFP/CNARR targeting strategy and compared it with the targeting strategy based on imputed consumption.

The results suggest that the set of variables available in ProGres accurately predicts the welfare rates for different poverty lines. Adding variables related to asset and animal ownership provides predictions that are very close to the ones with only the variables available in the ProGres data set. These results are robust to different model specifications, varying poverty lines, and assumptions about the error terms. Since the UNHCR ProGres data are available in most refugee locations where the UNHCR runs the registration system – currently more than 122 countries – these methods may be replicable in many settings of forcibly displaced persons.

The current targeting strategy that is used for food, livelihoods, and cash-based assistance, despite its simplicity, is rather accurate when compared with the existing international evidence. The targeting errors resulting from the current UNHCR/WFP/CNARR targeting strategy for a poverty rate of 25% are in the same error range as other targeting methods around the world, as reported in Coady et al. (Citation2004). Yet, we also showed that the existing targeting method can be improved by using the imputation method proposed in this paper.

Our study has several data limitations. The PDM data measure consumption using relatively fewer variables than those in the standard household consumption survey (i.e. the Chadian Household Consumption and Informal Sector Surveys [ECOSIT4]), and the data used by this paper only cover a subset of refugees in Chad. Our main objective is to test a cross-survey methodology, and, for this purpose, we used a subsample of UNHCR refugee data that are not nationally representative of the refugee population in Chad. Therefore, the poverty estimates presented in this paper do not reflect the official poverty estimates monitored by the government and the international community, and our poverty estimates can be improved once ECOSIT4 data are available. In addition, the data that we analyze did not cover refugees who live outside camps. As these refugees live in different environments, accurately predicting their welfare may require more detailed variables.

A promising direction for future research is to adapt existing survey instruments to better collect data to further enhance the accuracy of the imputation model. Another direction can be to experiment with different imputation models for different types of refugees. Since forced displacement can lead to a reorganization of a family’s structure (Beltramo et al., Citation2023), refugee household composition can be unique as only a subset of the original members may be sustained post conflict (e.g. due to conscription in the military for male members, and the death, kidnapping or separation of certain family members during displacement). As such, cross survey imputation methods offer an opportunity for heterogeneity analysis within the refugee (or forcibly displaced) community to assess the welfare of especially vulnerable groups, which is typically impossible in data-scarce contexts. Last but not least, the proposed imputation method is being extended to study other general poverty outcomes such as poverty gap, extreme poverty, or vulnerability (near-poor) rate that can better capture the consumption distribution (Dang et al., Citation2023). It would be useful to further explore similar applications to the refugee context.

Supplemental material

CODS-2022-0174.R1 Dang Annex Final.docx

Download MS Word (43.5 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13600818.2024.2313216.

Additional information

Funding

This work was supported by the Foreign Commonwealth and Development Office, UK Government [P169210 and P175686].

Notes

1 In this paper, we focus on poverty as measured by household consumption. While some richer countries tend to implement income surveys (e.g. in Latin America), consumption-based poverty measurement is the standard practice with poorer countries, particularly in Africa (see, e.g. Beegle et al., Citation2016).

2 Missing data issues are not a problem limited to displaced populations but can emerge because of lack of survey data on a particular topic of interest, population group, or time period. These issues can also be caused by sampling errors, incomplete data due to unit or item nonresponse, data input errors, or post-survey data manipulations such as top-coding or censoring.

3 This is an expanded version of an early working paper by Beltramo et al. (Citation2021).

4 The poverty estimates used in this paper do not reflect the official poverty estimates monitored by the government and the international community. The interest of this paper is to test a cross-survey imputation methodology, and, for this purpose, we use a subsample of UNHCR refugee data that are not nationally representative of the refugee population in Chad. By contrast, the official poverty statistics require national consumption surveys conducted by the national statistical office with samples that are nationally representative. At the time of writing this paper, these national data were not available. This will provide another opportunity to validate the cross-survey imputation method proposed in this paper.

5 UNHCR uses the term ‘people of concern’ to describe those who are forcibly uprooted from their homes, including asylum-seekers, refugees, stateless persons, the internally displaced, and returnees.

6 See also Tarozzi (Citation2007) and Mathiassen (Citation2009) for further improvements and adaptation of this approach (e.g. by estimating the standard errors in a different way). For example, Douidich et al. (Citation2016) offer an early study that imputes across types of surveys such as consumption and labor force surveys. Newhouse et al. (Citation2014), Dang et al. (Citation2019), and Dang (Citation2021) offer recent reviews of previous imputation studies that discuss the main advantages and different approaches of welfare imputation practices as well as provide useful insights into the imputation process. See also Little and Rubin (Citation2019) for a recent review on related topics in the statistics literature.

7 Note that we have to use the poverty line z1 from the base survey to be consistent with the other parameters that are also estimated using the base survey. In addition, to make notation simpler, we do not show the equations with sampling weights. However, these features can be straightforwardly incorporated. For example, consider a commonly used stratified two-stage sample design where the population is divided into R strata, nr clusters are sampled out of a population total of Nr clusters from stratum r in the first stage, and mrc households are sampled out of a population total of Mrc households from cluster c in stratum r in the second stage. Suppose that all individuals within a household share the same poverty status (i.e. poverty is measured at the household level), and household h has mrch members. The formula for estimating the poverty rate at the sth random draw given in EquationEquation (4) can be modified accordingly as

P ˆy2,s1z1=r=1Rc=1nrh=1mrcwrchP ˆy2rchi,s1z1 (4’)

with the sampling weight wrch=NrnrMrcmrcmrch, and the subscript i indexing household individuals. We refer to interested readers to more detailed discussion on this topic in Dang et al. (Citation2017).

8 The aim of the paper is not to measure consumption accurately or estimate nationally representative poverty figures for refugees in Chad. The purpose of the paper is only to test the cross-survey imputation methodology using a sample of refugee data. In this respect, our only concern is that the poverty predictions are close to the poverty rates calculated with the consumption data. Whether our consumption data are accurate or not, this is less relevant for us. We should expect the cross-survey methodology to produce even better results if the quality of the consumption aggregate improves.

9 We use the test proposed by Verme et al. (Citation2015) applied to similar ProGres data on refugees in Jordan.

10 See also in the Annex for the full regression results.

11 To check for possible heterogeneity, we split the sample with respect to country of origin. The results (not shown) were similar except larger estimate variances (less precision), which might be due to sample size for refugees from the Central African Republic.

12 Our results are also consistent with the findings in previous studies (e.g. Dang & Verme, Citation2023; Dang et al., Citation2017; Dang et al., Citation2019; De Luca et al., Citation2018) that emphasize the importance of selecting few key predictor variables rather than too many predictors (to avoid model overfitting). In particular, these studies highlight that adding household assets to a parsimonious model with key demographic variables (such as Model 1) helps to improve on poverty estimates. Model 2, which adds asset and animal ownership to Model 1, is consistent with this evidence. However, adding more variables may lead to overfitting, resulting in less accurate welfare estimates. The results of Model 3 could be placed in this context.

13 On optimal targeting in humanitarian contexts, see also Verme and Gigliarano (Citation2019).

References

  • Altındağ, O., O’Connell, S. D., Şaşmaz, A., Balcıoğlu, Z., Cadoni, P., Jerneck, M., & Foong, A. K. (2021). Targeting humanitarian aid using administrative data: Model design and validation. Journal of Development Economics, 148, 102564. https://doi.org/10.1016/j.jdeveco.2020.102564
  • Beegle, K., Christiaensen, L., Dabalen, A., & Gaddis, I. (2016). Poverty in a rising Africa. World Bank.
  • Beltramo, T. P., Calvi, R., Giorgi, G. D., & Sarr, I. (2023). Child poverty among refugees. Discussion Paper no. 17870. CEPR.
  • Beltramo, T., Dang, H. A. H., Sarr, I., & Verme, P. (2021). Estimating poverty among refugee populations: A cross-survey imputation exercise for Chad. Discussion Paper No. 14606. IZA.
  • Coady, D., Grosh, M., & Hoddinott, J. (2004). Targeting of transfers in developing countries: Review of lessons and experience. World Bank.
  • Dang, H.-A. H. (2021). To impute or not to impute, and how? A review of poverty-estimation methods in the absence of consumption data. Development Policy Review, 39(6), 1008–1030.
  • Dang, H.-A. H., Jolliffe, D., & Carletto, C. (2019). Data gaps, data incomparability, and data imputation: A review of poverty measurement methods for data-scarce environments. Journal of Economic Surveys, 33(3), 757–797.
  • Dang, H.-A. H., Kilic, T., Carletto, C., & Abanokova, K. (2023). Imputing poverty indicators without consumption data: An exploratory analysis (Mimeo). World Bank.
  • Dang, H.-A. H., Lanjouw, P. F., & Serajuddin, U. (2017). Updating poverty estimates in the absence of regular and comparable consumption data: Methods and illustration with reference to a middle-income country. Oxford Economic Papers, 69(4), 939–962. https://doi.org/10.1093/oep/gpx020
  • Dang, H.-A. H., & Verme, P. (2023). Estimating poverty for refugees in data-scarce contexts: An application of cross-survey imputation. Journal of Population Economics, 36(2), 653–679. https://doi.org/10.1007/s00148-022-00909-x
  • De Luca, G., Magnus, J. R., & Peracchi, F.(2018). Balanced variable addition in linear models. Journal of Economic Surveys, 32(4), 1183–1200.
  • Douidich, M., Ezzrari, A., Van der Weide, R., & Verme, P. (2016). Estimating quarterly poverty rates using labor force surveys: A primer. The World Bank Economic Review, 30(3), 475–500. https://doi.org/10.1093/wber/lhv062
  • Elbers, C., Lanjouw, J. O., & Lanjouw, P. (2003). Micro–level estimation of poverty and inequality. Econometrica, 71(1), 355–364. https://doi.org/10.1111/1468-0262.00399
  • Kennedy, P. (2008). A guide to Econometrics. John Wiley & Sons.
  • Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). John Wiley & Sons.
  • Mathiassen, A. (2009). A model-based approach for predicting annual poverty rates without expenditure data. Journal of Economic Inequality, 7(2), 117–135. https://doi.org/10.1007/s10888-007-9059-7
  • Newhouse, D., Shivakumaran, S., Takamatsu, S., & Yoshida, N. (2014). How survey-to-survey imputation can fail. The World Bank.
  • Skoufias, E., Davis, B., & de la Vega, S. (2001). Targeting the poor in Mexico: An evaluation of the selection of households into PROGRESA. World Development, 29(10), 1769–1784. https://doi.org/10.1016/S0305-750X(01)00060-2
  • Tarozzi, A. (2007). Calculating comparable statistics from incomparable surveys, with an application to poverty in India. Journal of Business & Economic Statistics, 25(3), 314–336. https://doi.org/10.1198/073500106000000233
  • UNHCR. (2019). Global trends—Forced displacement in 2018.
  • UNHCR. (2020). GlobalTrends: Forced displacement in 2019.
  • Verme, P., & Gigliarano, C. (2019). Optimal targeting under budget constraints in a humanitarian context. World Development, 119, 224–233. https://doi.org/10.1016/j.worlddev.2017.12.012
  • Verme, P., Gigliarano, C., Wieser, C., Hedlund, K., Petzoldt, M., & Santacroce, M. (2015). The welfare of Syrian refugees: Evidence from Jordan and Lebanon. World Bank.
  • WFP. (2018). World Food Programme, 2018. Minimum expenditure basket: Interim guidance note. https://docs.wfp.org/api/documents/WFP-0000074198/download/
  • World Bank. (2013). Repbulic of Chad Poverty Notes: Dynamics of Poverty and Inequality following the Rise of the Oil Sector. The World Bank. http://hdl.handle.net/10986/19322
  • World Bank. (2021). Chad poverty assessment: Investing in rural income growth, human capital, and resilience to support sustainable poverty reduction. World Bank. http://hdl.handle.net/10986/36443

Appendix

Table A1. Distribution of persons of concern by group in Chad.

Table A2. Means difference tests.

Table A3. Collinearity tests.

Table A4. Estimation model.

Table A5. Imputed poverty rates using the international and national poverty lines, further robustness checks with probit and logit models.

Table A6. Food poverty imputation.

Table A7. Targeting performance measures.

Table A8. Targeting performance of sample of programs, current UNHCR/WFP/CNARR, and our imputed consumption-based targeting.