Abstract
In this study, we concern with the improved estimation of sensitive variable when there is non-response and measurement error on sensitive variable but the auxiliary variable is non sensitive in nature. For the purpose, we propose an improved estimator in the presence of non-response and measurement error using Optional randomized response technique (ORRT) under simple random sampling without replacement (SRSWOR). The properties of the estimator have been studied and the efficiency conditions are obtained in comparison to the mean estimator, ratio estimator and Zhang’s estimator. Simulation study based on hypothetical populations has been carried out to demonstrate the performance of the proposed estimator at its optimum among others. It has been observed that the proposed estimator is more efficient than other considered estimators in term of having higher Percent Relative Efficiency (PRE).
1 Introduction
Survey researchers find difficult to obtain efficient parameters due to the presence of non-response and measurement errors. If the variable of interest is sensitive in nature, one may find it more difficult to collect data from the respondents. Face-to-face interview method is more reliable to collect data but the cost associated to this method is higher in comparison to other methods. Hansen and Hurwitz (Citation1946) suggested a procedure of taking a sub-sample from non-respondents after the first call and collecting information by personal interviews. If the variable of interest is sensitive in nature, then the respondent may not provide honest answers in face-to-face interview. To reduce the bias caused by sensitive questions, one could use Randomized response technique (RRT) models. Diana et al. (Citation2014), Ahmed et al. (Citation2017) and Makhdum et al. (Citation2020) proposed estimators for a sensitive variable in the presence of non-response using RRT models. According to Collins et al. (Citation2001), the use of auxiliary variables when combined with the variable under study help to achieve more efficient estimators. Gupta et al. (Citation2020) studied the estimation of variance of a sensitive study variable using a highly correlated but non-sensitive auxiliary variable.
Another important cornerstone of non-sampling error is measurement error. Kumar et al. (Citation2011), Khalil et al. (Citation2021), and Singh et al. (Citation2019) proposed different estimators for estimating population parameters in presence of measurement error. Azeem (Citation2014), Kumar et al. (Citation2015), Kumar et al. (Citation2018), Kumar (Citation2016), Singh and Sharma (Citation2015), Audu et al. (Citation2020), Singh and Vishwakarma (Citation2019), Kumar and Chowdhary (Citation2021), studied the problem of mean estimation in the presence of non-response and measurement error simultaneously.
Further, Khalil et al. (Citation2018) pioneered the estimation procedure for a sensitive variable in the presence of measurement error by using optional and non-optional RRT models.
On the basis of previous studies, a researcher may think about estimating the population mean of a sensitive variable in the presence of both measurement error and non-response. This issue has received little consideration in the existing literature. The RRT models utilized in earlier studies (Ahmed et al. Citation2017; Diana et al. Citation2014; Makhdum et al. Citation2020) are non-optional RRT models in which all respondents need to give a scrambled response. A survey question, on the other hand, may be sensitive for one person but not for another. According to Gupta et al. (Citation2002), if we give respondents the option of answering the sensitive question directly or providing a scrambled response, the model will be more efficient while causing no further loss of privacy Gupta et al. (Citation2018).
2 Sampling procedure
Let be a finite population of size N units and a sample of size n is taken from U by using simple random sampling without replacement (SRSWOR). Let Y be a sensitive study variable which cannot be observed directly and be a non sensitive auxiliary variable correlated with Y, both having unknown mean and variance i.e.,
and (, respectively. Suppose T and S be two scrambling variable(s) with mean and variance (, respectively. Let W be the probability that respondent find the question sensitive. If the respondents consider the question sensitive then he/she is asked to report a scrambled response and else a correct response is reported/recorded.
Further, to collect sensitive information from the respondents, the researchers find difficulty due to the occurrence of non-response. If the variable of interest is sensitive in nature, then to tackle with non-response, Hansen and Hurwitz (Citation1946) technique has been modified by Zhang et al. (Citation2021), Kumar and Kour (Citation2022). In this technique, the respondent gives direct answer in first phase then ORRT model is used to get answer from a sub-group of non-respondents in the second phase.
Therefore, ORRT model in the second phase is (1) (1) with mean and variance . The RRT model is (2) (2) where with and When W = 1, then the randomized response becomes non-optional. So, with W = 1 the mean and variance of Z is (3) (3) (4) (4)
Let us take a transformation of the randomized response be whose expectation under the randomization mechanism is the true response yi and is given as (5) (5) with and .
Based on above discussions, we assume that only n1 units provide response on first call and remaining units do not respond. Then a sub-sample of units are taken from non-responding units n2 respectively. A modified version of Hansen and Hurwitz estimator is given by (6) (6) where is the mean of respondents in first phase and is the mean of sub-sampled units in the second phase. Also and .
The mean and variance of is
Moreover, let be the observed values and be the true values of the variables and Z, respectively. Let u be the measurement error (ME) on Y, v be the measurement error on X and p be the measurement error on Z, respectively. The ME’s on ith observed unit are ui = yi – Y i, vi = xi – Xi and pi = zi – Zi and assumed to be uncorrelated with mean zero and variance and , respectively.
In the presence of non-response and ME, the variance of is given by
3 Existing mean estimator
By using basic terminologies, as used in Section 2, suppose that population mean and variance of the auxiliary variable X are known and is denoted by and , respectively.
Let the population mean and variance of the respondent group of size N1 is given by
and , respectively and the population mean and variance of non-respondent group of size N2is given by and , respectively. Further, be the correlation coefficient between the auxiliary variable X and sensitive variable Y
Similarly, let and be the correlation coefficient between auxiliary variable X and the sensitive study variable for the respondent group and the non-respondents group, respectively.
Assuming that the population mean μx of X is known and non-response happened on both Y and X. Some of the existing mean estimators of ORRT model are listed below:
A typical mean estimator for sensitive variable in finite population under modified Hansen and Hurwitz (HH) estimator in presence of measurement error is
(7)
where .
In presence of measurement error, the MSE of is given by
(8)
A ratio estimator corresponding to Gupta et al. (Citation2014) estimator under modified HH in presence of measurement error is given by
(9)
where and is the ordinary mean estimator under original HH procedure.
The MSE of is given by
(10)
where and .
The generalized mean estimator considered in Zhang et al. (Citation2021) but with non-response and measurement error is given as
(11)
where and
Also, k and ν are suitable chosen constants, is assumed to be an unknown constant whose value is to be determined from optimality consideration, α and β are assumed to be some known parameters of the auxiliary variable X.
The MSE of is given by (12) (12) where .
Therefore, the MSE of , and without measurement error may be obtained by puting .
4 Proposed mean estimator
Taking the motivation from Zhang et al. (Citation2021), we propose a generalized mean estimator using ORRT models in the presence of non-response and measurement error simultaneously as (13) (13) where denotes the mean of the sensitive study variable in the presence of non-response and measurement error, is the mean of the auxiliary variable in the presence of non-response and measurement error, is the mean of auxiliary variable, α1 and α2 are suitable chosen constants and k1 and are assumed to be unknown constants whose values are to be optimize.
To find the MSE of the estimator, we define such that . where .
The bias of a proposed estimator up to the second order of approximation is given by (14) (14)
Without measurement error, the bias of can be obtained by taking in the above equation.
The MSE of the proposed estimator is (15) (15)
Differentiate (15) with respect to k1 and k2 we get the optimum values of k1 and as (16) (16)
Substitute the values of k1 and from (16) in (15) we get the minimum MSE as (17) (17)
The expression for the minimized MSE of the proposed estimator without ME may be obtained by putting in the above expression, we get (18) (18)
5 Efficiencies comparison
In this section, we compare the MSE of the proposed estimator with respect to the MSE of other existing estimators mentioned in (8, 10, 12), and (17) are given as
if
(19)
if
(20)
if
(21)
If the above conditions (19)–(21) hold true then the proposed estimator is always more efficient than the other considered estimators.
6 Simulation study
In this study, with the help of simulation study, we compare the performance of the proposed estimator under SRSWOR with the usual unbiased estimator and other two considered estimators.
For simulation study, data set consist of sensitive study variable Y and an auxiliary variable X is generated from a normal distribution using the model
where , , and may varies.
An artificial population of size N(5000) from normal distribution and a sample of size n(850) under SRSWOR is taken. It is assumed that only units provide response and do not respond in the first phase. In the second phase, we take a sub-sample of size from the non-respondent n2 units by using , respectively. The simulation study given in and .
Also, the scrambling variable and S are taken to be normal with mean 1 and 0, respectively and with different variances.
Further, another artificial population is used, we considered by Zhang et al. (Citation2021) for the comparison purpose and to see the performance of the proposed estimator over other considered estimator. We have considered a population of size 5000 generated from a bivariate normal distribution with mean and covariance as mentioned below:
Taking sample of size 500 using SRSWOR and in the first phase we select a sample of size and . We take another sub-sample from the non-respondent in the second phase by using The simulation study based on Zhang et al. (Citation2021) given in .
Coding for simulation was done in R software. The Percent Relative Efficiency (PRE) of the proposed estimator with respect to usual unbiased estimator and two considered estimators is defined as
where and
Also .
Also .
From and , we will compare the performance of the proposed estimator with respect to usual unbiased estimator and two considered estimators. In , when , we see that our proposed estimator decreases with the increase in f and W, where f = 2 to 5 and W = 0.2 to 0.8. But in , when , we see that our proposed estimator performs better than the other considered estimators ( for different values of and W except that W = 0.6, in this case PRE of the proposed estimator is less than the considered estimators (.
It is noted from , that the proposed estimator ( is more efficient than the usual unbiased estimator Gupta et al. (Citation2014) ratio estimator under the setup of Hansen and Hurwitz and the generalized estimator of Zhang et al. (Citation2021) ( in terms of having higher PRE. Also, the ratio estimator not performing well at .
For , the values of PRE of estimators decreases with the increase in the value of f i.e., to 5.
7 Conclusion
In this paper, we studied the improved mean estimation of sensitive variable by suggested generalized mean estimator using ORRT model. The properties of the proposed estimator have been studied and the conditions are obtained where the proposed estimator is more efficient than the existing estimators. A simulation study is also supporting the theoretical results except the situation when the probability of sensitive question is moderately high (i.e., W = 0.6), under this situation the Zhang et al. (Citation2021) estimator () is more efficient. Based on the results obtained, we recommend the use of the suggested estimator by the researchers and practitioners in future.
Acknowledgments
The authors express very sincere gratitude to the reviewers for their constructive suggestions which helped improve the presentation of the paper.
Data availability statement
No real data is used in the paper.
Disclosure statement
The authors declares that they have no conflicts of interest.
References
- Ahmed S, Shabbir J, Gupta S. 2017. Use of scrambled response model in estimating the finite population mean in presence of non-response when coefficient of variation is known. Commun Stat Theory Methods. 46:8435–8449.
- Audu A, Singh R, Khare S, Dauran, NS. 2020. Almost unbiased estimators for population mean in the presence of non-response and measurement error. J Stat Manag Syst. 24:573–589.
- Azeem M. 2014. On estimation of population mean in the presence of measurement error and non-response [Unpublished Ph.D. thesis]. Lahore: National College of Business Administration and Economics.
- Collins LM, Schafer JL, Kam CM. 2001. A comparison of inclusive and restrictive strategies in modern missing data procedure. Psychol Methods. 6:330–351.
- Diana G, Riaz S, Shabbir J. 2014. Hansen and Hurwitz estimator with scrambled response on the second call. J Appl Stat. 41:596–611.
- Gupta S, Aloraini B, Qureshi MN, Khalil S. 2020. Variance estimation using randomized response technique. Revstat Stat. J. 18:165–176.
- Gupta S, Gupta B, Singh S. 2002. Estimation of sensitivity level of personal interview survey questions. J Stat Plan Inference. 100:239–247.
- Gupta S, Kalucha G, Shabbir J, Dass, BK. 2014. Estimation of finite population mean using optional RRT models in the presence of non-sensitive auxiliary information. Am J Math Manag Sci. 33:147–159.
- Gupta S, Mehta S, Shabbir J, Khalil S. 2018. A unified measure of respondent privacy and model efficiency in quantitative RRT models. J Stat Theory Pract. 12:506–511.
- Hansen MH, Hurwitz WN. 1946. The problem of non-response in sample surveys. J Am Stat Assoc. 41:517–529.
- Khalil S, Noor-Ul-Amin M, Hanif M. 2018. Estimation of population mean for a sensitive variable in the presence of measurement error. J Stat Manag Syst. 21:81–91.
- Khalil S, Zhang Q, Gupta S. 2021. Mean estimation of sensitive variables under measurement errors using optional RRT models. Commun Stat Simul Comput. 50:1417–1426.
- Kumar S, Bhogal S, Nataraja NS, Viswanathaiah M. 2015. Estimation of population mean in the presence of non-response and measurement error. Rev Colomb Estad. 38:145–161.
- Kumar S, Chowdhary M. 2021. Estimation of population product in the presence of non-response and measurement error in successive sampling. Math Sci Lett. 10:71–83.
- Kumar S, Kour SP. 2022. The joint influence of estimation of sensitive variable under measurement error and non-response using ORRT models. J Stat Comput Simul. 92:3583–3604.
- Kumar S, Singh HP, Bhougal S, Gupta R. 2011. A class of ratio-cum-product type estimators under double sampling in the presence of non-response. J Math Stat. 40:589–599.
- Kumar S, Trehan M, Joorel JPS. 2018. A simulation study: estimation of population mean using two auxiliary variables in stratified random sampling. J Stat Comput Simul. 88:3694–3707.
- Kumar S. 2016. Improved estimation of population mean in presence of nonresponse and measurement error. J Stat Theory Pract. 10:707–720.
- Makhdum M, Sanaullah A, Hanif M. 2020. A modified regression-cum-ratio estimator of population mean of a sensitive variable in the presence of non-response in simple random sampling. J Stat Manag Syst. 23:495–510.
- Singh N, Vishwakarma GK, Kim, JM. 2019. Computing the effect of measurement errors on efficient variant of the product and ratio estimators of mean using auxiliary information. Commun Stat Simul Comput. 51:1–22.
- Singh N, Vishwakarma, GK. 2019. A generalized class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey. Rev Investig Oper. 40:275–285.
- Singh SR, Sharma P. 2015. Method of estimation in the presence of non-response and measurement errors simultaneously. J Mod App Stat Meth. 14:107–121.
- Zhang Q, Khalil S, Gupta S. 2021. Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. J Stat Theory Pract. 15:1–15.