209
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Improved estimator for the estimation of sensitive variable using ORRT models

ORCID Icon &
Article: 2318383 | Received 09 May 2023, Accepted 08 Feb 2024, Published online: 28 Feb 2024

Abstract

In this study, we concern with the improved estimation of sensitive variable when there is non-response and measurement error on sensitive variable but the auxiliary variable is non sensitive in nature. For the purpose, we propose an improved estimator in the presence of non-response and measurement error using Optional randomized response technique (ORRT) under simple random sampling without replacement (SRSWOR). The properties of the estimator have been studied and the efficiency conditions are obtained in comparison to the mean estimator, ratio estimator and Zhang’s estimator. Simulation study based on hypothetical populations has been carried out to demonstrate the performance of the proposed estimator at its optimum among others. It has been observed that the proposed estimator is more efficient than other considered estimators in term of having higher Percent Relative Efficiency (PRE).

1 Introduction

Survey researchers find difficult to obtain efficient parameters due to the presence of non-response and measurement errors. If the variable of interest is sensitive in nature, one may find it more difficult to collect data from the respondents. Face-to-face interview method is more reliable to collect data but the cost associated to this method is higher in comparison to other methods. Hansen and Hurwitz (Citation1946) suggested a procedure of taking a sub-sample from non-respondents after the first call and collecting information by personal interviews. If the variable of interest is sensitive in nature, then the respondent may not provide honest answers in face-to-face interview. To reduce the bias caused by sensitive questions, one could use Randomized response technique (RRT) models. Diana et al. (Citation2014), Ahmed et al. (Citation2017) and Makhdum et al. (Citation2020) proposed estimators for a sensitive variable in the presence of non-response using RRT models. According to Collins et al. (Citation2001), the use of auxiliary variables when combined with the variable under study help to achieve more efficient estimators. Gupta et al. (Citation2020) studied the estimation of variance of a sensitive study variable using a highly correlated but non-sensitive auxiliary variable.

Another important cornerstone of non-sampling error is measurement error. Kumar et al. (Citation2011), Khalil et al. (Citation2021), and Singh et al. (Citation2019) proposed different estimators for estimating population parameters in presence of measurement error. Azeem (Citation2014), Kumar et al. (Citation2015), Kumar et al. (Citation2018), Kumar (Citation2016), Singh and Sharma (Citation2015), Audu et al. (Citation2020), Singh and Vishwakarma (Citation2019), Kumar and Chowdhary (Citation2021), studied the problem of mean estimation in the presence of non-response and measurement error simultaneously.

Further, Khalil et al. (Citation2018) pioneered the estimation procedure for a sensitive variable in the presence of measurement error by using optional and non-optional RRT models.

On the basis of previous studies, a researcher may think about estimating the population mean of a sensitive variable in the presence of both measurement error and non-response. This issue has received little consideration in the existing literature. The RRT models utilized in earlier studies (Ahmed et al. Citation2017; Diana et al. Citation2014; Makhdum et al. Citation2020) are non-optional RRT models in which all respondents need to give a scrambled response. A survey question, on the other hand, may be sensitive for one person but not for another. According to Gupta et al. (Citation2002), if we give respondents the option of answering the sensitive question directly or providing a scrambled response, the model will be more efficient while causing no further loss of privacy Gupta et al. (Citation2018).

2 Sampling procedure

Let U(=U1U2UN) be a finite population of size N units and a sample of size n is taken from U by using simple random sampling without replacement (SRSWOR). Let Y be a sensitive study variable which cannot be observed directly and X be a non sensitive auxiliary variable correlated with Y, both having unknown mean and variance i.e.,

(μx,μy) and (σx,2σy2), respectively. Suppose T and S be two scrambling variable(s) with mean (μT=1,μS=0) and variance (σT,2σS2), respectively. Let W be the probability that respondent find the question sensitive. If the respondents consider the question sensitive then he/she is asked to report a scrambled response and else a correct response is reported/recorded.

Further, to collect sensitive information from the respondents, the researchers find difficulty due to the occurrence of non-response. If the variable of interest is sensitive in nature, then to tackle with non-response, Hansen and Hurwitz (Citation1946) technique has been modified by Zhang et al. (Citation2021), Kumar and Kour (Citation2022). In this technique, the respondent gives direct answer in first phase then ORRT model is used to get answer from a sub-group of non-respondents in the second phase.

Therefore, ORRT model in the second phase is (1) Z={Y with probability(1W)TY+S with probability W(1) with mean E(Z)=E(Y) and variance Var(Z)=σy2+σS2W+σT2(σy2+μy2)W. The RRT model is (2) Z=(TY+S)J+Y(1J)(2) where JBernoulli(W) with E(J)=W and Var(J)=W(1W) When W = 1, then the randomized response becomes non-optional. So, with W = 1 the mean and variance of Z is (3) ER(Z)=(μTW+1W)Y+μSW(3) (4) VR(Z)=(Y2σT2+σS2)W(4)

Let us take a transformation of the randomized response be ŷi whose expectation under the randomization mechanism is the true response yi and is given as (5) ŷi=ZiμSμTW+1W(5) with E(ŷi)=yi and Var(ŷi)=(yi2σT2+σS2)W(μTW+1W)2.

Based on above discussions, we assume that only n1 units provide response on first call and remaining n2=(nn1) units do not respond. Then a sub-sample of ns(=n2f(f>1)) units are taken from non-responding units n2 respectively. A modified version of Hansen and Hurwitz estimator is given by (6) y¯̂=w1y¯1+w2y¯̂2,(6) where y¯1 is the mean of respondents in first phase and y¯̂2=i=1ns(ŷins) is the mean of sub-sampled units in the second phase. Also w1=n1n and w2=n2n.

The mean and variance of y¯̂ is E(y¯̂)=Y¯ and Var(y¯̂)=θσy2+λσy(2)2+Gwhere θ=(NnNn),λ=W2(f1)n,G=W2fn[[(σy(2)2+μy(2)2)σT2+σS2]W(μTW+1W)2],andW2=N2N.

Moreover, let (xi,yi,zi) be the observed values and (XiYiZi) be the true values of the variables XY and Z, respectively. Let u be the measurement error (ME) on Y, v be the measurement error on X and p be the measurement error on Z, respectively. The ME’s on ith observed unit are ui = yiY i, vi = xiXi and pi = ziZi and assumed to be uncorrelated with mean zero and variance σu2σv2 and σp2, respectively.

In the presence of non-response and ME, the variance of y¯̂ is given by Var(y¯̂)=θ(σy2+σu2)+λ(σy(2)2+σp2)+G.

3 Existing mean estimator

By using basic terminologies, as used in Section 2, suppose that population mean and variance of the auxiliary variable X are known and is denoted by μx=1Ni=1Nxi and σx2=1N1i=1N(xiμx)2, respectively.

Let the population mean and variance of the respondent group of size N1 is given by

μx(1)=1N1i=1N1xi and σx(1)2=1N11i=1N1(xiμx(1))2, respectively and the population mean and variance of non-respondent group of size N2is given by μx(2)=1N2i=1N2xi and σx(2)2=1N21i=1N2(xiμx(2))2, respectively. Further, ρxy=σxyσxσy be the correlation coefficient between the auxiliary variable X and sensitive variable Y

Similarly, let ρxy(1)=σxy(1)σxσy and ρxy(2)=ρxy(2)σxσy be the correlation coefficient between auxiliary variable X and the sensitive study variable Y for the respondent group and the non-respondents group, respectively.

Assuming that the population mean μx of X is known and non-response happened on both Y and X. Some of the existing mean estimators of ORRT model are listed below:

  1. A typical mean estimator for sensitive variable in finite population under modified Hansen and Hurwitz (HH) estimator in presence of measurement error is

    μ̂HH=y¯̂=w1y¯1+w2y¯2, (7)

    where y¯2=1nsi=1nszi.

    In presence of measurement error, the MSE of μ̂HH is given by

    MSE(μ̂HH)=θ(σy2+σu2)+λ(σy(2)2+σp2)+G (8)

  2. A ratio estimator corresponding to Gupta et al. (Citation2014) estimator under modified HH in presence of measurement error is given by

    μ̂R=y¯̂x¯μx=R̂Wμx, (9)

    where y¯̂ and x¯ is the ordinary mean estimator under original HH procedure.

    The MSE of μ̂R is given by

    MSE(μ̂R)=θ(σy2+R2σx22Rρyxσyσx)+λ(σy(2)2+R2σx(2)22Rρzx(2)σzσx(2))+θ(σu2+R2σv2)+λ(σp2+R2σv2)+G (10)

    where R=μyμx and ρzx(2)=ρyx(2)(1+[σS2+σT2(σy(2)2+μy(2)2)]Wσy(2)2).

  3. The generalized mean estimator considered in Zhang et al. (Citation2021) but with non-response and measurement error is given as

    μ̂pw=[y¯̂+k(μxx¯)](D¯d¯)ν, (11)

    where d¯=[ϕ(αx¯+β)+(1ϕ)(αμx+β)],D¯=αμx+β,x¯=μx(1+e1) and

    y¯=μy(1+e0). Also, k and ν are suitable chosen constants, ϕ is assumed to be an unknown constant whose value is to be determined from optimality consideration, α and β are assumed to be some known parameters of the auxiliary variable X.

The MSE of μ̂pw is given by (12) MSE(μ̂pw)=θ(σy2+P2σx22Pρyxσyσx)+λ(σy(2)2+P2σx(2)22Pρzx(2)σzσx(2))+θ(σu2+P2σv2)+λ(σp2+P2σv2)+G(12) where P=θρyxσyσx+λρzx(2)σzσx(2)θ(σx2+σv2)+λ(σx(2)+σv2).

Therefore, the MSE of μ̂HH,μ̂R, and μ̂pw without measurement error may be obtained by puting σv2=σu2=σp2=0.

4 Proposed mean estimator

Taking the motivation from Zhang et al. (Citation2021), we propose a generalized mean estimator using ORRT models in the presence of non-response and measurement error simultaneously as (13) τ̂cs={y¯̂+k1(μxx¯)+k2(μxx¯)}(μxx¯)α1(μxx¯)α2,(13) where y¯̂ denotes the mean of the sensitive study variable in the presence of non-response and measurement error, x¯ is the mean of the auxiliary variable in the presence of non-response and measurement error, x¯ is the mean of auxiliary variable, α1 and α2 are suitable chosen constants and k1 and k2 are assumed to be unknown constants whose values are to be optimize.

To find the MSE of the estimator, we define y¯=μy(1+e0),x¯=μx(1+e1)andx¯=μx(1+e1)such that E(e0)=E(e1)=E(e1)=0. E(e02)=1μy2[θ(σy2+σu2)+λ(σy(2)2+σp2)]+2fn[[(σy(2)2+μy(2)2)σT2+σS2]W(μTW+1W)2],E(e12)=1μx2[θ(σx2+σv2)+λ(σx(2)2+σv2)],E(e12)=1μx2[θ(σx2)],E(e0e1)=θρyxσyσxμyμx+λρzx(2)σzσx(2)μzμx,E(e0e1)=1μyμxθρyxσyσx,E(e1e1)=1μx2θσx2. where ρzx(2)=ρyx(2)(1+[σS2+σT2(σy(2)2+μy(2)2)]Wσy(2)2).

The bias of a proposed estimator up to the second order of approximation is given by (14) Bias(τ̂cs)=θ{(A1+A2+A3)μx2σx2+A2μx2σv2ρyxσyσxμx(α1+α2)}+λ{A2μx2(σx(2)2+σv2)α1μxρzx(2)σzσx(2)},whereA1=(Rα2(α2+1)2+k1α2)μx,A2=(Rα1(α1+1)2+k2α1)μx,A3=(k1α1+k2α2+Rα1α2)μx,R=μyμz and R=μyμx.(14)

Without measurement error, the bias of τ̂cs can be obtained by taking σv2=0 in the above equation.

The MSE of the proposed estimator is (15) MSE(τ̂cs)=θ{[(k1+α2R)+(k2+α1R)]2σx2+(k2+α1R)2σv22ρyxσyσx[(k1+k2)+R(α1+α2)]+(σy2+σu2)}+λ{(k2+α1R)2(σx(2)2+σv2)2R(k2+α1R)ρzx(2)σzσx(2)+(σy(2)2+σp2)}+G(15)

Differentiate (15) with respect to k1 and k2 we get the optimum values of k1 and k2 as (16) k1(opt)=Dα2R and k2(opt)=Jα1R,(16) where D=1θσv2+λ(σx(2)2+σv2){ρyxσyσx[θ(σx2+σv2)+λ(σx(2)2+σv2)][θρyxσyσx+Rλρzx(2)σzσx(2)]}and J=Rλρzx(2)σzσx(2)θσv2+λ(σx(2)2+σv2).

Substitute the values of k1 and k2 from (16) in (15) we get the minimum MSE as (17) MSEmin(τ̂cs)=θ{(D+J)2σx2+J2σv22ρyxσyσx(D+J)+(σy2+σu2)}+λ{J2(σx(2)2+σv2)2Jρzx(2)σzσx(2)+(σy(2)2+σp2)}+G(17)

The expression for the minimized MSE of the proposed estimator without ME may be obtained by putting σv2=σu2=σp2=0 in the above expression, we get (18) MSEmin(τ̂cs)=θ{(D+J)2σx22ρyxσyσx(D+J)+σy2}+λ{J2σx(2)22Jρzx(2)σzσx(2)+σy(2)2}+Gwhere D=1λσx(2)2{ρyxσyσx[θσx2+λσx(2)2][θρyxσyσx+Rλρzx(2)σzσx(2)]}and J=Rλρzx(2)σzσx(2)λσx(2)2.(18)

5 Efficiencies comparison

In this section, we compare the MSE of the proposed estimator with respect to the MSE of other existing estimators mentioned in (8, 10, 12), and (17) are given as

  1. MSEmin(τ̂cs)<MSE(μ̂HH) if

    θ{(D+J)2σx2+J2σv22ρyxσyσx(D+J)}+λ{J2(σx(2)2+σv2)2Jρzx(2)σzσx(2)}<0 (19)

  2. MSEmin(τ̂cs)<MSE(μ̂R) if

    θ{[(D+J)2R2]σx2+(J2R2)σv22ρyxσyσx(D+JR)}+λ{(J2R2)(σx(2)2+σv2)2ρzx(2)σzσx(2)(JR) }<0. (20)

  3. MSEmin(τ̂cs)<MSE(μ̂pw) if

    θ{[(D+J)2P2]σx2+(J2P2)σv22ρyxσyσx(D+JP)}+λ{(J2P2)(σx(2)2+σv2)2ρzx(2)σzσx(2)(JP)}<0. (21)

If the above conditions (19)–(21) hold true then the proposed estimator is always more efficient than the other considered estimators.

6 Simulation study

In this study, with the help of simulation study, we compare the performance of the proposed estimator under SRSWOR with the usual unbiased estimator and other two considered estimators.

For simulation study, data set consist of sensitive study variable Y and an auxiliary variable X is generated from a normal distribution using the model

Y=aX+rnorm(N,μy,σy2) where X=rnorm(N,μx,σx2), (μy,μx)=(0,0), a=0.25 and (σy2,σx2) may varies.

An artificial population of size N(5000) from normal distribution and a sample of size n(850) under SRSWOR is taken. It is assumed that only n1(450) units provide response and n2(400) do not respond in the first phase. In the second phase, we take a sub-sample of size ns=n2f(f>1) from the non-respondent n2 units by using f=2,3,4,5, respectively. The simulation study given in and .

Table 1 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Table 2 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Also, the scrambling variable T and S are taken to be normal with mean 1 and 0, respectively and with different variances.

Further, another artificial population is used, we considered by Zhang et al. (Citation2021) for the comparison purpose and to see the performance of the proposed estimator over other considered estimator. We have considered a population of size 5000 generated from a bivariate normal distribution with mean and covariance (Y,X) as mentioned below: μ=[106],Σ=[169.0519.0518]ρyx=0.8μx=6.0228,σx2=8.1830,μy=9.9864,σy2=16.1215,ρyx=0.8024

Taking sample of size n= 500 using SRSWOR and in the first phase we select a sample of size n1(200) and n2(300). We take another sub-sample (ns=n2f where (f>1)) from the non-respondent in the second phase by using f=2,3,4,5. The simulation study based on Zhang et al. (Citation2021) given in .

Table 3 PRE of the proposed estimators with respect to existing estimators for different values of f and W = 0.8 using ORRT models. Also (σv2=σu2=σp2=1,5,10).

Coding for simulation was done in R software. The Percent Relative Efficiency (PRE) of the proposed estimator (τ̂cs) with respect to usual unbiased estimator (μ̂HH) and two considered estimators (μ̂R,μ̂pw) is defined as PRE=(MSE(μ̂HH)MSE(μ̂i))100,

where μ̂i=μ̂HH,μ̂R,μ̂pw and τ̂cs

Also σT2=σS2=0.5.

Also σT2=σS2=1.

From and , we will compare the performance of the proposed estimator with respect to usual unbiased estimator and two considered estimators. In , when σT2=σS2=0.5, we see that our proposed estimator decreases with the increase in f and W, where f = 2 to 5 and W = 0.2 to 0.8. But in , when σT2=σS2=1, we see that our proposed estimator (τ̂cs) performs better than the other considered estimators (μ̂HH,μ̂R,μ̂pw) for different values of f and W except that W = 0.6, in this case PRE of the proposed estimator is less than the considered estimators (μ̂HH,μ̂R,μ̂pw).

It is noted from , that the proposed estimator (τ̂cs) is more efficient than the usual unbiased estimator (μ̂HH), Gupta et al. (Citation2014) ratio estimator (μ̂R) under the setup of Hansen and Hurwitz (μ̂R) and the generalized estimator of Zhang et al. (Citation2021) (μ̂pw) in terms of having higher PRE. Also, the ratio estimator not performing well at (σv2=σu2=σp2=5,10).

For W=0.8, the values of PRE of estimators decreases with the increase in the value of f i.e., f=2 to 5.

7 Conclusion

In this paper, we studied the improved mean estimation of sensitive variable by suggested generalized mean estimator using ORRT model. The properties of the proposed estimator have been studied and the conditions are obtained where the proposed estimator is more efficient than the existing estimators. A simulation study is also supporting the theoretical results except the situation when the probability of sensitive question is moderately high (i.e., W = 0.6), under this situation the Zhang et al. (Citation2021) estimator (μ̂pw) is more efficient. Based on the results obtained, we recommend the use of the suggested estimator by the researchers and practitioners in future.

Acknowledgments

The authors express very sincere gratitude to the reviewers for their constructive suggestions which helped improve the presentation of the paper.

Data availability statement

No real data is used in the paper.

Disclosure statement

The authors declares that they have no conflicts of interest.

References

  • Ahmed S, Shabbir J, Gupta S. 2017. Use of scrambled response model in estimating the finite population mean in presence of non-response when coefficient of variation is known. Commun Stat Theory Methods. 46:8435–8449.
  • Audu A, Singh R, Khare S, Dauran, NS. 2020. Almost unbiased estimators for population mean in the presence of non-response and measurement error. J Stat Manag Syst. 24:573–589.
  • Azeem M. 2014. On estimation of population mean in the presence of measurement error and non-response [Unpublished Ph.D. thesis]. Lahore: National College of Business Administration and Economics.
  • Collins LM, Schafer JL, Kam CM. 2001. A comparison of inclusive and restrictive strategies in modern missing data procedure. Psychol Methods. 6:330–351.
  • Diana G, Riaz S, Shabbir J. 2014. Hansen and Hurwitz estimator with scrambled response on the second call. J Appl Stat. 41:596–611.
  • Gupta S, Aloraini B, Qureshi MN, Khalil S. 2020. Variance estimation using randomized response technique. Revstat Stat. J. 18:165–176.
  • Gupta S, Gupta B, Singh S. 2002. Estimation of sensitivity level of personal interview survey questions. J Stat Plan Inference. 100:239–247.
  • Gupta S, Kalucha G, Shabbir J, Dass, BK. 2014. Estimation of finite population mean using optional RRT models in the presence of non-sensitive auxiliary information. Am J Math Manag Sci. 33:147–159.
  • Gupta S, Mehta S, Shabbir J, Khalil S. 2018. A unified measure of respondent privacy and model efficiency in quantitative RRT models. J Stat Theory Pract. 12:506–511.
  • Hansen MH, Hurwitz WN. 1946. The problem of non-response in sample surveys. J Am Stat Assoc. 41:517–529.
  • Khalil S, Noor-Ul-Amin M, Hanif M. 2018. Estimation of population mean for a sensitive variable in the presence of measurement error. J Stat Manag Syst. 21:81–91.
  • Khalil S, Zhang Q, Gupta S. 2021. Mean estimation of sensitive variables under measurement errors using optional RRT models. Commun Stat Simul Comput. 50:1417–1426.
  • Kumar S, Bhogal S, Nataraja NS, Viswanathaiah M. 2015. Estimation of population mean in the presence of non-response and measurement error. Rev Colomb Estad. 38:145–161.
  • Kumar S, Chowdhary M. 2021. Estimation of population product in the presence of non-response and measurement error in successive sampling. Math Sci Lett. 10:71–83.
  • Kumar S, Kour SP. 2022. The joint influence of estimation of sensitive variable under measurement error and non-response using ORRT models. J Stat Comput Simul. 92:3583–3604.
  • Kumar S, Singh HP, Bhougal S, Gupta R. 2011. A class of ratio-cum-product type estimators under double sampling in the presence of non-response. J Math Stat. 40:589–599.
  • Kumar S, Trehan M, Joorel JPS. 2018. A simulation study: estimation of population mean using two auxiliary variables in stratified random sampling. J Stat Comput Simul. 88:3694–3707.
  • Kumar S. 2016. Improved estimation of population mean in presence of nonresponse and measurement error. J Stat Theory Pract. 10:707–720.
  • Makhdum M, Sanaullah A, Hanif M. 2020. A modified regression-cum-ratio estimator of population mean of a sensitive variable in the presence of non-response in simple random sampling. J Stat Manag Syst. 23:495–510.
  • Singh N, Vishwakarma GK, Kim, JM. 2019. Computing the effect of measurement errors on efficient variant of the product and ratio estimators of mean using auxiliary information. Commun Stat Simul Comput. 51:1–22.
  • Singh N, Vishwakarma, GK. 2019. A generalized class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey. Rev Investig Oper. 40:275–285.
  • Singh SR, Sharma P. 2015. Method of estimation in the presence of non-response and measurement errors simultaneously. J Mod App Stat Meth. 14:107–121.
  • Zhang Q, Khalil S, Gupta S. 2021. Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. J Stat Theory Pract. 15:1–15.