Full article: Improved estimator for the estimation of sensitive variable using ORRT models

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In this study, we concern with the improved estimation of sensitive variable when there is non-response and measurement error on sensitive variable but the auxiliary variable is non sensitive in nature. For the purpose, we propose an improved estimator in the presence of non-response and measurement error using Optional randomized response technique (ORRT) under simple random sampling without replacement (SRSWOR). The properties of the estimator have been studied and the efficiency conditions are obtained in comparison to the mean estimator, ratio estimator and Zhang’s estimator. Simulation study based on hypothetical populations has been carried out to demonstrate the performance of the proposed estimator at its optimum among others. It has been observed that the proposed estimator is more efficient than other considered estimators in term of having higher Percent Relative Efficiency (PRE).

KEYWORDS:

1 Introduction

Survey researchers find difficult to obtain efficient parameters due to the presence of non-response and measurement errors. If the variable of interest is sensitive in nature, one may find it more difficult to collect data from the respondents. Face-to-face interview method is more reliable to collect data but the cost associated to this method is higher in comparison to other methods. Hansen and Hurwitz (Citation1946) suggested a procedure of taking a sub-sample from non-respondents after the first call and collecting information by personal interviews. If the variable of interest is sensitive in nature, then the respondent may not provide honest answers in face-to-face interview. To reduce the bias caused by sensitive questions, one could use Randomized response technique (RRT) models. Diana et al. (Citation2014), Ahmed et al. (Citation2017) and Makhdum et al. (Citation2020) proposed estimators for a sensitive variable in the presence of non-response using RRT models. According to Collins et al. (Citation2001), the use of auxiliary variables when combined with the variable under study help to achieve more efficient estimators. Gupta et al. (Citation2020) studied the estimation of variance of a sensitive study variable using a highly correlated but non-sensitive auxiliary variable.

Another important cornerstone of non-sampling error is measurement error. Kumar et al. (Citation2011), Khalil et al. (Citation2021), and Singh et al. (Citation2019) proposed different estimators for estimating population parameters in presence of measurement error. Azeem (Citation2014), Kumar et al. (Citation2015), Kumar et al. (Citation2018), Kumar (Citation2016), Singh and Sharma (Citation2015), Audu et al. (Citation2020), Singh and Vishwakarma (Citation2019), Kumar and Chowdhary (Citation2021), studied the problem of mean estimation in the presence of non-response and measurement error simultaneously.

Further, Khalil et al. (Citation2018) pioneered the estimation procedure for a sensitive variable in the presence of measurement error by using optional and non-optional RRT models.

On the basis of previous studies, a researcher may think about estimating the population mean of a sensitive variable in the presence of both measurement error and non-response. This issue has received little consideration in the existing literature. The RRT models utilized in earlier studies (Ahmed et al. Citation2017; Diana et al. Citation2014; Makhdum et al. Citation2020) are non-optional RRT models in which all respondents need to give a scrambled response. A survey question, on the other hand, may be sensitive for one person but not for another. According to Gupta et al. (Citation2002), if we give respondents the option of answering the sensitive question directly or providing a scrambled response, the model will be more efficient while causing no further loss of privacy Gupta et al. (Citation2018).

2 Sampling procedure

Let $U (= U_{1} U_{2} \dots U_{N})$ be a finite population of size N units and a sample of size n is taken from U by using simple random sampling without replacement (SRSWOR). Let Y be a sensitive study variable which cannot be observed directly and $X$ be a non sensitive auxiliary variable correlated with Y, both having unknown mean and variance i.e.,

$(μ_{x,} μ_{y})$ and ( $σ_{x,}^{2} σ_{y}^{2})$ , respectively. Suppose T and S be two scrambling variable(s) with mean $(μ_{T} = 1, μ_{S} = 0)$ and variance ( $σ_{T,}^{2} σ_{S}^{2})$ , respectively. Let W be the probability that respondent find the question sensitive. If the respondents consider the question sensitive then he/she is asked to report a scrambled response and else a correct response is reported/recorded.

Further, to collect sensitive information from the respondents, the researchers find difficulty due to the occurrence of non-response. If the variable of interest is sensitive in nature, then to tackle with non-response, Hansen and Hurwitz (Citation1946) technique has been modified by Zhang et al. (Citation2021), Kumar and Kour (Citation2022). In this technique, the respondent gives direct answer in first phase then ORRT model is used to get answer from a sub-group of non-respondents in the second phase.

Therefore, ORRT model in the second phase is (1) $Z = {\begin{matrix} Y with probability (1 - W) \\ TY + S with probability W \end{matrix}$ (1) with mean $E (Z) = E (Y)$ and variance $Var (Z) = σ_{y}^{2} + σ_{S}^{2} W + σ_{T}^{2} (σ_{y}^{2} + μ_{y}^{2}) W$ . The RRT model is (2) $Z = (TY + S) J + Y (1 - J)$ (2) where $J \sim Bernoulli (W)$ with $E (J) = W$ and $Var (J) = W (1 - W)$ When W = 1, then the randomized response becomes non-optional. So, with W = 1 the mean and variance of Z is (3) $E_{R} (Z) = (μ_{T} W + 1 - W) Y + μ_{S} W$ (3) (4) $V_{R} (Z) = (Y^{2} σ_{T}^{2} + σ_{S}^{2}) W$ (4)

Let us take a transformation of the randomized response be ${\hat{y}}_{i}$ whose expectation under the randomization mechanism is the true response y_i and is given as (5) ${\hat{y}}_{i} = \frac{Z_{i} - μ_{S}}{μ_{T} W + 1 - W}$ (5) with $E ({\hat{y}}_{i}) = y_{i}$ and $Var ({\hat{y}}_{i}) = \frac{(y_{i}^{2} σ_{T}^{2} + σ_{S}^{2}) W}{{(μ_{T} W + 1 - W)}^{2}}$ .

Based on above discussions, we assume that only n₁ units provide response on first call and remaining $n_{2} = (n - n_{1})$ units do not respond. Then a sub-sample of $n_{s} (= \frac{n_{2}}{f} (f > 1))$ units are taken from non-responding units n₂ respectively. A modified version of Hansen and Hurwitz estimator is given by (6) $\hat{\bar{y}} = w_{1} {\bar{y}}_{1} + w_{2} {\hat{\bar{y}}}_{2},$ (6) where ${\bar{y}}_{1}$ is the mean of respondents in first phase and ${\hat{\bar{y}}}_{2} = \sum_{i = 1}^{n_{s}} (\frac{{\hat{y}}_{i}}{n_{s}})$ is the mean of sub-sampled units in the second phase. Also $w_{1} = \frac{n_{1}}{n}$ and $w_{2} = \frac{n_{2}}{n}$ .

The mean and variance of $\hat{\bar{y}}$ is $\begin{matrix} E (\hat{\bar{y}}) = \bar{Y} and Var (\hat{\bar{y}}) = θ σ_{y}^{2} + λ σ_{y_{(2)}}^{2} + G \\ where θ = (\frac{N - n}{Nn}), λ = \frac{W_{2} (f - 1)}{n}, \\ G = \frac{W_{2} f}{n} [\frac{[(σ_{y_{(2)}}^{2} + μ_{y_{(2)}}^{2}) σ_{T}^{2} + σ_{S}^{2}] W}{{(μ_{T} W + 1 - W)}^{2}}], and \\ W_{2} = \frac{N_{2}}{N} . \end{matrix}$

Moreover, let $(x_{i}, y_{i}, z_{i})$ be the observed values and $(X_{i} Y_{i} Z_{i})$ be the true values of the variables $XY$ and Z, respectively. Let u be the measurement error (ME) on Y, v be the measurement error on X and p be the measurement error on Z, respectively. The ME’s on i^th observed unit are u_i = y_i – Y _i, v_i = x_i – X_i and p_i = z_i – Z_i and assumed to be uncorrelated with mean zero and variance $σ_{u}^{2} σ_{v}^{2}$ and $σ_{p}^{2}$ , respectively.

In the presence of non-response and ME, the variance of ${\hat{\bar{y}}}^{*}$ is given by $Var ({\hat{\bar{y}}}^{*}) = θ (σ_{y}^{2} + σ_{u}^{2}) + λ (σ_{y_{(2)}}^{2} + σ_{p}^{2}) + G .$

3 Existing mean estimator

By using basic terminologies, as used in Section 2, suppose that population mean and variance of the auxiliary variable X are known and is denoted by $μ_{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ and $σ_{x}^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - μ_{x})}^{2}$ , respectively.

Let the population mean and variance of the respondent group of size N₁ is given by

$μ_{x_{(1)}} = \frac{1}{N_{1}} \sum_{i =1}^{N_{1}} x_{i}$ and $σ_{x_{(1)}}^{2} = \frac{1}{N_{1} - 1} \sum_{i =1}^{N_{1}} {(x_{i} - μ_{x_{(1)}})}^{2}$ , respectively and the population mean and variance of non-respondent group of size N₂is given by $μ_{x_{(2)}} = \frac{1}{N_{2}} \sum_{i =1}^{N_{2}} x_{i}$ and $σ_{x_{(2)}}^{2} = \frac{1}{N_{2} - 1} \sum_{i =1}^{N_{2}} {(x_{i} - μ_{x_{(2)}})}^{2}$ , respectively. Further, $ρ_{xy} = \frac{σ_{xy}}{σ_{x} σ_{y}}$ be the correlation coefficient between the auxiliary variable X and sensitive variable Y

Similarly, let $ρ_{x y_{(1)}} = \frac{σ_{x y_{(1)}}}{σ_{x} σ_{y}}$ and $ρ_{x y_{(2)}} = \frac{ρ_{x y_{(2)}}}{σ_{x} σ_{y}}$ be the correlation coefficient between auxiliary variable X and the sensitive study variable $Y$ for the respondent group and the non-respondents group, respectively.

Assuming that the population mean μ_x of X is known and non-response happened on both Y and X. Some of the existing mean estimators of ORRT model are listed below:

A typical mean estimator for sensitive variable in finite population under modified Hansen and Hurwitz (HH) estimator in presence of measurement error is
${\hat{μ}}_{HH} = {\hat{\bar{y}}}^{*} = w_{1} {\bar{y}}_{1} + w_{2} {\bar{y}}_{2}^{*},$ (7)
where ${\bar{y}}_{2}^{*} = \frac{1}{n_{s}} \sum_{i =1}^{n_{s}} z_{i}$ .
In presence of measurement error, the MSE of ${\hat{μ}}_{HH}$ is given by
$MSE ({\hat{μ}}_{HH}) = θ (σ_{y}^{2} + σ_{u}^{2}) + λ (σ_{y_{(2)}}^{2} + σ_{p}^{2}) + G$ (8)
A ratio estimator corresponding to Gupta et al. (Citation2014) estimator under modified HH in presence of measurement error is given by
${\hat{μ}}_{R} = \frac{{\hat{\bar{y}}}^{*}}{{\bar{x}}^{*}} μ_{x} = {\hat{R}}_{W}^{*} μ_{x},$ (9)
where ${\hat{\bar{y}}}^{*}$ and ${\bar{x}}^{*}$ is the ordinary mean estimator under original HH procedure.
The MSE of ${\hat{μ}}_{R}$ is given by
$\begin{matrix} MSE ({\hat{μ}}_{R}) = θ (σ_{y}^{2} + R^{2} σ_{x}^{2} - 2 R ρ_{yx} σ_{y} σ_{x}) \\ + λ (σ_{y_{(2)}}^{2} + R^{2} σ_{x_{(2)}}^{2} - 2 R ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}) \\ + θ (σ_{u}^{2} + R^{2} σ_{v}^{2}) + λ (σ_{p}^{2} + R^{2} σ_{v}^{2}) + G \end{matrix}$ (10)
where $R = \frac{μ_{y}}{μ_{x}}$ and $ρ_{z x_{(2)}} = \frac{ρ_{y x_{(2)}}}{\sqrt{(1 + \frac{[σ_{S}^{2} + σ_{T}^{2} (σ_{y_{(2)}}^{2} + μ_{y_{(2)}}^{2})] W}{σ_{y_{(2)}}^{2}})}}$ .
The generalized mean estimator considered in Zhang et al. (Citation2021) but with non-response and measurement error is given as
${\hat{μ}}_{pw} = [{\hat{\bar{y}}}^{*} + k (μ_{x} - {\bar{x}}^{*})] {(\frac{\bar{D}}{\bar{d}})}^{ν},$ (11)
where $\bar{d} = [ϕ (α {\bar{x}}^{*} + β) + (1 - ϕ) (α μ_{x} + β)], \bar{D} = α μ_{x} + β, {\bar{x}}^{*} = μ_{x} (1 + e_{1}^{*})$ and
${\bar{y}}^{*} = μ_{y} (1 + e_{0}^{*}) .$ Also, k and ν are suitable chosen constants, $ϕ$ is assumed to be an unknown constant whose value is to be determined from optimality consideration, α and β are assumed to be some known parameters of the auxiliary variable X.

The MSE of ${\hat{μ}}_{pw}$ is given by (12) $\begin{matrix} MSE ({\hat{μ}}_{pw}) = θ (σ_{y}^{2} + P^{2} σ_{x}^{2} - 2 P ρ_{yx} σ_{y} σ_{x}) \\ + λ (σ_{y_{(2)}}^{2} + P^{2} σ_{x_{(2)}}^{2} - 2 P ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}) \\ + θ (σ_{u}^{2} + P^{2} σ_{v}^{2}) + λ (σ_{p}^{2} + P^{2} σ_{v}^{2}) + G \end{matrix}$ (12) where $P = \frac{θ ρ_{yx} σ_{y} σ_{x} + λ ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}}{θ (σ_{x}^{2} + σ_{v}^{2}) + λ (σ_{x_{(2)}} + σ_{v}^{2})}$ .

Therefore, the MSE of ${\hat{μ}}_{HH}, {\hat{μ}}_{R}$ , and ${\hat{μ}}_{pw}$ without measurement error may be obtained by puting $σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 0$ .

4 Proposed mean estimator

Taking the motivation from Zhang et al. (Citation2021), we propose a generalized mean estimator using ORRT models in the presence of non-response and measurement error simultaneously as (13) ${\hat{τ}}_{cs} = {{\hat{\bar{y}}}^{*} + k_{1} (μ_{x} - \bar{x}) + k_{2} (μ_{x} - {\bar{x}}^{*})} {(\frac{μ_{x}}{{\bar{x}}^{*}})}^{α_{1}} {(\frac{μ_{x}}{\bar{x}})}^{α_{2}},$ (13) where ${\hat{\bar{y}}}^{*}$ denotes the mean of the sensitive study variable in the presence of non-response and measurement error, ${\bar{x}}^{*}$ is the mean of the auxiliary variable in the presence of non-response and measurement error, $\bar{x}$ is the mean of auxiliary variable, α₁ and α₂ are suitable chosen constants and k₁ and $k_{2}$ are assumed to be unknown constants whose values are to be optimize.

To find the MSE of the estimator, we define ${\bar{y}}^{*} = μ_{y} (1 + e_{0}^{*}), {\bar{x}}^{*} = μ_{x} (1 + e_{1}^{*}) and \bar{x} = μ_{x} (1 + e_{1})$ such that $E (e_{0}^{*}) = E (e_{1}^{*}) = E (e_{1}) =0$ . $\begin{matrix} E (e_{0}^{*^{2}}) = \frac{1}{μ_{y}^{2}} [θ (σ_{y}^{2} + σ_{u}^{2}) + λ (σ_{y_{(2)}}^{2} + σ_{p}^{2})] \\ + \frac{_{2} f}{n} [\frac{[(σ_{y_{(2)}}^{2} + μ_{y_{(2)}}^{2}) σ_{T}^{2} + σ_{S}^{2}] W}{{(μ_{T} W + 1 - W)}^{2}}], \\ E (e_{1}^{*^{2}}) = \frac{1}{μ_{x}^{2}} [θ (σ_{x}^{2} + σ_{v}^{2}) + λ (σ_{x_{(2)}}^{2} + σ_{v}^{2})], \\ E (e_{1}^{2}) = \frac{1}{μ_{x}^{2}} [θ (σ_{x}^{2})], E (e_{0}^{*} e_{1}^{*}) = θ ρ_{yx} \frac{σ_{y} σ_{x}}{μ_{y} μ_{x}} + λ ρ_{z x_{(2)}} \frac{σ_{z} σ_{x_{(2)}}}{μ_{z} μ_{x}}, \\ E (e_{0}^{*} e_{1}) = \frac{1}{μ_{y} μ_{x}} θ ρ_{yx} σ_{y} σ_{x}, E (e_{1}^{*} e_{1}) = \frac{1}{μ_{x}^{2}} θ σ_{x}^{2} . \end{matrix}$ where $ρ_{z x_{(2)}} = \frac{ρ_{y x_{(2)}}}{\sqrt{(1 + \frac{[σ_{S}^{2} + σ_{T}^{2} (σ_{y_{(2)}}^{2} + μ_{y_{(2)}}^{2})] W}{σ_{y_{(2)}}^{2}})}}$ .

The bias of a proposed estimator up to the second order of approximation is given by (14) $\begin{matrix} Bias ({\hat{τ}}_{cs}) \\ = θ {\frac{(A_{1} + A_{2} + A_{3})}{μ_{x}^{2}} σ_{x}^{2} + \frac{A_{2}}{μ_{x}^{2}} σ_{v}^{2} - ρ_{yx} \frac{σ_{y} σ_{x}}{μ_{x}} (α_{1} + α_{2})} \\ + λ {\frac{A_{2}}{μ_{x}^{2}} (σ_{x_{(2)}}^{2} + σ_{v}^{2}) - \frac{α_{1}}{μ_{x}} ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}}, \\ where A_{1} = (R \frac{α_{2} (α_{2} +1)}{2} + k_{1} α_{2}) μ_{x}, A_{2} \\ = (R \frac{α_{1} (α_{1} +1)}{2} + k_{2} α_{1}) μ_{x}, \\ A_{3} = (k_{1} α_{1} + k_{2} α_{2} + R α_{1} α_{2}) μ_{x}, R^{'} = \frac{μ_{y}}{μ_{z}} and R = \frac{μ_{y}}{μ_{x}} . \end{matrix}$ (14)

Without measurement error, the bias of ${\hat{τ}}_{cs}$ can be obtained by taking $σ_{v}^{2} = 0$ in the above equation.

The MSE of the proposed estimator is (15) $\begin{matrix} MSE ({\hat{τ}}_{cs}) = θ {{[(k_{1} + α_{2} R) + (k_{2} + α_{1} R)]}^{2} σ_{x}^{2} + {(k_{2} + α_{1} R)}^{2} \\ σ_{v}^{2} - 2 ρ_{yx} σ_{y} σ_{x} [(k_{1} + k_{2}) + R (α_{1} + α_{2})] \\ + (σ_{y}^{2} + σ_{u}^{2})} \\ + λ {{(k_{2} + α_{1} R)}^{2} (σ_{x_{(2)}}^{2} + σ_{v}^{2}) \\ - 2 R^{'} (k_{2} + α_{1} R) ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}} + (σ_{y_{(2)}}^{2} + σ_{p}^{2})} + G \end{matrix}$ (15)

Differentiate (15) with respect to k₁ and k₂ we get the optimum values of k₁ and $k_{2}$ as (16) $k_{1_{(opt)}} = D - α_{2} R and k_{2_{(opt)}} = J - α_{1} R,$ (16) $\begin{matrix} where D = \frac{1}{θ σ_{v}^{2} + λ (σ_{x_{(2)}}^{2} + σ_{v}^{2})} {ρ_{yx} \frac{σ_{y}}{σ_{x}} [θ (σ_{x}^{2} + σ_{v}^{2}) \\ + λ (σ_{x_{(2)}}^{2} + σ_{v}^{2})] - [θ ρ_{yx} σ_{y} σ_{x} + R^{'} λ ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}]} \\ and J = \frac{R^{'} λ ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}}{θ σ_{v}^{2} + λ (σ_{x_{(2)}}^{2} + σ_{v}^{2})} . \end{matrix}$

Substitute the values of k₁ and $k_{2}$ from (16) in (15) we get the minimum MSE as (17) $\begin{matrix} {MSE}_{min} ({\hat{τ}}_{cs}) = θ {{(D + J)}^{2} σ_{x}^{2} + J^{2} σ_{v}^{2} - 2 ρ_{yx} σ_{y} σ_{x} (D + J) \\ + (σ_{y}^{2} + σ_{u}^{2})} + λ {J^{2} (σ_{x_{(2)}}^{2} + σ_{v}^{2}) \\ - 2 J ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}} + (σ_{y_{(2)}}^{2} + σ_{p}^{2})} + G \end{matrix}$ (17)

The expression for the minimized MSE of the proposed estimator without ME may be obtained by putting $σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 0$ in the above expression, we get (18) $\begin{matrix} {MSE}_{min} ({\hat{τ}}_{cs}) = θ {{(D^{*} + J^{*})}^{2} σ_{x}^{2} - 2 ρ_{yx} σ_{y} σ_{x} (D^{*} + J^{*}) + σ_{y}^{2}} \\ + λ {J^{*^{2}} σ_{x_{(2)}}^{2} - 2 J^{*} ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}} + σ_{y_{(2)}}^{2}} + G \\ where D^{*} = \frac{1}{λ σ_{x_{(2)}}^{2}} {ρ_{yx} \frac{σ_{y}}{σ_{x}} [θ σ_{x}^{2} + λ σ_{x_{(2)}}^{2}] \\ - [θ ρ_{yx} σ_{y} σ_{x} + R^{'} λ ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}]} \\ and J^{*} = \frac{R^{'} λ ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}}{λ σ_{x_{(2)}}^{2}} . \end{matrix}$ (18)

5 Efficiencies comparison

In this section, we compare the MSE of the proposed estimator with respect to the MSE of other existing estimators mentioned in (8, 10, 12), and (17) are given as

${MSE}_{min} ({\hat{τ}}_{cs}) < MSE ({\hat{μ}}_{HH})$ if
$\begin{matrix} θ {{(D + J)}^{2} σ_{x}^{2} + J^{2} σ_{v}^{2} - 2 ρ_{yx} σ_{y} σ_{x} (D + J)} \\ + λ {J^{2} (σ_{x_{(2)}}^{2} + σ_{v}^{2}) - 2 J ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}}} < 0 \end{matrix}$ (19)
${MSE}_{min} ({\hat{τ}}_{cs}) < MSE ({\hat{μ}}_{R})$ if
$\begin{matrix} θ {[{(D + J)}^{2} - R^{2}] σ_{x}^{2} + (J^{2} - R^{2}) σ_{v}^{2} \\ - 2 ρ_{yx} σ_{y} σ_{x} (D + J - R)} \\ + λ {(J^{2} - R^{2}) (σ_{x_{(2)}}^{2} + σ_{v}^{2}) \\ - 2 ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}} (J - R)} < 0 . \end{matrix}$ (20)
${MSE}_{min} ({\hat{τ}}_{cs}) < MSE ({\hat{μ}}_{pw})$ if
$\begin{matrix} θ {[{(D + J)}^{2} - P^{2}] σ_{x}^{2} + (J^{2} - P^{2}) σ_{v}^{2} \\ - 2 ρ_{yx} σ_{y} σ_{x} (D + J - P)} \\ + λ {(J^{2} - P^{2}) (σ_{x_{(2)}}^{2} + σ_{v}^{2}) - 2 ρ_{z x_{(2)}} σ_{z} σ_{x_{(2)}} (J - P)} < 0 . \end{matrix}$ (21)

If the above conditions (19)–(21) hold true then the proposed estimator is always more efficient than the other considered estimators.

6 Simulation study

In this study, with the help of simulation study, we compare the performance of the proposed estimator under SRSWOR with the usual unbiased estimator and other two considered estimators.

For simulation study, data set consist of sensitive study variable Y and an auxiliary variable X is generated from a normal distribution using the model

$Y = aX + rnorm (N, μ_{y}, σ_{y}^{2})$ where $X = rnorm (N, μ_{x}, σ_{x}^{2})$ , $(μ_{y}, μ_{x}) = (0, 0)$ , $a =0 . 25$ and $(σ_{y}^{2}, σ_{x}^{2})$ may varies.

An artificial population of size N(5000) from normal distribution and a sample of size n(850) under SRSWOR is taken. It is assumed that only $n_{1} (450)$ units provide response and $n_{2} (400)$ do not respond in the first phase. In the second phase, we take a sub-sample of size $n_{s} = \frac{n_{2}}{f} (f > 1)$ from the non-respondent n₂ units by using $f = 2, 3, 4, 5$ , respectively. The simulation study given in and .

Table 1 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Display Table

Table 2 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Display Table

Also, the scrambling variable $T$ and S are taken to be normal with mean 1 and 0, respectively and with different variances.

Further, another artificial population is used, we considered by Zhang et al. (Citation2021) for the comparison purpose and to see the performance of the proposed estimator over other considered estimator. We have considered a population of size 5000 generated from a bivariate normal distribution with mean and covariance $(Y, X)$ as mentioned below: $\begin{matrix} μ = [\begin{matrix} 10 \\ 6 \end{matrix}], Σ = [\begin{matrix} 16 & 9.051 \\ 9.051 & 8 \end{matrix}] ρ_{yx} =0 . 8 \\ μ_{x} =6 . 0228, σ_{x}^{2} =8 . 1830, μ_{y} =9 . 9864, σ_{y}^{2} =16 . 1215, ρ_{yx} =0 . 8024 \end{matrix}$

Taking sample of size $n =$ 500 using SRSWOR and in the first phase we select a sample of size $n_{1} (200)$ and $n_{2} (300)$ . We take another sub-sample $(n_{s} = \frac{n_{2}}{f} where (f > 1))$ from the non-respondent in the second phase by using $f = 2, 3, 4, 5.$ The simulation study based on Zhang et al. (Citation2021) given in .

Table 3 PRE of the proposed estimators with respect to existing estimators for different values of f and W = 0.8 using ORRT models. Also $(σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 1, 5, 10)$ .

Display Table

Coding for simulation was done in R software. The Percent Relative Efficiency (PRE) of the proposed estimator $({\hat{τ}}_{cs})$ with respect to usual unbiased estimator $({\hat{μ}}_{HH})$ and two considered estimators $({\hat{μ}}_{R}, {\hat{μ}}_{pw})$ is defined as $PRE = (\frac{{MSE}^{*} ({\hat{μ}}_{HH})}{{MSE}^{*} ({\hat{μ}}_{i})}) * 100,$

where ${\hat{μ}}_{i} = {\hat{μ}}_{HH}, {\hat{μ}}_{R}, {\hat{μ}}_{pw}$ and ${\hat{τ}}_{cs}$

Also $σ_{T}^{2} = σ_{S}^{2} =0 . 5$ .

Also $σ_{T}^{2} = σ_{S}^{2} =1$ .

From and , we will compare the performance of the proposed estimator with respect to usual unbiased estimator and two considered estimators. In , when $σ_{T}^{2} = σ_{S}^{2} = 0.5$ , we see that our proposed estimator decreases with the increase in f and W, where f = 2 to 5 and W = 0.2 to 0.8. But in , when $σ_{T}^{2} = σ_{S}^{2} = 1$ , we see that our proposed estimator $({\hat{τ}}_{cs})$ performs better than the other considered estimators ( ${\hat{μ}}_{HH}, {\hat{μ}}_{R}, {\hat{μ}}_{pw})$ for different values of $f$ and W except that W = 0.6, in this case PRE of the proposed estimator is less than the considered estimators ( ${\hat{μ}}_{HH}, {\hat{μ}}_{R}, {\hat{μ}}_{pw})$ .

It is noted from , that the proposed estimator ( ${\hat{τ}}_{cs})$ is more efficient than the usual unbiased estimator $({\hat{μ}}_{HH}),$ Gupta et al. (Citation2014) ratio estimator $({\hat{μ}}_{R})$ under the setup of Hansen and Hurwitz $({\hat{μ}}_{R})$ and the generalized estimator of Zhang et al. (Citation2021) ( ${\hat{μ}}_{pw})$ in terms of having higher PRE. Also, the ratio estimator not performing well at $(σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 5, 10)$ .

For $W = 0.8$ , the values of PRE of estimators decreases with the increase in the value of f i.e., $f = 2$ to 5.

7 Conclusion

In this paper, we studied the improved mean estimation of sensitive variable by suggested generalized mean estimator using ORRT model. The properties of the proposed estimator have been studied and the conditions are obtained where the proposed estimator is more efficient than the existing estimators. A simulation study is also supporting the theoretical results except the situation when the probability of sensitive question is moderately high (i.e., W = 0.6), under this situation the Zhang et al. (Citation2021) estimator ( ${\hat{μ}}_{pw}$ ) is more efficient. Based on the results obtained, we recommend the use of the suggested estimator by the researchers and practitioners in future.

Acknowledgments

The authors express very sincere gratitude to the reviewers for their constructive suggestions which helped improve the presentation of the paper.

Data availability statement

No real data is used in the paper.

Disclosure statement

The authors declares that they have no conflicts of interest.

References

Ahmed S, Shabbir J, Gupta S. 2017. Use of scrambled response model in estimating the finite population mean in presence of non-response when coefficient of variation is known. Commun Stat Theory Methods. 46:8435–8449.
Web of Science ®Google Scholar
Audu A, Singh R, Khare S, Dauran, NS. 2020. Almost unbiased estimators for population mean in the presence of non-response and measurement error. J Stat Manag Syst. 24:573–589.
Web of Science ®Google Scholar
Azeem M. 2014. On estimation of population mean in the presence of measurement error and non-response [Unpublished Ph.D. thesis]. Lahore: National College of Business Administration and Economics.
Google Scholar
Collins LM, Schafer JL, Kam CM. 2001. A comparison of inclusive and restrictive strategies in modern missing data procedure. Psychol Methods. 6:330–351.
PubMed Web of Science ®Google Scholar
Diana G, Riaz S, Shabbir J. 2014. Hansen and Hurwitz estimator with scrambled response on the second call. J Appl Stat. 41:596–611.
Web of Science ®Google Scholar
Gupta S, Aloraini B, Qureshi MN, Khalil S. 2020. Variance estimation using randomized response technique. Revstat Stat. J. 18:165–176.
Web of Science ®Google Scholar
Gupta S, Gupta B, Singh S. 2002. Estimation of sensitivity level of personal interview survey questions. J Stat Plan Inference. 100:239–247.
Web of Science ®Google Scholar
Gupta S, Kalucha G, Shabbir J, Dass, BK. 2014. Estimation of finite population mean using optional RRT models in the presence of non-sensitive auxiliary information. Am J Math Manag Sci. 33:147–159.
Google Scholar
Gupta S, Mehta S, Shabbir J, Khalil S. 2018. A unified measure of respondent privacy and model efficiency in quantitative RRT models. J Stat Theory Pract. 12:506–511.
Web of Science ®Google Scholar
Hansen MH, Hurwitz WN. 1946. The problem of non-response in sample surveys. J Am Stat Assoc. 41:517–529.
PubMed Web of Science ®Google Scholar
Khalil S, Noor-Ul-Amin M, Hanif M. 2018. Estimation of population mean for a sensitive variable in the presence of measurement error. J Stat Manag Syst. 21:81–91.
Web of Science ®Google Scholar
Khalil S, Zhang Q, Gupta S. 2021. Mean estimation of sensitive variables under measurement errors using optional RRT models. Commun Stat Simul Comput. 50:1417–1426.
Web of Science ®Google Scholar
Kumar S, Bhogal S, Nataraja NS, Viswanathaiah M. 2015. Estimation of population mean in the presence of non-response and measurement error. Rev Colomb Estad. 38:145–161.
Google Scholar
Kumar S, Chowdhary M. 2021. Estimation of population product in the presence of non-response and measurement error in successive sampling. Math Sci Lett. 10:71–83.
Google Scholar
Kumar S, Kour SP. 2022. The joint influence of estimation of sensitive variable under measurement error and non-response using ORRT models. J Stat Comput Simul. 92:3583–3604.
Web of Science ®Google Scholar
Kumar S, Singh HP, Bhougal S, Gupta R. 2011. A class of ratio-cum-product type estimators under double sampling in the presence of non-response. J Math Stat. 40:589–599.
Google Scholar
Kumar S, Trehan M, Joorel JPS. 2018. A simulation study: estimation of population mean using two auxiliary variables in stratified random sampling. J Stat Comput Simul. 88:3694–3707.
Web of Science ®Google Scholar
Kumar S. 2016. Improved estimation of population mean in presence of nonresponse and measurement error. J Stat Theory Pract. 10:707–720.
Web of Science ®Google Scholar
Makhdum M, Sanaullah A, Hanif M. 2020. A modified regression-cum-ratio estimator of population mean of a sensitive variable in the presence of non-response in simple random sampling. J Stat Manag Syst. 23:495–510.
Google Scholar
Singh N, Vishwakarma GK, Kim, JM. 2019. Computing the effect of measurement errors on efficient variant of the product and ratio estimators of mean using auxiliary information. Commun Stat Simul Comput. 51:1–22.
Google Scholar
Singh N, Vishwakarma, GK. 2019. A generalized class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey. Rev Investig Oper. 40:275–285.
Google Scholar
Singh SR, Sharma P. 2015. Method of estimation in the presence of non-response and measurement errors simultaneously. J Mod App Stat Meth. 14:107–121.
Web of Science ®Google Scholar
Zhang Q, Khalil S, Gupta S. 2021. Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. J Stat Theory Pract. 15:1–15.
Web of Science ®Google Scholar

Improved estimator for the estimation of sensitive variable using ORRT models

Abstract

1 Introduction

2 Sampling procedure

3 Existing mean estimator

4 Proposed mean estimator

5 Efficiencies comparison

6 Simulation study

Table 1 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Table 2 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Table 3 PRE of the proposed estimators with respect to existing estimators for different values of f and W = 0.8 using ORRT models. Also $(σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 1, 5, 10)$ .

7 Conclusion

Acknowledgments

Data availability statement

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Improved estimator for the estimation of sensitive variable using ORRT models

Abstract

1 Introduction

2 Sampling procedure

3 Existing mean estimator

4 Proposed mean estimator

5 Efficiencies comparison

6 Simulation study

Table 1 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Table 2 PRE of the proposed estimators with respect to existing estimators for different values of f and W using ORRT models.

Table 3 PRE of the proposed estimators with respect to existing estimators for different values of f and W = 0.8 using ORRT models. Also (σv2=σu2=σp2=1,5,10).

7 Conclusion

Acknowledgments

Data availability statement

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 3 PRE of the proposed estimators with respect to existing estimators for different values of f and W = 0.8 using ORRT models. Also $(σ_{v}^{2} = σ_{u}^{2} = σ_{p}^{2} = 1, 5, 10)$ .