98
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Data augmentation based methods for estimating the parameters of the Feller-Pareto distribution: theory and applications

ORCID Icon
Article: 2318387 | Received 26 Sep 2023, Accepted 07 Feb 2024, Published online: 04 Mar 2024

Abstract

In income and wealth data modeling Pareto distribution and its several variants play an important role. Both univariate and multivariate variations of this model have been extensively used as a suitable model for various non-negative socio-economic variables. In this article, we consider the most general Feller-Pareto (FP, in short) distribution, which subsumes all four types of Pareto distributions and show that it can be represented as a mixture of a conditional generalized gamma and an unconditional gamma distribution. Using this strategy, we consider a data augmentation based method (under the envelope of Bayesian paradigm) to estimate the parameters of the FP distribution. This mixture representation allows us to easily derive conditional Jeffery’s type non informative priors. For illustrative purposes, one data set is considered to exhibit the utility of the proposed method.

1 Introduction

Applications of several Pareto models and its various generalizations in modeling socio-economic phenomena are well established in literature. For an excellent survey on several Pareto models along with its stochastic properties see Arnold (Citation2015) and the references cited therein. In the hierarchy of several Pareto models, Pareto (Type IV) is the most general which subsumes three other models. Feller (Citation1971) came up with a different representation for the Pareto (Type IV) model, expressing it as a ratio of two independent gamma variables, a distribution alternatively known as Beta distribution of the second kind. According to Feller, if YiΓ(δi,1), i=1,2, are independent random variables, if for μR, σ>0, γ>0, we define W=μ+σ(Y2Y1)γ, then W has a Feller-Pareto (FP, henceforth, in short) distribution, and we write WFP(μ,σ,δ1,δ2,γ). The corresponding density can be easily obtained as (1.1) g(w)=(wμσ)δ1/γ1γΓ(δ1)Γ(δ2)Γ(δ1+δ2)(1+(wμσ)1/γ)δ1+δ2I(w>μ).(1.1)

There is another construction of the FP distribution as defined in the next. Let Y1 and Y2 be two independent radon variables having gamma distribution with scale parameter σ and shape parameters δ1,δ2. Then, X=μ+γ(Y2Y1)γ has FP (μ,σ,δ1,δ2,γ). Indeed Arnold (Citation2015) has shown that the FP distribution is a generalization of the Pareto (IV) distribution.

The FP family appears to be an unimodal distribution which subsumes a variety of continuous probability models as special cases. For example, it includes Pareto (type I), Pareto (type II), Pareto (type III), and Pareto (type IV) which are identified as special cases by appropriately selecting model parameters in EquationEq. (1.1) given below:

  • Pareto (I) (σ,α)=FP (σ,σ,1,α,1),

  • Pareto (II) (μ,σ,α)=FP (μ,σ,1,α,1),

  • Pareto(III) (μ,σ,γ)=FP (μ,σ,γ,1,1),

  • Pareto(IV) (μ,σ,γ,α)=FP (μ,σ,γ,α,1),

Another special case of the FP distribution is the transformed beta family which includes several well-known probability models such as Burr, Generalized Pareto, and Inverse Burr among others. Noticeably, a salient feature of these distributions is that they possess relatively high probability in the upper tail. However, it is also interesting to note that there are some distributions that exhibit distinctly non-Paretian behavior in the upper tail. For instance, Log-logistic, Inverse Pareto, and Inverse Paralogistic -each is a special case of Inverse Burr have relatively “light” tails as independently observed by Brazauskas (Citation2002). Application of such probability models covers a wide spectrum of areas ranging from actuarial science, economics, finance to health science domain and telecommunications, for distributions of variables such as sizes of insurance claims, incomes in a population of people, stock price fluctuations, duration of responses to medical treatment, and length of telephone calls. For pertinent details, see Arnold (Citation1983, Citation2015); Johnson et al. (Citation1994), and Klein and Moeschberger (Citation1997). Moreover, some of these distributions are relevant within much broader classes of probability models. For example, a generalized Pareto distribution arises in semiparametric modeling of upper observations in samples from distributions which are regularly varying or in the domain of attraction of extreme value distributions, see Embrechts et al. (1997). This motivates us to consider the study of this distribution from the estimation perspective.

Next, without loss of generality, we consider μ=0. Alternatively, we can estimate μ from a given data as w1:n=min1inwi, which is a consistent estimate of μ. Then, we can subtract from w and set μ equal to zero. Kalbfleisch and Prentice (Citation1980) identified EquationEq. (1.1) as a generalized F density. However, the seemingly unattractive and complicated (although it does have a closed form density) expression of the density function discouraged researchers to investigate further about this model. In the literature attempts have been made to discuss structural properties of the FP distribution as well as methodologies related to methods of estimation under a frequentist approach. A not-exhaustive list of such references are mentioned as follows. Tahmasebi and Behboodian (Citation2010) have discussed and derived the exact analytical expressions of entropy for the FP family and order statistics of FP subfamilies. Dutang, Goulet, and Langevin (Citation2022) developed an R package actuar for implementing FP distribution in actuarial applications numerically. Odubote and Oluyede (Citation2014) discussed a weighted version of a Feller-Pareto distribution and discussed several useful structural properties. Brazauskas (Citation2002) derived the exact form of Fisher Information matrix for the FP distribution. However, none of the above cited references have discussed the estimation of the model parameters under a frequentist approach. Additionally, classical estimation appears to have serious limitations in efficiently estimating the parameters of the FP model that has a total of 5 parameters. In particular, one is faced with the following problems:

  • The classical maximum likelihood method of estimation may not perform satisfactorily well, because we have 4 parameters (assuming μ=0) in total, and the associated likelihood is not a well behaved function as it involves gamma functions. Moreover, even if we estimates for the parameters, it is quite difficult to examine mathematically or otherwise, whether or not the estimated values are global or local maximum.

  • From EquationEq. (1.1), with μ=0, the k-th order moment (k2) of FP distribution will be E(Wk)=σkΓ(kγ+δ2)Γ(δ1kγ)Γ(δ1)Γ(δ2). It is quite obvious from this expression that for the k-th order of moment to exist, we must have δ1>kγ. This is quite a strong assumption, and at the same time we do not know for sure whether for a given data set, this condition will be satisfied. Although, one may use the method of fractional moments (which always exists), but that too, might not yield satisfactory results as several estimation methods under the classical set-up involves selecting a starting value for the parameters under study.

As a remedy, we seek a different approach in this paper. We write the density function in EquationEq. (1.1) as a mixture of one conditional generalized gamma and an unconditional gamma distribution. Later on, we will show that this mixture representation helps us to derive easily conditional Jeffery’s type non informative prior. This is quite fascinating in the sense that we are not assuming any external prior, but by rewriting the model in terms of two known distributions by invoking the argument of data augmentation. Needless to say, similar technique might well be considered for a multivariate FP distribution. The rest of the paper is organized as follows. In Section 2, the maximum likelihood estimation method is used to estimate the model parameters for the FP distribution given in EquationEq. (1.1). In Section 3, we discuss mixture representation of the density given in EquationEq. (1.1). In Section 4, we discuss a simulation study. Section 5 deals with the application of the proposed methodology to a data set having varying complexities. Finally, some concluding remarks are presented in Section 6.

2 Estimation by the method of maximum likelihood

For a random sample of size n the associated log-likelihood function drawn from the probability model in EquationEq. (1.1) will be (2.1) logL(σ,γ,δ1,δ2)=(δ1γ1)i=1nlog(Wiσ)+n(Γ(δ1+δ2))n{Γ(δ1)+Γ(δ2)}nlogγ(δ1+δ2)i=1nlog(1+(Wiσ)1/γ).(2.1)

The corresponding maximum likelihood equations will be by differentiating EquationEq. (2.1) with respect to γ, σ, δ1 and δ2 will be (2.2) logL(σ,γ,δ1,δ2)γ=(δ1+δ2)i=1n(Wiσ)1γlog(Wiσ)γ2((Wiσ)1γ+1)δ1i=1nlog(Wiσ)γ2nγ.(2.2) (2.3) logL(σ,γ,δ1,δ2)σ=(δ1+δ2)i=1nWi(Wiσ)1γ1γσ2((Wiσ)1γ+1)n(δ1γ1)σ.(2.3) (2.4) logL(σ,γ,δ1,δ2)δ1=i=1nlog(Wiσ)γi=1nlog((Wiσ)1γ+1)+nΓ(δ1+δ2)ψ(0)(δ1+δ2)nΓ(δ1)ψ(0)(δ1).(2.4) (2.5) logL(σ,γ,δ1,δ2)δ2=i=1nlog((Wiσ)1γ+1)+nΓ(δ1+δ2)ψ(0)(δ1+δ2)nΓ(δ2)ψ(0)(δ2).(2.5)

The maximum likelihood estimates of the parameters σ,γ,δ1,δ2 are obtained by equating EquationEqs. (2.2)–(2.5) to zero, and ψ(0)(z)=ddzlog(Γ(z)), is the polygamma function.

For the elements of the Fisher information (FIM, in short), an interested reader is referred to the paper by Brazauskas (Citation2002) with γ1 and γ2 need to be replaced by δ1 and δ2 as per the notation utilized in this paper in the probability density function in EquationEq. (1.1).

The asymptotic variance-covariance matrix of the MLE θ¯̂=(σ̂,γ̂,δ̂1,δ̂2) can be obtained from the inverse of the observed FIM as U=I1=def.[u11u12u13u14u21u22u23u24u31u32u33u34u41u42u43u44].

Lehmann and Casella (Citation2006) have discussed the asymptotic normality of the MLE under certain conditions (see, Theorem 3.10, p.449). For the FP distribution given in EquationEq. (1.1), it is straightforward to see that |4σγδ1δ2[logfX¯(x¯|θ¯)]|=0.

In addition, all the remaining necessary and sufficient conditions of Theorem 3.10 of Lehmann and Casella (Citation2006) are satisfied, and therefore, it can be assumed that (σ̂,γ̂,δ̂1,δ̂2)asympN4(σ,γ,δ1,δ2,(U[θ¯])jj1).

Consequently, a 100(1q)% approximate confidence intervals of the parameters θî, will be (2.6) θî±Zq×uii,(2.6)

i=1,2,3, where Zq is the 100q-th upper percentile of the standard normal distribution.

2.1 Simulation study

In this section, we conduct a Monte Carlo simulation study to evaluate the performance of the likelihood inference for the FP distribution given in EquationEq. (1.1). Random samples from the FP distribution are obtained using the actuar package in R. In particular, we consider the sample sizes n = 50, 75, and 100 with the following six sets of choices of the model parameters.

  1. Choice 1: σ=2, γ=1.5 and δ1=0.35, δ2=0.35.

  2. Choice 2: σ=2.5, γ=2 and δ1=0.65, δ2=0.75.

  3. Choice 3: σ=1.75, γ=1.2 and δ1=1.35, δ2=1.40.

  4. Choice 4: σ=1.45, γ=1.6 and δ1=1.40, δ2=1.52.

  5. Choice 5: σ=3, γ=1.8 and δ1=1.65, δ2=1.75.

  6. Choice 6: σ=3.5, γ=2.25 and δ1=0.95, δ2=0.85.

For each setting, 20,000 sets of random samples are generated. Regarding the MLE estimates, we have mimicked the strategy adopted in Dutang, Goulet, and Langevin (Citation2022). For each simulated random sample, we also compute the 95% approximate confidence intervals for the parameters σ, γ, δ1, and δ2 based on EquationEq. (2.6) with the asymptotic variances obtained from inverting the observed Fisher information matrix as well as the approximated variances obtained from a parametric bootstrap method with 400 bootstrap samples, for details on the bootstrapping, see Efron and Tibshirani (Citation1994). The estimated biases and mean squared errors (MSEs) of the MLEs of σ, γ, δ1, and δ2 are presented in . The estimated coverage probabilities and average widths of the confidence intervals are presented in . Since the observed information need not be positive definite which results in negative asymptotic variances, see, for example, Verbeke and Molenberghs (Citation2007). Additionally, we also represent the percentage of cases in which the asymptotic variances are negative and the confidence intervals cannot be computed in . In those cases that the asymptotic variances are negative, we recommend using a parametric bootstrap method as an alternative method to approximate the variances of the MLEs.

Table 1: Simulated biases and MSEs of the MLEs of the parameters in the FP distribution for various choices of σ,γ,δ1,δ2 and n.

Table 2: Simulated coverage probabilities (CP) and average widths (AW) of the MLEs of the parameters in the FP distribution for various choices of σ,γ,δ1,δ2.

One may observe from , that the estimated MSEs for the four parameters σ, γ, δ1 and δ2 decrease as the sample size increases. However, for the estimated biases, there is not a steady decreasing pattern with the increase of sample sizes, and on the contrary, in some cases, it appears that there is a negligible amount (by 0.01–0.05) of increase. Whether this is an anomaly or not will be investigated in a separate article. We also observe that the estimated MSEs of δ2 is larger than the MSEs of σ, γ, and δ1. Next, from , one may also observe the following

  • that the proportions of cases in which negative variance estimates are obtained is negligibly small.

  • the computed approximate confidence intervals based on bootstrap variances performs satisfactorily well. Note that these approximate confidence intervals can be used as an alternative when the asymptotic variances are negative, for pertinent details, see Ghosh and Ng (Citation2019) and the references cited therein.

3 Mixture representation

In this section, we represent the mixture representation of the FP distribution. Suppose, X1Γ(δ1,1) and X2Γ(δ2,1) and they are independent. Let us define Zi=Xiγ, i=1,2, with γ>0. Then, the distribution of Zi will be f(zi)=1γΓ(δi)exp(zi1/γ)ziδi/γ1zi>0.

Note that the above density function has been independently derived and explored by Stacy (Citation1962). The joint density of Z1 and Z2 will be f(z1,z2)=1γ2Γ(δ1)Γ(δ2)exp{(z11/γ+z21/γ)}z1δ1/γ1z2δ2/γ1,zi>0,i=1,2.

Next, let us consider the following transformation. W=σZ1Z2, with σ>0, and U=Z2. We are interested in the distribution of W. The Jacobian of the above transformation is |J|=uσ. Then, the joint distribution of W and U will be f(u,w)=1γ2σΓ(δ1)Γ(δ2)(uwσ)δ1/γ1uδ2/γ1(uσ)exp(((uwσ)1/γ+u1/γ))I(w>0,u>0).

Hence, the marginal distribution of W will be (3.1) g(w)=(wσ)δ1/γ1γ2σΓ(δ1)Γ(δ2)0uδ1+δ2γ1exp(u1/γ((wσ)1/γ+1))du=(wσ)δ1/γ1σγΓ(δ1)Γ(δ2)Γ(δ1+δ2)(1+(wσ)1/γ)δ1+δ2I(w>0).(3.1)

Noticeably, it is the ratio of two independent Stacy random variables with the same shape parameter. Jordanova et al. (Citation2023) discussed and studied the distribution of the ratio of two independent Stacy random variables when both the shape parameters for the numerator and denominator random variables is equal to one. However, it must be noted that Jordanova et al. (Citation2023) did not work on a subset of this current work. Precisely, the authors in that paper work on different sets of the values of the powers in the numerator and denominator, and just have an intersection when these powers are equal to 1.

Next, observe that we can rewrite EquationEq. (3.1) as g(w*)=0f1(w*|u)f2(u)du,where W*=Wσ and

  • W|U=uΓ(δ1γ,u), with

    f1(w|u)=uδ1/γγΓ(δ1)wδ1/γ1exp{uw1/γ}I(w>0).

  • UΓ(δ2γ,1).

In this case, the associated complete data likelihood takes the following form: (3.2) L(δ1,δ2,γ,W1,W2,,Wn,U1,U2,,Un)=1(γσ)n(1Γ(δ1))n(1Γ(δ2))ni=1nuiδ2/γ1exp{i=1nui1/γ}×i=1n(uiwi)δ1/γexp{i=1n(uiwi)1/γ}=(1γσΓ(δ1)Γ(δ2))ni=1nui(δ1+δ2)/γ1i=1n(wiσ)δ1/γ1exp{i=1nui1/γ(1+(wiσ)1/γ)}.(3.2)

Next, we try to derive from EquationEq. (3.2), the full conditionals of (δ1,δ2,γ,σ,U1,U2,,Un), given (W1,W2,,Wn) as follows:

  • For i=1,2,,n,

    Ui|ui,w¯,γ,δ1,δ2,σui(δ1+δ2)/γ1exp{ui1/γ(1+(wiσ)1/γ)}

    Generalized gamma((δ1+δ2)/γ,1/(1+(wiσ)1/γ),γ).

  • γ|u¯,w¯,δ1,δ2,σexp{1γ((δ1+δ2)i=1nlogui+δ1i=1nlogwiσ)i=1nui1/γ(1+(wiσ)1/γ)}(1γσ)n.

  • δ2|u¯,w¯,δ1,γ,σ(1σΓ(δ2))nexp{δ2γi=1nlogui}. Note that this function is log concave.

  • δ1|u¯,w¯,δ2,γ,σ(1Γ(δ1))nexp{δ2γi=1n(logui+logwiσ)}. Note that this function is also log concave.

  • Finally, σ|u¯,w¯,δ1,δ2,γ(1γΓ(δ1)Γ(δ2))n exp(i=1nui1/γ(1+(wiσ)1/γ)).

Therefore, all (n+4) full conditionals can be sampled using Acceptance Rejection (AR) sampling using the R-package ars.

On the choice of hyperparameters for the priors:

In this case, the hyperparameter values for the Gamma priors are obtained via the method of matching first two theoretical moments with the sample moments. Needless to say that there are other available strategies of selecting hyperparameters, such as the procedure adopted by Giannone et al. (Citation2015) in relation to vector autoregression models, Singh and Hellander (Citation2018) in the context of hyperparameter optimization, etc. However, the methodology adopted in such references might be more applicable to spatial-temporal models rather than the model that we have here. Additionally, whether such strategies will be beneficial (in the sense of computational efficiency, computation time) is a subject matter of a future study. Furthermore, it is safe to opine that we are not claiming that this set of priors with these specific choice of hyperparameters is optimum for the associated Bayesian analysis, but, in our cases, we have tried several other prior choices and found minimal changes in the final estimates. Also, the acceptance probability across found to be between (0.36,0.73), which indicates that the chain mixing was satisfactory. A full scale study regarding a wide range of prior choices needs to be done which is beyond the scope of this present paper.

3.1 Comment on the convergence of MCMC procedure

For the convergence diagnostics of the adopted MCMC procedure, the method of Gelman and Rubin (Citation1992) is utilized. It involves two stages. The first step, before sampling begins, involves obtaining an over-dispersed estimate of the target distribution(s) and using these to generate the starting points for the desired number of independent chains (in our case,we consider running a MCMC procedure with m=10 parallel independent chains of length 2k = 40000 each). The second step involves using the last k=1000 iterations to re-estimate the target distribution of the scalar quantity of interest (in our case σ, γ, δ1, and δ2). The convergence of the MCMC convergence is monitored by the following quantity given below: R̂={k1n+(m+1)BmkW}df(df2),where B is the variance between the means from the m=10 parallel chains, W is the average of the m=10 within-chain variances and df is the degrees of freedom of the approximating t density, for details, see pp. 465, Eq. (20) of Gelman and Rubin (Citation1992). Convergence is said to have achieved once the value of R̂ is near 1 for all scalar estimands of interest for a sufficiently large k (k). It is to be noted here that Gelman and Rubin (Citation1992) had proposed the method for Gibbs sampler but here we have applied their procedure on the output of MCMC procedure which is a particular case of Gibbs sampling. For illustrative purposes, we provide the values of R̂ for all of our scalar quantities of interest under the partial dependent conjugate prior set up (provided in in Appendix).

Table 3: Posterior summary for the FP model under the non-informative prior assumption.

4 Simulation study

For the associated Bayesian analysis, we consider the following two different sets of prior choices, namely the non-informative & improper and conjugate prior set up. At first, we consider the non-informative prior set-up and we label it as Choice 1.

Choice 1 (Non-informative prior set-up): Π(σ)1(1+σ)2I(σ>0).Π(γ)1(1+γ)2I(γ>0).Π(δ1)1(1+δ1)2I(δ1>0).Π(δ2)1(1+δ2)2I(δ2>0).

Therefore, the joint prior in this case will be Π(σ)×Π(γ)×Π(δ1)×Π(δ2).

The associated posterior summary is represented in .

Choice 2 (Conjugate prior set-up):

Next, under the conjugate prior set-up, we consider the same prior choices for all the parameters, i.e., (δ1,δ2,γ,σ) each follows a Gamma distribution with shape = 0.2, and rate = 0.2 respectively. We consider using WINBUGS (a software for computing Bayesian analysis) with R interface to conduct the simulation study. We do not claim that the prior choice made here is optimum, but among different choices of the hyperparameters made for this study, this choice appears to be producing satisfactory results for the Bayesian analysis based on a random sample of size n=100. The associated posterior summary is provided in .

Table 4: Posterior summary for the FP model under the conjugate prior assumption.

For both the cases, we consider the following four sets of true parameter choices.

  1. Choice 1: σ=0.07, γ=0.25 and δ1=0.75, δ2=0.82.

  2. Choice 2: σ=0.158, γ=3.10 and δ1=0.65, δ2=0.75.

  3. Choice 3: σ=4.98, γ=6.70 and δ1=0.57, δ2=0.61.

  4. Choice 4: σ=3.22, γ=3.50 and δ1=0.52, δ2=0.53.

Remark.

From the posterior simulation studies as given in and , one may comment the following:

  • Under the non-informative prior set-up the associated 95% HPD intervals are slightly wider.

  • We cannot make a general comment as to whether prior conjugacy in this scenario will perform uniformly better or not. A full scale study involving other possible prior choices and hyperparameter values along with a flat prior of the form Π(θ)1 will be the subject matter of a separate article.

5 Real data application

In this section, we consider a data on fault-trace lengths from Clark, Cox, and Laslett (Citation1999). As per the authors, mimicking their words, “It has often been observed that fault-trace lengths tend to follow a power-law or Pareto distribution, at least for sufficiently large lengths.” The applicability of the FP model under the classical paradigm (using the MLE method) has already been discussed in Clark, Cox, and Laslett (Citation1999). In this paper, we discuss the Bayesian estimation of the FP model parameters. One prominent reason for selecting the FP distribution is that it is the most general model in the hierarchy of Pareto distribution and many other distributions have been assumed to fit this particular data set which assumes a Pareto like behavior (especially mimicking the tail-behavior). Here, we consider the estimates of the parameters using moment matching (i.e., equating the sample moments with the theoretical moments) strategy and treat them as the initial prior choice for the Bayesian analysis. The summary of the Bayesian output is given in . Regarding the convergence of the adopted MCMC, we provide the R̂ values given in the Appendix. For the Bayesian analysis, we consider the following two different sets of priorchoices:

Table 5: Posterior summary for the 4 parameter FP model on fault-trace lengths data.

  • Choice 1 (Non-informative prior set-up):

    Π(σ)1σI(σ>0).Π(γ)1γI(γ>0).Π(δ1)1(1+δ1)2I(δ1>0).Π(δ2)1(1+δ2)2I(δ2>0).

  • Choice 2 (Conjugate prior set-up):

    • Π(σ)Gamma(shape=0.27,rate=0.35).

    • Π(γ)Gamma(shape=1.27,rate=2.09).

    • Π(δ1)Gamma(shape=1.09,rate=2.12).

    • Π(δ2)Gamma(shape=1.45,rate=3.50).

From the 95% HPD intervals for the conjugate prior set-up, it appears that the Bayesian estimation under the non-informative (and improper) prior set-up is less efficient as expected. The summary of the output is given in .

Sensitivity analysis with respect to the hyperparameters:

An anonymous referee has expressed concern that the results of this analysis might be influenced by the choice of hyperparameters. As an initial effort to investigate hyperparameters sensitivity, we re-analyze the data set with two other sets of prior choices that are given as follows (for the conjugate prior set-up):

  1. Choice 3:

    • Π(σ)Gamma(shape=0.32,rate=0.49).

    • Π(γ)Gamma(shape=2.49,rate=3.28).

    • Π(δ1)Gamma(shape=1.53,rate=2.58).

    • Π(δ2)Gamma(shape=1.68,rate=1.39).

  2. Choice 4:

    • Π(σ)Gamma(shape=0.68,rate=0.89).

    • Π(γ)Gamma(shape=2.38,rate=2.73).

    • Π(δ1)Gamma(shape=1.78,rate=3.46).

    • Π(δ2)Gamma(shape=1.95,rate=2.34).

The posterior summaries based on the above two different prior choices (with varying choices of the hyperparameters) are given in . It appears that with these two sets of new choices of hyperparameters for the respective priors the final conclusion remains the same, i.e., the performance of the Bayesian inference under the conjugate prior set-up performs slightly better as compared to that of under the non-informative priorset-up.

Table 6: Posterior summary for the 4 parameter FP model on fault-trace lengths data under different conjugate prior choices (Choice 3 and Choice 4).

6 Concluding remarks

In this paper, we have discussed estimation of the model parameters of the Feller-Pareto distribution under both frequentist and Bayesian paradigm. Under the frequentist approach, we consider the MLE method and also discussed the asymptotic normality of the estimates under certain regularity conditions. Noticeably, as mentioned in Section 1, estimation of the parameters under the classical approach is usually hindered by the fact that the likelihood function is not well behaved, and the parameter space have constraints. Some of these problems can be avoided by using computing environment such as R using specific packages that are designed to handle such types of complicated models in terms of their estimation. Nevertheless, efficient estimation of model parameters for the most general members of the Pareto-family, namely the Feller-Pareto distribution under the classical approach is difficult to achieve, as the existence of moments has some conditions that are hard to satisfy from a practical point of view. However, by rewriting the density function in EquationEq. (1.1) as a mixture of two well-known continuous probability models and by invoking the strategy of data augmentation, we have conducted a Bayesian analysis under both non-informative & improper and conjugate prior set-up. Based on our study, it is difficult to comment as to which of the two procedures (MLE and Bayesian) will be performing uniformly better than the other. Now, in the absence of large samples and in the presence of prior information, one can hope that the inferential aspect under the Bayesian paradigm will be better than any strategy under the classical set-up. The results of a small simulation study is encouraging. We have also discussed the utility of the proposed strategy of estimation of the Feller-Pareto distribution in the context of a real-life data set. A more comprehensive study is required to explore the usefulness of Bayesian estimation from a standard Bayesian analysis point of view.

Several possible works in the future direction can be considered, including but not limited to

  • choice of other types of improper priors as opposed to the ones assumed in this paper and then conduct a Bayesian analysis.

  • selecting a dependent & full conditional priors from the exponential family as opposed to subjective conjugate and proper priors assumed in this paper.

  • estimation of the model parameters by other methods under then classical paradigm, such as the method of maximum product spacing distance, Cramer Von mises, method of ordinary and weighted least square estimators etc., and have a comparison study to see the relative efficiency of each of these estimation strategies in the context of such type of probability models.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

References

  • Feller W. 1971. An introduction to probability theory and its applications. Vol. 2. New York: Willey. p. 766.
  • Arnold BC. 1983. Pareto distributions. Fairland (MD): International Cooperative Publishing House.
  • Johnson NL, Kotz S, Balakrishnan N. 1994. Continuous univariate distributions. Vol. 1, 2nd ed. New York: Wiley.
  • Klein JP, Moeschberger S. 1997. Survival analysis: techniques for censored and truncated data. Berlin: Springer.
  • Emberchts P, Claudia K, Mikosch T. 1997. Modelling extremal events. Stochastic modelling and applied probability. Vol. 33. Berlin: Springer. p. 283–370.
  • Arnold BC. 2015. Pareto distributions. 2nd ed. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Boca Raton, FL: Chapman & Hall/CRC.
  • Brazauskas V. 2002. Fisher information matrix for the Feller–Pareto distribution. Stat Probab Lett. 59(2):159–167.
  • Clark RM, Cox SJD, Laslett GM. 1999. Generalizations of power-law distributions applicable to sampled fault-trace lengths: model choice, parameter estimation and caveats. Geophys J Int. 136(2):357–372.
  • Dutang C, Goulet V, Langevin N. 2022. Feller-Pareto and related distributions: Numerical implementation and actuarial applications. J Stat Softw. 103:1–22.
  • Efron B, Tibshirani RJ. 1994. An introduction to the bootstrap. New York: CRC Press.
  • Ghosh I, Ng HKT. 2019. A class of skewed distributions with applications in environmental data. Commun Stat Case Stud Data Anal Appl. 5(4):346–365.
  • Gelman A, Carlin JB, Stern HS, Rubin DB. 1995. Bayesian data analysis. New York: Chapman and Hall.
  • Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences (with discussion). Stat Sci. 7:457–511.
  • Giannone D, Lenza M, Primiceri GE. 2015. Prior selection for vector autoregressions. Rev Econ Stat. 97(2):436–451.
  • Kalbfleisch JD, Prentice RL. 1980. The statistical analysis of failure time data. New York: Wiley.
  • Lehmann EL, Casella G. 2006. Theory of point estimation. New York: Springer.
  • Lunn DJ, Thomas A, Best N, Spiegelhalter D. 2000. WinBUGS-a Bayesian modeling framework: concepts, structure, and extensibility. Stat Comput. 10(4):325–337.
  • Odubote O, Oluyede BO. 2014. Theoretical properties of the weighted Feller-Pareto distributions. Asian J Math Appl. 1:1–12.
  • Jordanova PK, Savov M, Tchorbadjieff A, Stehlík M. 2023. Mixed Poisson process with Stacy mixing variable. Stochastic Anal Appl. 42(2):289–305.
  • Robert CP, Casella G. 1999. Monte Carlo statistical methods. New York: Springer.
  • Rubenstein RY. 1981. Simulation and the Monte Carlo method. New York: Wiley.
  • Singh P, Hellander A. 2018. Hyperparameter optimization for approximate bayesian computation. In: 2018 Winter Simulation Conference (WSC). IEEE, p. 1718–1729.
  • Stacy EW. 1962. A generalization of the Gamma distribution. Ann Math Stat 33(3):1187–1192.
  • Tahmasebi S, Behboodian J. 2010. Shannon entropy for the Feller-Pareto (FP) family and order statistics of FP subfamilies. Appl Math Sci 4(10):495–504.
  • Verbeke G, Molenberghs G. 2007. What can go wrong with the score test? Amer Stat 61(4):289–290.

Appendix

In this section, we discuss the proofs of the results mentioned in Section 3.

Log concavity of distributions corresponding to mixture representation in Section 3:

First of all, note that a real valued function g() is said to be log-concave on the interval (a,b) if the function logg() is a concave function on (a,b). Equivalently, one can say,

g()g() is monotonically decreasing on (a,b) or (log"g()<0,) where “stands for the double derivative and stands for the first order derivative respectively.

Result 1.

The kernel of the conditional density of γ|u¯,w¯,δ1,δ2exp{1γ((δ1+δ2)i=1nlogui+δ1i=1nlogwi)i=1nui1/γ(1+wi1/γ)}(1γ)nis log concave and integrable.

Proof.

We need to show that the kernel of this distribution is log concave and integrable. Let us define A1=i=1nui1/γ(1+wi1/γ). Then A1=A1γ=log(1/γ)[A1+i=1n(uiwi)1/γ].

Therefore, (A.1) A1A1=1+log(1/γ){i=1n(uiwi)1/γi=1n(uiwi)1/γ+i=1n(ui)1/γ}=1+log(1/γ){11+i=1n(ui)1/γi=1n(uiwi)1/γ}.(A.1)

Note that, from EquationEq. (A.1), for integer values γ>0 and with (ui,wi)>0,i=1,2,,n, as γ increasing, the quantity log(1/γ) is decreasing. Also, the term inside the second parenthesis is decreasing. Hence A1 is log-concave.

Next, consider A2=1γ((δ1+δ2)i=1nlogui+δ1i=1nlogwi)

For the sake of notational simplicity, we write B=(δ1+δ2)i=1nlogui+δ1i=1nlogwi.

Note that B is positive (based on the argument as mentioned earlier.) Also, A2=Bγ.

Therefore, 2logA2γ2=1γ2>0.

So, A2 is log-convex. Again, since A1 is log-concave A1 is log-convex. Now, our kernel for the conditional density A2+(A1).

This is sum of two log-convex functions, and therefore, it is log-convex.

We must make a note of the following:

  • For integer values of γ, log(1/γ) is a decreasing function.

  • For fractional values of γ (i.e., 0<γ<1) log(1/γ) is increasing.

  • For any values of γ, and with any fractional positive values of (ui,wi) the ratio 11+i=1n(ui)1/γi=1n(uiwi)1/γ is decreasing.

  • But, for integer values of (ui,wi) and for any values of γ, the above ratio is increasing.

However, since the quantity is of the form 11+positive quantity, it will be always <1.

Similarly, one can establish the log-concavity (and/or log-convexity) of all the other conditional distributions.

On the integrability of all the conditional density functions for AR sampling:

Here we begin with the following:

  1. Since the conditional density of Ui given Ui,γ,δ1,δ2 is a generalized gamma, indeed it is integrable.

  2. The conditional density of γ is given by (without the normalizing constant)

    γ|u¯,w¯,δ1,δ2exp{1γ((δ1+δ2)D1+δ1D2)i=1nui1/γ(1+wi1/γ)}(1γ)n,

where D1=i=1nlogui, and D2=i=1nlogwi. Next, if consider the transformation θ=1γ, then the conditional density of θ given u¯,w¯,δ1,δ2 will be (B.2) g(θ|u¯,w¯,δ1,δ2)exp{θ((δ1+δ2)D1+δ1D2)i=1nuiθ(1+wiθ)}θn2=M1M2,(B.2)

where M1=exp(θT1)θn2, M2=exp(i=1nuiθ(1+wiθ)), and T1=(δ1+δ2)D1+δ1D2.

Next, note that 0<γ< will imply 0<θ<. Also, (ui,wi)0. So, the function M2 will exhibit the following:

  • limθ0M2=exp(2n)<, provided n<.

  • limθM20.

Therefore, M2 is a bounded function on the interval [0,).

Next, let us consider the integrability of the quantity M1 0M1dθ=0exp(θT1)θn2dθ=0j=0(θT1)jj!θn2dθ=j=0(T1)jj!0θn2+jdθ.

Table A1: Values of R̂ for n = 100.

Table A2: Values of R̂ for n = 200.

Table A3: Values of R̂ for n = 500.

This integral will be bounded and integrable, provided n+j<2. This appears to be a impossible condition. Hence the proof.