1,046
Views
0
CrossRef citations to date
0
Altmetric
Invited Reviews

Development of next-generation reference interval models to establish reference intervals based on medical data: current status, algorithms and future consideration

, &
Pages 298-316 | Received 30 Aug 2023, Accepted 30 Nov 2023, Published online: 26 Dec 2023

Abstract

Evidence derived from laboratory medicine plays a pivotal role in the diagnosis, treatment monitoring, and prognosis of various diseases. Reference intervals (RIs) are indispensable tools for assessing test results. The accuracy of clinical decision-making relies directly on the appropriateness of RIs. With the increase in real-world studies and advances in computational power, there has been increased interest in establishing RIs using big data. This approach has demonstrated cost-effectiveness and applicability across diverse scenarios, thereby enhancing the overall suitability of the RI to a certain extent. However, challenges persist when tests results are influenced by age and sex. Reliance on a single RI or a grouping of RIs based on age and sex can lead to erroneous interpretation of results with significant implications for clinical decision-making. To address this issue, the development of next generation of reference interval models has arisen at an historic moment. Such models establish a curve relationship to derive continuously changing reference intervals for test results across different age and sex categories. By automatically selecting appropriate RIs based on the age and sex of patients during result interpretation, this approach facilitates clinical decision-making and enhances disease diagnosis/treatment as well as health management practices. Development of next-generation reference interval models use direct or indirect sampling techniques to select reference individuals and then employed curve fitting methods such as splines, polynomial regression and others to establish continuous models. In light of these studies, several observations can be made: Firstly, to date, limited interest has been shown in developing next-generation reference interval models, with only a few models currently available. Secondly, there are a wide range of methods and algorithms for constructing such models, and their diversity may lead to confusion. Thirdly, the process of constructing next-generation reference interval models can be complex, particularly when employing indirect sampling techniques. At present, normative documents pertaining to the development of next-generation reference interval models are lacking. In summary, this review aims to provide an overview of the current state of development of next-generation reference interval models by defining them, highlighting inherent advantages, and addressing existing challenges. It also describes the process, advanced algorithms for model building, the tools required and the diagnosis and validation of models. Additionally, a discussion on the prospects of utilizing big data for developing next-generation reference interval models is presented. The ultimate objective is to equip clinical laboratories with the theoretical framework and practical tools necessary for developing and optimizing next-generation reference interval models to establish next-generation reference intervals while enhancing the use of medical data resources to facilitate precision medicine.

Introduction

Laboratory medicine underpins patient care and is used to inform diagnosis, monitor treatment, and predict outcomes. At the heart of this practice are reference intervals (RIs) that are critical for interpreting test results and guiding clinical decisions. Traditional RIs face significant limitations and may lead to inaccurate clinical decisions when demographic variables such as age and sex influence test results.

In recent years, collaborative efforts across disciplines and the emergence of real-world studies have paved the way for advances in the establishment of reference intervals. The advances in computational power have catalyzed a paradigm shift in RI development [Citation1–5], with recent methodologies harnessing the power of these technologies to establish more accurate and demographically tailored RIs [Citation6–11]. Drawing upon the contributions of scholars in the field and our own research, we propose the term “next-generation reference interval models” for the first time. Such advanced models are characterized by a curve function for test results for different age and sex categories and offer dynamically tailored intervals in contrast to traditional static or broad age-range intervals. By auto-selecting the appropriate reference interval based on each patient’s age and sex, they equip clinicians with tools for precise, individualized decision-making. This not only refines the accuracy of disease diagnosis and treatment but also elevates the overall quality of health management.

Despite their potential, the path to widespread adoption of such models has many challenges. First, the process of collecting a robust, representative sample of healthy individuals to establish these intervals is costly, time-consuming, and, for certain types of specimens (e.g. cerebrospinal fluid), practically unfeasible. Moreover, the precursors of such models, known as continuous reference intervals, were tailored predominantly for pediatric cohorts because of variations in their physiological parameters [Citation12,Citation13]. The lack of information system capabilities has hindered the widespread application of these findings in clinical practice. Recognizing these challenges, our team, among others, has leveraged the use of medical data derived from the electronic health record (EHR) to validate the feasibility of indirect sampling techniques for various test parameters and to establish more comprehensive next-generation RIs [Citation14–17].

Zierk et al. [Citation10] established a model to develop RIs for hematology tests using an indirect sampling technique (kosmic algorithm) and smooth spline algorithms, which they referred to as the next-generation reference interval in their publication. Such efforts have set the groundwork for broader adoption and application. Yet, significant hurdles remain: the precise definition of the next-generation reference interval model is still nebulous, there are limited models available, and the methodologies behind their creation and refinement lack clarity and consensus.

This review seeks to demystify the development of next-generation reference interval models, to provide a comprehensive overview of their current status, and to describe their advantages and outline ongoing challenges. We describe the complexities of model development and discuss advanced algorithms and the tools that are integral to their development and optimization. Furthermore, we discuss the potential future of these models in utilizing medical data derived from the EHR with the ultimate aim of providing clinical laboratories with the necessary theoretical and practical knowledge to develop their own next-generation RIs. This would also allow the clinical laboratory to use its own medical data, thereby promoting precision medicine.

Current status of next-generation reference interval models

Definition of next-generation reference interval model

The reference interval (RI) is crucial for clinical diagnosis, treatment monitoring, and prognosis evaluation. Its significance is highlighted by the range of test results observed in healthy individuals, typically represented with a confidence level of 95% (). The traditional methodology for setting up RIs is the direct approach in which individuals are chosen from a reference population based on specific criteria and the samples are collected and analyzed for the selected measurands. This process bifurcates into the a priori and a posteriori selection mechanisms. The a priori method involves the selection of individuals who meet pre-defined inclusion criteria, while the a posteriori method includes specimens based on supplementary details like clinical data or other test results, and not all collected samples may be used for further examination [Citation18,Citation19]. While the ideal scenario for the direct method involves random selection from the reference population, in practice, this is not often achieved. The selection of the group for testing is often swayed by factors such as cost and convenience. True randomization, which ensures a wholly representative group, demands substantial planning and resources. Another pivotal aspect of the direct approach is the inclusion of each test result in the statistical assessment, necessitating the exclusion of outliers. This exclusion can influence the intervals that are determined [Citation18,Citation19]. Challenges with direct studies include defining health, the presence of unnoticed diseases, and potential biases from smaller cohorts.

Figure 1. The concept of reference interval.

Figure 1. The concept of reference interval.

In comparison, the indirect sampling technique, also termed data mining, taps into existing laboratory data. These methods use statistical tools to parse reference values from large datasets. Critics of this method highlight the limited oversight of pre-analytical conditions and the dependence on statistical tools for filtering out unhealthy subjects. Its supporters appreciate its clinical relevance and its straightforwardness, especially for specific populations like neonates and the elderly. In recent years, the advent of the digital age has increased the interest in real-world studies to derive RIs from mixed datasets of physiological and pathological test results [Citation11,Citation20–22]. Real-world data is information collected from actual, everyday clinical events and interactions as opposed to data generated in controlled or experimental settings. This development reduces the economic cost of establishing RI significantly and addresses the situation in which RIs from other sources are difficult to apply to various test systems and specific populations. However, challenges remain for test parameters heavily influenced by age and sex. Utilizing a single RI or discrete RIs grouped by age results in increased misjudgments that impact clinical decision-making (). A model for RIs that offers continuous age- and sex-specific intervals tailored to different age and sex categories has been introduced [Citation10,Citation11].

Figure 2. The limitations of the single reference interval. The blue dots represent individual test results; taking the upper limit of the reference interval as an example, there are a large number of results in the area between the upper limit of a single reference interval and the continuous upper limit calculated by the next-generation reference interval model (the grey shaded part).

Figure 2. The limitations of the single reference interval. The blue dots represent individual test results; taking the upper limit of the reference interval as an example, there are a large number of results in the area between the upper limit of a single reference interval and the continuous upper limit calculated by the next-generation reference interval model (the grey shaded part).

Automatically selection of the appropriate reference interval according to the patient’s sex and age facilitates the determination of normal or abnormal test results, aids clinical decision-making and reduces misjudgment rates associated with static RIs. These models are termed “next-generation reference interval models” in this review.

Advantages of and challenges in developing next-generation reference interval models

A next-generation reference interval model embedded in the clinical laboratory information system can provide clinicians with more accurate decision-making tools and enhance convenience for utilization. In terms of methodology, the subjects used in establishing next-generation reference interval models can be acquired through both direct and indirect sampling techniques. Like the establishment of a single reference interval, the next-generation reference interval model based on reference individuals obtained by a direct sampling technique is accurate and reliable. However, our previous studies have found that establishing a single stable reference interval for tests with relatively large variation, such as thyroid stimulating hormone, would require at least 850 samples [Citation23,Citation24]. Thus, development of a next-generation reference interval model by recruiting healthy individuals free of disease would require a vast number of samples to achieve a relatively stable and accurate model. This method is both costly and time-consuming. Moreover, it is challenging to obtain specimens like aqueous humor and cerebrospinal fluid through a direct approach [Citation18,Citation19]. Compared with this, mining massive medical data generated in clinical laboratories provides a new way for establishing next-generation reference intervals and compensates for the limitations of direct sampling techniques. This approach significantly reduces the cost of establishing next-generation reference intervals and simplifies the procedure. Moreover, this method enables the establishment of next-generation reference intervals for specific types of hard-to-obtain specimens. Next-generation reference interval models can also present results in percentiles or Z-score charts [Citation25], enabling normalization across different systems and significantly enhancing convenience in interpreting test results to assist clinical decision-making.

However, challenges still remain. It has not been determined which tests would benefit by having their RIs established using a next-generation reference interval model. Additionally, such a model has been developed for only a few tests. Lastly, given the proliferation of modeling algorithms in the field, along with the many parameters they entail, learning about this domain may be challenging.

Process and algorithms for developing next-generation reference interval models and establishing next-generation reference intervals based on medical data

In recent years, there has been a notable surge in interest regarding next-generation reference interval models. The methodology for formulating such models consists of two fundamental stages: the selection of an appropriate reference population followed by the development of the model. Methods for constructing such reference interval models can be classified into three primary categories based on the selection of the reference population and the subsequent model development. In this paper, they are referred to as category I, II and III methods. Specifically, these are:

  • Direct sampling techniques with modeling (category I methods);

  • Indirect sampling techniques (1) with modeling (category II) methods; and

  • Indirect sampling techniques (2) with curve fitting (category III methods).

provides a concise overview of these three methodological approaches, placing special emphasis on category III methods, which have been the subject of more intensive research compared the other categories. These studies have centered around biochemical analytes, hematological parameters, and hormones integral to endocrine metabolism. Several studies [Citation39,Citation40] have also addressed tests in pregnancy.

Table 1. Methods and content of previous studies on establishment of next-generation reference intervals models.

While continuous RIs serve as the precursor for next-generation reference interval models, they are not innovative per se. The novel aspect is the application of innovative algorithms specifically tailored to utilize medical data in the EHR, that is, real-world data. It is through these novel computational methods that we can accurately establish and refine continuous reference intervals. The subsequent sections provide detailed descriptions of the methodology and algorithms employed for establishing next generation RIs using next-generation reference interval models.

Direct sampling techniques with modeling (category I methods)

This method utilizes data primarily from multicenter studies originally designed to establish reference intervals using direct sampling. Selected reference subjects are employed to construct an age-test curve model. In these studies, reference subjects are selected using direct sampling, where individuals are sampled randomly from various centers. Based on when exclusion criteria and partitioning are applied, methods can be divided into a priori and a posteriori. Exclusion criteria encompass variables like blood pressure, body mass index, smoking, and alcohol consumption that are sourced from questionnaires and onsite physical examinations. CLSI document EP28-A3c, Establishing, and Verifying Reference Intervals in the Clinical Laboratory [Citation51], provides a comprehensive questionnaire detailing these criteria. Upon obtaining reference individual test results, a curve model illustrating age-test result relationships is developed. Several methodologies exist for this purpose.

Altman method [Citation52–54]

This method, based on Altman et al. starts with performing a Box-Cox power transformation or log transformation on the data. Then a weighted polynomial model is developed for the mean of the test (y1) as a function of age, and it gives the standard deviation of the test (y2) as a function of age. Finally, the models are used to establish reference intervals (y1±Z × y2) for every age within the age range; for RIs with a 95% confidence interval, the Z value is equal to 1.96. The calculated results are back-transformed to the original scale.

LMS (lambda-median-sigma) [Citation55–57]

This well-established and widely-used technique, developed by Cole and Green, constructs age-related reference percentile curves using LMS [Citation55]. In this method, with the three numerical parameters, L (degree of skewness), M (central tendency) and S (dispersion), continuous normal distributions are defined with age. As with the Altman method, this method requires Box-Cox transformation or log transformation on the data, and back-transformation to the original scale after calculating the age-specific distribution and the reference limits of the test.

GAMLSS model [Citation58–60]

The full name of the GAMLSS model is General Additive Model for Location Scale and Shape method. Functions for GAMLSS were introduced by Rigby and Stasinopoulos in 2005. In this method, models are developed using distributional regression in which all parameters of the conditional distribution for the response variable are modeled using the explanatory variables. In addition to focusing on the mean (or location) of the distribution, other parts of the distribution like variance, quantiles and skewness and kurtosis or tails may also need to be considered to establish appropriate models. More than 100 continuous, discrete, and mixed distributions are available in GAMLSS for modeling the response variable, which makes modeling flexible and controllable.

Quantile regression [Citation61–65]

Quantile regression is a non-parametric method. Unlike linear regression, which estimates parameters of the model by the least square method, the quantile regression estimates model parameters using the weighted least absolute deviation method to minimize the sum of the absolute residuals for each sample. Linear regression also relies on assumptions of normality and homogeneity of variances for residuals, which are often challenging to meet in practice, it is sensitive to outliers, and it fails to capture the global distribution of response variables in many scenarios. In contrast, quantile regression addresses these limitations effectively by not requiring residual distribution and by providing robust results that are less influenced by outliers.

The development of next-generation reference interval models through quantile regression often presents challenges:

  1. flexibility: each fitted quantile curve should possess sufficient flexibility to capture the nonlinear patterns observed across ages;

  2. comparison: the absence of an overall (likelihood-based) measure of fit, such as Generalized Akaike Information Criterion, in the fitted quantile regression model, poses challenges when comparing competing models;

  3. non-crossing: ensuring non-crossing behavior is crucial for the fitted quantile curves, particularly when dealing with nonlinear relationships, limited sample sizes, and sparse data; and

  4. instability of edges: the quantile curves near the extremes (i.e. for quantiles close to 0 or 100) vary more (i.e. are more erratic) than those in the center of the distribution of y, because those curves are supported by fewer observations.

Nonparametric estimation with radial smoothing

Kernel function is a function introduced in local regression to describe the local features of data. The radial basis function is a basis function in a space of functions. It satisfies the condition that for a fixed-point, c, the values of the function are the same for x equidistant around the fixed-point. There are many common radial functions, and the Gaussian function is one of them. The kernel method maps a data matrix to a new design matrix by using a kernel or a set of M basis functions. In this way, a complex nonlinear model can be fitted indirectly in the original input space. In the study of Wan et al. [Citation66], the radial basis function was used to establish the mean, variance, skewness, and kurtosis models. Then, the Edgeworth-Cornish-Fisher expansion was applied to approximate quantile functions. Finally, rearrangement was used to transform the original quantile estimate to a monotonic quantile estimate.

Double-kernel-based method [Citation67,Citation68]

In 2010, Li et al. developed the double kernel-based method and automatic bandwidth selection procedure [Citation67]. The method is nonparametric and based on the double-kernel percentile estimator proposed by Yu and Jones [Citation68]; it can be obtained using the following steps:

  1. Estimate the conditional distribution function of y given x using local-linear weighting and applying a kernel smoother with a specified bandwidth along the x-axis.

  2. Simultaneously, apply another kernel smoother with its own bandwidth along the y-axis.

  3. Finally, obtain the estimated percentile from the estimated conditional distribution.

The double-kernel estimator was favored over the single-kernel estimator by Yu and Jones because of its improved mean-squared-error properties and smoother appearance. In the method of Li et al., sample weight was incorporated into the double-kernel percentile estimator to fit the data.

Indirect sampling techniques (1) with modeling (category II methods)

These methods closely mirror category I methods, direct sampling techniques combined with a modeling method, in terms of their procedural approach. The primary distinction lies in its use of indirect sampling techniques to ascertain the underlying Gaussian distribution. As depicted in , data for this approach is extracted from individuals participating in routine physical exams during periodic health screenings as well as from blood bank donors, that is, individuals who are mostly healthy. Then, other data within the EHR are used to filter out subjects who may exhibit abnormalities [Citation69]. To address outliers, methods such as Tukey’s method [Citation70,Citation71] and the latent abnormal value exclusion [Citation72] are applied. Once the outliers have been identified and excluded, a reference population is determined. Subsequently, the next-generation reference interval model is constructed following the modeling methodologies delineated under category I methods.

Table 2. Comparison of three methodologies for establishing next-generation reference interval models.

Indirect sampling techniques (2) with curve fitting (category III methods)

Over the last decade, there has been a growing interest in constructing next-generation reference interval models utilizing outpatient and inpatient data. The methodological flow of these investigations encompasses three principle steps that are depicted in [Citation10,Citation12]:

Figure 3. Procedure for developing a next-generation reference interval model based on indirect sampling (2) and curve fitting.

Figure 3. Procedure for developing a next-generation reference interval model based on indirect sampling (2) and curve fitting.
  1. Segment the input data into overlapping age categories.

  2. For each age group, establish discrete reference intervals using indirect sampling techniques.

  3. Transition from discrete reference intervals to continuous percentile curves.

Category III methods are generally based on hospitalized patients and use data from mixed datasets that are extracted from hospital EHRs, which have a higher proportion of pathological results than the datasets used for category II methods, which are based on mostly health individuals. Initially, category III methods did not incorporate extensive data filtration. As research progressed, more rigorous steps were integrated to refine the data. In category III methods, advanced algorithms are used to extract the Gaussian distribution. This advanced approach is gaining traction because of the maturation of the tools used and the availability of large patient datasets. As indicated in , this methodology has been used extensively in the development of pediatric reference intervals.

Indirect sampling (2)

The common approaches for indirect sampling (2) used in the establishment of next-generation reference interval models consist of Hoffmann’s approach, the Bhattacharya approach, the expectation-maximization algorithm, the kosmic approach, the refineR algorithm, and the truncated minimum chi-square (TMC) algorithm. Among them, the TMC and kosmic algorithms are used most often. These methods have been summarized in our previous studies [Citation73–75]. The algorithms are briefly described here.

Hoffmann

The Hoffmann method is an algorithm based on graphs [Citation76,Citation77]. First, the measurement values are arranged in ascending order and the cumulative probability for each value is calculated. The probabilities (y-axis) against the measurement values (x-axis) are plotted on normal probability paper. The y-axis scale also can be marked with the Z value of the standard normal distribution [Citation77]. In the plot, the distinct linear zones correspond to different subgroups representing health and disease. The upper limit and lower limit of reference intervals are determined by extending the linear region of the healthy subgroup and calculating their corresponding x-axis values using 2.5 (-1.96) and 97.5 (1.96), respectively, in a specific formula, to obtain percentiles. The Hoffmann algorithm’s dependence on visual inspection may introduce subjectivity into the results of the analysis and impedes the potential for automated analysis. Although two studies [Citation78,Citation79] have enhanced Hoffmann’s method to enable automatic selection of linear regions, some scholars [Citation77,Citation80] contend that this modification has altered the fundamental essence of the Hoffmann approach and consequently has impacted the confidence level associated with the 95% reference interval. Thus the improved Hoffmann method remains controversial [Citation81–83]. Consequently, incorporating this algorithm into Category III techniques of constructing next-generation reference interval models is challenging.

Bhattacharya

This method involves applying a logarithm transformation to the variables dln(Y(x))d x and x, followed by calculating the derivative of the resulting normal distribution density function (EquationEquation 1). (1) dln(Y(x))d x=1σ2×x+μσ2(1)

To determine the mean and standard deviation of the healthy distribution, plot a scatter plot of x and dln(Y(x))d x manually and fit an analytical formula to represent the linear region observed in the scatter plot. Then, utilize the slope and intercept values from this straight line to calculate these parameters. Subsequently, use these calculated mean and standard deviation values to compute the reference intervals [Citation84]. It is important to note that both Hoffmann’s and Bhattacharya’s methods have been successfully applied for analytes with a normal distribution. However, caution should be exercised when dealing with analytes exhibiting skewed distributions as they may yield unreliable RIs. Additionally, as for the Hoffmann algorithm, the Bhattacharya algorithm relies on visual inspection that can introduce subjectivity into the analysis and that hinders automated analysis. Therefore, it is difficult to embed this algorithm in category III techniques for building next-generation reference interval models.

Expectation-maximization

The expectation-maximization algorithm is an iterative algorithm that consists of two steps: the expectation (E) step and the maximization (M) step) [Citation85]. The E step calculates the model parameters and the probability for each individual. Using this probability, the M step then estimates the model parameters. After each iteration, the E step reevaluates the parameters until convergence con­ditions are met. Finally, at the end of the process, the RI is established based on the final parameters obtained from the analysis of the data distribution.

Compared to Hoffmann and Bhattacharya, the expectation-maximization algorithm allows a higher level of automation; however, it does necessitate parameter adjustment. If one does not actively engage in the parameter adjustment process, calculated reference intervals may become highly unreasonable. Consequently, incorporating this method into category III techniques for developing next-generation reference interval models poses a challenge.

kosmic

The proposed approach involves estimating the distribution of physiological test results by analyzing a mixture of normal and abnormal distributions [Citation86]. Initially, the data is transformed using the Box-Cox transformation, followed by fitting a Gaussian distribution to a truncated section of the dataset. Subsequently, the Kolmogorov-Smirnov distance between this Gaussian distribution and the observed truncated distribution is calculated. The distribution with the smallest Kolmogorov-Smirnov distance is then chosen as representative data for individuals with good health. Finally, this estimated distribution is utilized to calculate the reference interval. The algorithm allows a high level of automation and is robust, making it one of the prominent methods in establishing next-generation reference interval models.

refineR

The refineR algorithm employs an inverse modeling approach and can be divided into three steps [Citation87]. Firstly, the parameter search region and the primary peak are carefully selected. Secondly, a comprehensive multi-level grid search is performed to determine the optimal model parameters, which include λ (power), μ, σ, and P (scaling factor). Thirdly, the optimal model is used to calculate RIs. This algorithm is a relatively new method that exhibits superior stability and accuracy compared to the kosmic algorithm. Despite limited modeling studies utilizing this approach, its potential for practical application in establishing next-generation reference interval models is promising. It has performed well in several comparative studies [Citation74,Citation75,Citation88]. Recently, Ammer et al. developed a new method for establishing next-generation reference intervals model based on refineR and GAMLSS [Citation25]. This novel method can automatically and objectively optimize the smoothing strength. Furthermore, in comparison to other methodologies of category III techniques using nonparametric method fitting curves, this approach employs distribution parameters that are estimated through an indirect technique. This facilitates a weighted contribution from all data points and effectively controls the influence of patients results on the estimation of smooth curves. By utilizing the GAMLSS approach, the semi-parametric model developed enables computation of any percentile for any given explanatory variable. Because this is a recent model and research using it is limited, this methodology is classified as a special approach in category III methods of establishing next-generation reference intervals.

Truncated minimum chi-square (TMC)

The TMC algorithm [Citation48] is well-suited for analyzing data distributions that exhibit high skewness and contain a substantial proportion of data at or below the detection limit. The approach begins with utilizing a quantile-quantile plot to obtain initial estimates for the parameters (λ, μ, σ) of the assumed power normal distribution (Note that while the parameters λ, μ and σ here in the TMC method serve a similar role to the LMS parameters in that they describe a distribution’s skewness, median, and variation, they are derived and applied differently in the two methods.). These estimates are then refined iteratively using the TMC estimation. The resulting parameters are used to calculate a 95% reference interval. Confidence intervals for the reference limits are determined using an asymptotic formula for quantiles, while tolerance limits are obtained through bootstrapping techniques. The TMC method [Citation46,Citation47] is employed widely in establishing next-generation reference interval models because of its inherent stability and ease of automation.

Fitting percentile curves

Curve fitting, the third step of category III techniques, involves determining the correlation between the reference limits and age. These approaches can be a parametric method or a smoother. Smoothers use nonparametric approaches to determine the relationship between explanatory variables (x) and the predictor (y). Here, the commonly used approaches are described

Polynomial regression

This is a linear regression model employed to capture nonlinear relationships, thereby overcoming the limitations of simple linear regression (which accommodates solely linear associations between explanatory variables (x) and the predictor (y)). The polynomial regression method incorporates additional polynomial features, where β denotes the parameter that requires training of the model. The model parameters of polynomial regression can be estimated through either the method of least squares or gradient descent (EquationEquation 2). (2) f(x)=β0+β1x+β2x2+β3x3++βpxp(2)

In particular, when the power p of xp in a polynomial regression is negative, xp is a fractional term. This approach, pioneered by Royston and Altman in 1994 [Citation89], aimed to achieve more accurate curve fitting with the minimum number of polynomials.

However, the use of polynomial regression has several challenges. Firstly, as the number of terms increases, while the data may be increasingly well-fitted, overfitting becomes a concern. In other words, while the current data may exhibit a perfect fit, there is a high likelihood of poor performance when it is applied to new or modified data. Secondly, collinearity among the terms is an issue because correlations between different variables can lead to difficulties in accurately estimating the individual effect of each variable. Lastly, polynomial regression employs a single function to model global data; however, in many cases, distinct patterns within different subsets of the dataset emerge that necessitate the use of diverse functions for accurate fitting.

Regression spline

Data may exhibit sudden changes at specific values. To address such a data modeling scenario, piecewise polynomial regression can be employed for processing. The x value corresponding to each segment is referred to as a knot or breakpoint. Piecewise polynomial regression, also known as regression spline according to Smith [Citation90], necessitates the continuity of all derivatives at these nodes. Commonly utilized splines include B-splines and natural splines. Splines are defined as EquationEquation 3, where D is the degree of the polynomial in x and K is the number of knots. When D = 3, the cubic spline is obtained. Cubic splines are particularly useful for smoothing techniques because of their continuous first and second derivatives at the breakpoints, resulting in exceptionally smooth curves. B-splines [Citation91,Citation92], or basic splines, are a set of smooth, piecewise polynomials that are defined over a certain range and are used extensively in numerical analysis and data fitting. They are particularly useful because of their stability and minimal support with respect to a given partition of the domain. B-splines of a given degree are continuous and have continuous derivatives up to a certain order. Each B-spline basis function is non-zero only over a small interval of the domain, which makes them computationally attractive for modeling complex shapes and functions. (3) f(x)=j=0Dβ0jxj+i=1Kβi(xbi)DH(xbi)(3)

H(xbi)  is the Heaviside function: Ht=0,t<01,t0

The use of B-splines for truncated piecewise polynomials is analogous to that of orthogonal polynomials for polynomial regression. The number of knots has a direct impact on the model’s degrees of freedom and subsequently affects its complexity. A higher number of knots leads to increased model complexity, larger calculations, and reduced curve smoothness. Therefore, careful consideration should be given to selecting the appropriate number of knots, typically ranging from 3 to 5. One limitation with B-splines is that they tend to exhibit high variance in prediction at both ends of the x value range, resulting in wide confidence intervals. To address this issue, natural splines that impose boundary restrictions on regression splines are used; the assumption of linearity in the leftmost and rightmost segments enhances accuracy in predicting boundaries.

Smoothing spline

For the development of next-generation models, the main considerations include model goodness of fit and model generalization; this means that the fit is good and that the model represents the original data as much as possible. In the best case, the model can accurately predict every data point. However, the biggest problem is that the prediction ability of the model for a new data set may be poor, that is, that the model generalization ability will be poor. This situation still exists in the development of the next-generation reference interval models. Thus, smooth splines [Citation93] have been introduced (EquationEquation 4). (4) i=1n[yif(x)]2+λ[f(x)]2dx(4)

The smaller the first term,i=1nyifx2, in EquationEquation 4, the lower the fitting error; however, it also increases the risk of overfitting. The second term, λ[f(x)]2dx, quantifies the function’s unsmoothness, while λ serves as a penalty term for model complexity. The penalty becomes more severe and the curve becomes smoother as λ increases. The λ is typically determined through cross-validation. The fundamental principle of smooth splines involves integrating a collection of B-spline basis functions with a smoothing penalty to control the smoothness of the resulting curve. B-splines in this framework typically have a degree of three, hence cubic splines are frequently used in smooth splines. When the degree is set to three, smooth splines are characterized as piecewise cubic polynomials, and linear functions are often utilized for the outermost segments to ensure stability at the data’s boundaries.

Local regression

This non-parametric curve fitting method involves selection of a small local neighborhood, also known as bandwidth, around each independent variable. Points within this neighborhood are used to fit a local zeroth, linear and quadratic polynomial. The polynomial in the local neighborhood, a zeroth polynomial, is referred to as kernel regression [Citation94]. This curve-fitting approach is called local polynomial regression when the local fit involves linear and quadratic polynomials. Specifically, in local regression, the number and position of fitting points are determined first. Subsequently, the m nearest points are determined with xi as the central point for fitting, and the weight of these m points is computed using weight functions. Finally, linear or quadratic polynomial fitting is performed through weighted regression in the local neighborhood and the value of the regression function at xi is obtained as the dependent variable. The final curve can be drawn after repeating the above steps for all the original data points. In kernel regression, after obtaining weights for xi, yi can be calculated directly using EquationEquation 5. (5) yi=w1x1+w2x2++wmxm(5) wm is the weight of m raw data in the neighborhood centered on xi, where the closer wm is to xi, the greater the weight. The selection of bandwidth plays a crucial role in model performance. A wider bandwidth results in a smoother curve, potentially leading to underfitting. Conversely, a narrower bandwidth can lead to a more intricate curve and increase the risk of overfitting.

Comparison of the three methods

The three methods for constructing next-generation reference interval models differ primarily in their choice of reference individuals and curve-fitting approaches.

The primary distinction between category II and category III techniques is their approach to data selection and refinement. In category I methods, reference subjects are selected using direct sampling, Category II methods rely primarily on datasets with a comparatively low proportion of pathological results (individuals undergoing routine physical examinations or blood donors), and these methods focus on outlier identification. Category 3 methods use mixed datasets that have a higher proportion of pathological results and use algorithms to obtain a Gaussian distribution. Further distinctions are detailed in . Compared to the direct sampling technique, the indirect sampling method presents clear advantages in cost-effectiveness and operational feasibility in model development. Notably, category III techniques, which combine the indirect sampling technique (2) with percentile curve fitting, depend on reference limits acquired through indirect sampling. In contrast, the first two methods use original data.

It should be emphasized that the pipeline developed by Ammer et al. [Citation25] is a special method based on the use of refineR and GAMLSS. Ammer’s method has gained widespread acceptance and is now considered a standard method. It utilizes the refineR algorithm to separate non-pathological distributions from mixed data. The results of individuals belonging to non-pathological distributions are used to establish the next-generation reference interval model using GAMLSS. Unlike the other algorithms used in category III techniques, this method does not rely on reference limits obtained from indirect sampling technology to fit curves. However, this method also does not use original data. In summary, considering that this class of methods is relatively new and that an indirect sampling technique (2) is applied, this method is classified as a special subclass of category III methods.

Diagnosis, validation and testing of next-generation reference interval models

The process of model diagnosis and validation is paramount in ensuring the precision and reliability of the finalized model. Diagnostic evaluations are indispensable for certain parametric, semi-parametric, and even non-parametric models. For all models, validation is essential. This segment offers an overview of the methods for diagnosis and validation.

Normalized quantile residuals for diagnostics

The raw residuals as well as the Pearson residuals pose challenges in generalizing to distributions beyond normality. Additionally, some response variables in the models exhibit extreme skewness. To address these issues and enhance parametric model diagnostics, normalized quantile residuals are introduced [Citation95,Citation96]. The process for calculating normalized quantile residuals involves determining the probability density function and cumulative distribution function of the observed value y, followed by computing the corresponding function F(y) on the cumulative distribution function. Finally, the corresponding x-value of F(y) on the standard normal distribution’s cumulative distribution function is calculated to obtain the residual. Fit residuals are computed using a similar approach. In a correct model, both residuals and fit residuals should conform to a standard normal distribution. For visual inspection of normality, the GAMLSS package in R provides several plots that include residual-related plots (Figure S1). Adherence of these plots to a standard normal distribution for residuals demonstrates satisfactory model performance. Worm plots [Citation97,Citation98] (Figure S2), used primarily for assessing appropriate fitted distributions, show good fit when points lie between two curves and align closely with a central horizontal line. The Q-statistic plot (Figure S3) is used to detect deviations from normality, with respect to the mean (z1), variance (z2), skewness (z3), and kurtosis (z4) values of residuals. The manifestation of squares on this plot is indicative of absolute Z-values surpassing 2, which signals potential misalignment of certain distributional parameters within the model [Citation99].

k-fold cross validation

The k-fold cross-validation, one of multiple methods for internal validation, is used primarily for comparing models competitively. This method initially partitions the dataset into k subsets, with one subset designated as the validation set and the remaining k-1 subsets used for training. The model’s goodness of fit is then computed for each validation set, and after conducting cross-repeated validation k times, the average result from these iterations is employed as the final evaluation metric for model validation [Citation31,Citation32].

Calculating the fraction outside the reference interval

The methodology for determining the fraction outside the reference interval (FOR) [Citation34,Citation38] is illustrated in Figure S4. Data from healthy individuals, distinct from the modeling dataset, were curated to form the test set. From this collection, a random 30% was designated as the test subset. Utilizing the pre-established model, the test results for individuals within this subset were compared against the reference intervals corresponding to sex and age. Subsequently, the percentage of these individuals who fell outside the designated reference set was computed. This evaluation process was iterated 100 times, yielding multiple FOR values. From the distribution of these FOR values, a 95% confidence interval was derived. In the next-generation reference interval model with a two-sided 95% confidence level, the expected percentage of observations falling outside the reference interval stands at 5%. This is split evenly, with 2.5% anticipated beyond each of the upper and the lower bounds. Theoretically, the 95% confidence interval of the FOR value should encompass this 2.5% threshold. Should it not align with this expectation, it is essential to adjust the input parameters in the modeling approach to account for the observed discrepancies. The modeling is iteratively refined until the desired specifications are satisfied.

Tools for establishing big data driven next-generation reference interval models

The tools and code sources commonly employed for establishing next-generation reference interval models are summarized in . Currently, the predominant modeling approach relies on programming, with R [Citation100] and Python [Citation101] being the primary languages utilized because they have strong functionality and easy-to-use packages and libraries. Additionally, RStudio [Citation102] and PyCharm [Citation103] provide convenient platforms for programming in R and Python. Also, Medcalc or SPSS can be used.

Table 3. Tools for establishment of next-generation reference interval model.

Future perspectives of the next-generation reference interval models based on data from clinical laboratories

Novel report pattern of the next-generation reference interval model

Next-generation reference interval models can be represented using quantile plots or normalized Z-score plots, which offer the advantage of eliminating the influence of units when comparing test results across different platforms. In future reports, a single RI may be presented for the “non-elderly adult” population or simply adult population—those under 65 or 60 years of age, as depicted in . This interval will offer a baseline against which the health metrics of “older adults”, those over 60 or 65 years, can be normalized. By comparing the laboratory values of older adults with this RI, we can delineate the extent of their health status deviations attributable to the aging process. This comparison is designed to enrich clinical decision-making by providing a clear benchmark for identifying age-related changes in health parameters. Furthermore, while sex and age are often used in model construction because they are readily available in patient records, other determinants also play a role. The emphasis on age and sex in our review arises from their universal accessibility, e.g. in the EMR, making models based on them particularly practical in everyday clinical work. Genetic background and other variables are also important, but the challenge lies in the lack of availability of such data for all patients. This may pose limitations in implementing models in real-world scenarios. Nevertheless, it is our belief that acknowledging the many influences on test results is paramount. In the future, to cater to the intricacies of multi-variable scenarios, one approach might be to construct segmented models. Ideally, integrating these variables for model adjustment would be the optimal route and would significantly enhance the precision and dependability of the model.

Figure 4. Novel report pattern of the next-generation reference interval.

Figure 4. Novel report pattern of the next-generation reference interval.

Develop an information system that can use next-generation reference interval models

Our team has developed a multi-center big data analysis platform for real-world studies of clinical laboratory data that can be utilized in training, validating, and testing next-generation reference interval models. Additionally, we aim to establish data interfaces between our platform and both the clinical laboratory information system and the hospital information system to transfer quantile values derived from the model. In the future, test reports will include plots of patient results in next-generation reference interval model diagrams, providing clinicians with a visual basis for decision-making. Importantly, upon opening the interface, data will be analyzed and transmitted in real-time, allowing regular re-estimation of next-generation reference interval model parameters using the new data. The dynamic nature of next-generation reference intervals based on next-generation reference interval models will enable more informed evaluation of patient results.

Authors’ contributions

Chaochao Ma wrote and revised this manuscript. Zheng Yu provided guidance on the mathematics. Ling Qiu made suggestions for the revision of the manuscript. All authors reviewed the manuscript and approved the submission.

Consent for publication

All the authors gave their consent for publication.

Abbreviations
EHR=

electronic health record

FOR=

fraction outside the reference interval

GAMLSS=

generalized additive models for location, scale, and shape

RI=

reference interval

TMC=

truncated minimum chi-square

Supplemental material

Supplemental materials w legends.docx

Download MS Word (522.7 KB)

Disclosure statement

This review is part of the Real-World Study of Big Data Mining in Laboratory Medicine and has been conducted in accordance with the Declaration of Helsinki. This review was approved by the Ethics Committee of Peking Union Medical College & Chinese Academy of Medical Sciences, Peking Union Medical College Hospital (approval number: S-K1192). All methods were performed in accordance with relevant guidelines and regulations.

Data availability statement

The authors declare that all data generated or analyzed during this study are included in this article.

Additional information

Funding

This study was supported by the National Natural Science Foundation of China (72274218).

References

  • Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-world evidence - What is it and what can it tell us? N Engl J Med. 2016;375(23):2293–2297. doi: 10.1056/NEJMsb1609216.
  • Zheng C, Zhou W, Zhou R, et al. The necessity for improving lipid testing reagents: a real world study. Clin Chim Acta. 2023;548:117529. doi: 10.1016/j.cca.2023.117529.
  • Lux MP, Lewis K, Rider A, et al. Treatment patterns, safety, and patient reported outcomes among adult women with human epidermal growth factor receptor 2-negative advanced breast cancer with or without, or with unknown, BRCA1/2 mutation(s): results of a real-world study from the United States, United Kingdom, and four EU countries. Breast Care (Basel). 2022;17(5):460–469. doi: 10.1159/000523970.
  • Zhang W, Wang Y, Li E, et al. Neuropsychiatric adverse events following antiretroviral therapy in people living with HIV: a real-world study of dynamic trends and risk factors in hangzhou, China. Infect Drug Resist. 2023;16:5007–5019. doi: 10.2147/IDR.S419308.
  • Li XN, Huang Y, Wang W, et al. Effectiveness of inactivated SARS-CoV-2 vaccines against the delta variant infection in Guangzhou: a test-negative case-control real-world study. Emerg Microbes Infect. 2021;10(1):1751–1759. doi: 10.1080/22221751.2021.1969291.
  • Ma C, Wang X, Wu J, et al. Real-world big-data studies in laboratory medicine: current status, application, and future considerations. Clin Biochem. 2020;84:21–30. doi: 10.1016/j.clinbiochem.2020.06.014.
  • Adeli K. Closing the gaps in pediatric reference intervals: the CALIPER initiative. Clin Biochem. 2011;44(7):480–482. doi: 10.1016/j.clinbiochem.2011.02.017.
  • Ceriotti F. Establishing pediatric reference intervals: a challenging task. Clin Chem. 2012;58(5):808–810. doi: 10.1373/clinchem.2012.183483.
  • Hoq M, Canterford L, Matthews S, et al. Statistical methods used in the estimation of age-specific paediatric reference intervals for laboratory blood tests: a systematic review. Clin Biochem. 2020;85:12–19. doi: 10.1016/j.clinbiochem.2020.08.002.
  • Zierk J, Hirschmann J, Toddenroth D, et al. Next-generation reference intervals for pediatric hematology. Clin Chem Lab Med. 2019;57(10):1595–1607. doi: 10.1515/cclm-2018-1236.
  • Ma C, Xia L, Chen X, et al. Establishment of variation source and age-related reference interval models for 22 common biochemical analytes in older people using real-world big data mining. Age Ageing. 2020;49(6):1062–1070. doi: 10.1093/ageing/afaa096.
  • Zierk J, Arzideh F, Haeckel R, et al. Indirect determination of pediatric blood count reference intervals. Clin Chem Lab Med. 2013;51(4):863–872. doi: 10.1515/cclm-2012-0684.
  • Zierk J, Arzideh F, Rechenauer T, et al. Age- and sex-specific dynamics in 22 hematologic and biochemical analytes from birth to adolescence. Clin Chem. 2015;61(7):964–973. doi: 10.1373/clinchem.2015.239731.
  • Wang D, Yu S, Ma C, et al. Reference intervals for thyroid-stimulating hormone, free thyroxine, and free triiodothyronine in elderly chinese persons. Clin Chem Lab Med. 2019;57(7):1044–1052. doi: 10.1515/cclm-2018-1099.
  • Zou Y, Wang D, Cheng X, et al. Reference intervals for thyroid-associated hormones and the prevalence of thyroid diseases in the chinese population. Ann Lab Med. 2021;41(1):77–85. doi: 10.3343/alm.2021.41.1.77.
  • Ma C, Li D, Yin Y, et al. Establishing thresholds and effects of gender, age, and season for thyroglobulin and thyroid peroxidase antibodies by mining real-world big data. Clin Biochem. 2019;74:36–41. doi: 10.1016/j.clinbiochem.2019.08.011.
  • Wang D, Ma C, Zou Y, et al. Gender and age-specific reference intervals of common biochemical analytes in chinese population: derivation using real laboratory data. J Med Biochem. 2020;39(3):384–391.
  • Ozarda Y. Reference intervals: current status, recent developments and future considerations. Biochem Med (Zagreb). 2016;26(1):5–16. doi: 10.11613/BM.2016.001.
  • Jones GRD, Haeckel R, Loh TP, et al. Indirect methods for reference interval determination - review and recommendations. Clin Chem Lab Med. 2018;57(1):20–29. doi: 10.1515/cclm-2018-0073.
  • Ma S, Yu J, Qin X, et al. Current status and challenges in establishing reference intervals based on real-world data. Crit Rev Clin Lab Sci. 2023;60(6):427–441. doi: 10.1080/10408363.2023.2195496.
  • Doyle K, Bunch DR. Reference intervals: past, present, and future. Crit Rev Clin Lab Sci. 2023;60(6):466–482. doi: 10.1080/10408363.2023.2196746.
  • Yang D, Su Z, Zhao M. Big data and reference intervals. Clin Chim Acta. 2022;527:23–32. doi: 10.1016/j.cca.2022.01.001.
  • Ma C, Wang X, Xia L, et al. Effect of sample size and the traditional parametric, nonparametric, and robust methods on the establishment of reference intervals: evidence from real world data. Clin Biochem. 2021;92:67–70. doi: 10.1016/j.clinbiochem.2021.03.006.
  • Ma C, Hou L, Zou Y, et al. An innovative approach based on real-world big data mining for calculating the sample size of the reference interval established using transformed parametric and non-parametric methods. BMC Med Res Methodol. 2022;22(1):275. doi: 10.1186/s12874-022-01751-1.
  • Ammer T, Schützenmeister A, Prokosch HU, et al. A pipeline for the fully automated estimation of continuous reference intervals using real-world data. Sci Rep. 2023;13(1):13440. doi: 10.1038/s41598-023-40561-3.
  • Wilson SM, Bohn MK, Madsen A, et al. LMS-based continuous reference percentiles for 14 laboratory parameters in the CALIPER cohort of healthy children and adolescents. Clin Chem Lab Med. 2023;61(6):1105–1115. doi: 10.1515/cclm-2022-1077.
  • Yan R, Peng Y, Hu L, et al. Continuous reference intervals for 21 biochemical and hematological analytes in healthy chinese children and adolescents: the PRINCE study. Clin Biochem. 2022;102:9–18. doi: 10.1016/j.clinbiochem.2022.01.004.
  • Tybirk L, Hviid CVB, Knudsen CS, et al. Serum GFAP - reference interval and preanalytical properties in danish adults. Clin Chem Lab Med. 2022;60(11):1830–1838. doi: 10.1515/cclm-2022-0646.
  • Hall A, Bohn MK, Wilson S, et al. Continuous reference intervals for 19 endocrine, fertility, and immunochemical markers in the CALIPER cohort of healthy children and adolescents. Clin Biochem. 2021;94:35–41. doi: 10.1016/j.clinbiochem.2021.04.014.
  • Cao B, Peng Y, Song W, et al. Pediatric continuous reference intervals of serum insulin-like growth factor 1 levels in a healthy chinese children population - Based on PRINCE study. Endocr Pract. 2022;28(7):696–702. doi: 10.1016/j.eprac.2022.04.004.
  • Wilson S, Bohn MK, Hall A, et al. Continuous reference curves for common hematology markers in the CALIPER cohort of healthy children and adolescents on the sysmex XN-3000 system. Int J Lab Hematol. 2021;43(6):1394–1402. doi: 10.1111/ijlh.13670.
  • Holmes DT, van der Gugten JG, Jung B, et al. Continuous reference intervals for pediatric testosterone, sex hormone binding globulin and free testosterone using quantile regression. J Mass Spectrom Adv Clin Lab. 2021;22:64–70. doi: 10.1016/j.jmsacl.2021.10.005.
  • Cai T, Karlaftis V, Hearps S, et al. Reference intervals for serum cystatin C in neonates and children 30 days to 18 years old. Pediatr Nephrol. 2020;35(10):1959–1966. doi: 10.1007/s00467-020-04612-5.
  • Li K, Hu L, Peng Y, et al. Comparison of four algorithms on establishing continuous reference intervals for pediatric analytes with age-dependent trend. BMC Med Res Methodol. 2020;20(1):136. doi: 10.1186/s12874-020-01021-y.
  • Asgari S, Higgins V, McCudden C, et al. Continuous reference intervals for 38 biochemical markers in healthy children and adolescents: comparisons to traditionally partitioned reference intervals. Clin Biochem. 2019;73:82–89. doi: 10.1016/j.clinbiochem.2019.08.010.
  • Peitzsch M, Mangelis A, Eisenhofer G, et al. Age-specific pediatric reference intervals for plasma free normetanephrine, metanephrine, 3-methoxytyramine and 3-O-methyldopa: particular importance for early infancy. Clin Chim Acta. 2019;494:100–105. doi: 10.1016/j.cca.2019.03.1620.
  • Vogel M, Kirsten T, Kratzsch J, et al. A combined approach to generate laboratory reference intervals using unbalanced longitudinal data. J Pediatr Endocrinol Metab. 2017;30(7):767–773.
  • Ma C, Li L, Wang X, et al. Establishment of reference interval and aging model of homocysteine using real-world data. Front Cardiovasc Med. 2022;9:846685. doi: 10.3389/fcvm.2022.846685.
  • Ma C, Li X, Liu L, et al. Establishment of early pregnancy related thyroid hormone models and reference intervals for pregnant women in China based on real world data. Horm Metab Res. 2021;53(4):272–279. doi: 10.1055/a-1402-0290.
  • Markus C, Flores C, Saxon B, et al. Pregnancy-specific continuous reference intervals for haematology parameters from an Australian dataset: a step toward dynamic continuous reference intervals. Aust N Z J Obstet Gynaecol. 2021;61(2):223–231. doi: 10.1111/ajo.13260.
  • Zhu XT, Wang KJ, Zhou Q, et al. Establishing reference intervals of thyroid hormone based on a laboratory information system. Zhonghua Nei Ke Za Zhi. 2020;59(2):129–133.
  • Mokhtar KM. TSH continuous reference intervals by indirect methods: a comparisons to partitioned reference intervals. Clin Biochem. 2020;85:53–56. doi: 10.1016/j.clinbiochem.2020.08.003.
  • McCudden CR, Brooks J, Figurado P, et al. Cerebrospinal fluid total protein reference intervals derived from 20 years of patient data. Clin Chem. 2017;63(12):1856–1865. doi: 10.1373/clinchem.2017.278267.
  • Zhang GM, Guo XX, Ma XB, et al. Reference intervals of alpha-fetoprotein and carcinoembryonic antigen in the apparently healthy population. Med Sci Monit. 2016;22:4875–4880. doi: 10.12659/msm.901861.
  • Zierk J, Baum H, Bertram A, et al. High-resolution pediatric reference intervals for 15 biochemical analytes described using fractional polynomials. Clin Chem Lab Med. 2021;59(7):1267–1278. doi: 10.1515/cclm-2020-1371.
  • Haeckel R, Wosniok W, Torge A, et al. Reference limits of high-sensitive cardiac troponin T indirectly estimated by a new approach applying data mining. A special example for measurands with a relatively high percentage of values at or below the detection limit. J Lab Med. 2021;45(2):87–94. doi: 10.1515/labmed-2020-0063.
  • Haeckel R, Wosniok W, Torge A, et al. Age- and sex-dependent reference intervals for uric acid estimated by the truncated minimum chi-square (TMC) approach, a new indirect method. J Lab Med. 2020;44(3):157–163. doi: 10.1515/labmed-2019-0164.
  • Wosniok W, Haeckel R. A new indirect estimation of reference intervals: truncated minimum chi-square (TMC) approach. Clin Chem Lab Med. 2019;57(12):1933–1947. doi: 10.1515/cclm-2018-1341.
  • Weidhofer C, Meyer E, Ristl R, et al. Dynamic reference intervals for coagulation parameters from infancy to adolescence. Clin Chim Acta. 2018;482:124–135. doi: 10.1016/j.cca.2018.04.003.
  • Zierk J, Arzideh F, Haeckel R, et al. Pediatric reference intervals for alkaline phosphatase. Clin Chem Lab Med. 2017;55(1):102–110. doi: 10.1515/cclm-2016-0318.
  • CLSI. Establishing, and verifying reference intervals in the clinical laboratory; approved guideline—third edition. CLSI document EP28-A3c. Wayne (PA): Clinical and Laboratory Standards Institute; 2008.
  • Reed AH, Henry RJ, Mason WB. Influence of statistical method used on the resulting estimate of normal range. Clin Chem. 1971;17(4):275–284. doi: 10.1093/clinchem/17.4.275.
  • Altman DG. Construction of age-related reference centiles using absolute residuals. Stat Med. 1993;12(10):917–924. doi: 10.1002/sim.4780121003.
  • Wright EM, Royston P. Simplified estimation of age-specific reference intervals for skewed data. Statist Med. 1997;16(24):2785–2803. doi: 10.1002/(SICI)1097-0258(19971230)16:24<2785::AID-SIM797>3.0.CO;2-Z.
  • Cole TJ, Green PJ. Smoothing reference centile curves: the LMS method and penalized likelihood. Stat Med. 1992;11(10):1305–1319. doi: 10.1002/sim.4780111005.
  • Moussa MA. Estimation of age-specific reference intervals for skewed data. Methods Inf Med. 2002;41(02):147–153. doi: 10.1055/s-0038-1634299.
  • Kuczmarski RJ, Ogden CL, Guo SS, et al. 2000 CDC growth charts for the United States: methods and development. Vital Health Stat 11. 2002;11(246):1–190.
  • Rigby RA, Stasinopoulos DM. Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution. Stat Med. 2004;23(19):3053–3076. doi: 10.1002/sim.1861.
  • Rigby RA, Stasinopoulos DM. Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Stat Model. 2006;6(3):209–229. doi: 10.1191/1471082X06st122oa.
  • Rigby RA, Stasinopoulos DM. Automatic smoothing parameter selection in GAMLSS with an application to centile estimation. Stat Methods Med Res. 2014;23(4):318–332. doi: 10.1177/0962280212473302.
  • Gannoun A, Girard S, Guinot C, et al. Reference curves based on non-parametric quantile regression. Stat Med. 2002;21(20):3119–3135. doi: 10.1002/sim.1226.
  • Wei Y, Pere A, Koenker R, et al. Quantile regression methods for reference growth charts. Stat Med. 2006;25(8):1369–1382. doi: 10.1002/sim.2271.
  • Sarkar R. Establishment of biological reference intervals and reference curve for urea by exploratory parametric and non-parametric quantile regression models. EJIFCC. 2013;24(2):61–67.
  • Muggeo VMR, Sciandra M, Tomasello A, et al. Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology. Environ Ecol Stat. 2013;20(4):519–531. 12/01doi: 10.1007/s10651-012-0232-1.
  • Muggeo VMR, Torretta F, Eilers PHC, et al. Multiple smoothing parameters selection in additive regression quantiles. Stat Model. 2021;21(5):428–448. doi: 10.1177/1471082X20929802.
  • Wan X, Qu Y, Huang Y, et al. Nonparametric estimation of age-specific reference percentile curves with radial smoothing. Contemp Clin Trials. 2012;33(1):13–22. doi: 10.1016/j.cct.2011.09.002.
  • Li Y, Graubard BI, Korn EL. Application of nonparametric quantile regression to body mass index percentile curves from survey data. Stat Med. 2010;29(5):558–572. doi: 10.1002/sim.3810.
  • Yu K, Jones MC. Local linear quantile regression. J Am Stat Assoc. 1998;93(441):228–237. doi: 10.1080/01621459.1998.10474104.
  • Ma C, Cheng X, Xue F, et al. Validation of an approach using only patient big data from clinical laboratories to establish reference intervals for thyroid hormones based on data mining. Clin Biochem. 2020;80:25–30. doi: 10.1016/j.clinbiochem.2020.03.012.
  • Horn PS, Feng L, Li Y, et al. Effect of outliers and nonhealthy individuals on reference interval estimation. Clin Chem. 2001;47(12):2137–2145. doi: 10.1093/clinchem/47.12.2137.
  • Horn PS, Pesce AJ. Reference intervals: an update. Clin Chim Acta. 2003;334(1–2):5–23. doi: 10.1016/s0009-8981(03)00133-5.
  • Ichihara K, Boyd JC; IFCC Committee on Reference Intervals and Decision Limits (C-RIDL). An appraisal of statistical procedures used in derivation of reference intervals. Clin Chem Lab Med. 2010;48(11):1537–1551. doi: 10.1515/CCLM.2010.319.
  • Li S, Mu D, Ma C, et al. Establishment of a reference interval for total carbon dioxide using indirect methods in chinese populations living in high-altitude areas: a retrospective real-world analysis. Clin Biochem 2023;119:110631. doi: 10.1016/j.clinbiochem.2023.110631.
  • Zhong J, Ma C, Hou L, et al. Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults. BMC Med Res Methodol. 2023 23(1):108. doi: 10.1186/s12874-023-01898-5.
  • Ma C, Zou Y, Hou L, et al. Validation and comparison of five data mining algorithms using big data from clinical laboratories to establish reference intervals of thyroid hormones for older adults. Clin Biochem. 2022;107:40–49. doi: 10.1016/j.clinbiochem.2022.05.008.
  • Hoffmann RG. Statistics in the practice of medicine. JAMA. 1963;185(11):864–873. doi: 10.1001/jama.1963.03060110068020.
  • Holmes DT, Buhr KA. Widespread incorrect implementation of the hoffmann method, the correct approach, and modern alternatives. Am J Clin Pathol. 2019;151(3):328–336. doi: 10.1093/ajcp/aqy149.
  • Katayev A, Fleming JK, Luo D, et al. Reference intervals data mining: no longer a probability paper method. Am J Clin Pathol. 2015;143(1):134–142. doi: 10.1309/AJCPQPRNIB54WFKJ.
  • Katayev A, Balciza C, Seccombe DW. Establishing reference intervals for clinical laboratory test results: is there a better way? Am J Clin Pathol. 2010;133(2):180–186. doi: 10.1309/AJCPN5BMTSF1CDYP.
  • Holmes DT. Correct implementation of the hoffmann method. Clin Biochem. 2019;70:49–50. doi: 10.1016/j.clinbiochem.2019.02.007.
  • Katayev A, Fleming JK, Holmes DT, et al. Widespread implementation of the hoffmann method: a second opinion. Am J Clin Pathol. 2019;152(1):116–117. doi: 10.1093/ajcp/aqz015.
  • Zhang Y, Ma W, Wang G, et al. Response to the editor: limitations of the hoffmann method for establishing reference intervals using clinical laboratory data. Clin Biochem. 2019;70:51. doi: 10.1016/j.clinbiochem.2019.06.007.
  • Zhang Y, Ma W, Wang G, et al. Limitations of the hoffmann method for establishing reference intervals using clinical laboratory data. Clin Biochem. 2019;63:79–84. doi: 10.1016/j.clinbiochem.2018.11.005.
  • Bhattacharya CG. A simple method of resolution of a distribution into gaussian components. Biometrics. 1967;23(1):115–135. doi: 10.2307/2528285.
  • Concordet D, Geffré A, Braun JP, et al. A new approach for the determination of reference intervals from hospital-based data. Clin Chim Acta. 2009;405(1–2):43–48. doi: 10.1016/j.cca.2009.03.057.
  • Zierk J, Arzideh F, Kapsner LA, et al. Reference interval estimation from mixed distributions using truncation points and the Kolmogorov-Smirnov distance (kosmic). Sci Rep. 2020;10(1):1704. doi: 10.1038/s41598-020-58749-2.
  • Ammer T, Schützenmeister A, Prokosch HU, et al. refineR: a novel algorithm for reference interval estimation from real-world data. Sci Rep. 2021;11(1):16023. doi: 10.1038/s41598-021-95301-2.
  • Ammer T, Schützenmeister A, Prokosch HU, et al. RIbench: a proposed benchmark for the standardized evaluation of indirect methods for reference interval estimation. Clin Chem. 2022;68(11):1410–1424. doi: 10.1093/clinchem/hvac142.
  • Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. J R Stat Soc: series C (Applied Statistics). 1994;43(3):429–453. doi: 10.2307/2986270.
  • Smith PL. Splines as a useful and convenient statistical tool. Am Stat. 1979;33(2):57–62. doi: 10.2307/2683222.
  • De Boor C. A practical guide to splines. Applied mathematical sciences series. Vol. 27. New York (NY): Springer-Verlag; 1978.
  • Wahba G. Spline models for observational data. CBMS-NSF regional conference series in applied mathematics. Philadelphia (PA): Siam Publications Library; 1990.
  • Hardle W. Smoothing techniques: with implementation in S. Springer series in statistics. Springer-Verlag: New York; 1991.
  • Wand MP, Jones MC. Kernel smoothing. New York: Chapman and Hall/CRC; 1994.
  • Stasinopoulos D, Rigby R, Heller G, et al. Flexible regression and smoothing: using GAMLSS in R. The R series. New York: Chapman and Hall/CRC; 2017.
  • Rigby RA, Stasinopoulos D, Heller GZ, et al. Distributions for modeling location, scale, and shape: using GAMLSS in R. New York: Chapman and Hall/CRC; 2019. doi: 10.1201/9780429298547.
  • van Buuren S. Worm plot to diagnose fit in quantile regression. Stat Model. 2007;7(4):363–376. doi: 10.1177/1471082X0700700406.
  • van Buuren S, Fredriks M. Worm plot: a simple diagnostic device for modelling growth reference curves. Stat Med. 2001;20(8):1259–1277. doi: 10.1002/sim.746.
  • Metcalfe C. Goodness-of-fit statistics for age-specific reference intervals by P royston and EM wright. Comment. Statistics in medicine 2000; 19: 2943-2962. Stat Med. 2002;21(23):3749–3750. doi: 10.1002/sim.1356.
  • The R Project for Statistical Computing. The R Foundation. 2023. Available from: https://www.R-project.org
  • Python. Python Software Foundation. 2023. Available from: http://www.python.org
  • RStudio Desktop. Posit Software. 2023. Available from: https://posit.co/download/rstudio-desktop
  • Pycharm. The python IDE for professional developers. JetBrains s.r.o. 2023. Available from: https://www.jetbrains.com/pycharm