ABSTRACT
Outliers can have a major effect on the estimated transformation of the response in linear regression models, as they can on the estimates of the coefficients of the fitted model. The effect is more extreme in the Generalized Additive Models (GAMs) that are the subject of this article, as the forms of terms in the model can also be affected. We develop, describe and illustrate robust methods for the nonparametric transformation of the response and estimation of the terms in the model. Numerical integration is used to calculate the estimated variance stabilizing transformation. Robust regression provides outlier free input to the polynomial smoothers used in the calculation of the response transformation and in the backfitting algorithm for estimation of the functions of the GAM. Our starting point was the AVAS (Additivity and VAriance Stabilization) algorithm of Tibshirani. Even if robustness is not required, we have made four further general optional improvements to AVAS which greatly improve the performance of Tibshirani’s original Fortran program.
We provide a publicly available and fully documented interactive program for our procedure which is a robust form of Tibshirani’s AVAS that allows many forms of robust regression. We illustrate the efficacy of our procedure through data analyses. A refinement of the backfitting algorithm has interesting implications for robust model selection. Supplementary materials for this article are available online.
1 Introduction
The nonlinear transformation of response variables is a common practice in regression problems. Two customary goals are the stabilization of error variance and the approximate normalization of the error distribution (Box and Cox Citation1964). The more comprehensive goals of our article are to find, via smoothing and robustness, those transformations that produce the best-fitting additive model. Hopefully, such knowledge may lead to parsimonious parametric models capable of a physical interpretation.
Generalized Additive Models (GAMs) are a widely employed extension of linear regression models which we use to replace the explanatory variables by nonparametrically estimated functions of the variables (Hastie and Tibshirani Citation1990). Tibshirani (Citation1988) extended the GAM to include response transformation using an empirical version of the variance stabilizing transformation. His algorithm AVAS (additivity and variance stabilization) provides nonparametric alternatives to the Box-Cox parametric transformation of the response and the Box-Tidwell family of power transformations of explanatory variables (Box and Tidwell Citation1962). Unfortunately, like these parametric methods, his procedure is not robust; the transformations of both the explanatory variables and the response can be highly distorted by outlying observations (Atkinson, Riani, and Corbellini Citation2021). The same problem holds for the related ACE algorithm of Breiman and Friedman (Citation1985) and is one topic in the discussion of that article (Fowlkes and Kettenring Citation1985; Buja and Kass Citation1985). It is our purpose to provide a robust version of AVAS, robust both for transformations of the response and the explanatory variables. For obvious reasons we call our procedure RAVAS.
Both ACE and AVAS are appreciably cited and applied and both are available in the R package acepack (Spector et al. Citation2016). Although Breiman and Friedman (Citation1985) present ACE as a method for transformations in multiple regression and correlation, it has some anomalous properties for estimation of response transformations, as noted by Tibshirani (Citation1988). A discussion of the relationship of ACE and AVAS and of both to the Box-Cox transformation is in Hastie and Tibshirani (Citation1990, Cap.7). Buja and Kass (Citation1985) suggest a method for the robustification of ACE, but comment that “it is far easier to make this glib remark than to formulate the problem in such a way that progress [can] be made while retaining such advantages of the current ACE algorithm as the low computational cost.” The same remark applies to AVAS. What is surprising is the paucity of references for improving the procedure. Foi (Citation2008) comments that AVAS is non-asymptotic. Both he and Wang, Lyu, and Yu (Citation2021) use function minimization for numerical variance stabilization in an image-enhancement problem in which the errors have a Poisson-Normal distribution. Neither author, nor indeed Tibshirani, mentions robustness. However, Boente, Martínez, and Salibián-Barrera (Citation2017) do provide a numerical procedure for robust backfitting in the GAM. We believe our paper provides a robustification of AVAS in the spirit suggested by Buja and Kass (Citation1985). The application of our methods to robustifying ACE would follow directly from the work presented here.
Alternatives to AVAS include Ramsay (Citation1988) who uses a monotone spline to estimate the response transformation, with the regression parameters estimated by least squares. An advantage is that the Jacobian of the response transformation is found by straightforward differentiation of the spline. Ramsey also uses monotone splines for transformation of the explanatory variables. Breiman (Citation1988) queries the restriction to monotonicity for the transformation of the explanatory variables, but commends the use of splines. Subroutine transace of the R-package Hmisc (Harrell Jr Citation2019) replaces the supersmoother in ACE and AVAS with restricted cubic smoothing splines, with a controllable number of knots.
The desirable property of AVAS, and of our robust version, is that they provide the flexibility of nonparametric models for both the response and the explanatory variables. Most of the large literature on generalized linear models (GLMs) and GAMs uses smoothing or spline methods on either the response or the explanatory variables, but not both. An exception is Spiegel, Kneib1, and Otto-Sobotka (Citation2019) who use spline smoothing both for the link function in a GLM and for the transformation of the explanatory variables in the GAM. Robustness is introduced to fitting the GAM by Alimadad and Salibian-Barrera (Citation2011) who use a soft-trimming method, combined with spline smoothing. Their results extend those of Cantoni and Ronchetti (Citation2001) for the robust fitting of GLMs. Boente, Martínez, and Salibián-Barrera (Citation2017) formulate the numerical procedure for robust backfitting in the GAM using polynomial smoothers.
In this article we present a robust and improved version of Tibshirani’s algorithm. Our starting point was the original version of his algorithm, written in “classical” Fortran, that is with few comments and uninformative variable names. We first converted this nonrobust algorithm into Matlab. Our extension of AVAS is fully documented and freely available on GitHub. Details are given in the supplementary material. It is programmed to allow the use of a variety of methods of robust regression. These include S-estimation (Rousseeuw and Yohai Citation1984) and MM-estimation (Yohai and Zamar Citation1988), in which outliers are downweighted, and the hard trimming methods least Trimmed Squares (LTS) and the Forward Search (see Section 4.1.3) in which outliers have weight zero. In our numerical examples we use the Forward Search (Atkinson, Riani, and Cerioli Citation2010) to provide a subset of the observations believed to be outlier free. Because of the potential for an appreciable number of outliers which also need to be transformed for further calculations, we develop a new procedure for the necessary interpolation and extrapolation for these observations. Since the whole procedure involves iteration between response and explanatory variable transformation, the subset of observations treated as outlying may change from iteration to iteration. The backfitting and smoothing procedures for transforming the explanatory variables are estimated using this subset. Likewise, the numerical procedure for the variance stabilizing transformation of the response is only estimated from this subset, although it is applied to all observations.
In summary, the main purposes of our article, in order of increasing generality, are:
To clarify the details of Tibshirani’s algorithm;
To provide a version with improved numerical estimation of variance used in constructing the transformation. Other improvements, introduced in Section 4, are also available as options;
To present a flexible robust version of response and explanatory variable transformations, with graphical interpretation which can be interactive;
Finally, to make these procedures freely available on GitHub.
In addition to robustness we have made four improvements to the performance of AVAS. These, like robustness, are available as options and so can be employed whether or not a robust option is chosen. The program has been written to be highly flexible and can, for example, be run solely for nonparametric response transformation in regression using least squares.
In the next section we provide some background to the AVAS algorithm described in Section 3. Our robust algorithm, RAVAS, is presented in Section 4. Robust regression is introduced in Section 4.1.3, followed by details of the variance stabilizing transformation.
The remainder of the article illustrates the use of RAVAS. In Section 5 we demonstrate the program’s ability to detect outliers and illustrate the effect of our improvements in the algorithm on a nonrobust analysis. Section 6 introduces a novel graphical display (the augmented star plot) to illustrate the effectiveness of our five options on data analyses. It can also be brushed to link to the plot for particular analyses. These properties are illustrated on simulated data. In Section 7 we analyze data on the effectiveness of advertisements using social media. We conclude in Section 8 with brief details of a simulation study comparing RAVAS with traditional nonrobust AVAS, for samples of up to 10,000 observations. Unlike AVAS, RAVAS provided parameter estimates with mean squared error that was virtually constant and tests for outliers, the power of which tended to one as the outliers became more remote. Except for small sample sizes and moderate outliers, RAVAS converged in three iterations, whereas convergence of AVAS required increasingly many iterations as the outliers became more remote.
2 Background
2.1 Introduction
The generalized additive model (GAM) has the form (1) (1)
The functions are unknown and are, in general, found by the use of smoothing techniques. A monotonicity constraint can be applied. If the response transformation or link function g is unknown, it is restricted to be monotonic, but scaled to satisfy the technically necessary constraint that var (otherwise, the zero transformation would be trivially perfect). In the fitting algorithm the transformed responses are scaled to have mean zero; the constant can therefore be ignored. The observational errors are assumed to be independent and additive with constant variance. The performance of fitted models is compared by use of the coefficient of determination . Since the are estimated from the data, the traditional assumption of linearity in the explanatory variables is avoided. However, the GAM retains the assumption that explanatory variable effects are additive. Buja, Hastie, and Tibshirani (Citation1989) describe the background and early development of this model.
In the next section we assume that the response transformation is known and describe the backfitting algorithm for estimating the functions .
2.2 Backfitting
2.2.1 Definition
The backfitting algorithm, described in Hastie and Tibshirani (Citation1990, p. 91), is used to fit a GAM. The algorithm proceeds iteratively using residuals when one explanatory variable in turn is dropped from the model. A discussion of convergence of iterative versions is in Schimek and Turlach (Citation2006).
With the vector of transformed responses, let be the vector of residuals when is removed from the model without any refitting. Then (2) (2)
The new value of depends on ordered values of and . Let the ordered values of be . The residuals are sorted in the same way to give the new order . Within each iteration each explanatory variable is dropped in turn; . The iterations continue until the change in the value of is less than a specified tolerance.
For iteration l the vector of sorted residuals for is . The new estimate of is (3) (3)
The required estimates of follow by restoring the elements of to the appropriate values of unordered .
The function S depends on the constraint imposed on the transformation of variable j.
If the transformation can be non-monotonic, S denotes a smoothing procedure. In order to compare our procedure with that of Tibshirani (Citation1988) we follow him in using the supersmoother (Friedman and Stuetzle Citation1982), a nonparametric estimator based on local linear regression with adaptive bandwidths.
If the transformation is monotonic, the come from isotonic regression (Barlow et al. Citation1972).
If the transformation is linear , where a and b are the least squares coefficients.
2.2.2 Properties
For a linear regression model with errors that obey second-order assumptions, the least squares estimates of the parameters satisfy the normal equations. The backfitting algorithm is the Gauss-Seidel algorithm for solving this set of equations. Buja, Hastie, and Tibshirani (Citation1989) extend this result for least squares to the backfitting algorithm for linear smoothers; a smoother is linear if and S does not depend on y. This result assumes that, if necessary, the response has been transformed so that the second-order conditions are satisfied. Unfortunately, the supersmoother that is used for transformation of explanatory variables is not linear, so that convergence of the procedure is not automatically guaranteed.
The backfitting algorithm is not invariant to the permutation of order of the variables inside matrix X, with high collinearity between the explanatory variables causing slow convergence of the algorithm: the residual sum of squares can change very little between iterations (Breiman and Friedman Citation1985; Hastie and Tibshirani Citation1988). Our option orderR2, Section 4.1.4, attempts a solution to this problem by reordering the variables in order of importance.
3 Introduction to AVAS
In this section we first present the structure of the AVAS algorithm of Tibshirani (Citation1988) and then outline the variance stabilizing transformation used to estimate the response transformation. Our RAVAS algorithm has a similar structure, made more elaborate by the requirements of robustness and the presence of options.
3.1 The AVAS Algorithm
In this description of the algorithm ty and tX are transformed values of y and X. In both algorithms the inner loop fits the GAM for a given leading to new values of tX and residuals e. The main (outer) loop uses these to calculate a new transformation .
Initialize Data. Standardize response y so that and var , where var is maximum likelihood biased estimator of variance. Center each column of the X matrix so that .
Initial call to “Inner Loop” to find initial GAM using ty and tX; calculates initial value of . Set convergence conditions on number of iterations and value of .
Main (Outer) Loop. Given values of ty and tX at each iteration, the outer loop provides updated values of the transformed response. Given the newly transformed response, updated transformed explanatory variables are found through the call to the backfitting algorithm (inner loop). In our version iterations continue until a stopping condition on is verified or until a maximum number of iterations has been reached.
3.2 The Numerical Variance Stabilizing Transformation
We first consider the case of a random variable Y with known distribution for which E and var . We seek a transformation for which the variance is, at least approximately, independent of the mean. Then Taylor series expansion of leads to var . For a general distribution is then a solution of the differential equation d . For random variables standardized, as are the values ty, to have unit variance, and the variance stabilizing transformation is (4) (4)
In the AVAS algorithm for data, is estimated by the vector of the reciprocals of the absolute values of the smoothed residuals sorted using the ordering based on fitted values of the model. There are n integrals, one for each observation. The range of integration goes from the smallest fitted value, to . The computation of the n integrals uses the trapezoidal rule. The logged residuals in the estimation of the variance function are smoothed using the running line smoother of Hastie and Tibshirani (Citation1986). Full details of our implementation of the algorithm are in Atkinson et al. (Citation2023, chap. 7).
4 RAVAS: Five Extensions to AVAS
Our RAVAS procedure introduces five improvements to AVAS, programmed as options. These do not have a hierarchical structure, so that there are possible choices of the options. The augmented star plot of Section 6 provides a method for assessing these choices. We discuss the motivation and implementation for each in the order in which they are applied to the data when all are employed.
4.1 Initial Calculations
The structure of our algorithm is an elaboration of that of AVAS outlined in Section 3.1. Four of the five options can be invoked before the start of the outer loop. They are described here. A detailed flow scheme is in Section 2 of the supplementary material.
4.1.1 Initialization of Data: Option tyinitial
Our numerical experience is that it is often beneficial to start from a parametric transformation of the response. This is optionally found using the automatic robust procedure for power transformations described by Riani, Atkinson, and Corbellini (Citation2023). For min we use the Box-Cox transformation. For min the extended Yeo-Johnson transformation is used (Atkinson, Riani, and Corbellini Citation2020). This family of transformations has separate Box and Cox transformations for positive and negative observations. In both cases the initial parametric transformations are only useful approximations, found by searching over a coarse grid of parameter values. The final nonparametric transformations sometimes suggest a generalization of the parametric ones.
4.1.2 Ordering Explanatory Variables in Backfitting: Option scail
In a comparison of monotone regression spline estimation with ACE, Ramsay (Citation1988) observed that the fitted model obtained with ACE depends on the order of the explanatory variables (see Section 2.2). One approach is to use an initial regression to remove the effect of the order of the variables through scaling (Breiman Citation1988). With the coefficient of in the multiple regression of on all the option scail provides new transformed values for the explanatory variables: , . Option scail is used only in the initialization of the data.
4.1.3 Robust Regression and Robust Outlier Detection: Option rob
1. Robust Regression
We robustify our transformation method through the use of robust regression to replace least squares. There are three main approaches to robust parametric regression.
Hard (0,1) Trimming. In Least Trimmed Squares, Hampel (Citation1975), Rousseeuw (Citation1984) the amount of trimming is determined by the choice of the trimming parameter h, , which is specified in advance. The LTS estimate is intended to minimize the sum of squares of the residuals of h observations. For least squares (LS), .
Adaptive Hard Trimming. In the Forward Search (FS), the observations are again hard trimmed, but the value of h is determined by the data, being found adaptively by the search. Atkinson, Riani, and Cerioli (Citation2010) provide a general survey of the FS, with discussion.
Soft trimming (downweighting). M estimation and derived methods. The intention is that observations near the center of the distribution retain their value. Increasingly remote observations are downweighted by a function to give a weight that decreases with distance from the fitted model. We include S and MM estimation. A popular function is Tukey’s biweight (Beaton and Tukey Citation1974).
All three methods provide weights for the observations. Our RAVAS algorithm has been programmed to use any of the three robust methods. However, we prefer the Forward Search for outlier detection, since it depends upon the data to determine the number and identity of the outliers, rather than analyzing the data with the pre-determined value for h of LTS. All examples in this article have been computed using the FS. Comparisons of outlier detection using hard and soft-trimming in regression are in Riani, Atkinson, and Perrotta (Citation2014).
2. Robust Outlier Detection
Our algorithm works with k observations treated as outliers, providing the subset of observations used in model fitting and parameter estimation. This section describes our outlier detection methods.
The default setting of the forward search uses the multivariate procedure of Riani, Atkinson, and Cerioli (Citation2009) adapted for regression (Torti, Corbellini, and Atkinson Citation2021) to detect outliers at a simultaneous level of approximately 1% for samples of size up to around 1000. Optionally, a different level can be selected.
In the other two methods of robust regression we again calculate scaled residuals for all n observations and use a Bonferroni inequality to give a simultaneous test for outliers with significance level for detection of individual outliers of . The m observations for which the test is not significant then form the subset .
We use the subset in the backfitting algorithm to calculate the transformations at each iteration, the k outliers being ignored as they are in the calculation of the numerical variance stabilizing transformation in Section 4.2.1. Since different response transformations can indicate different observations as outliers, the identification of outliers occurs repeatedly during our robust algorithm, once per iteration of the outer loop. Examples of the dependence of outliers on the parameter of the Box-Cox transformation are in Atkinson and Riani (Citation2000, chap. 4). Here there is also a relationship between the transformations and the declaration of outliers.
4.1.4 Ordering Predictor Variables: Option orderR2
In order to completely eliminate dependence on the order of the variables, we include an option that, at each iteration, provides an ordering which is based on the variable which produces the highest increment of . Let be the set of the indexes for the columns of matrix X which have already been updated, and let be the index of a column of X that has not yet been updated; then . Further, let be the coefficient of determination in a multiple linear regression which has g as response and as regressors . We select as the next variable to be estimated in the backfitting algorithm, that for which
With this criterion the most relevant features are immediately transformed and those that are perhaps irrelevant will be transformed in the final positions. For robust estimation, this procedure is applied solely to the observations in the subset . Option orderR2 is available at each call to the backfitting function. Flowcharts for initialization and iterative use of the outer loop of RAVAS are in the supplementary material.
This robust procedure, with ordering, offers interesting suggestions for robust model selection for GAMs. We do not here develop this idea further.
4.2 Main or Outer Loop
The only option remaining to be described is the robust version of the numerical variance stabilizing transformation of Section 4.2.1.
4.2.1 Numerical Variance stabilizing Transformation: Option Trapezoid
Plots of residuals against fitted values are widely used in regression analysis to check the assumption of constant variance (Anscombe and Tukey Citation1963; Cox and Snell Citation1981). Here the observations have been transformed, so the fitted values are . To estimate the variance stabilizing transformation, the fitted values have to be sorted, giving a vector of ordered values . The residuals are ordered in the same way and, following the procedure of Section 3.2, provide estimates of the integrand in (4). The provide estimates at the ordered points . Calculation of the variance transformation (4) is however for sorted observed responses , rather than fitted, transformed responses . As did Tibshirani, we use the trapezoidal rule to approximate the integral. Linear interpolation and extrapolation are used in calculation of the at the . We provide an option “trapezoid” for the choice between two methods for the extrapolation of the variance function estimate, the interpolation method remaining unchanged. Our approach leads to trapezoidal summands in the approximation to the integral for the extrapolated elements, whereas Tibshirani’s leads to rectangular elements. When we are concerned with robust inference, there are only members of whereas there are n values of , so that robustness increases the effect of the difference between the two rules. The option trapezoid = false uses rectangular elements in extrapolation.
For a transformation that stabilizes variance, there is no relationship between var and ty. Thus, the smoothed values of the residuals will be constant, apart from random fluctuations.
A more detailed discussion of the calculation of the variance stabilizing transformation, together with the flowchart, are in the supplementary materials. A figure and clarificatory discussion are in Atkinson et al. (Citation2023, chap. 7).
5 Simulated Examples
In this section we start with a simple example to compare the data analyses from RAVAS with the nonrobust AVAS of Tibshirani when there are a few outliers. In Section 5.2 our analysis is of data without introduced outliers to demonstrate the effect of our options on the performance of the nonrobust algorithm.
5.1 Example 1—Robustness
There are 151 observations with x equally spaced over 0,(0.1),15. The linear model is
Outliers of value one replace the values of z in positions 100, 105, 106,…110. That is there are seven outliers at . The response . Thus, a logarithmic transformation of the observed responses is appropriate. We start with a nonrobust analysis excluding our improvements to the algorithm. That is, there is no initial transformation of the response (tyinitial = false) and we use the rectangular extrapolation when calculating the variance stabilizing transformation (trapezoid = false).
The top left-hand panel of shows the transformation of y, which is roughly logarithmic, but with a slight decrease in gradient near the centre of the plot. The top right-hand panel shows the transformed values of y against fitted values. Two horizontal linear structures are apparent; a set of six values of transformed y close to one and several values close to 1.75. The bottom panel of shows a very clear distortion of the sinusoid of in the range of the outliers, that is 9.9–10.9. This “rug” plot shows the uniform distribution of the values of x.
In our robust analysis we used the initial transformation of y and set trapezoid = true. The top panels of show plots of the transformed values of y with the seven outliers automatically detected by the robust procedure. The plot against y shows a smoother curve than that for the nonrobust analysis. However, the greater difference is in the plot against fitted values. There is now a regression line with less scatter than in ; the linear structure at the bottom of the plot is now absent and the horizontal line formed by the group of six outliers is more clearly distanced from the fitted regression. The plot also shows the isolated outlier (corresponding to x = 9.9). The bottom panel of shows that the robust analysis has removed the distortion in the estimation of the sinusoidal form of , except where the outliers have been deleted.
In this and other figures the outliers are indicated by filled symbols. In the online .pdf version they are colored red.
5.2 Example of Effects from Options in RAVAS
The preceding example shows that the robustness of RAVAS leads to the identification of the seven outliers and to the virtual recovery of the model from which the data were simulated. In this section we use further simulations to explore how our modifications, not necessarily the robustness itself, lead to improved analyses of data.
Two-variable model. We use an example without outliers to illustrate the importance of some options including the initial transformation of the response. There are 151 observations with . The linear model is where . The observed responses are , so that a 1/3rd transformation is expected. As well as plotting residuals and fitted values from some of the fitted models, we also calculate the value of and the significance level of the Durbin-Watson statistic (Durbin and Watson Citation1950) for the independence of the residuals ordered by the fitted values from the model.
shows results from using RAVAS on these data without any options. The left-hand panel shows the plot of residuals against fitted values which has a sinusoidal shape. The plot of transformed y against fitted values in the right-hand panel is logistic in shape, rather than linear. These two plots indicate that the fitted model is not satisfactory, an impression confirmed by the value of for the significance of the Durbin-Watson statistic. We do not show the plot of transformed y against y which is roughly appropriate for a transformation of 1/3.
We now proceed adding options in an ad hoc manner, starting with scail, that is the rescaling of the explanatory variables, Section 4.1.2. shows plots of the same properties as . The plot of the residuals now shows a series of rough diagonal bands with an outline for larger fitted values similar to that in . Although straighter, the plot of transformed against fitted y is again curved for higher values of y. These improvements lead to a less significant value of for the significance of the Durbin-Watson statistic. The fit is still far from satisfactory.
In the modeling producing the third pair of figures () we have also included the initial transformation of the responses (tyinitial, Section 4.1.1). It is clear from scatterplots of the data that the transformation parameter should be positive but small. We used a grid of
The plot of residuals against fitted values in the left-hand panel of the figure is greatly improved from that of but is still not random, showing some curvature. This is reflected in the improved, but still significantly small, value of for the significance of the Durbin-Watson test. The plot of transformed y against fitted values is closer to a straight line than before.
These three different sets of options show steady improvement in the normality of the residuals. We have examined them in some detail because we observed plots with structures similar to those in the left-hand panels of and when we used AVAS to analyze data for response transformations in Atkinson, Riani, and Corbellini (Citation2020), although we did not publish such plots. We have also failed to find any references to such phenomena. We conclude this progression with the analysis using all five options.
The upper left-hand panel of shows the smooth curve of the transformed responses and the upper right-hand panel shows the plot of residuals against fitted values. This appears to be a random scatter with, surprisingly, two outliers, an unexpected artifact of the simulation. The significance value for the Durbin-Watson statistic is 0.77, so that independence of the residuals has been achieved, once the outliers are deleted. The bottom left-hand panel shows the linear relationship between transformed response and fitted values, as is expected for a GAM, together with the two outliers. The estimated transformations of the two explanatory variables are in the bottom right-hand panel of the figure. The first explanatory variable shows the shape of a scaled and translated sine function, with the two observations for the largest values of being marked as outlying. The second explanatory variable has the desired exponential form. Plots of these nonparametric functions with superimpositions of the parametric forms can be used to check closeness to the parametric models.
6 A Graphical Representation of the Systematic Evaluation of Options
We have added five options to the original AVAS. There are therefore 32 combinations of options that could be chosen. It is not obvious that all will be necessary when analyzing all sets of data. Our program provides flexibility in the assessment of these options. One possibility is a list of options ordered by, for example, the value of or of the significance of the Durbin-Watson or the normality test. In this section we describe the augmented star plot, one graphical method for visualizing interesting combinations of options in a particular data analysis. An example is . introduces a brushable version of the augmented star plot which links to residual and other plots from the fitted model.
We remove all analyses for which the residuals fail the Durbin-Watson and Jarque-Bera normality tests, at the 10% level (two-sided for Durbin-Watson). The Jarque-Bera (Jarque and Bera Citation1987) test uses a combination of estimated skewness and kurtosis to test the distributional shape of the residuals. There is an option for independent choice of the levels of these two tests. We used a threshold of 5% for both. We optionally enclose them in a polygon.
We order the remaining, admissible, solutions by the Durbin-Watson significance level multiplied by the value of and by the number of units not declared as outliers. Other options are available. The lengths of rays in individual panels of the plot are of equal length for those features used in an analysis; we enclose them in a polygon. All rays are in identical places in each panel of the plot; the lengths of the rays for each analysis are proportional to , the significance level of the Durbin-Watson test. Each ray has a different color.
The ordering in which the five options are displayed in the plot depends on the frequency of their presence in the set of admissible solutions. For example, if robustness is the one which has the highest frequency, its ray is shown on the right. The remaining options are displayed counterclockwise, in order of frequency.
We list the five options giving an informative text description followed by the optional input arguments.
Initial robust transformation of the response; tyinitial—Section 4.1.1.
Remove effect of order of explanatory variables by regression; scail—Section 4.1.2.
Robustness; rob—Section 4.1.3.
Ordering variables in the backfitting algorithm using values; orderR2—Section 4.1.4.
Trapezoidal or rectangular rule for numerical integration; trapezoid—Section 4.2.1.
Example 2.
As an initial example we consider an extension of Example 1 simulated in Section 5.1 in which there are now four explanatory variables. There are again 151 observations with equally spaced from 0,(0.1),15. The linear model is where are independently Eight outliers of value one replace the values of z at . The response . There are thus four explanatory variables, only one of which requires transformation, as does the response to logy.
shows the augmented star plots of the two combinations of options for which the residuals pass the tests of independence and normality. The first solution uses four options with the trapezoidal integration rule added for the second solution. The statistical properties of the two solutions are indistinguishable. For both, = 0.998, and is 0.088 and 0.090.
We now compare the analysis without options with that suggested in . shows the nonrobust results from using AVAS. The value of . Although the response has been adequately transformed the residuals are far from random (the significance level for the Durbin-Watson test is , the sinusoid for is distorted and has been subjected to nonlinear transformation.
The fit from RAVAS using all options identifies nine outliers, the eight generated intentionally and one at unit 53. The upper-right and lower-left panels of show, after the outliers have been deleted, that the residuals are structure free when plotted against fitted values and that there is a linear relationship between the transformed values of y and the fitted values. The lower right-hand panel plots the transformed values of . The sinusoid has been recovered. We do not show the plots of the transformations of the other explanatory variables, which are straight lines, unlike some of the plots in the lower right-hand panel of .
Example from Wang and Murphy. Wang and Murphy (Citation2005) used ACEPACK (Spector et al. Citation2016) to illustrate the use of ACE and AVAS in the transformation of data. Their illustrations did not include any robust methods. We now illustrate the use of RAVAS to analyze one of their examples to which we have introduced some outliers. We also introduce an extended version of the augmented star plot which has advantages when, as here, many statistical options provide satisfactory fits to the data. The extension also allows brushing.
There are 200 observations and four explanatory variables all independently uniformly distributed on The linear model is (5) (5) and There are nine outliers each randomly generated as
shows the six combinations of options which satisfy the constraints on probabilities. As expected, these are plotted in a different order from those in . Solution 4 uses only robustness, which is included in all six combinations of options. The optional bars in the figure, representing the four properties of the fitted models, have been introduced as being easier to interpret than the list of numerical values, such as those in , when the figure displays several solutions. Since the maximum number of outliers is we plot . Brushing the figure reveals plots such as those in and , depending on which augmented star plot is brushed. Details are in section of the supplementary material.
The values of and of do not change as much as those for . The correlations between the pairs of fitted values for the various combinations have a minimum value of 0.996. There is virtually no difference between the fitted models, despite the four selections of options detecting nine outliers and two detecting 11.
We look at some properties of the best solution from , which uses all options apart from the trapezoidal rule. The plot of transformed y against y in the left-hand panel of shows the exponential relationship which is the inverse of the logarithmic transformation used to generate the data. The plot of residuals against fitted values in the right-hand panel shows the lack of relationship between the residuals and the fitted values. The nine outliers are clearly identified, although they are not evident in the scatterplots of the data. The transformations for the four explanatory variables () recover the model used in simulation (5). Particularly impressive is the plot for where the sharpness of the absolute value transformation is achieved by the smoothing algorithm. Similar results are obtained by Wang and Murphy (Citation2005) who continue their analysis to the important step of testing the fit of parametric versions of these nonparametric forms for
7 Internet Marketing Data
The R package datarium (Kassambara Citation2019) contains 200 results of an experiment on internet marketing. The explanatory variables are the budgets for advertising on YouTube, Facebook, and newspapers and the response is sales. These data starkly show the gains in data analysis that come from using RAVAS with options rather than AVAS.
We start with linear regression on the three explanatory variables, which gives an F statistic for regression of 504 and a t statistic for of 0.658. That newspapers advertising has no effect and that there is a strong effect of advertising in the two online media is a constant pattern in all our analyses. We gain more insight into the structure of the data by using AVAS. It makes sense to assume that increasing advertising expenditures increase sales, so we include a monotonicity constraint on the transformations of the explanatory variables.
The resulting regression gives an F-value of 472 (even smaller than that for linear regression). shows that there is a strongly nonrandom curved pattern in the plot of residuals against fitted values, despite the smooth transformation of the response. Use of RAVAS with all options leads to the deletion of 17 observations giving an F value of 936. Despite the number of observations deleted, the residuals fail both tests of normality—with significance levels dw = and jb = 0.0095.
shows the plot of residuals against fitted values in the RAVAS analysis with all options. This figure shows that the residuals are not yet free of structure. The deletion of the large number of outliers has reduced the curved pattern evident in . The deletion of 17 outliers has also produced relatively smooth curves in the other panels, except for the transformation of which has a bizarre stepped shape. Since this variable is not significant, the pattern can be ignored.
The curved pattern that remains in the plot of fitted values against residuals, ignoring outliers, suggests that a second-order model might be appropriate. We drop from the model and fit a full second-order model in and , including quadratic and interaction terms. The augmented star plot, , shows that the preferred model includes the options tyinitial, rob, and scail. The significance levels of the two tests are dw =0.98 and jb = 0.033. This fit detects one outlier and produces an F-statistic for regression of 5579, a more than 10-fold increase from our first fit using AVAS. The results in show a smooth transformation of the response and a linear relationship between the transformed response and fitted values, once the outlier is ignored. The transformed explanatory variables all have smoothly changing shapes and all are significant; the least significant t-statistic is 3.01.
To conclude our analysis we look at the results of fitting (nonrobust) AVAS with a quadratic model in two variables, that is, excluding . The results, compared with RAVAS, which led to the deletion of one observation, are amazing. There are, of course, no deleted outliers. The F-statistic now has a value of 556, hardly better than regression on the first-order model without transformation of the response or of the explanatory variables. The plot (not given here) of tXj against Xj shows that there are smooth transformations of all five explanatory variables. However, the plot of the transformed response against y (left-hand panel of ) includes an unexpected kink. The plot of transformed y against fitted values (again not shown) is not a straight line, but is lumpy for low values. Most surprisingly, the plot of residuals against fitted values (right-hand panel of ), shows the inexplicable diagonal structures that we noticed in and . Such structures were the original stimulus for our attempts to provide a reliable version of AVAS that produces interpretable data analyses.
8 Discussion
Our examples in this article illustrate the excellent properties of our robust procedure. Even if robustness is not of interest, the results of Section 5.2 show how we improved on the performance of Tibshirani’s algorithm. Harrell Jr (Citation2019) claims that AVAS is unstable unless the sample size exceeds 350. We used simulation to explore the properties of our method, comparing RAVAS with traditional AVAS, that is with no options. We found no numerical instability in either procedure for sample sizes from 200 to 10,000 and values of p from 5 to 20 with 10% additive outliers. The mean squared error of parameter estimates from RAVAS remained virtually constant as the outliers became more remote, whereas that for AVAS increased steadily. The test for the detection of outliers using RAVAS tended to power one as the outliers became more remote. The average number of iterations for AVAS increased with sample size, and outlier remoteness, reaching the allowed upper limit of 20. The average number of iterations for RAVAS only rose above three for small sample sizes and moderate shift contamination, the situation in which outliers cause significant bias in parameter estimates, but are hard to detect. We believe we have provided a robust version of AVAS in the spirit of the remark by Buja and Kass (Citation1985) quoted in Section 1, that is a procedure that retains the computational simplicity of the original AVAS.
Supplementary Materials
The first three sections of the supplementary material provide flow charts for the program we wrote to implement RAVAS. Section 2 is the initialisation of the fitting procedure including implementation of the four non-iterative options; Section 3 describes the iterative part of the algorithm and sets up the environment for the numerical variance stabilizing transformation; Section 4 provides the fitted values and residuals to which the trapezoidal integration approximation is applied; Section 5 provides links, inter alia, to the code used for the calculations and the plotting of graphs.
Supplemental Material
Download Zip (131.5 KB)Acknowledgments
We are very grateful to the editor, associate editor and a referee, whose comments greatly helped us to clarify the presentation of our work. Our research has benefited from the High Performance Computing (HPC) facility of the University of Parma. We acknowledge financial support from the University of Parma project “Robust statistical methods for the detection of frauds and anomalies in complex and heterogeneous data.”
Disclosure Statement
The authors confirm there are no competing interests to be declared.
Additional information
Funding
References
- Alimadad, A., and Salibian-Barrera, M. (2011), “An Outlier-Robust Fit for Generalized Additive Models with Applications to Disease Outbreak Detection,” Journal of the American Statistical Association, 106, 719–731. DOI: 10.1198/jasa.2011.tm09654.
- Anscombe, F. J., and Tukey, J. W. (1963), “The Examination and Analysis of Residuals,” Technometrics, 5, 141–160. DOI: 10.1080/00401706.1963.10490071.
- Atkinson, A. C., and Riani, M. (2000), Robust Diagnostic Regression Analysis, New York: Springer–Verlag.
- Atkinson, A. C., Riani, M., and Cerioli, A. (2010), “The Forward Search: Theory and Data Analysis ,” (with Discussion), Journal of the Korean Statistical Society, 39, 117–134. DOI: 10.1016/j.jkss.2010.02.007.
- Atkinson, A. C., Riani, M., and Corbellini, A. (2020), “The Analysis of Transformations for Profit-and-Loss Data,” Applied Statistics, 69, 251–275. DOI: 10.1111/rssc.12389.
- Atkinson, A. C., Riani, M., and Corbellini, A. (2021), “The Box-Cox Transformation: Review and Extensions,” Statistical Science, 36, 239–255. DOI: 10.1214/20-STS778.
- Atkinson, A. C., Riani, M., Corbellini, A., Perrotta D., and Todorov, V. (2023), Applied Robust Statistics through the Monitoring Approach, Heidelberg: Springer Nature (in preparation).
- Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972), Statistical Inference under Order Restrictions, Chichester: Wiley.
- Beaton, A. E., and Tukey, J. W. (1974), “The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data,” Technometrics, 16, 147–185. DOI: 10.1080/00401706.1974.10489171.
- Boente, G., Martínez, A., and Salibián-Barrera, M. (2017), “Robust Estimators for Additive Models Using Backfitting,” Journal of Nonparametric Statistics, 29, 744–767. DOI: 10.1080/10485252.2017.1369077.
- Box, G. E. P., and Cox, D. R. (1964), “An Analysis of Transformations,” (with Discussion), Journal of the Royal Statistical Society, Series B, 26, 211–252. DOI: 10.1111/j.2517-6161.1964.tb00553.x.
- Box, G. E. P., and Tidwell, P. W. (1962), “Transformations of the Independent Variables,” Technometrics, 4, 531–550. DOI: 10.1080/00401706.1962.10490038.
- Breiman, L. (1988), Comment on “Monotone Regression Splines in Action” (Ramsey, 1988), Statistical Science, 3, 442–445. DOI: 10.1214/ss/1177012762.
- Breiman, L., and Friedman, J. H. (1985), “Estimating Optimal Transformations for Multiple Regression and Transformation,” (with discussion), Journal of the American Statistical Association, 80, 580–619. DOI: 10.1080/01621459.1985.10478157.
- Buja, A., and Kass, R. E. (1985), “Comment on “Estimating Optimal Transformations for Multiple Regression and Transformation” by Breiman and Friedman,” Journal of the American Statistical Association, 80, 602–607. DOI: 10.1080/01621459.1985.10478159.
- Buja, A., Hastie, T., and Tibshirani, R. (1989), “Linear Smoothers and Additive Models,” Annals of Statistics, 17, 453–510.
- Cantoni, E., and Ronchetti, E. (2001), “Robust Inference for Generalized Linear Models,” Journal of the American Statistical Association, 96, 1022–1030. DOI: 10.1198/016214501753209004.
- Cox, D. R., and Snell, E. J. (1981), Applied Statistics: Principles and Examples, London: Chapman and Hall.
- Durbin, J., and Watson, G. S. (1950), “Testing for Serial Correlation in Least Squares Regression: I,” Biometrika, 37, 409–428.
- Foi, A. (2008), “Direct Optimization of Nonparametric Variance-Stabilizing Transformations,” in Meeting on Mathematical Statistics. C.I.R.M. Marseille.
- Fowlkes, E. B., and Kettenring, J. R. (1985), “Comment on “Estimating Optimal Transformations for Multiple Regression and Transformation” by Breiman and Friedman,” Journal of the American Statistical Association, 80, 607–613. DOI: 10.1080/01621459.1985.10478160.
- Friedman, J., and Stuetzle, W. (1982), “Smoothing of Scatterplots,” Technical Report, Department of Statistics, Stanford University, Technical Report ORION 003.
- Hampel, F. R. (1975), “Beyond Location Parameters: Robust Concepts and Methods,” Bulletin of the International Statistical Institute, 46, 375–382.
- Harrell Jr, F. E. (2019), Hmisc, R package version 4.2-0.
- Hastie, T., and Tibshirani, R. (1986), “Generalized Additive Models,” Statistical Science, 1, 297–318. DOI: 10.1214/ss/1177013604.
- Hastie, T., and Tibshirani, R. (1988), Comment on “Monotone Regression Splines in Action” (Ramsey, 1988), Statistical Science, 3, 451–456.
- Hastie, T. J., and Tibshirani, R. J. (1990), Generalized Additive Models, London: Chapman and Hall.
- Jarque, C. M., and Bera, A. K. (1987), “A Test for Normality of Observations and Regression Residuals,” International Statistical Review, 52, 163–172. DOI: 10.2307/1403192.
- Kassambara, A. (2019), datarium: Data Bank for Statistical Analysis, R package version 0.1.0.
- Ramsay, J. O. (1988), “Monotone Regression Splines in Action,” Statistical Science, 3, 425–461. DOI: 10.1214/ss/1177012761.
- Riani, M., Atkinson, A. C., and Cerioli, A. (2009), “Finding an Unknown Number of Multivariate Outliers,” Journal of the Royal Statistical Society, Series B, 71, 447–466. DOI: 10.1111/j.1467-9868.2008.00692.x.
- Riani, M., Atkinson, A. C., and Perrotta, D. (2014), “A Parametric Framework for the Comparison of Methods of Very Robust Regression,” Statistical Science, 29, 128–143. DOI: 10.1214/13-STS437.
- Riani, M., Atkinson, A. C., and Corbellini, A. (2023), “Automatic Robust Box-Cox and Extended Yeo-Johnson Transformations in Regression,” Statistical Methods and Applications, 32, 75–102. DOI: 10.1007/s10260-022-00640-7.
- Rousseeuw, P. J. (1984), “Least Median of Squares Regression,” Journal of the American Statistical Association, 79, 871–880. DOI: 10.1080/01621459.1984.10477105.
- Rousseeuw, P. J., and Yohai, V. J. (1984), “Robust Regression by Means of S-estimators,” in Robust and Nonlinear Time Series Analysis: Lecture Notes in Statistics 26, eds. J. Franke, W. Härdle, and R. D. Martin, pp. 256–272, New York: Springer Verlag.
- Schimek, M. G., and Turlach, B. A. (2006), “Additive and Generalized Additive Models,” Technical report, Sonderforschungsbereich 373, Humboldt University, Berlin.
- Spector, P., Friedman, J., Tibshirani, R., Lumley, T., Garbett, S., and Baron, J. (2016), acepack: ACE and AVAS for Selecting Multiple Regression Transformations, R package version 1.4.1.
- Spiegel, E., Kneib1, T., and Otto-Sobotka, F. (2019), “Generalized Additive Models with Flexible Response Functions,” Statistics and Computing, 29, 123–138. DOI: 10.1007/s11222-017-9799-6.
- Tibshirani, R. (1988), “Estimating Transformations for Regression via Additivity and Variance Stabilization,” Journal of the American Statistical Association, 83, 394–405. DOI: 10.1080/01621459.1988.10478610.
- Torti, F., Corbellini, A., and Atkinson, A. C. (2021), “fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search,” Stats, 4, 327–347. DOI: 10.3390/stats4020022.
- Wang, D., and Murphy, M. (2005), “Identifying Nonlinear Relationships in Regression Ysing the ACE Algorithm,” Journal of Applied Statistics, 32, 243–258. DOI: 10.1080/02664760500054517.
- Wang, M., Lyu, B., and Yu, G. (2021), “ConvexVST: A Convex Optimization Approach to Variance-Stabilizing Transformation,” in Proceedings of the 38th International Conference on Machine Learning (Vol. 139), pp. 10839–10848.
- Yohai, V. J., and Zamar, R. H. (1988), “High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale,” Journal of the American Statistical Association, 83, 406–413. DOI: 10.1080/01621459.1988.10478611.