902
Views
1
CrossRef citations to date
0
Altmetric
General Regression Methods

Functional Additive Models on Manifolds of Planar Shapes and Forms

, &
Pages 1600-1612 | Received 22 Nov 2021, Accepted 24 Dec 2022, Published online: 15 Mar 2023

Abstract

The “shape” of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation, and scaling, its “form” its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian L2-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even nonlinear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of (a) astragalus shapes of wild and domesticated sheep and (b) cell forms generated in a biophysical model, as well as (c) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines. Supplementary materials for this article are available online.

1 Introduction

In many imaging data problems, the coordinate system of recorded objects is arbitrary or explicitly not of interest. Statistical shape analysis (Dryden and Mardia Citation2016) addresses this point by identifying the ultimate object of analysis as the shape of an observation, reflecting its geometric properties invariant under translation, rotation and rescaling, or as its form (or size-and-shape) invariant under translation and rotation. This article establishes a flexible additive regression framework for modeling the shape or form of planar (potentially irregularly sampled) curves and/or landmark configurations in dependence on scalar covariates. A rich shape analysis literature has been developed for 2D or 3D landmark configurations—presenting for instance selected points of a bone or face—which are considered elements of Kendall’s shape space (see, e.g., Dryden and Mardia Citation2016). In many 2D scenarios, however, observed points describe a curve reflecting the outline of an object rather than dedicated landmarks (Adams, Rohlf, and Slice Citation2013). Considering outlines as images of (parameterized) curves shows a direct link to functional data analysis (FDA, Ramsay and Silverman Citation2005) and, in this context, we speak of functional shape/form data analysis. As in FDA, functional shape/form data can be observed on a common and often dense grid (regular/dense design) or on curve-specific often sparse grids (irregular/sparse design). While in the regular case, analysis often simplifies by treating curve evaluations as multivariate data, more general irregular designs gave rise to further developments in sparse FDA (e.g., Yao, Müller, and Wang Citation2005; Greven and Scheipl Citation2017), explicitly considering irregular measurements instead of pre-smoothing curves. To the best of our knowledge, we are the first to consider irregular/sparse designs in the context of functional shape/form analysis.

Shapes and forms are examples of manifold data. Petersen and Müller (Citation2019) propose “Fréchet regression” for random elements in general metric spaces, which requires estimation of a (potentially negatively) weighted Fréchet mean for each covariate combination. Their implicit rather than explicit model formulation renders model interpretation difficult. More explicit model formulations have been developed for the special case of a Riemannian geometry. Besides tangent space models (Kent et al. Citation2001), extrinsic models (Lin et al. Citation2017) and models based on unwrapping (Jupp and Kent Citation1987; Mallasto and Feragen Citation2018), a variety of manifold regression models have been designed based on the intrinsic Riemannian geometry. Starting from geodesic regression (Fletcher Citation2013), which extends linear regression to curved spaces, these include MANOVA (Huckemann, Hotz, and Munk Citation2010), polynomial regression (Hinkle, Fletcher, and Joshi Citation2014), smoothing splines (Kume, Dryden, and Le Citation2007), regression along geodesic paths with nonconstant speed (Hong et al. Citation2014), or kernel regression (Davis et al. Citation2010) and Kriging (Pigoli, Menafoglio, and Secchi Citation2016). However, mostly only one metric covariate or categorical covariates are considered, possibly in hierarchical model extensions for longitudinal data (Muralidharan and Fletcher Citation2012; Schiratti et al. Citation2017). By contrast, Zhu et al. (Citation2009), Shi et al. (Citation2009), and Kim et al. (Citation2014) generalize geodesic regression to regression with multiple covariates focusing on Symmetric Positive-Definite (SPD) matrix responses. Cornea et al. (Citation2017) develop a general Generalized Linear Model (GLM) analogue regression framework for responses in a symmetric manifold and apply it to shape analysis. Recently, Lin, Müller, and Park (Citation2020) proposed a Lie group additive regression model for Riemannian manifolds focusing on SPD matrices rather than shapes.

In FDA, there is a much wider range of developed regression methods (see overviews in Morris Citation2015; Greven and Scheipl Citation2017). Among the most flexible models are Functional Additive Models (FAMs) for (univariate) functional responses (in contrast to FAMs with functional covariates (Ferraty et al. Citation2011)) with either semi- or nonparametric approaches to model (a) response functions and (b) smooth covariate effects. For (a), nonparametric approaches formulate estimation problems in infinite-dimensional model spaces to motivate finite-dimensional representations or effectively evaluate curves on grids (e.g., Jeon and Park Citation2020). Semi-parametric approaches directly employ finite expansions in spline bases (Brockhaus, Scheipl, and Greven Citation2015), Functional Principal Component (FPC) bases (Morris and Carroll Citation2006) or both (Scheipl, Staicu, and Greven Citation2015), as well as wavelets (Meyer et al. Citation2015), sometimes directly expanding functions to model on coefficients and sometimes expanding only predictions while keeping the raw measurements. Nonparametric approaches are formulated in infinite-dimensional model spaces and effectively evaluate curves on grids or apply pre-smoothing techniques (e.g., Jeon and Park Citation2020). For (b), again semiparametric penalized spline basis approaches are employed (Scheipl, Staicu, and Greven Citation2015; Brockhaus, Scheipl, and Greven Citation2015), or local linear/polynomial (Müller and Yao Citation2008; Jeon et al. Citation2022) or other nonparametric kernel-based approaches (Jeon and Park Citation2020; Jeon, Park, and Van Keilegom Citation2021). Semi- and nonparametric approaches come with different theoretical and practical advantages, but similarities such as regarding asymptotic behavior are also known from scalar nonparametric regression (Li and Ruppert Citation2008). Advantages of the semi-parametric approach summarized in Greven and Scheipl (Citation2017) include its appropriateness for sparse irregular functional data and its modular extensibility to functional mixed models (Scheipl, Staicu, and Greven Citation2015; Meyer et al. Citation2015) and nonstandard response distributions (Brockhaus, Scheipl, and Greven Citation2015; Stöcker et al. Citation2021). For bivariate or multivariate functional responses, which are closest to functional shapes/forms but without invariances, Rosen and Thompson (Citation2009), Zhu, Li, and Kong (Citation2012), Olsen, Markussen, and Raket (Citation2018) consider linear fixed effects of scalar covariates, the latter also allowing for warping. Zhu et al. (Citation2017), Backenroth et al. (Citation2018) consider one or more random effects for one grouping variable, linear fixed effects and common dense grids for all functions. Volkmann et al. (Citation2021) combine the FAM model class of Greven and Scheipl (Citation2017) with multivariate FPC analysis (Happ and Greven Citation2018) to model multivariate (sparse) functional responses.

This article establishes an interpretable FAM framework for modeling the shape or form of planar (potentially irregularly sampled) curves and/or landmark configurations in dependence on scalar covariates, extending L2-Boosting (Bühlmann and Yu Citation2003; Brockhaus, Scheipl, and Greven Citation2015) to Riemannian manifolds for model estimation. The three major contributions of our regression framework are: (i) We introduce additive regression with shapes/forms of planar curves and/or landmarks as response, extending FAMs to nonlinear response spaces or, vice versa, extending GLM-type regression on manifolds for landmark shapes both to functional shape manifolds and to include (nonlinear) additive model effects. (ii) We propose a novel Riemannian L2-Boosting algorithm for estimating regression models for this type of manifold response, and (iii) a visualization technique based on tensor-product factorization yielding intuitive interpretations even of multi-dimensional smooth covariate effects for practitioners. Although related tensor-product model transformations based on higher-order SVD have been used, e.g., in control engineering (Baranyi, Yam, and Várlaki Citation2013), we are not aware of any comparable application for visualization in FAMs or other statistical models for object data. Despite our focus on shapes and forms, transfer of the model, Riemannian L2-Boosting, and factorized visualization to other Riemannian manifold responses is intended in the generality of the formulation and the design of the provided R package manifoldboost (developer version on github.com/Almond-S/manifoldboost). The versatile applicability of the approach is illustrated in three different scenarios: an analysis of the shape of sheep astragali (ankle bones) represented by both regularly sampled curves and landmarks in dependence on categorical “demographic” variables; an analysis of the effects of different metric biophysical model parameters (including smooth interactions) on the form of (irregularly sampled) cell outlines generated from a cellular Potts model; and a simulation study with irregularly sampled functional shape and form responses generated from a dataset of different bottle outlines and including metric and categorical covariates.

In Section 2, we introduce the manifold geometry of irregular curves modulo translation, rotation and potentially rescaling, which underlies the intrinsic additive regression model formulated in Section 3. The Riemannian L2-Boosting algorithm is introduced in Section 4. Section 5 analyzes different data problems, modeling sheep bone shape responses (Section 5.1) and cell outlines (Section 5.2). Section 5.3 summarizes the results of simulation studies with functional shape and form responses. We conclude with a discussion in Section 6.

2 Geometry of Functional Shapes and Forms

Riemannian manifolds of planar shapes (and forms) are discussed in various textbooks at different levels of generality, in finite (Kendall et al. Citation1999; Dryden and Mardia Citation2016) or potentially infinite dimensions (Srivastava and Klassen Citation2016; Klingenberg Citation1995). Starting from the Hilbert space Y of curve representatives y of a single shape or form observation, we successively characterize its quotient space geometry under translation, rotation and rescaling including the respective tangent spaces. Building on that, we introduce Riemannian exponential and logarithmic maps and parallel transports needed for model formulation and fitting, and the sample space of (irregularly observed) functional shapes/forms.

To make use of complex arithmetic, we identify the two-dimensional plane with the complex numbers, R2 C, and consider a planar curve to be a function y: RT C, element of a separable complex Hilbert space Y with a complex inner product ·,· and corresponding norm ·. This allows simple scalar expressions for the group actions of translation Trl={yTrlγy+γ 1:γ  C} with 1Y canonically given by 1:t1t1 the real constant function of unit norm; rescaling Scl={ySclλλ·(y0y)+0y:λ R+} around the centroid 0y=1 ,y1 (which we consider more natural than using 0, the zero element of Y, mostly chosen in the literature); and rotation Rot={yRotuu·(y0y)+0y:uS1} around 0y with S1={uC:|u|=1}={exp(ω1):ωR} reflecting counterclockwise rotations by ω radian measure. Concatenation yields combined group actions G as direct products, such as the rigid motions G=Trl×Rot={Trlγ°Rotu:γC,u S1}C×S1 (see Section S.1.1, supplementary materials for more details). The two real-valued component functions of y are identified with the real part Re(y):T R and imaginary part Im(y):T R of y=Re(y)+Im(y)1. While the complex setup is used for convenience, the real part of ·,· constitutes an inner product Re(y1,y2)=Re(y1),Re(y2)+Im(y1),Im(y2) for y1,y2Y on the underlying real vector space of planar curves. Typically Re(y), Im(y) are assumed square-intregrable with respect to a measure ν and we consider the canonical inner product y1,y2=y1y2dν where y denotes the conjugate transpose of y, that is, y(t)=Re(y)(t)Im(y)(t)1 is simply the complex conjugate, but for vectors yCk, the vector y is also transposed. For curves, we typically assume ν to be the Lebesgue measure on T=[0,1]; for landmarks, a standard choice is the counting measure on T={1,,k}.

The ultimate response object is given by the orbit [y]G={g(y):gG} (or short [y]) of yY, the equivalence class under the respective combined group actions G: with G=Trl×Rot×Scl,[y]=[y]Trl×Rot×Scl={λu y+γ 1:λ R+,uS1,γ C} is referred to as the shape of y and, for G=Trl×Rot, [y]=[y]Trl×Rot={uy+γ 1:uS1,γC} as its form or size-and-shape. Y/G={[y]G:yY} denotes the quotient space of Y with respect to G. The description of the Riemannian geometry of Y/G involves, in particular, a description of the tangent spaces T[y]Y/G at points [y]Y/G, which can be considered local vector space approximations to Y/G in a neighborhood of [y]. For a point q in a manifold M the tangent vectors βTqM can, i.a., be thought of as gradients ċ(0) of paths c:R(δ,δ)M at 0 where they pass through c(0)=q. Besides their geometric meaning, they will also play an important role in the regression model, as additive model effects are formulated on tangent space level. Choosing suitable representatives y˜G[y]GY (or short y˜) of orbits [y]G, we use an identification of tangent spaces with suitable linear subspaces T[y]GY/GY.

Form geometry: Starting with translation as the simplest invariance, an orbit [y]Trl can be one-to-one identified with its centered representative y˜Trl=yy,1 1 yielding an identification Y/Trl{yY:y,1 =0} with a linear subspace of Y. Hence, also T[y]Y/Trl={yY:y,1 =0}. For rotation, by contrast, we can only find local identifications with Hilbert subspaces (i.e., charts) around reference points [p]Trl×Rot we refer to as “poles”. Moreover, we restrict to y,pY*=Y[0 ]Trl eliminating constant functions as degenerate special cases in the translation orbit of zero. For each [y]Trl×Rot in an open neighborhood around [p]Trl×Rot which can be chosen with y˜Trl,p˜Trl0, y can be uniquely rotation aligned to p, yielding a one-to-one identification of the form [y]Trl×Rot with the aligned representative given by y˜Trl×Rot=y˜Trl,p˜Trl|y˜Trl,p˜Trl|y˜Trl=argminy[y]Trl×Rotyp (compare ). While y˜Trl×Rot depends on p, we omit this in the notation for simplicity. All y˜Trl rotation aligned to p˜Trl lie on the hyper-plane determined by Im(y˜Trl,p˜Trl)=0 (), which yields T[p]Y/Trl+Rot*={yY:y,1=0, Im(y,p)=0} with normal vectors ζ(1)=1,ζ(2)=1 1,ζ(3)=1 p. Note that, despite the use of complex arithmetic, T[p]Y/Trl×Rot* is a real vector space not closed under complex scalar multiplication. The geodesic distance of [y]Trl×Rot to the pole [p]Trl×Rot is given by d([y]Trl×Rot,[p]Trl×Rot)=y˜Trl×Rotp˜Trl=argminy[y]Trl×Rot,p[p]Trl×Rotyp. It reflects the length of the shortest path (i.e., the geodesic) between the forms and the minimum distance between the orbits as sets.

Fig. 1 Left: Quotient space geometry: assuming p and y centered, translation invariance is not further considered in the plot; given pole representative p, we express y=Re(p,y)p2p+Im(p,y)p2ip+(yp,yp2p)Y in its coordinates in p and ip direction, subsuming all orthogonal directions in the third dimension. In this coordinate system, the rotation orbit [y]Rot corresponds to the dotted horizontal circle, and is identified with the aligned y˜:=y˜Rot in the half-plane of p; [y]Rot×Scl is identified with the unit vector y˜Rot×Scl=y˜y˜ projecting y˜ onto the hemisphere depicted by the vertical semicircle. Form and shape distances between [p] and [y] correspond to the length of the geodesics c(τ) (thick lines) on the plane and sphere, respectively. Right: Geodesic line c(τ) between p=c(0) and p=c(1), Log-map projecting y to εTpM, parallel transport Transppp forwarding ε to εTpM, and Exp-map projecting ε onto M visualized for a sphere. Tangent spaces, identified with subspaces of the ambient space, are depicted as gray planes above the respective poles. The parallel transport preserves all angles between tangent vectors and identifies ċ(0)ċ(1).

Fig. 1 Left: Quotient space geometry: assuming p and y centered, translation invariance is not further considered in the plot; given pole representative p, we express y=Re​(〈p,y〉)‖p‖2p+Im​(〈p,y〉)‖p‖2ip+(y−〈p,y〉‖p‖2p)∈Y in its coordinates in p and ip direction, subsuming all orthogonal directions in the third dimension. In this coordinate system, the rotation orbit [y]Rot corresponds to the dotted horizontal circle, and is identified with the aligned y˜:=y˜Rot in the half-plane of p; [y]Rot×Scl is identified with the unit vector y˜Rot×Scl=y˜‖y˜‖ projecting y˜ onto the hemisphere depicted by the vertical semicircle. Form and shape distances between [p] and [y] correspond to the length of the geodesics c(τ) (thick lines) on the plane and sphere, respectively. Right: Geodesic line c(τ) between p=c(0) and p′=c(1), Log-map projecting y to ε∈TpM, parallel transport Transppp′ forwarding ε to ε′∈Tp′M, and Exp-map projecting ε′ onto M visualized for a sphere. Tangent spaces, identified with subspaces of the ambient space, are depicted as gray planes above the respective poles. The parallel transport preserves all angles between tangent vectors and identifies ċ(0)≅ċ(1).

Shape geometry: To account for scale invariance in shapes [y]Trl×Rot×Scl, they are identified with normalized representatives y˜Trl×Rot×Scl=y˜Trl×Roty˜Trl×Rot. Motivated by the normalization, we borrow the well-known geometry of the sphere S={yY:y=1}, where TpS={yY:Re(y,p)=0} is the tangent space at a point pS and geodesics are great circles. Together with translation and rotation invariance, the shape tangent space is then given by T[p]Y/Trl×Rot×Scl*=T[p]Y/Trl×Rot*TpS={yY:y,1=0,y,p=0} with normal vector ζ(4)=p in addition to ζ(1),ζ(2),ζ(3) above. The geodesic distance d([p]Trl×Rot×Scl,[y]Trl×Rot×Scl)=arccos|y˜Trl×Rot×Scl,p˜Trl×Rot×Scl| corresponds to the arc-length between the representatives. This distance is often referred to as Procrustres distance in statistical shape analysis.

We may now define the maps needed for the regression model formulation. Let y˜ and p˜ be shape/form representatives of [y] and [p] rotation aligned to the shape/form pole representative p. Generalizing straight lines to a Riemannian manifold M, geodesics c:(δ,δ)M can be characterized by their “intercept” c(0)M and “slope” ċ(0)Tc(0)M. The exponential map Expq:TqMM at a point qM is defined to map βc(1) for c the geodesic with q=c(0) and β=ċ(0). It maps βTqM to a point Expq(β)M located d(q,Expq(β))=β apart of the pole q in the direction of β. On the form space Y/Trl×Rot, the exponential map is simply given by Exp[p]Trl×Rot(β)= [p˜Trl×Rot+β]Trl×Rot. On the shape space Y/Trl×Rot×Scl, identification with exponential maps on the sphere yields Exp[p]G(β)=[cos(β)p˜G+sin(β)ββ]G with G=Trl×Rot×Scl. In an open neighborhood U,qUM,Expq is invertible yielding the Logq:UTqM map from the manifold to the tangent space at q. For forms, it is given by Log[p]Trl×Rot([y]Trl×Rot)=y˜Trl×Rotp˜Trl×Rot and, for shapes, by Log[p]G([y]G)=d([p]G,[y]G)y˜Gp˜G,y˜Gp˜Gy˜Gp˜G,y˜Gp˜G with G=Trl×Rot×Scl. Finally, Transpq,q:TqMTqM parallel transports tangent vectors εε isometrically along a geodesic c(τ) connecting q and qM such that the slopes Transpq,q(ċ(q))=ċ(q) are identified and all angles are preserved. For shapes, Transp[y]G,[p]G(ε)=εε,p˜Gy˜G+p˜G1+y˜G,p˜G, with G=Trl×Rot×Scl, takes the form of the parallel transport on a sphere replacing the real inner product with its complex analogue. For forms, it changes only the Im(ε,p˜) coordinate orthogonal to the real y˜-p˜-plane as in the shape case, while the remainder of ε is left unchanged as in a linear space. This yields Transp[y]G,[p]G(ε)=εIm(p˜G/p˜G,ε)y˜G/y˜G+p˜G/p˜G1+y˜G/y˜G,p˜G/p˜G1, with G=Trl×Rot, for form tangent vectors. While equivalent expressions for the parallel transport in the shape case can be found, for example, in Dryden and Mardia (Citation2016), Huckemann, Hotz, and Munk (Citation2010), a corresponding derivation for the form case is given in Section S.1.2, supplementary materials including a discussion of the quotient space geometry in differential geometric terms.

Based on this understanding of the response space, we may now proceed to consider a sample of curves y1,,ynY representing orbits [y1],,[yn] with respect to group actions G. In the functional case, with the domain T=[0,1], these curves are usually observed as evaluations yi=(yi(ti1),,yi(tiki)) on a finite grid ti1<<tikiT which may differ between observations. In contrast to the regular case with common grids, this more general data structure is referred to as irregular functional shape/form data. To handle this setting, we replace the original inner product ·,· on Y by individual yi,yii=yiWiyi providing inner products on the ki -dimensional space Yi=Cki of evaluations yi,yi on the same grid. The symmetric positive-definite weight matrix Wi can be chosen to implement an approximation to integration w.r.t. the original measure ν with a numerical integration measure νi such as given by the trapezoidal rule. Alternatively, Wi=1kiIki with ki×ki identity matrix Iki presents a canonical choice that is analog to the landmark case for kik. Moreover, data-driven Wi could also be motivated from the covariance structure estimated for (potentially sparse) y1,,yn along the lines of Yao, Müller, and Wang (Citation2005), Stöcker et al. (Citation2022). While this is beyond the scope of this article, potential procedures are sketched in Section S.7, supplementary materials. With the inner products given for i=1,,n, the sample space naturally arises as the Riemannian product Y1/G*××Yn/G* of the orbit spaces, with the individual geometries constructed as described above.

3 Additive Regression on Riemannian Manifolds

Consider a data scenario with n observations of a random response covariate tuple (Y,X), where the realizations of Y are planar curves yi:T C,i=1,,n, belonging to a Hilbert space Y defined as above and potentially irregularly measured on individual grids ti1<<tikiT. The response object [Y] is the equivalence class of Y with respect to translation, rotation and possibly scale and the sample [y1],,[yn] is equipped with the respective Riemannian manifold geometry introduced in the previous section. For i=1,,n, realizations xiX of a covariate vector X in a covariate space X are observed. X can contain several categorical and/or metric covariates.

For regressing the mean of [Y] on X=x, we model the shape/form [μ] of μY as(1) [μ]=Exp[p](h(x))=Exp[p](j=1Jhj(x)),(1) with an additive predictor h:XT[p]Y/G* acting in the tangent space at an “intercept” [p]Y/G*. Generalizing an additive model “Y=μ+ϵ=p+h(x)+ϵ” in a linear space, we implicitly define [μ] as the conditional mean of [Y] given X=x by assuming zero-mean “residuals” ϵ. In their definition, we follow Cornea et al. (Citation2017) but extend to the functional shape/form and additive case. We assume local linearized residuals ε[μ]=Log[μ]([Y]) in T[μ]Y/G* to have mean E(ε[μ])=0, which corresponds to E(ε[μ](t))=0 for (ν-almost) all tT. Here, we assume [Y] is sufficiently close to [μ] with probability 1 such that Log[μ] is well-defined, which is the case whenever Y˜,μ˜0 for centered shape/form representatives Y˜ and μ˜, an unrestrictive and common assumption (compare also Cornea et al. Citation2017). However, residuals ε[μ] for different [μ] belong to separate tangent spaces. To obtain a formulation in a common linear space instead, local residuals are mapped to residuals ϵ=Transp[μ],[p](ε[μ]) by parallel transporting them from [μ] to the common covariate independent pole [p]. After this isometric mapping into T[p]Y/G*, we can equivalently define the conditional mean [μ] via E(ϵ)=0 for the transported residuals ϵ.

Exp[p] maps the additive predictor h(x)=j=1Jhj(x)T[p]Y/G* to the response space. It is analogous to a response function in GLMs but depends on [p]. Although covariate effects hj(x) often only depend on an individual covariate in x for each j, they might also depend on covariate combinations in general to allow (smooth) interactions. While other response functions could be used, we restrict to the exponential map here, such that the model contains a geodesic model (Fletcher Citation2013)—the direct generalization of simple linear regression—as a special case for h(x)=βx1 with a single covariate x1 and tangent vector β. Typically, it is assumed that h is centered such that E(h(X))=0, and the pole [p] is the overall mean of [Y] defined, like the conditional mean, via residuals of mean zero.

3.1 Tensor-Product Effect Functions hj

Scheipl, Staicu, and Greven (Citation2015) and other authors employ Tensor-Product (TP) bases for functional additive model terms. This naturally extends to tangent space effects, which we model ashj(x)=r,lθj(r,l) bj(l)(x) rwith the TP basis given by the pair-wise products of m linearly independent tangent vectors rT[p]Y/G*,r=1,,m, and mj basis functions bj(l):X R,l=1,,mj, for the jth covariate effect depending on one or more covariates. The real coefficients can be arranged as a matrix {θj(r,l)}r,l=Θj Rm×mj. Also for infinite-dimensional T[p]Y/G* and a general nonlinear dependence on x, a basis representation approach requires truncation to finite dimensions m and mj in practice. Choosing the bases to capture the essential variability in the data, their size can be extended with increasing data size and computational resources.

While, in principle, the basis {r}r could also vary across effects j=1,,J, we assume a common basis for notational simplicity, which presents the typical choice. Due to the identification of T[p]Y/G* with a subspace of the function space Y, the {r}r may be specified using a function basis commonly used in additive models: Let b0(l):T R,l=1,,m0 be a basis of real functions, say a B-spline basis (other typical bases used in the literature include wavelet (Meyer et al. Citation2015) or FPC bases (Müller and Yao Citation2008)). Then we construct the tangent space basis as r=l=1m0(zp(l,r)+zp(m0+l,r)1)b0(l), employing the same basis for the 1- and 1-dimension before transforming it with a basis transformation matrix Zp={zp(l,r)}l,r R2m0×m implementing the linear tangent space constraints Re(l,ζ(r))=0 (or the empirical version) for all l and normal vectors ζ(1),ζ(2),ζ(3) for forms and additionally ζ(4) for shapes defining T[p]Y/G* as described in Section 2. Thus, the tangent space basis dimension is m=2m03 for forms or m = 2m0 − 4 for shapes (or could, in principle, be larger if the original basis already meets the constraints). For details on the construction of Zp see Section S.1.3, supplementary materials. For closed curves, we additionally choose Zp to enforce periodicity, that is, r(t)=r(t+t0) for some t0 R (compare Hofner, Kneib, and Hothorn Citation2016).

Given the tangent space basis, we may now modularly specify the usual additive model basis functions bj(l):X R,l=1,,mj, for the jth covariate effect to obtain the full functional additive model “tool box” offered by, for example, Brockhaus, Scheipl, and Greven (Citation2015). Typically, bj(l)(x)=bj(l)(z) depending on an individual covariate, say on z, in x=(,z,). But for a single covariate also multiple different effects can be specified and a single interaction effect depends on multiple covariates. A linear effect—linear in the tangent space—of the form hj(x)=βz of a scalar (typically centered) covariate z and βT[p]Y/G* is simply implemented by a single function bj(1)(x)=z. A smooth effect of the generic form hj(x)(t)=f(z,t) can be implemented by choosing, for example, a B-spline basis bj(1)(z),,bj(mj)(z) (asymptotic properties of penalized B-splines and connections to kernel estimators are discussed, for example, by Wood, Pya, and Säfken (Citation2016), Li and Ruppert (Citation2008)). For a categorical covariate κ in x, with effect hj(x):{1,,K}T[p]Y/G*,κβκ, the basis bj(x)=(bj(1)(κ),,bj(mj)(κ)) maps κeκ to a usual contrast vector eκ with the basis being of dimension mj=K1 just as in standard linear models. Here, we typically use effect-encoding to obtain centered effects. Moreover, TP interactions of the model terms described above, as well as group-specific effects and smooth effects with additional constraints (Hofner, Kneib, and Hothorn Citation2016) can be specified in the model formula, relying on the mboost framework introduced by Hothorn et al. (Citation2010), which also allows to define custom effect designs. For identification of an overall mean intercept [p], sum-to-zero constraints yielding i=1nhj(xi)=0 for observed covariates xi can be specified, and similar constraints can be used to distinguish linear from nonlinear effects and interactions from their marginal effects (Kneib, Hothorn, and Tutz Citation2009). Different quadratic penalties can be specified for the coefficients Θj, allowing to regularize high-dimensional effect bases and to balance effects of different complexity in the model fit (see, Section 4).

3.2 Tensor-Product Factorization

The multidimensional structure of the response objects makes it challenging to graphically illustrate and interpret additive model terms, in particular when it comes to nonlinear (interaction) effects, or when effect sizes are visually small. To solve this problem, we suggest to rewrite estimated TP effects ĥj with estimated coefficient matrix Θ̂j asĥj(x)=r=1mjξj(r)ĥj(r)(x)factorized into mj=min(mj,m0) components consisting of covariate effects ĥj(r):X R,r=1,,mj, in corresponding orthonormal directions ξj(r)T[p]Y/G* with ξj(r),ξj(l)= 1(r=l), that is, 1 if r = l and 0 otherwise. Assuming E(bj(l)(X)2)<,l=1,,mj, for the underlying effect basis, the ĥj(r) are specified to achieve decreasing component variances vj(1)vj(mj)0 given by vj(r)= E(ĥj(r)(X)2). In practice, the expectation over the covariates X and the inner product .,. are replaced by empirical analogs (compare Corollary 3, supplementary materials). Due to orthonormality of the ξj(r), the component variances add up to the total predictor variance r=1mjvj(r)=vj=E(ĥj(X),ĥj(X)). Moreover, the TP factorization is optimally concentrated in the first components in the sense that for any lmj there is no sequence of ξ*(r)Y and ĥ*(r):XR, such that E(ĥj(X)r=1lξ*(r)ĥ*(r)(X)2)< E(hj(X)r=1lξj(r)ĥj(r)(X)2), that is, the series of the first l components yields the best rank l approximation of ĥj. The factorization relies on SVD of (a transformed version of) the coefficient matrix Θ̂j and the fact that it is well-defined is a variant of the Eckart-Young-Mirsky theorem (proof in Section S.2, supplementary materials).

Particularly when large shares of the predictor variance are explained by the first component(s), the decomposition facilitates graphical illustration and interpretation: choosing a suitable constant τ ≠ 0, an effect direction ξj(r) can be visualized by plotting the pole representative p together with Expp(τ ξj(r)) on the level of curves, while accordingly rescaled 1τĥj(r)(x) is displayed separately in a standard scalar effect plot. Adjusting τ offers an important degree of freedom for visualizing ξj(r) on an intuitively accessible scale while faithfully depicting ξj(r)ĥj(r)(x). When based on the same τ, different covariate effects can be compared across the plots sharing the same scale. We suggest τ=maxjvj, the maximum total predictor standard deviation of an effect, as a good first choice.

Besides factorizing effects separately, it can also be helpful to apply TP factorization to the joint additive predictor, yieldingh(x)=r=1mξ(r)ĥ(r)(x)=r=1mξ(r)(ĥ1(r)(x)++ĥJ(r)(x)),with m=min(jmj,m) and again ξ(r)T[p]Y/G* orthonormal and the corresponding variance concentration in the first components, but now determined w.r.t. entire additive predictors ĥ(r)=j=1Jĥj(r) spanned by all covariate basis functions in the predictor. In this representation, the first component yields a geodesic additive model approximation where the predictor moves along a geodesic line c(τ)=Exp[p](ξ(1)τ) with the signed distance τR from [p], modeled by a scalar additive predictor ĥ(1)(x) composed of covariate effects analogous to the original model predictor. In Section 5, we illustrate its potential in three different scenarios.

4 Component-Wise Riemannian L2-Boosting

Component-wise gradient boosting (e.g., Hothorn et al. Citation2010) is a step-wise model fitting procedure accumulating predictors from smaller models, so called base-learners, to built an ensemble predictor aiming at minimizing a mean loss function. To this end, the base-learners are fit (via least squares) to the negative gradient of the loss function in each step and the best fitting base-learner is added to the current ensemble predictor. Due to its versatile applicability, inherent model selection, and slow over-fitting behavior, boosting has proven useful in various contexts (Mayr et al. Citation2014). Boosting with respect to the least squares loss function l(y,μ)=12(yμ)2,y,μR, is typically referred to as L2-Boosting and simplifies to repeated refitting of residuals ε=yμ=μl(y,μ) corresponding to the negative gradient of the loss function. For L2-Boosting with a single learner, Bühlmann and Yu (2003) show how fast bias decay and slow variance increase over the boosting iterations suggest stopping the algorithm early before approaching the ordinary (penalized) least squares estimator. Lutz and Bühlmann (Citation2006) prove consistency of component-wise L2-Boosting in a high-dimensional multivariate response linear regression setting and Stöcker et al. (Citation2021) illustrate in extensive simulation studies how stopping the boosting algorithm early based on curve-wise cross-validation applies desired regularization when fitting (even highly autocorrelated) functional responses with parameter-intense additive model base-learners and, thus, leads to good estimates even in challenging scenarios.

When generalizing to least squares on Riemannian manifolds with the loss 12d2([y],[μ]) given by the squared geodesic distance, the negative gradient [μ]12d2([y],[μ])=Log[μ]([y])=ε[μ] (compare e.g., Pennec Citation2006) corresponds to the local residuals ε[μ] defined in Section 3. This analogy to L2-Boosting motivates the presented generalization where local residuals are further transported to residuals ϵ in a common linear space.

Consider the pole [p] known and fixed for now. Assuming its existence, we aim to minimize the population mean lossσ2(h)=E(d2([Y],Exp[p](h(X))))with the point-wise minimizer h(x)=argminh:XT[p]Y/G*E(d2([Y],Exp[p](h(X)))|X=x) minimizing the conditional expected squared distance. Fixing a covariate constellation xX, the prediction [μ] = Exp[p](h(x)) corresponds to the Fréchet mean (Karcher Citation1977) of [Y] conditional on X = x. In a finite-dimensional context, Pennec (Citation2006) show that E(ε[μ]) =0 for a Fréchet mean [μ] if residuals ε[μ] are uniquely defined with probability one. This indicates the connection to our residual based model formulation in Section 3. We fit the model by reducing the empirical mean loss σ̂2(h)=1ni=1ndi2([yi],Exp[p](h(xi))), where we replace the population mean by the sample mean and compute the geodesic distances di with respect to the inner products ·,·i defined for the respective evaluations of yi .

A base-learner corresponds to a covariate effect hj(x)=r,lθj(r,l) bj(l)(x) r,Θj={θj(r,l)}r,l, which is repeatedly fit to the transported residuals ϵ1,,ϵn by penalized least-squares (PLS) minimizing i=1nϵihj(xi)i2+λjtr(ΘjPjΘj)+λtr(ΘPΘ). Via the penalty parameters λj,λ0 the effective degrees of freedom of the base-learners are controlled (Hofner et al. Citation2011) to achieve a balanced “fair” base-learner selection despite the typically large and varying number of coefficients involved in the TP effects. The symmetric penalty matrices PjRmj×mj and PRm×m (imposing, e.g., a second-order difference penalty for B-splines in either direction) can equivalently be arranged as a mjm×mjm penalty matrix Rj=λj(PjIm)+λ(ImjP) for the vectorized coefficients vec(Θj)=(θj(1,1),,θj(m,1),,θ(m,mj)), where ⊗ denotes the Kronecker product. The standard PLS estimator is then given by vec(Θ̂j)=(Ψj+Rj)1ψj with Ψj=i=1n{Re(bj(l)(xi)r,bj(l)(xi)ri)}(r,l)=(1,1),,(m,1),,(m,mj)(r,l)=(1,1),,(m,1),,(m,mj) Rmmj×mmj and ψj=i=1n{Re(bj(l)(xi)r,ϵii)}(r,l)=(1,1),,(m,1),,(m,mj)Rmmj. In a regular design, using the functional linear array model (Brockhaus, Scheipl, and Greven Citation2015) can save memory and computation time by avoiding construction of the complete matrices. The basis construction of {r}r via a transformation matrix Zp (Section 3.1) is reflected in the penalty by setting P=Zp(I2P0)Zp with P0 the penalty matrix for the un-transformed basis {b0(r)}r.

In each iteration of the proposed Algorithm 1, the best-performing base-learner is added to the current ensemble additive predictor h(x) after multiplying it with a step-length parameter η(0,1]. Due to the additive model structure this corresponds to a coefficient update of the selected covariate effect. Accordingly, after repeated selection, the effective degrees of freedom of a covariate effect, in general, exceed the degrees specified for the base-learner. They are successively adjusted to the data. To avoid over-fitting, the algorithm is typically stopped early before reaching a minimum of the empirical mean loss. The stopping iteration is determined, for example, by resampling strategies such as bootstrapping or cross-validation on the level of shapes/forms.

Algorithm 1:

Component-wise Riemannian L2-Boosting

# Initialization:

Geometry: specify geometry (shape/form) and pole representative p

Hyper-parameters: Step-length η(0,1], number of boosting iterations

Base-learners: hj(x) with penalty matrix Rj and initial coefficient matrix Θj=0

for j = 1 to J do # Prepare penalized least-squares (PLS)

# set up m mj×m mj matrix: Ψji=1n{Re(bj(l)(xi)r,bj(l)(xi)ri)}(r,l)=(1,1),,(m,1),,(m,mj)(r,l)=(1,1),,(m,1),,(m,mj)

end

repeat # boosting steps

for i=1,,n do # Compute current transported residuals

[μi]Exp[p](h(xi))

ε[μi]Log[μi]([yi])

ϵiTransp[μi],[p](ε[μi])

end

for j=1,,J do # PLS fit to residuals

# m mj vector: ψji=1n{Re(bj(l)(xi)r,ϵii)}(r,l)=(1,1),,(m,1),,(m,mj)

Θ̂j={θ̂j(r,l)}r,l Solve(

(Ψj+Rj)vec(Θ)=ψj)

end

ȷ̂argminj{1,,J}i=1nϵir,lθ̂j(r,l)bj(l)(x)ri2;

# Select base-learner

Θȷ̂Θȷ̂+η Θ̂ȷ̂; # Update selected model coefficients

until Stopping criterion (e.g., minimal cross-validation error)

The pole [p] is, in fact, usually not a priori available. Instead we typically assume [p]=argminqY*E(d2([Y],[q])) is the overall Fréchet mean, also often referred to as Riemannian center of mass for Riemannian manifolds or as Procrustes mean in shape analysis (Dryden and Mardia Citation2016). Here, we estimate it as [p]=Exp[p0](h0) in a preceding Riemannian L2-Boosting routine. The constant effect h0T[p0]Y/G* in the intercept-only special case of our model is estimated with Algorithm 1 based on a preliminary pole [p0]Y/G*. For shapes and forms, a good candidate for p0 can be obtained as the standard functional mean of a reasonably well aligned sample y1,,ynY of representatives.

The proposed Riemannian L2-Boosting algorithm is available in the R (R Core Team Citation2018) package manifoldboost (github.com/Almond-S/manifoldboost). The implementation is based on the package FDboost (Brockhaus, Rügamer, and Greven Citation2020), which is in turn based on the model-based boosting package mboost (Hothorn et al. Citation2010).

5 Applications and Simulation

5.1 Shape Differences in Astragali of Wild and Domesticated Sheep

In a geometric morphometric study, Pöllath, Schafberg, and Peters (Citation2019) investigate shapes of sheep astragali (ankle bones) to understand the influence of different living conditions on the micromorphology of the skeleton. Based on a total of n = 163 shapes recorded by Pöllath, Schafberg, and Peters (Citation2019), we model the astragalus shape in dependence on different variables, including domestication status (wild/feral/domesticated), sex (female/male/NA), age (juvenile/subadult/adult/NA), and mobility (confined/pastured/free) of the animals as categorical covariates. The sample comprises sheep of four different populations: Asiatic wild sheep (Field Museum, Chicago; Lay Citation1967; Zeder Citation2006), feral Soay sheep (British Natural History Museum, London; Clutton-Brock et al. Citation1990), and domestic sheep of the Karakul and Marsch breed (Museum of Livestock Sciences, Halle (Saale); Schafberg and Wussow Citation2010). Table S1 in Section S.3, supplementary materials shows the distribution of available covariates within the populations. Each sheep astragalus shape, i=1,,n, is represented by a configuration composed of 11 selected landmarks in a vector yilmC11 and two vectors of sliding semi-landmarks yic1C14 and yic2C18 evaluated along two outline curve segments, marked on a 2D image of the bone (dorsal view). Several example configurations are displayed in Figure S1, supplementary materials. In general, we could separately specify smooth function bases for the outline segments yic1 and yic2, respectively. Due to their systematic recording, we assume, however, that not only landmarks but also semi-landmarks are regularly observed on a fixed grid, and refrain from using smooth function bases for simplicity. Accordingly, shape configurations can directly be identified with their evaluation vectors yi=(yilm,yic1,yic2)C43=Y, and the geometry of the response space Y/Trl×Rot×Scl* widely corresponds to the classic Kendall’s shape space geometry, with the difference that, considering landmarks more descriptive than single semi-landmarks, we choose a weighted inner product yi,yi=yiWyi with diagonal weight matrix W with diagonal (111,314114,318118) assigning the weight of three landmarks to each outline segment. We model the astragalus shapes [yi]Y/Trl×Rot×Scl* as[μi]=Exp[p](βstatusi+βpopi+βagei+βsexi+βmobilityi)with the pole [p]Y/G* specified as overall mean and the conditional mean [μi]Y/Trl×Rot×Scl* depending on the effect coded covariate effects xijβxijT[p]Y/Trl×Rot×Scl*. For identifiability, the population and mobility effects are centered around the status effect, as we only have data on different populations/mobility levels for domesticated sheep. All base-learners are regularized to one degree of freedom by employing ridge penalties for the coefficients of the covariate bases {bj(l)}l while the coefficients of the response basis (the standard basis for C43) are left un-penalized. With a step-length of η=0.1, 10-fold shape-wise cross-validation suggests early stopping after 89 boosting iterations. Due to the regular design, we can make use of the functional linear array model (Brockhaus, Scheipl, and Greven Citation2015) for saving computation time and memory, which lead to 8 sec of initial model fit followed by 47 sec of cross-validation. To interpret the categorical covariate effects, we rely on TP factorization (). The first component of the status effect explains about 2/3 of the variance of the status effect and over 50% of the cumulative effect variance in the model. In that main direction, the effect of feral is not located between wild and domestic, as might be naively expected. By contrast, the second component of the effect seems to reflect the expected order and still explains a considerable amount of variance. Similar to Pöllath, Schafberg, and Peters (Citation2019), we find little influence of age, sex, and mobility on the astragalus shape. Yet, all covariates were selected by the boosting algorithm.

Fig. 2 Left: Shares of different factorized covariate effects in the total predictor variance. Right: Factorized effect plots showing the two components of the status effect (rows): in the right column, the two first directions ξ1(1),ξ1(2)T[p]Y/Trl+Rot+Scl* are visualized via line-segments originating at the overall mean shape (empty circles) and ending in the shape resulting from moving 1 unit into the target direction (solid circles; large: landmarks; small: semi-landmarks along the outline); in the left column, the status effect in the respective direction is depicted. As illustrated in the middle plot, an effect of 1 would correspond to the full extend of the direction shown to the right.

Fig. 2 Left: Shares of different factorized covariate effects in the total predictor variance. Right: Factorized effect plots showing the two components of the status effect (rows): in the right column, the two first directions ξ1(1),ξ1(2)∈T[p]Y/Trl+Rot+Scl* are visualized via line-segments originating at the overall mean shape (empty circles) and ending in the shape resulting from moving 1 unit into the target direction (solid circles; large: landmarks; small: semi-landmarks along the outline); in the left column, the status effect in the respective direction is depicted. As illustrated in the middle plot, an effect of 1 would correspond to the full extend of the direction shown to the right.

Visually, differences in estimated mean shapes are rather small, which is, in our experience, quite usual for shape data. With differences in size, rotation and translation excluded by definition, only comparably small variance remains in the observed shapes. Nonetheless, TP factorization provides accessible visualization of the effect directions and allows to partially order the effect levels in each direction.

5.2 Cellular Potts Model Parameter Effects on Cell Form

The stochastic biophysical model proposed by Thüroff et al. (Citation2019), a Cellular Potts Model (CPM), simulates migration dynamics of cells (e.g., wound healing or metastasis) in two dimensions. The progression of simulated cells is the result of many consecutive local elementary events sampled with a Metropolis-algorithm according to a Hamiltonian. Different parameters controlling the Hamiltonian have to be calibrated to match real live cell properties (Schaffer Citation2021). Considering whole cells, parameter implications on the cell form are not obvious. To provide additional insights, we model the cell form in dependence on four CPM parameters considered particularly relevant: the bulk stiffness xi1, membrane stiffness xi2, substrate adhesion xi3, and signaling radius xi4 are subsumed in a vector xi of metric covariates for i=1,,n. Corresponding sampled cell outlines yi were provided by Sophia Schaffer in the context of Schaffer (Citation2021), who ran underlying CPM simulations and extracted outlines. Deriving the intrinsic orientation of the cells from their movement trajectories, we parameterize yi:[0,1] C, clockwisely relative to arc-length such that yi(0)=yi(1) points into the movement direction of the barycenter of the cell. With an average of k=1ni=1nki43 samples per curve (after sub-sampling preserving 95% of their inherent variation, as described in Volkmann et al. Citation2021, supplement), the evaluation vectors yi Cki are equipped with an inner-product implementing trapezoidal rule integration weights. Example cell outlines are depicted in Figure S4, supplementary materials. The results shown below are based on cell samples obtained from 30 different CPM parameter configurations. For each configuration, 33 out of 10.000 Monte Carlo samples were extracted as approximately independent. This yields a dataset of n=990=30×33 cell outlines.

As positioning of the irregularly sampled cell outlines yi , i=1,,n, in the coordinate system is arbitrary, we model the cell forms [yi]Y/Trl+Rot*. Their estimated overall form mean [p] serves as pole in the additive model[μi]=Exp[p](h(xi))=Exp[p](jβjxij+jfj(xij)+jȷ¨fjȷ¨(xij,xiȷ¨))where the conditional form mean [μi] is modeled in dependence on tangent-space linear effects with coefficients βjT[p]Y/Trl×Rot and nonlinear smooth effects fj for covariate j=1,,4, as well as smooth interaction effects fjȷ¨ for each pair of covariates jȷ¨. All involved (effect) functions are modeled via a cyclic cubic P-spline basis {b0(r)}r with 7 (inner) knots and a ridge penalty, and quadratic P-splines with 4 knots for the covariates xij equipped with a second-order difference penalty for the fj and ridge penalties for interactions. Covariate effects are mean centered and interaction effects fjȷ¨(xj,xȷ¨) are centered around their marginal effects fj(xj),fȷ¨(xȷ¨), which are in turn centered around the linear effects βjxj and βȷ¨xȷ¨, respectively. Resulting predictor terms involve 69 (linear effect) to 1173 (interaction) basis coefficients but are penalized to a common degree of freedom of 2 to ensure a fair base-learner selection. We fit the model with a step-size of η=0.25 and stop after 2000 boosting iterations observing no further meaningful risk reduction, since no need for early-stopping is indicated by 10-fold form-wise cross-validation. Due to the increased number of data points and coefficients, the irregular design, and the increased number of iterations, the model fit takes considerably longer than in Section 5.1, with about 50 initial minutes followed by 8 hr of cross-validation. However, as usual in boosting, model updates are large in the beginning and only marginal in later iterations, such that fits after 1000 or 500 iterations would already yield very similar results.

Observing that the most relevant components point into similar directions, we jointly factorize the predictor as ĥ(xi)=rξ(r)ĥ(r)(xi) with TP factorization. The first component explains about 93% of the total predictor variance (Figure S3, supplementary materials), indicating that, post-hoc, a good share of the model can be reduced to the geodesic model [μ̂i]=Exp[p](ξ(1)ĥ(1)(xi)) illustrated in . A positive effect in the direction ξ(1) makes cells larger and more keratocyte/croissant shaped, a negative effect—pointing into the opposite direction—makes them smaller and more mesenchymal shaped/elongated. The bulk stiffness xi1 turns out to present the most important driving factor behind the cell form, explaining over 75% of the cumulative variance of the effects (Figure S2, supplementary materials). Around 80% of its effect are explained by the linear term reflecting gradual shrinkage at the side of the cells with increasing bulk stiffness.

Fig. 3 Center: the main direction ξ(1) of the model illustrated as vectors pointing from the overall mean cell form [p] (gray curve) to the form Exp[p](ξ(1)) (filled dots), which are both oriented as cells migrating rightwards. Left: Effects of the bulk stiffness xi1 into the direction ξ(1). A vertical line from 0, corresponding to [p], to 1, corresponding to the full extent of ξ(1), underlines the connection between the plots and helps to visually asses the amount of change for a given value of xi1. Right: The overall effect of xi1 and membrane stiffness xi2, comprising linear, smooth and interaction effects, as a 3D surface plot. The heat map plotted on the surface shows only the interaction effect f12(1)(xi1,xi2) illustrating deviations from the marginal effects, which are of particular interest for CPM calibration.

Fig. 3 Center: the main direction ξ(1) of the model illustrated as vectors pointing from the overall mean cell form [p] (gray curve) to the form  Exp [p](ξ(1)) (filled dots), which are both oriented as cells migrating rightwards. Left: Effects of the bulk stiffness xi1 into the direction ξ(1). A vertical line from 0, corresponding to [p], to 1, corresponding to the full extent of ξ(1), underlines the connection between the plots and helps to visually asses the amount of change for a given value of xi1. Right: The overall effect of xi1 and membrane stiffness xi2, comprising linear, smooth and interaction effects, as a 3D surface plot. The heat map plotted on the surface shows only the interaction effect f12(1)(xi1,xi2) illustrating deviations from the marginal effects, which are of particular interest for CPM calibration.

5.3 Realistic Shape and form Simulation Studies

To evaluate the proposed approach, we conduct simulation studies for both shape and form regression for irregular curves. We compare sample sizes n{54,162} and average grid sizes k=1ni=1nki{40,100} as well as an extreme case with ki = 3 for each curve but n = 720, that is, where only random triangles are observed (yet, with known parameterization over [0,1]). We additionally investigate the influence of nuisance effects and compare different inner product weights. While important results are summarized in the following, comprehensive visualizations can be found in Section S.5, supplementary materials.

Simulation design: We simulate models of the form [μ]=Exp[p](βκ+f1(z1)) with overall mean [p], a binary effect with levels κ{0,1} and a smooth effect of z1[60,60]. We choose a cyclic cubic B-spline basis with 27 knots for T[p]Y/G*, placing them irregularly at 1/27-quantiles of unit-speed parameterization time-points of the curves. Cubic B-splines with four regularly placed knots are used for covariates in smooth effects. True models are based on the bot dataset from R package Momocs (Bonhomme et al. Citation2014) comprising outlines of 20 beer (κ = 0) and 20 whiskey (κ = 1) bottles of different brands. A smooth effect is induced by the 2D viewing transformations resulting from tilting the planar outlines in a 3D coordinate system along their longitudinal axis by an angle of up to 60 degree toward the viewer (z1=60) and away (z1=60) (i.e., in a way not captured by 2D rotation invariance). Establishing ground truth models based on a fit to the bottle data, we simulate new responses [y1],,[yn] via residual resampling (Section S.5, supplementary materials) to preserve realistic autocorrelation. Subsequently, we randomly translate, rotate and scale y1,,ynY somewhat around the aligned shape/form representatives to obtain realistic samples.

The implied residual variance 1ni=1nϵii2=1ni=1ndi2([yi],[μi]) on simulated datasets ranges around 105% of the predictor variance 1ni=1nh(xi)i2=1ni=1ndi2([μi],[p]) in the form scenario and around 65% in the shape scenario. All simulations were repeated 100 times, fitting models with the model terms specified above and three additional nuisance effects: a linear effect βz1 (orthogonal to f1(z1)), an effect f2 of the same structure as f1 but depending on an independently uniformly drawn variable z2, and a constant effect h0T[p]Y/G* to test centering around [p]. Base-learners are regularized to 4 degrees of freedom (step-length η=0.1). Early-stopping is based on 10-fold cross-validation.

Form scenario: In the form scenario, the smooth covariate effect f1 offers a particularly clear interpretation. TP factorization decomposes the true effect into its two relevant components, where the first (major) component corresponds to the bare projection of the tilted outline in 3D into the 2D image plane and the second to additional perspective transformations (). For this effect, we observe a median relative mean squared error rMSE(ĥj)=i=1nĥj(xi)hj(xi)i2/i=1nh(xi)i2 of about 3.7% of the total predictor variance for small data settings with n = 54 and k=100 (5.9% with k = 40), which reduces to 1.5% for n = 162 (for both k = 40 and k = 100). It is typical for functional data that, from a certain point, adding more (highly correlated) evaluations per curve leads to distinctly less improvement in the model fit than adding further observations (compare, e.g., also Stöcker et al. Citation2021). In the extreme ki = 3 scenario, we obtain an rMSE of around 15%, which is not surprisingly considerably higher than for the moderate settings above. Even in this extreme setting (), the effect directions are captured well, while the size of the effect is underestimated. Rotation alignment based on only three points (which are randomly distributed along the curves) might considerably differ from the full curve alignment, and averaging over these sub-optimal alignments masks the full extend of the effect. Still, results are very good given the sparsity of information in this case. Having a simpler form, the binary effect βκ is also estimated more accurately with an rMSE of around 1.5% for n = 54, k = 100 (1.9% for k = 40) and less than 0.8% for n = 162 (for both k = 40 and k = 100). The pole estimation accuracy varies on a similar scale.

Fig. 4 Left: First (row 1) and second (row 2) main components of the smooth effect f1(z1) in the form scenario obtained from TP factorization. Normalized component directions are visualized as bottle outlines after transporting them to the true pole (gray solid outline). Underlying truth (solid lines/shaded areas) are plotted together with five example estimates for n = 162 and k = 100 (black solid lines) and the extremely sparse ki = 3 setting (gray dashed lines). Center: Conditional means for both bottle types with fixed metric covariate z1=0 in the shape scenario with n = 54 and k = 40. Five example estimates (black solid outlines) are plotted in front of the underlying truth (shaded areas). Right: rMSE of shown example estimates (jittered diamonds) contextualized with boxplots of rMSE distributions observed in respective simulation scenarios.

Fig. 4 Left: First (row 1) and second (row 2) main components of the smooth effect f1(z1) in the form scenario obtained from TP factorization. Normalized component directions are visualized as bottle outlines after transporting them to the true pole (gray solid outline). Underlying truth (solid lines/shaded areas) are plotted together with five example estimates for n = 162 and k = 100 (black solid lines) and the extremely sparse ki = 3 setting (gray dashed lines). Center: Conditional means for both bottle types with fixed metric covariate z1=0 in the shape scenario with n = 54 and k = 40. Five example estimates (black solid outlines) are plotted in front of the underlying truth (shaded areas). Right: rMSE of shown example estimates (jittered diamonds) contextualized with boxplots of rMSE distributions observed in respective simulation scenarios.

Shape scenario: Qualitatively, the shape scenario shows a similar picture. For k = 40, we observe median rMSEs of 2.8% (n = 54) and 2.2% (n = 162) for f1(z1), and 1.5% and 0.6% for the binary effect βκ. For k = 100, accuracy is again slightly higher.

Nuisance effects and integration weights: Nuisance effects in the model where generally rarely selected and, if selected at all, only lead to a marginal loss in accuracy. The constant effect is only selected sometimes in the extreme triangle scenarios, when pole estimation is difficult. We refer to Brockhaus et al. (Citation2017), who perform gradient boosting with functional responses and a large number of covariate effects with stability selection, for simulations with larger numbers of nuisance effects and further discussion in a related context, as variable selection is not our main focus here. Finally, simulations indicate that inner product weights implementing a trapezoidal rule for numerical integration are slightly preferable for typical grid sizes (k=40,100), whereas weights of 1/ki equal over all grid points within a curve gave slightly better results in the extreme ki = 3 settings.

All in all, the simulations show that Riemannian L2-Boosting can adequately fit both shape and form models in a realistic scenario and captures effects reasonably well even for a comparably small number of sampled outlines or evaluations per outline.

6 Discussion and Outlook

Compared to existing (landmark) shape regression models, the presented approach extends linear predictors to more general additive predictors including also, for example, smooth nonlinear model terms and interactions, and yields the first regression approach for functional shape as well as form responses. Moreover, we propose novel visualizations based on TP factorization that, similar to FPC analysis, enable a systematic decomposition of the variability explained by an additive effect on tangent space level. Yielding meaningful coordinates for model effects, its potential for visualization will be useful also for FAMs in linear spaces and also beyond our model framework, such as we exemplarily illustrate for the nonparametric approach of Jeon and Park (Citation2020) in Section S.8, supplementary materials.

Instead of operating on the original evaluations yiCki of response curves yi as in all applications above, another frequently used approach expands yi , i=1,,n, in a common basis first, before carrying out statistical analysis on coefficient vectors (compare Ramsay and Silverman (Citation2005), Morris (Citation2015), and Müller and Yao (Citation2008) for smoothing spline, wavelet or FPC representations in FDA or Bonhomme et al. (Citation2014) in shape analysis). Shape/form regression on the coefficients is, in fact, a special case of our approach, where the inner product is evaluated on the coefficients instead of evaluations (Section S.6, supplementary materials).

The proposed model is motivated by geodesic regression. However, in the multiple linear predictor, a linear effect of a single covariate does, in general, not describe a geodesic for fixed nonzero values of other covariate effects. Or put differently, Exp[p](h1+h2)ExpExp[p](h1)(h2)ExpExp[p](h2)(h1) in general. Thus, hierarchical geodesic effects of the form ExpExp[p](h1)(h2), relevant, i.a., in mixed models for hierarchical/longitudinal study designs (Kim et al. Citation2017), present an interesting future extension of our model. Moreover, an “elastic” extension based on the square-root-velocity framework (Srivastava and Klassen Citation2016) presents a promising direction for future research, as do other manifold responses.

Supplemental material

Supplemental Material

Download Zip (226 MB)

Acknowledgments

We sincerely thank Nadja Pöllath for providing carefully recorded sheep astragalus data and important insights and comments, and Sophia Schaffer for running and discussing cell simulations and providing fully processed cell outlines.

Supplementary Materials

Supplementary material with further details is provided in an online supplement.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

We gratefully acknowledge funding by grant GR 3793/3-1 from the German research foundation (DFG) and support by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

References

  • Adams, D., Rohlf, F., and Slice, D. (2013), “A Field Comes of Age: Geometric Morphometrics in the 21st Century,” Hystrix, the Italian Journal of Mammalogy, 24, 7–14.
  • Backenroth, D., Goldsmith, J., Harran, M. D., Cortes, J. C., Krakauer, J. W., and Kitago, T. (2018), “Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis,” Journal of the American Statistical Association, 113, 1003–1015.
  • Baranyi, P., Yam, Y., and Várlaki, P. (2013), Tensor Product Model Transformation in Polytopic Model-based Control, Boca Raton, FL: CRC Press.
  • Bonhomme, V., Picq, S., Gaucherel, C., and Claude, J. (2014), “Momocs: Outline Analysis using R,” Journal of Statistical Software, 56, 1–24.
  • Brockhaus, S., Melcher, M., Leisch, F., and Greven, S. (2017), “Boosting Flexible Functional Regression Models with a High Number of Functional Historical Effects,” Statistics and Computing, 27, 913–926.
  • Brockhaus, S., Rügamer, D., and Greven, S. (2020), “Boosting Functional Regression Models with FDboost,” Journal of Statistical Software, 94, 1–50.
  • Brockhaus, S., Scheipl, F., and Greven, S. (2015), “The Functional Linear Array Model,” Statistical Modelling, 15, 279–300.
  • Bühlmann, P., and Yu, B. (2003), “Boosting with the L2 Loss: Regression and Classification,” Journal of the American Statistical Association, 98, 324–339.
  • Clutton-Brock, J., Dennis-Bryan, K., Armitage, P. L., and Jewell, P. A. (1990), “Osteology of the Soay Sheep,” Bulletin of the British Museum (Natural History), 56, 1–56.
  • Cornea, E., Zhu, H., Kim, P., Ibrahim, J. G., and the Alzheimer’s Disease Neuroimaging Initiative. (2017), “Regression Models on Riemannian Symmetric Spaces,” Journal of the Royal Statistical Society, Series B, 79, 463–482.
  • Davis, B. C., Fletcher, P. T., Bullitt, E., and Joshi, S. (2010), “Population Shape Regression from Random Design Data,” International Journal of Computer Vision, 90, 255–266.
  • Dryden, I. L., and Mardia, K. V. (2016), Statistical Shape Analysis: With Applications in R, Chichjester: Wiley.
  • Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2011), “Recent Advances on Functional Additive Regression,” in Recent Advances in Functional Data Analysis and Related Topics, ed. F. Ferraty, pp. 97–102, Heidelberg: Springer.
  • Fletcher, P. T. (2013), “Geodesic Regression and the Theory of Least Squares on Riemannian Manifolds,” International Journal of Computer Vision, 105, 171–185.
  • Greven, S., and Scheipl, F. (2017), “A General Framework for Functional Regression Modelling,” (with discussion and rejoinder), Statistical Modelling, 17(1–2), 1–35 and 100–115.
  • Happ, C., and Greven, S. (2018), “Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains,” Journal of the American Statistical Association, 113, 649–659.
  • Hinkle, J., Fletcher, P. T., and Joshi, S. (2014), “Intrinsic Polynomials for Regression on Riemannian Manifolds,” Journal of Mathematical Imaging and Vision, 50, 32–52.
  • Hofner, B., Hothorn, T., Kneib, T., and Schmid, M. (2011), “A Framework for Unbiased Model Selection Based on Boosting,” Journal of Computational and Graphical Statistics, 20, 956–971.
  • Hofner, B., Kneib, T., and Hothorn, T. (2016), “A Unified Framework of Constrained Regression,” Statistics and Computing, 26, 1–14.
  • Hong, Y., Singh, N., Kwitt, R., and Niethammer, M. (2014), “Time-Warped Geodesic Regression,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 105–112, Springer.
  • Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., and Hofner, B. (2010), “Model-based Boosting 2.0,” Journal of Machine Learning Research, 11, 2109–2113.
  • Huckemann, S., Hotz, T., and Munk, A. (2010), “Intrinsic MANOVA for Riemannian Manifolds with an Application to Kendall’s Space of Planar Shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 593–603.
  • Jeon, J. M., and Park, B. U. (2020), “Additive Regression with Hilbertian Responses,” The Annals of Statistics, 48, 2671–2697.
  • Jeon, J. M., Lee, Y. K., Mammen, E., and Park, B. U. (2022), “Locally Polynomial Hilbertian Additive Regression,” Bernoulli, 28, 2034–2066.
  • Jeon, J. M., Park, B. U., and Van Keilegom, I. (2021), “Additive Regression for Non-Euclidean Responses and Predictors,” The Annals of Statistics, 49, 2611–2641.
  • Jupp, P. E., and Kent, J. T. (1987), “Fitting Smooth Paths to Spherical Data,” Journal of the Royal Statistical Society, Series C, 36, 34–46.
  • Karcher, H. (1977), “Riemannian Center of Mass and Mollifier Smoothing,” Communications on Pure and Applied Mathematics, 30, 509–541.
  • Kendall, D. G., Barden, D., Carne, T. K., and Le, H. (1999), Shape and Shape Theory (Vol. 500), Chichester: Wiley.
  • Kent, J. T., Mardia, K. V., Morris, R. J., and Aykroyd, R. G. (2001), “Functional Models of Growth for Landmark Data,” in Proceedings in Functional and Spatial Data Analysis, 109115.
  • Kim, H. J., Adluru, N., Collins, M. D., Chung, M. K., Bendlin, B. B., Johnson, S. C., Davidson, R. J., and Singh, V. (2014), “Multivariate General Linear Models (mglm) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2705–2712.
  • Kim, H. J., Adluru, N., Suri, H., Vemuri, B. C., Johnson, S. C., and Singh, V. (2017), “Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5777–5786.
  • Klingenberg, W. (1995), Riemannian Geometry, Berlin: de Gruyter.
  • Kneib, T., Hothorn, T., and Tutz, G. (2009), “Variable Selection and Model Choice in Geoadditive Regression Models,” Biometrics, 65, 626–634. DOI: 10.1111/j.1541-0420.2008.01112.x.
  • Kume, A., Dryden, I. L., and Le, H. (2007), “Shape-Space Smoothing Splines for Planar Landmark Data,” Biometrika, 94, 513–528.
  • Lay, D. M. (1967), “A Study of the Mammals of Iran: Resulting from the Street Expedition of 1962-63,” in Fieldiana: Zoology 54. Field Museum of Natural History.
  • Li, Y., and Ruppert, D. (2008), “On the Asymptotics of Penalized Splines,” Biometrika, 95, 415–436.
  • Lin, L., St. Thomas, B., Zhu, H., and Dunson, D. B. (2017), “Extrinsic Local Regression on Manifold-Valued Data,” Journal of the American Statistical Association, 112, 1261–1273. DOI: 10.1080/01621459.2016.1208615.
  • Lin, Z., Müller, H.-G., and Park, B. U. (2020), “Additive Models for Symmetric Positive-Definite Matrices, Riemannian Manifolds and Lie Groups,” arXiv preprint arXiv:2009.08789.
  • Lutz, R. W., and Bühlmann, P. (2006), “Boosting for High-Multivariate Responses in High-Dimensional Linear Regression,” Statistica Sinica, 16, 471–494.
  • Mallasto, A., and Feragen, A. (2018), “Wrapped Gaussian Process Regression on Riemannian Manifolds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5580–5588.
  • Mayr, A., Binder, H., Gefeller, O., and Schmid, M. (2014), “The Evolution of Boosting Algorithms,” Methods of Information in Medicine, 53, 419–427. DOI: 10.3414/ME13-01-0122.
  • Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P., and Morris, J. S. (2015), “Bayesian Function-on-Function Regression for Multilevel Functional Data,” Biometrics, 71, 563–574. DOI: 10.1111/biom.12299.
  • Morris, J. S. (2015), “Functional Regression,” Annual Review of Statistics and its Applications, 2, 321–359.
  • Morris, J. S., and Carroll, R. J. (2006), “Wavelet-based Functional Mixed Models,” Journal of the Royal Statistical Society, Series B, 68, 179–199. DOI: 10.1111/j.1467-9868.2006.00539.x.
  • Müller, H.-G., and Yao, F. (2008), “Functional Additive Models,” Journal of the American Statistical Association, 103, 1534–1544.
  • Muralidharan, P., and Fletcher, P. T. (2012), “Sasaki Metrics for Analysis of Longitudinal Data on Manifolds,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1027–1034, IEEE.
  • Olsen, N. L., Markussen, B., and Raket, L. L. (2018), “Simultaneous Inference for Misaligned Multivariate Functional Data,” Journal of the Royal Statistical Society, Series C, 67, 1147–1176.
  • Pennec, X. (2006), “Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements,” Journal of Mathematical Imaging and Vision, 25, 127–154.
  • Petersen, A., and Müller, H.-G. (2019), “Fréchet Regression for Random Objects with Euclidean Predictors,” The Annals of Statistics, 47, 691–719.
  • Pigoli, D., Menafoglio, A., and Secchi, P. (2016), “Kriging Prediction for Manifold-Valued Random Fields,” Journal of Multivariate Analysis, 145, 117–131.
  • Pöllath, N., Schafberg, R., and Peters, J. (2019), “Astragalar Morphology: Approaching the Cultural Trajectories of Wild and Domestic Sheep Applying Geometric Morphometrics,” Journal of Archaeological Science: Reports, 23, 810–821.
  • R Core Team (2018), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
  • Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis, New York: Springer.
  • Rosen, O., and Thompson, W. K. (2009), “A Bayesian Regression Model for Multivariate Functional Data,” Computational Statistics & Data Analysis, 53, 3773–3786. DOI: 10.1016/j.csda.2009.03.026.
  • Schafberg, R., and Wussow, J. (2010), “Julius Kühn. Das Lebenswerk eines agrarwissenschaftlichen Visionärs,” Züchtungskunde, 82, 468–484.
  • Schaffer, S. A. (2021), “Cytoskeletal Dynamics in Confined Cell Migration: Experiment and Modelling,” PhD thesis, LMU Munich. DOI: 10.5282/edoc.28480.
  • Scheipl, F., Staicu, A.-M., and Greven, S. (2015), “Functional Additive Mixed Models,” Journal of Computational and Graphical Statistics, 24, 477–501. DOI: 10.1080/10618600.2014.901914.
  • Schiratti, J.-B., Allassonnière, S., Colliot, O., and Durrleman, S. (2017), “A Bayesian Mixed-Effects Model to Learn Trajectories of Changes from Repeated Manifold-Valued Observations,” The Journal of Machine Learning Research, 18, 4840–4872.
  • Shi, X., Styner, M., Lieberman, J., Ibrahim, J. G., Lin, W., and Zhu, H. (2009), “Intrinsic Regression Models for Manifold-Valued Data,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 192–199, Springer.
  • Srivastava, A., and Klassen, E. P. (2016), Functional and Shape Data Analysis, New York: Springer-Verlag.
  • Stöcker, A., Brockhaus, S., Schaffer, S. A., Bronk, B. v., Opitz, M., and Greven, S. (2021), “Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition,” Statistical Modelling, 21, 385–404.
  • Stöcker, A., Pfeuffer, M., Steyer, L., and Greven, S. (2022), “Elastic Full Procrustes Analysis of Plane Curves via Hermitian Covariance Smoothing.” DOI: 10.48550/arXiv.2203.10522.
  • Thüroff, F., Goychuk, A., Reiter, M., and Frey, E. (2019), “Bridging the Gap between Single-Cell Migration and Collective Dynamics,” eLife, 8, e46842.
  • Volkmann, A., Stöcker, A., Scheipl, F., and Greven, S. (2021), “Multivariate Functional Additive Mixed Models,” Statistical Modelling. DOI: 10.1177/1471082X211056158.
  • Wood, S. N., Pya, N., and Säfken, B. (2016), “Smoothing Parameter and Model Selection for General Smooth Models,” Journal of the American Statistical Association, 111, 1548–1563.
  • Yao, F., Müller, H., and Wang, J. (2005), “Functional Data Analysis for Sparse Longitudinal Data,” Journal of the American Statistical Association, 100, 577–590.
  • Zeder, M. A. (2006), “Reconciling Rates of Long Bone Fusion and Tooth Eruption and Wear in Sheep (Ovis) and Goat (Capra),” Recent Advances in Ageing and Sexing Animal Bones, 9, 87–118.
  • Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009), “Intrinsic Regression Models for Positive-Definite Matrices with Applications to Diffusion Tensor Imaging,” Journal of the American Statistical Association, 104, 1203–1212.
  • Zhu, H., Li, R., and Kong, L. (2012), “Multivariate Varying Coefficient Model for Functional Responses,” Annals of Statistics, 40, 2634–2666.
  • Zhu, H., Morris, J. S., Wei, F., and Cox, D. D. (2017), “Multivariate Functional Response Regression, with Application to Fluorescence Spectroscopy in a Cervical Pre-cancer Study,” Computational Statistics and Data Analysis, 111, 88–101.