Full article: Functional Additive Models on Manifolds of Planar Shapes and Forms

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The “shape” of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation, and scaling, its “form” its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian L₂-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even nonlinear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of (a) astragalus shapes of wild and domesticated sheep and (b) cell forms generated in a biophysical model, as well as (c) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines. Supplementary materials for this article are available online.

Keywords:

1 Introduction

In many imaging data problems, the coordinate system of recorded objects is arbitrary or explicitly not of interest. Statistical shape analysis (Dryden and Mardia Citation2016) addresses this point by identifying the ultimate object of analysis as the shape of an observation, reflecting its geometric properties invariant under translation, rotation and rescaling, or as its form (or size-and-shape) invariant under translation and rotation. This article establishes a flexible additive regression framework for modeling the shape or form of planar (potentially irregularly sampled) curves and/or landmark configurations in dependence on scalar covariates. A rich shape analysis literature has been developed for 2D or 3D landmark configurations—presenting for instance selected points of a bone or face—which are considered elements of Kendall’s shape space (see, e.g., Dryden and Mardia Citation2016). In many 2D scenarios, however, observed points describe a curve reflecting the outline of an object rather than dedicated landmarks (Adams, Rohlf, and Slice Citation2013). Considering outlines as images of (parameterized) curves shows a direct link to functional data analysis (FDA, Ramsay and Silverman Citation2005) and, in this context, we speak of functional shape/form data analysis. As in FDA, functional shape/form data can be observed on a common and often dense grid (regular/dense design) or on curve-specific often sparse grids (irregular/sparse design). While in the regular case, analysis often simplifies by treating curve evaluations as multivariate data, more general irregular designs gave rise to further developments in sparse FDA (e.g., Yao, Müller, and Wang Citation2005; Greven and Scheipl Citation2017), explicitly considering irregular measurements instead of pre-smoothing curves. To the best of our knowledge, we are the first to consider irregular/sparse designs in the context of functional shape/form analysis.

Shapes and forms are examples of manifold data. Petersen and Müller (Citation2019) propose “Fréchet regression” for random elements in general metric spaces, which requires estimation of a (potentially negatively) weighted Fréchet mean for each covariate combination. Their implicit rather than explicit model formulation renders model interpretation difficult. More explicit model formulations have been developed for the special case of a Riemannian geometry. Besides tangent space models (Kent et al. Citation2001), extrinsic models (Lin et al. Citation2017) and models based on unwrapping (Jupp and Kent Citation1987; Mallasto and Feragen Citation2018), a variety of manifold regression models have been designed based on the intrinsic Riemannian geometry. Starting from geodesic regression (Fletcher Citation2013), which extends linear regression to curved spaces, these include MANOVA (Huckemann, Hotz, and Munk Citation2010), polynomial regression (Hinkle, Fletcher, and Joshi Citation2014), smoothing splines (Kume, Dryden, and Le Citation2007), regression along geodesic paths with nonconstant speed (Hong et al. Citation2014), or kernel regression (Davis et al. Citation2010) and Kriging (Pigoli, Menafoglio, and Secchi Citation2016). However, mostly only one metric covariate or categorical covariates are considered, possibly in hierarchical model extensions for longitudinal data (Muralidharan and Fletcher Citation2012; Schiratti et al. Citation2017). By contrast, Zhu et al. (Citation2009), Shi et al. (Citation2009), and Kim et al. (Citation2014) generalize geodesic regression to regression with multiple covariates focusing on Symmetric Positive-Definite (SPD) matrix responses. Cornea et al. (Citation2017) develop a general Generalized Linear Model (GLM) analogue regression framework for responses in a symmetric manifold and apply it to shape analysis. Recently, Lin, Müller, and Park (Citation2020) proposed a Lie group additive regression model for Riemannian manifolds focusing on SPD matrices rather than shapes.

In FDA, there is a much wider range of developed regression methods (see overviews in Morris Citation2015; Greven and Scheipl Citation2017). Among the most flexible models are Functional Additive Models (FAMs) for (univariate) functional responses (in contrast to FAMs with functional covariates (Ferraty et al. Citation2011)) with either semi- or nonparametric approaches to model (a) response functions and (b) smooth covariate effects. For (a), nonparametric approaches formulate estimation problems in infinite-dimensional model spaces to motivate finite-dimensional representations or effectively evaluate curves on grids (e.g., Jeon and Park Citation2020). Semi-parametric approaches directly employ finite expansions in spline bases (Brockhaus, Scheipl, and Greven Citation2015), Functional Principal Component (FPC) bases (Morris and Carroll Citation2006) or both (Scheipl, Staicu, and Greven Citation2015), as well as wavelets (Meyer et al. Citation2015), sometimes directly expanding functions to model on coefficients and sometimes expanding only predictions while keeping the raw measurements. Nonparametric approaches are formulated in infinite-dimensional model spaces and effectively evaluate curves on grids or apply pre-smoothing techniques (e.g., Jeon and Park Citation2020). For (b), again semiparametric penalized spline basis approaches are employed (Scheipl, Staicu, and Greven Citation2015; Brockhaus, Scheipl, and Greven Citation2015), or local linear/polynomial (Müller and Yao Citation2008; Jeon et al. Citation2022) or other nonparametric kernel-based approaches (Jeon and Park Citation2020; Jeon, Park, and Van Keilegom Citation2021). Semi- and nonparametric approaches come with different theoretical and practical advantages, but similarities such as regarding asymptotic behavior are also known from scalar nonparametric regression (Li and Ruppert Citation2008). Advantages of the semi-parametric approach summarized in Greven and Scheipl (Citation2017) include its appropriateness for sparse irregular functional data and its modular extensibility to functional mixed models (Scheipl, Staicu, and Greven Citation2015; Meyer et al. Citation2015) and nonstandard response distributions (Brockhaus, Scheipl, and Greven Citation2015; Stöcker et al. Citation2021). For bivariate or multivariate functional responses, which are closest to functional shapes/forms but without invariances, Rosen and Thompson (Citation2009), Zhu, Li, and Kong (Citation2012), Olsen, Markussen, and Raket (Citation2018) consider linear fixed effects of scalar covariates, the latter also allowing for warping. Zhu et al. (Citation2017), Backenroth et al. (Citation2018) consider one or more random effects for one grouping variable, linear fixed effects and common dense grids for all functions. Volkmann et al. (Citation2021) combine the FAM model class of Greven and Scheipl (Citation2017) with multivariate FPC analysis (Happ and Greven Citation2018) to model multivariate (sparse) functional responses.

This article establishes an interpretable FAM framework for modeling the shape or form of planar (potentially irregularly sampled) curves and/or landmark configurations in dependence on scalar covariates, extending L₂-Boosting (Bühlmann and Yu Citation2003; Brockhaus, Scheipl, and Greven Citation2015) to Riemannian manifolds for model estimation. The three major contributions of our regression framework are: (i) We introduce additive regression with shapes/forms of planar curves and/or landmarks as response, extending FAMs to nonlinear response spaces or, vice versa, extending GLM-type regression on manifolds for landmark shapes both to functional shape manifolds and to include (nonlinear) additive model effects. (ii) We propose a novel Riemannian L₂-Boosting algorithm for estimating regression models for this type of manifold response, and (iii) a visualization technique based on tensor-product factorization yielding intuitive interpretations even of multi-dimensional smooth covariate effects for practitioners. Although related tensor-product model transformations based on higher-order SVD have been used, e.g., in control engineering (Baranyi, Yam, and Várlaki Citation2013), we are not aware of any comparable application for visualization in FAMs or other statistical models for object data. Despite our focus on shapes and forms, transfer of the model, Riemannian L₂-Boosting, and factorized visualization to other Riemannian manifold responses is intended in the generality of the formulation and the design of the provided R package manifoldboost (developer version on github.com/Almond-S/manifoldboost). The versatile applicability of the approach is illustrated in three different scenarios: an analysis of the shape of sheep astragali (ankle bones) represented by both regularly sampled curves and landmarks in dependence on categorical “demographic” variables; an analysis of the effects of different metric biophysical model parameters (including smooth interactions) on the form of (irregularly sampled) cell outlines generated from a cellular Potts model; and a simulation study with irregularly sampled functional shape and form responses generated from a dataset of different bottle outlines and including metric and categorical covariates.

In Section 2, we introduce the manifold geometry of irregular curves modulo translation, rotation and potentially rescaling, which underlies the intrinsic additive regression model formulated in Section 3. The Riemannian L²-Boosting algorithm is introduced in Section 4. Section 5 analyzes different data problems, modeling sheep bone shape responses (Section 5.1) and cell outlines (Section 5.2). Section 5.3 summarizes the results of simulation studies with functional shape and form responses. We conclude with a discussion in Section 6.

2 Geometry of Functional Shapes and Forms

Riemannian manifolds of planar shapes (and forms) are discussed in various textbooks at different levels of generality, in finite (Kendall et al. Citation1999; Dryden and Mardia Citation2016) or potentially infinite dimensions (Srivastava and Klassen Citation2016; Klingenberg Citation1995). Starting from the Hilbert space $Y$ of curve representatives y of a single shape or form observation, we successively characterize its quotient space geometry under translation, rotation and rescaling including the respective tangent spaces. Building on that, we introduce Riemannian exponential and logarithmic maps and parallel transports needed for model formulation and fitting, and the sample space of (irregularly observed) functional shapes/forms.

To make use of complex arithmetic, we identify the two-dimensional plane with the complex numbers, $R^{2} ≅ C$ , and consider a planar curve to be a function $y : R \supset T \to C$ , element of a separable complex Hilbert space $Y$ with a complex inner product $〈 \cdot, \cdot 〉$ and corresponding norm $‖ \cdot ‖$ . This allows simple scalar expressions for the group actions of translation $Trl = {y \overset{{Trl}_{γ}}{\to} y + γ 1 : γ \in C}$ with $1 \in Y$ canonically given by $1 : t \mapsto \frac{1}{‖ t \mapsto 1 ‖}$ the real constant function of unit norm; rescaling $Scl = {y \overset{{Scl}_{λ}}{\to} λ \cdot (y - 0_{y}) + 0_{y} : λ \in R^{+}}$ around the centroid $0_{y} = 〈 1, y 〉 1$ (which we consider more natural than using $0$ , the zero element of $Y$ , mostly chosen in the literature); and rotation $Rot = {y \overset{{Rot}_{u}}{\to} u \cdot (y - 0_{y}) + 0_{y} : u \in S^{1}}$ around $0_{y}$ with $S^{1} = {u \in C : | u | = 1} = {\exp (ω \sqrt{- 1}) : ω \in R}$ reflecting counterclockwise rotations by ω radian measure. Concatenation yields combined group actions G as direct products, such as the rigid motions $G = Trl \times Rot = {{Trl}_{γ} ° {Rot}_{u} : γ \in C, u \in S^{1}} ≅ C \times S^{1}$ (see Section S.1.1, supplementary materials for more details). The two real-valued component functions of y are identified with the real part $Re (y) : T \to R$ and imaginary part $Im (y) : T \to R$ of $y = Re (y) + Im (y) \sqrt{- 1}$ . While the complex setup is used for convenience, the real part of $〈 \cdot, \cdot 〉$ constitutes an inner product $Re (〈 y_{1}, y_{2} 〉) = 〈 Re (y_{1}), Re (y_{2}) 〉 + 〈 Im (y_{1}), Im (y_{2}) 〉$ for $y_{1}, y_{2} \in Y$ on the underlying real vector space of planar curves. Typically $Re (y), Im (y)$ are assumed square-intregrable with respect to a measure ν and we consider the canonical inner product $〈 y_{1}, y_{2} 〉 = \int y_{1}^{†} y_{2} d ν$ where $y^{†}$ denotes the conjugate transpose of y, that is, $y^{†} (t) = Re (y) (t) - Im (y) (t) \sqrt{- 1}$ is simply the complex conjugate, but for vectors $y \in C^{k}$ , the vector $y^{†}$ is also transposed. For curves, we typically assume ν to be the Lebesgue measure on $T = [0, 1]$ ; for landmarks, a standard choice is the counting measure on $T = {1, \dots, k}$ .

The ultimate response object is given by the orbit ${[y]}_{G} = {g (y) : g \in G}$ (or short $[y]$ ) of $y \in Y$ , the equivalence class under the respective combined group actions G: with $G = Trl \times Rot \times Scl, [y] = {[y]}_{Trl \times Rot \times Scl} = {λ u y + γ 1 : λ \in R^{+}, u \in S^{1}, γ \in C}$ is referred to as the shape of y and, for $G = Trl \times Rot$ , $[y] = {[y]}_{Trl \times Rot} = {u y + γ 1 : u \in S^{1}, γ \in C}$ as its form or size-and-shape. $Y_{/ G} = {{[y]}_{G} : y \in Y}$ denotes the quotient space of $Y$ with respect to G. The description of the Riemannian geometry of $Y_{/ G}$ involves, in particular, a description of the tangent spaces $T_{[y]} Y_{/ G}$ at points $[y] \in Y_{/ G}$ , which can be considered local vector space approximations to $Y_{/ G}$ in a neighborhood of $[y]$ . For a point q in a manifold $M$ the tangent vectors $β \in T_{q} M$ can, i.a., be thought of as gradients $\dot{c} (0)$ of paths $c : R \supset (- δ, δ) \to M$ at 0 where they pass through $c (0) = q$ . Besides their geometric meaning, they will also play an important role in the regression model, as additive model effects are formulated on tangent space level. Choosing suitable representatives ${\tilde{y}}^{G} \in {[y]}_{G} \subset Y$ (or short $\tilde{y}$ ) of orbits ${[y]}_{G}$ , we use an identification of tangent spaces with suitable linear subspaces $T_{{[y]}_{G}} Y_{/ G} \subset Y$ .

Form geometry: Starting with translation as the simplest invariance, an orbit ${[y]}_{Trl}$ can be one-to-one identified with its centered representative ${\tilde{y}}^{Trl} = y - 〈 y, 1 〉 1$ yielding an identification $Y_{/ Trl} ≅ {y \in Y : 〈 y, 1 〉 = 0}$ with a linear subspace of $Y$ . Hence, also $T_{[y]} Y_{/ Trl} = {y \in Y : 〈 y, 1 〉 = 0}$ . For rotation, by contrast, we can only find local identifications with Hilbert subspaces (i.e., charts) around reference points ${[p]}_{Trl \times Rot}$ we refer to as “poles”. Moreover, we restrict to $y, p \in Y^{*} = Y ∖ {[0]}_{Trl}$ eliminating constant functions as degenerate special cases in the translation orbit of zero. For each ${[y]}_{Trl \times Rot}$ in an open neighborhood around ${[p]}_{Trl \times Rot}$ which can be chosen with $〈 {\tilde{y}}^{Trl}, {\tilde{p}}^{Trl} 〉 \neq 0$ , y can be uniquely rotation aligned to p, yielding a one-to-one identification of the form ${[y]}_{Trl \times Rot}$ with the aligned representative given by ${\tilde{y}}^{Trl \times Rot} = \frac{〈 {\tilde{y}}^{Trl}, {\tilde{p}}^{Trl} 〉}{| 〈 {\tilde{y}}^{Trl}, {\tilde{p}}^{Trl} 〉 |} {\tilde{y}}^{Trl} = {argmin}_{y' \in {[y]}_{Trl \times Rot}} ‖ y' - p ‖$ (compare ). While ${\tilde{y}}^{Trl \times Rot}$ depends on p, we omit this in the notation for simplicity. All ${\tilde{y}}^{Trl}$ rotation aligned to ${\tilde{p}}^{Trl}$ lie on the hyper-plane determined by $Im (〈 {\tilde{y}}^{Trl}, {\tilde{p}}^{Trl} 〉) = 0$ (), which yields $T_{[p]} Y_{/ Trl + Rot}^{*} = {y \in Y : 〈 y, 1 〉 = 0, Im (〈 y, p 〉) = 0}$ with normal vectors $ζ^{(1)} = 1, ζ^{(2)} = \sqrt{- 1} 1, ζ^{(3)} = \sqrt{- 1} p$ . Note that, despite the use of complex arithmetic, $T_{[p]} Y_{/ Trl \times Rot}^{*}$ is a real vector space not closed under complex scalar multiplication. The geodesic distance of ${[y]}_{Trl \times Rot}$ to the pole ${[p]}_{Trl \times Rot}$ is given by $d ({[y]}_{Trl \times Rot}, {[p]}_{Trl \times Rot}) = ‖ {\tilde{y}}^{Trl \times Rot} - {\tilde{p}}^{Trl} ‖ = \underset{y' \in {[y]}_{Trl \times Rot}, p' \in {[p]}_{Trl \times Rot}}{argmin} ‖ y' - p' ‖$ . It reflects the length of the shortest path (i.e., the geodesic) between the forms and the minimum distance between the orbits as sets.

Fig. 1 Left: Quotient space geometry: assuming p and y centered, translation invariance is not further considered in the plot; given pole representative p, we express $y = \frac{Re (〈 p, y 〉)}{‖ p ‖^{2}} p + \frac{Im (〈 p, y 〉)}{‖ p ‖^{2}} i p + (y - \frac{〈 p, y 〉}{‖ p ‖^{2}} p) \in Y$ in its coordinates in p and ip direction, subsuming all orthogonal directions in the third dimension. In this coordinate system, the rotation orbit ${[y]}_{Rot}$ corresponds to the dotted horizontal circle, and is identified with the aligned $\tilde{y} : = {\tilde{y}}^{Rot}$ in the half-plane of p; ${[y]}_{Rot \times Scl}$ is identified with the unit vector ${\tilde{y}}^{Rot \times Scl} = \frac{\tilde{y}}{‖ \tilde{y} ‖}$ projecting $\tilde{y}$ onto the hemisphere depicted by the vertical semicircle. Form and shape distances between $[p]$ and $[y]$ correspond to the length of the geodesics $c (τ)$ (thick lines) on the plane and sphere, respectively. Right: Geodesic line $c (τ)$ between $p = c (0)$ and $p' = c (1)$ , Log-map projecting y to $ε \in T_{p} M$ , parallel transport ${Transp}_{p p'}$ forwarding ε to $ε' \in T_{p'} M$ , and Exp-map projecting $ε'$ onto $M$ visualized for a sphere. Tangent spaces, identified with subspaces of the ambient space, are depicted as gray planes above the respective poles. The parallel transport preserves all angles between tangent vectors and identifies $\dot{c} (0) ≅ \dot{c} (1)$ .

Shape geometry: To account for scale invariance in shapes ${[y]}_{Trl \times Rot \times Scl}$ , they are identified with normalized representatives ${\tilde{y}}^{Trl \times Rot \times Scl} = \frac{{\tilde{y}}^{Trl \times Rot}}{‖ {\tilde{y}}^{Trl \times Rot} ‖}$ . Motivated by the normalization, we borrow the well-known geometry of the sphere $S = {y \in Y : ‖ y ‖ = 1}$ , where $T_{p} S = {y \in Y : Re (〈 y, p 〉) = 0}$ is the tangent space at a point $p \in S$ and geodesics are great circles. Together with translation and rotation invariance, the shape tangent space is then given by $T_{[p]} Y_{/ Trl \times Rot \times Scl}^{*} = T_{[p]} Y_{/ Trl \times Rot}^{*} \cap T_{p} S = {y \in Y : 〈 y, 1 〉 = 0, 〈 y, p 〉 = 0}$ with normal vector $ζ^{(4)} = p$ in addition to $ζ^{(1)}, ζ^{(2)}, ζ^{(3)}$ above. The geodesic distance $d ({[p]}_{Trl \times Rot \times Scl}, {[y]}_{Trl \times Rot \times Scl}) = \arccos | 〈 {\tilde{y}}^{Trl \times Rot \times Scl}, {\tilde{p}}^{Trl \times Rot \times Scl} 〉 |$ corresponds to the arc-length between the representatives. This distance is often referred to as Procrustres distance in statistical shape analysis.

We may now define the maps needed for the regression model formulation. Let $\tilde{y}$ and $\tilde{p}$ be shape/form representatives of $[y]$ and $[p]$ rotation aligned to the shape/form pole representative p. Generalizing straight lines to a Riemannian manifold $M$ , geodesics $c : (- δ, δ) \to M$ can be characterized by their “intercept” $c (0) \in M$ and “slope” $\dot{c} (0) \in T_{c (0)} M$ . The exponential map ${Exp}_{q} : T_{q} M \to M$ at a point $q \in M$ is defined to map $β \mapsto c (1)$ for c the geodesic with $q = c (0)$ and $β = \dot{c} (0)$ . It maps $β \in T_{q} M$ to a point ${Exp}_{q} (β) \in M$ located $d (q, {Exp}_{q} (β)) = ‖ β ‖$ apart of the pole q in the direction of β. On the form space $Y_{/ Trl \times Rot}$ , the exponential map is simply given by ${Exp}_{{[p]}_{Trl \times Rot}} (β) = {[{\tilde{p}}^{Trl \times Rot} + β]}_{Trl \times Rot}$ . On the shape space $Y_{/ Trl \times Rot \times Scl}$ , identification with exponential maps on the sphere yields ${Exp}_{{[p]}_{G}} (β) = {[\cos (‖ β ‖) {\tilde{p}}^{G} + \sin (‖ β ‖) \frac{β}{‖ β ‖}]}_{G}$ with $G = Trl \times Rot \times Scl$ . In an open neighborhood $U, q \in U \subset M, {Exp}_{q}$ is invertible yielding the ${Log}_{q} : U \to T_{q} M$ map from the manifold to the tangent space at q. For forms, it is given by ${Log}_{{[p]}_{Trl \times Rot}} ({[y]}_{Trl \times Rot}) = {\tilde{y}}^{Trl \times Rot} - {\tilde{p}}^{Trl \times Rot}$ and, for shapes, by ${Log}_{{[p]}_{G}} ({[y]}_{G}) = d ({[p]}_{G}, {[y]}_{G}) \frac{{\tilde{y}}^{G} - 〈 {\tilde{p}}^{G}, {\tilde{y}}^{G} 〉 {\tilde{p}}^{G}}{‖ {\tilde{y}}^{G} - 〈 {\tilde{p}}^{G}, {\tilde{y}}^{G} 〉 {\tilde{p}}^{G} ‖}$ with $G = Trl \times Rot \times Scl$ . Finally, ${Transp}_{q, q'} : T_{q} M \to T_{q'} M$ parallel transports tangent vectors $ε \mapsto ε'$ isometrically along a geodesic $c (τ)$ connecting q and $q' \in M$ such that the slopes ${Transp}_{q, q'} (\dot{c} (q)) = \dot{c} (q')$ are identified and all angles are preserved. For shapes, ${Transp}_{{[y]}_{G}, {[p]}_{G}} (ε) = ε - 〈 ε, {\tilde{p}}^{G} 〉 \frac{{\tilde{y}}^{G} + {\tilde{p}}^{G}}{1 + 〈 {\tilde{y}}^{G}, {\tilde{p}}^{G} 〉}$ , with $G = Trl \times Rot \times Scl$ , takes the form of the parallel transport on a sphere replacing the real inner product with its complex analogue. For forms, it changes only the $Im (〈 ε, \tilde{p} 〉)$ coordinate orthogonal to the real $\tilde{y}$ - $\tilde{p}$ -plane as in the shape case, while the remainder of ε is left unchanged as in a linear space. This yields ${Transp}_{{[y]}_{G}, {[p]}_{G}} (ε) = ε - Im (〈 {\tilde{p}}^{G} / ‖ {\tilde{p}}^{G} ‖, ε 〉) \frac{{\tilde{y}}^{G} / ‖ {\tilde{y}}^{G} ‖ + {\tilde{p}}^{G} / ‖ {\tilde{p}}^{G} ‖}{1 + 〈 {\tilde{y}}^{G} / ‖ {\tilde{y}}^{G} ‖, {\tilde{p}}^{G} / ‖ {\tilde{p}}^{G} ‖ 〉} \sqrt{- 1}$ , with $G = Trl \times Rot$ , for form tangent vectors. While equivalent expressions for the parallel transport in the shape case can be found, for example, in Dryden and Mardia (Citation2016), Huckemann, Hotz, and Munk (Citation2010), a corresponding derivation for the form case is given in Section S.1.2, supplementary materials including a discussion of the quotient space geometry in differential geometric terms.

Based on this understanding of the response space, we may now proceed to consider a sample of curves $y_{1}, \dots, y_{n} \in Y$ representing orbits $[y_{1}], \dots, [y_{n}]$ with respect to group actions G. In the functional case, with the domain $T = [0, 1]$ , these curves are usually observed as evaluations $y_{i} = {(y_{i} (t_{i 1}), \dots, y_{i} (t_{i k_{i}}))}^{⊤}$ on a finite grid $t_{i 1} < \dots < t_{i k_{i}} \in T$ which may differ between observations. In contrast to the regular case with common grids, this more general data structure is referred to as irregular functional shape/form data. To handle this setting, we replace the original inner product $〈 \cdot, \cdot 〉$ on $Y$ by individual ${〈 y_{i}, y_{i}^{'} 〉}_{i} = y_{i}^{†} W_{i} y'_{i}$ providing inner products on the k_i -dimensional space $Y_{i} = C^{k_{i}}$ of evaluations $y_{i}, y_{i}^{'}$ on the same grid. The symmetric positive-definite weight matrix $W_{i}$ can be chosen to implement an approximation to integration w.r.t. the original measure ν with a numerical integration measure ν_i such as given by the trapezoidal rule. Alternatively, $W_{i} = \frac{1}{k_{i}} I_{k_{i}}$ with $k_{i} \times k_{i}$ identity matrix $I_{k_{i}}$ presents a canonical choice that is analog to the landmark case for $k_{i} \equiv k$ . Moreover, data-driven $W_{i}$ could also be motivated from the covariance structure estimated for (potentially sparse) $y_{1}, \dots, y_{n}$ along the lines of Yao, Müller, and Wang (Citation2005), Stöcker et al. (Citation2022). While this is beyond the scope of this article, potential procedures are sketched in Section S.7, supplementary materials. With the inner products given for $i = 1, \dots, n$ , the sample space naturally arises as the Riemannian product $Y_{1 / G}^{*} \times \dots \times Y_{n / G}^{*}$ of the orbit spaces, with the individual geometries constructed as described above.

3 Additive Regression on Riemannian Manifolds

Consider a data scenario with n observations of a random response covariate tuple $(Y, X)$ , where the realizations of Y are planar curves $y_{i} : T \to C, i = 1, \dots, n$ , belonging to a Hilbert space $Y$ defined as above and potentially irregularly measured on individual grids $t_{i 1} < \dots < t_{i k_{i}} \in T$ . The response object $[Y]$ is the equivalence class of Y with respect to translation, rotation and possibly scale and the sample $[y_{1}], \dots, [y_{n}]$ is equipped with the respective Riemannian manifold geometry introduced in the previous section. For $i = 1, \dots, n$ , realizations $x_{i} \in X$ of a covariate vector X in a covariate space $X$ are observed. X can contain several categorical and/or metric covariates.

For regressing the mean of $[Y]$ on $X = x$ , we model the shape/form $[μ]$ of $μ \in Y$ as(1) $[μ] = {Exp}_{[p]} (h (x)) = {Exp}_{[p]} (\sum_{j = 1}^{J} h_{j} (x)),$ (1) with an additive predictor $h : X \to T_{[p]} Y_{/ G}^{*}$ acting in the tangent space at an “intercept” $[p] \in Y_{/ G}^{*}$ . Generalizing an additive model “ $Y = μ + ϵ = p + h (x) + ϵ$ ” in a linear space, we implicitly define $[μ]$ as the conditional mean of $[Y]$ given $X = x$ by assuming zero-mean “residuals” ϵ. In their definition, we follow Cornea et al. (Citation2017) but extend to the functional shape/form and additive case. We assume local linearized residuals $ε_{[μ]} = {Log}_{[μ]} ([Y])$ in $T_{[μ]} Y_{/ G}^{*}$ to have mean $E (ε_{[μ]}) = 0$ , which corresponds to $E (ε_{[μ]} (t)) = 0$ for (ν-almost) all $t \in T$ . Here, we assume $[Y]$ is sufficiently close to $[μ]$ with probability 1 such that ${Log}_{[μ]}$ is well-defined, which is the case whenever $〈 \tilde{Y}, \tilde{μ} 〉 \neq 0$ for centered shape/form representatives $\tilde{Y}$ and $\tilde{μ}$ , an unrestrictive and common assumption (compare also Cornea et al. Citation2017). However, residuals $ε_{[μ]}$ for different $[μ]$ belong to separate tangent spaces. To obtain a formulation in a common linear space instead, local residuals are mapped to residuals $ϵ = {Transp}_{[μ], [p]} (ε_{[μ]})$ by parallel transporting them from $[μ]$ to the common covariate independent pole $[p]$ . After this isometric mapping into $T_{[p]} Y_{/ G}^{*}$ , we can equivalently define the conditional mean $[μ]$ via $E (ϵ) = 0$ for the transported residuals ϵ.

${Exp}_{[p]}$ maps the additive predictor $h (x) = \sum_{j = 1}^{J} h_{j} (x) \in T_{[p]} Y_{/ G}^{*}$ to the response space. It is analogous to a response function in GLMs but depends on $[p]$ . Although covariate effects $h_{j} (x)$ often only depend on an individual covariate in x for each j, they might also depend on covariate combinations in general to allow (smooth) interactions. While other response functions could be used, we restrict to the exponential map here, such that the model contains a geodesic model (Fletcher Citation2013)—the direct generalization of simple linear regression—as a special case for $h (x) = β x_{1}$ with a single covariate x₁ and tangent vector β. Typically, it is assumed that h is centered such that $E (h (X)) = 0$ , and the pole $[p]$ is the overall mean of $[Y]$ defined, like the conditional mean, via residuals of mean zero.

3.1 Tensor-Product Effect Functions h_j

Scheipl, Staicu, and Greven (Citation2015) and other authors employ Tensor-Product (TP) bases for functional additive model terms. This naturally extends to tangent space effects, which we model as $h_{j} (x) = \sum_{r, l} θ_{j}^{(r, l)} b_{j}^{(l)} (x) \partial_{r}$ with the TP basis given by the pair-wise products of $m_{}$ linearly independent tangent vectors $\partial_{r} \in T_{[p]} Y_{/ G}^{*}, r = 1, \dots, m_{},$ and m_j basis functions $b_{j}^{(l)} : X \to R, l = 1, \dots, m_{j},$ for the jth covariate effect depending on one or more covariates. The real coefficients can be arranged as a matrix ${θ_{j}^{(r, l)}}_{r, l} = Θ_{j} \in R^{m_{} \times m_{j}}$ . Also for infinite-dimensional $T_{[p]} Y_{/ G}^{*}$ and a general nonlinear dependence on x, a basis representation approach requires truncation to finite dimensions m and m_j in practice. Choosing the bases to capture the essential variability in the data, their size can be extended with increasing data size and computational resources.

While, in principle, the basis ${\partial_{r}}_{r}$ could also vary across effects $j = 1, \dots, J$ , we assume a common basis for notational simplicity, which presents the typical choice. Due to the identification of $T_{[p]} Y_{/ G}^{*}$ with a subspace of the function space $Y$ , the ${\partial_{r}}_{r}$ may be specified using a function basis commonly used in additive models: Let $b_{0}^{(l)} : T \to R, l = 1, \dots, m_{0}$ be a basis of real functions, say a B-spline basis (other typical bases used in the literature include wavelet (Meyer et al. Citation2015) or FPC bases (Müller and Yao Citation2008)). Then we construct the tangent space basis as $\partial_{r} = \sum_{l = 1}^{m_{0}} (z_{p}^{(l, r)} + z_{p}^{(m_{0} + l, r)} \sqrt{- 1}) b_{0}^{(l)}$ , employing the same basis for the 1- and $\sqrt{- 1}$ -dimension before transforming it with a basis transformation matrix $Z_{p} = {z_{p}^{(l, r)}}_{l, r} \in R^{2 m_{0} \times m_{}}$ implementing the linear tangent space constraints $Re (〈 \partial_{l}, ζ^{(r)} 〉) = 0$ (or the empirical version) for all $\partial_{l}$ and normal vectors $ζ^{(1)}, ζ^{(2)}, ζ^{(3)}$ for forms and additionally $ζ^{(4)}$ for shapes defining $T_{[p]} Y_{/ G}^{*}$ as described in Section 2. Thus, the tangent space basis dimension is $m_{} = 2 m_{0} - 3$ for forms or m = 2m₀ − 4 for shapes (or could, in principle, be larger if the original basis already meets the constraints). For details on the construction of $Z_{p}$ see Section S.1.3, supplementary materials. For closed curves, we additionally choose $Z_{p}$ to enforce periodicity, that is, $\partial_{r} (t) = \partial_{r} (t + t_{0})$ for some $t_{0} \in R$ (compare Hofner, Kneib, and Hothorn Citation2016).

Given the tangent space basis, we may now modularly specify the usual additive model basis functions $b_{j}^{(l)} : X \to R, l = 1, \dots, m_{j}$ , for the jth covariate effect to obtain the full functional additive model “tool box” offered by, for example, Brockhaus, Scheipl, and Greven (Citation2015). Typically, $b_{j}^{(l)} (x) = b_{j}^{(l)} (z)$ depending on an individual covariate, say on z, in $x = {(\dots, z, \dots)}^{⊤}$ . But for a single covariate also multiple different effects can be specified and a single interaction effect depends on multiple covariates. A linear effect—linear in the tangent space—of the form $h_{j} (x) = β z$ of a scalar (typically centered) covariate z and $β \in T_{[p]} Y_{/ G}^{*}$ is simply implemented by a single function $b_{j}^{(1)} (x) = z$ . A smooth effect of the generic form $h_{j} (x) (t) = f (z, t)$ can be implemented by choosing, for example, a B-spline basis $b_{j}^{(1)} (z), \dots, b_{j}^{(m_{j})} (z)$ (asymptotic properties of penalized B-splines and connections to kernel estimators are discussed, for example, by Wood, Pya, and Säfken (Citation2016), Li and Ruppert (Citation2008)). For a categorical covariate κ in x, with effect $h_{j} (x) : {1, \dots, K} \to T_{[p]} Y_{/ G}^{*}, κ \mapsto β_{κ}$ , the basis $b_{j} (x) = {(b_{j}^{(1)} (κ), \dots, b_{j}^{(m_{j})} (κ))}^{⊤}$ maps $κ \mapsto e_{κ}$ to a usual contrast vector $e_{κ}$ with the basis being of dimension $m_{j} = K - 1$ just as in standard linear models. Here, we typically use effect-encoding to obtain centered effects. Moreover, TP interactions of the model terms described above, as well as group-specific effects and smooth effects with additional constraints (Hofner, Kneib, and Hothorn Citation2016) can be specified in the model formula, relying on the mboost framework introduced by Hothorn et al. (Citation2010), which also allows to define custom effect designs. For identification of an overall mean intercept $[p]$ , sum-to-zero constraints yielding $\sum_{i = 1}^{n} h_{j} (x_{i}) = 0$ for observed covariates $x_{i}$ can be specified, and similar constraints can be used to distinguish linear from nonlinear effects and interactions from their marginal effects (Kneib, Hothorn, and Tutz Citation2009). Different quadratic penalties can be specified for the coefficients $Θ_{j}$ , allowing to regularize high-dimensional effect bases and to balance effects of different complexity in the model fit (see, Section 4).

3.2 Tensor-Product Factorization

The multidimensional structure of the response objects makes it challenging to graphically illustrate and interpret additive model terms, in particular when it comes to nonlinear (interaction) effects, or when effect sizes are visually small. To solve this problem, we suggest to rewrite estimated TP effects ${\hat{h}}_{j}$ with estimated coefficient matrix ${\hat{Θ}}_{j}$ as ${\hat{h}}_{j} (x) = \sum_{r = 1}^{m_{j}^{'}} ξ_{j}^{(r)} {\hat{h}}_{j}^{(r)} (x)$ factorized into $m_{j}^{'} = \min (m_{j}, m_{0})$ components consisting of covariate effects ${\hat{h}}_{j}^{(r)} : X \to R, r = 1, \dots, m_{j}^{'},$ in corresponding orthonormal directions $ξ_{j}^{(r)} \in T_{[p]} Y_{/ G}^{*}$ with $〈 ξ_{j}^{(r)}, ξ_{j}^{(l)} 〉 = 1 (r = l)$ , that is, 1 if r = l and 0 otherwise. Assuming $E (b_{j}^{(l)} {(X)}^{2}) < \infty, l = 1, \dots, m_{j}$ , for the underlying effect basis, the ${\hat{h}}_{j}^{(r)}$ are specified to achieve decreasing component variances $v_{j}^{(1)} \geq \dots \geq v_{j}^{(m_{j}^{'})} \geq 0$ given by $v_{j}^{(r)} = E ({\hat{h}}_{j}^{(r)} {(X)}^{2})$ . In practice, the expectation over the covariates X and the inner product $〈 ., . 〉$ are replaced by empirical analogs (compare Corollary 3, supplementary materials). Due to orthonormality of the $ξ_{j}^{(r)}$ , the component variances add up to the total predictor variance $\sum_{r = 1}^{m_{j}^{'}} v_{j}^{(r)} = v_{j} = E (〈 {\hat{h}}_{j} (X), {\hat{h}}_{j} (X) 〉)$ . Moreover, the TP factorization is optimally concentrated in the first components in the sense that for any $l \leq m_{j}^{'}$ there is no sequence of $ξ_{*}^{(r)} \in Y$ and ${\hat{h}}_{*}^{(r)} : X \to R$ , such that $E (‖ {\hat{h}}_{j} (X) - \sum_{r = 1}^{l} ξ_{*}^{(r)} {\hat{h}}_{*}^{(r)} (X) ‖^{2}) < E (‖ h_{j} (X) - \sum_{r = 1}^{l} ξ_{j}^{(r)} {\hat{h}}_{j}^{(r)} (X) ‖^{2})$ , that is, the series of the first l components yields the best rank l approximation of ${\hat{h}}_{j}$ . The factorization relies on SVD of (a transformed version of) the coefficient matrix ${\hat{Θ}}_{j}$ and the fact that it is well-defined is a variant of the Eckart-Young-Mirsky theorem (proof in Section S.2, supplementary materials).

Particularly when large shares of the predictor variance are explained by the first component(s), the decomposition facilitates graphical illustration and interpretation: choosing a suitable constant τ ≠ 0, an effect direction $ξ_{j}^{(r)}$ can be visualized by plotting the pole representative p together with ${Exp}_{p} (τ ξ_{j}^{(r)})$ on the level of curves, while accordingly rescaled $\frac{1}{τ} {\hat{h}}_{j}^{(r)} (x)$ is displayed separately in a standard scalar effect plot. Adjusting τ offers an important degree of freedom for visualizing $ξ_{j}^{(r)}$ on an intuitively accessible scale while faithfully depicting $ξ_{j}^{(r)} {\hat{h}}_{j}^{(r)} (x)$ . When based on the same τ, different covariate effects can be compared across the plots sharing the same scale. We suggest $τ = \max_{j} \sqrt{v_{j}}$ , the maximum total predictor standard deviation of an effect, as a good first choice.

Besides factorizing effects separately, it can also be helpful to apply TP factorization to the joint additive predictor, yielding $h (x) = \sum_{r = 1}^{m'} ξ^{(r)} {\hat{h}}^{(r)} (x) = \sum_{r = 1}^{m'} ξ^{(r)} ({\hat{h}}_{1}^{(r)} (x) + \dots + {\hat{h}}_{J}^{(r)} (x)),$ with $m' = \min (\sum_{j} m_{j}, m_{})$ and again $ξ^{(r)} \in T_{[p]} Y_{/ G}^{*}$ orthonormal and the corresponding variance concentration in the first components, but now determined w.r.t. entire additive predictors ${\hat{h}}^{(r)} = \sum_{j = 1}^{J} {\hat{h}}_{j}^{(r)}$ spanned by all covariate basis functions in the predictor. In this representation, the first component yields a geodesic additive model approximation where the predictor moves along a geodesic line $c (τ) = {Exp}_{[p]} (ξ^{(1)} τ)$ with the signed distance $τ \in R$ from $[p]$ , modeled by a scalar additive predictor ${\hat{h}}^{(1)} (x)$ composed of covariate effects analogous to the original model predictor. In Section 5, we illustrate its potential in three different scenarios.

4 Component-Wise Riemannian L₂-Boosting

Component-wise gradient boosting (e.g., Hothorn et al. Citation2010) is a step-wise model fitting procedure accumulating predictors from smaller models, so called base-learners, to built an ensemble predictor aiming at minimizing a mean loss function. To this end, the base-learners are fit (via least squares) to the negative gradient of the loss function in each step and the best fitting base-learner is added to the current ensemble predictor. Due to its versatile applicability, inherent model selection, and slow over-fitting behavior, boosting has proven useful in various contexts (Mayr et al. Citation2014). Boosting with respect to the least squares loss function $l (y, μ) = \frac{1}{2} {(y - μ)}^{2}, y, μ \in R$ , is typically referred to as L₂-Boosting and simplifies to repeated refitting of residuals $ε = y - μ = - \nabla_{μ} l (y, μ)$ corresponding to the negative gradient of the loss function. For L₂-Boosting with a single learner, Bühlmann and Yu (2003) show how fast bias decay and slow variance increase over the boosting iterations suggest stopping the algorithm early before approaching the ordinary (penalized) least squares estimator. Lutz and Bühlmann (Citation2006) prove consistency of component-wise L²-Boosting in a high-dimensional multivariate response linear regression setting and Stöcker et al. (Citation2021) illustrate in extensive simulation studies how stopping the boosting algorithm early based on curve-wise cross-validation applies desired regularization when fitting (even highly autocorrelated) functional responses with parameter-intense additive model base-learners and, thus, leads to good estimates even in challenging scenarios.

When generalizing to least squares on Riemannian manifolds with the loss $\frac{1}{2} d^{2} ([y], [μ])$ given by the squared geodesic distance, the negative gradient $- \nabla_{[μ]} \frac{1}{2} d^{2} ([y], [μ]) = {Log}_{[μ]} ([y]) = ε_{[μ]}$ (compare e.g., Pennec Citation2006) corresponds to the local residuals $ε_{[μ]}$ defined in Section 3. This analogy to L₂-Boosting motivates the presented generalization where local residuals are further transported to residuals ϵ in a common linear space.

Consider the pole $[p]$ known and fixed for now. Assuming its existence, we aim to minimize the population mean loss $σ^{2} (h) = E (d^{2} ([Y], {Exp}_{[p]} (h (X))))$ with the point-wise minimizer $h^{⋆} (x) = \underset{_{h : X \to T_{[p]} Y_{/ G}^{*}}}{argmin} E (d^{2} ([Y], {Exp}_{[p]} (h (X))) | X = x)$ minimizing the conditional expected squared distance. Fixing a covariate constellation $x \in X$ , the prediction $[μ] = {Exp}_{[p]} (h^{⋆} (x))$ corresponds to the Fréchet mean (Karcher Citation1977) of $[Y]$ conditional on $X = x$ . In a finite-dimensional context, Pennec (Citation2006) show that $E (ε_{[μ]}) = 0$ for a Fréchet mean $[μ]$ if residuals $ε_{[μ]}$ are uniquely defined with probability one. This indicates the connection to our residual based model formulation in Section 3. We fit the model by reducing the empirical mean loss ${\hat{σ}}^{2} (h) = \frac{1}{n} \sum_{i = 1}^{n} d_{i}^{2} ([y_{i}], {Exp}_{[p]} (h (x_{i}))),$ where we replace the population mean by the sample mean and compute the geodesic distances d_i with respect to the inner products ${〈 \cdot, \cdot 〉}_{i}$ defined for the respective evaluations of y_i .

A base-learner corresponds to a covariate effect $h_{j} (x) = \sum_{r, l} θ_{j}^{(r, l)} b_{j}^{(l)} (x) \partial_{r}, Θ_{j} = {θ_{j}^{(r, l)}}_{r, l}$ , which is repeatedly fit to the transported residuals $ϵ_{1}, \dots, ϵ_{n}$ by penalized least-squares (PLS) minimizing $\sum_{i = 1}^{n} ‖ ϵ_{i} - h_{j} (x_{i}) ‖_{i}^{2} + λ_{j} tr (Θ_{j} P_{j} Θ_{j}^{⊤}) + λ_{} tr (Θ^{⊤} P_{} Θ)$ . Via the penalty parameters $λ_{j}, λ_{} \geq 0$ the effective degrees of freedom of the base-learners are controlled (Hofner et al. Citation2011) to achieve a balanced “fair” base-learner selection despite the typically large and varying number of coefficients involved in the TP effects. The symmetric penalty matrices $P_{j} \in R^{m_{j} \times m_{j}}$ and $P_{} \in R^{m_{} \times m_{}}$ (imposing, e.g., a second-order difference penalty for B-splines in either direction) can equivalently be arranged as a $m_{j} m_{} \times m_{j} m_{}$ penalty matrix $R_{j} = λ_{j} (P_{j} \otimes I_{m_{}}) + λ_{} (I_{m_{j}} \otimes P_{})$ for the vectorized coefficients $vec (Θ_{j}) = {(θ_{j}^{(1, 1)}, \dots, θ_{j}^{(m_{}, 1)}, \dots, θ^{(m_{}, m_{j})})}^{⊤}$ , where ⊗ denotes the Kronecker product. The standard PLS estimator is then given by $vec ({\hat{Θ}}_{j}) = {(Ψ_{j} + R_{j})}^{- 1} ψ_{j}$ with $Ψ_{j} = \sum_{i = 1}^{n} {Re ({〈 b_{j}^{(l)} (x_{i}) \partial_{r}, b_{j}^{(l')} (x_{i}) \partial_{r'} 〉}_{i})}_{\begin{matrix} (r, l) = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j}) \\ (r', l') = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j}) \end{matrix}} \in R^{m_{} m_{j} \times m_{} m_{j}}$ and $ψ_{j} = \sum_{i = 1}^{n} {Re ({〈 b_{j}^{(l)} (x_{i}) \partial_{r}, ϵ_{i} 〉}_{i})}_{(r, l) = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j})} \in R^{m_{} m_{j}}$ . In a regular design, using the functional linear array model (Brockhaus, Scheipl, and Greven Citation2015) can save memory and computation time by avoiding construction of the complete matrices. The basis construction of ${\partial_{r}}_{r}$ via a transformation matrix $Z_{p}$ (Section 3.1) is reflected in the penalty by setting $P_{} = Z_{p}^{⊤} (I_{2} \otimes P_{0}) Z_{p}$ with $P_{0}$ the penalty matrix for the un-transformed basis ${b_{0}^{(r)}}_{r}$ .

In each iteration of the proposed Algorithm 1, the best-performing base-learner is added to the current ensemble additive predictor $h (x)$ after multiplying it with a step-length parameter $η \in (0, 1]$ . Due to the additive model structure this corresponds to a coefficient update of the selected covariate effect. Accordingly, after repeated selection, the effective degrees of freedom of a covariate effect, in general, exceed the degrees specified for the base-learner. They are successively adjusted to the data. To avoid over-fitting, the algorithm is typically stopped early before reaching a minimum of the empirical mean loss. The stopping iteration is determined, for example, by resampling strategies such as bootstrapping or cross-validation on the level of shapes/forms.

Algorithm 1:

Component-wise Riemannian L²-Boosting

# Initialization:

Geometry: specify geometry (shape/form) and pole representative p

Hyper-parameters: Step-length $η \in (0, 1]$ , number of boosting iterations

Base-learners: $h_{j} (x)$ with penalty matrix $R_{j}$ and initial coefficient matrix $Θ_{j} = 0$

for j = 1 to J do # Prepare penalized least-squares (PLS)

# set up $m_{} m_{j} \times m_{} m_{j}$ matrix: $Ψ_{j} \leftarrow \sum_{i = 1}^{n} {Re ({〈 b_{j}^{(l)} (x_{i}) \partial_{r}, b_{j}^{(l')} (x_{i}) \partial_{r'} 〉}_{i})}_{\begin{matrix} (r, l) = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j}) \\ (r', l') = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j}) \end{matrix}}$

end

repeat # boosting steps

for $i = 1, \dots, n$ do # Compute current transported residuals

$[μ_{i}] \leftarrow {Exp}_{[p]} (h (x_{i}))$

$ε_{[μ_{i}]} \leftarrow {Log}_{[μ_{i}]} ([y_{i}])$

$ϵ_{i} \leftarrow {Transp}_{[μ_{i}], [p]} (ε_{[μ_{i}]})$

end

for $j = 1, \dots, J$ do # PLS fit to residuals

# $m_{} m_{j}$ vector: $ψ_{j} \leftarrow \sum_{i = 1}^{n} {Re ({〈 b_{j}^{(l)} (x_{i}) \partial_{r}, ϵ_{i} 〉}_{i})}_{(r, l) = (1, 1), \dots, (m_{}, 1), \dots, (m_{}, m_{j})}$

${\hat{Θ}}_{j} = {{\hat{θ}}_{j}^{(r, l)}}_{r, l} \leftarrow$ Solve(

$(Ψ_{j} + R_{j}) vec (Θ) = ψ_{j}$ )

end

$\hat{ȷ} \leftarrow \underset{_{j \in {1, \dots, J}}}{argmin} \sum_{i = 1}^{n} ‖ ϵ_{i} - \sum_{r, l} {\hat{θ}}_{j}^{(r, l)} b_{j}^{(l)} (x) \partial_{r} ‖_{i}^{2}$ ;

# Select base-learner

$Θ_{\hat{ȷ}} \leftarrow Θ_{\hat{ȷ}} + η {\hat{Θ}}_{\hat{ȷ}}$ ; # Update selected model coefficients

until Stopping criterion (e.g., minimal cross-validation error)

The pole $[p]$ is, in fact, usually not a priori available. Instead we typically assume $[p] = \underset{_{q \in Y^{*}}}{argmin} E (d^{2} ([Y], [q]))$ is the overall Fréchet mean, also often referred to as Riemannian center of mass for Riemannian manifolds or as Procrustes mean in shape analysis (Dryden and Mardia Citation2016). Here, we estimate it as $[p] = {Exp}_{[p_{0}]} (h_{0})$ in a preceding Riemannian L²-Boosting routine. The constant effect $h_{0} \in T_{[p_{0}]} Y_{/ G}^{*}$ in the intercept-only special case of our model is estimated with Algorithm 1 based on a preliminary pole $[p_{0}] \in Y_{/ G}^{*}$ . For shapes and forms, a good candidate for p₀ can be obtained as the standard functional mean of a reasonably well aligned sample $y_{1}, \dots, y_{n} \in Y$ of representatives.

The proposed Riemannian L₂-Boosting algorithm is available in the R (R Core Team Citation2018) package manifoldboost (github.com/Almond-S/manifoldboost). The implementation is based on the package FDboost (Brockhaus, Rügamer, and Greven Citation2020), which is in turn based on the model-based boosting package mboost (Hothorn et al. Citation2010).

5 Applications and Simulation

5.1 Shape Differences in Astragali of Wild and Domesticated Sheep

In a geometric morphometric study, Pöllath, Schafberg, and Peters (Citation2019) investigate shapes of sheep astragali (ankle bones) to understand the influence of different living conditions on the micromorphology of the skeleton. Based on a total of n = 163 shapes recorded by Pöllath, Schafberg, and Peters (Citation2019), we model the astragalus shape in dependence on different variables, including domestication status (wild/feral/domesticated), sex (female/male/NA), age (juvenile/subadult/adult/NA), and mobility (confined/pastured/free) of the animals as categorical covariates. The sample comprises sheep of four different populations: Asiatic wild sheep (Field Museum, Chicago; Lay Citation1967; Zeder Citation2006), feral Soay sheep (British Natural History Museum, London; Clutton-Brock et al. Citation1990), and domestic sheep of the Karakul and Marsch breed (Museum of Livestock Sciences, Halle (Saale); Schafberg and Wussow Citation2010). Table S1 in Section S.3, supplementary materials shows the distribution of available covariates within the populations. Each sheep astragalus shape, $i = 1, \dots, n$ , is represented by a configuration composed of 11 selected landmarks in a vector $y_{i}^{lm} \in C^{11}$ and two vectors of sliding semi-landmarks $y_{i}^{c 1} \in C^{14}$ and $y_{i}^{c 2} \in C^{18}$ evaluated along two outline curve segments, marked on a 2D image of the bone (dorsal view). Several example configurations are displayed in Figure S1, supplementary materials. In general, we could separately specify smooth function bases for the outline segments $y_{i}^{c 1}$ and $y_{i}^{c 2}$ , respectively. Due to their systematic recording, we assume, however, that not only landmarks but also semi-landmarks are regularly observed on a fixed grid, and refrain from using smooth function bases for simplicity. Accordingly, shape configurations can directly be identified with their evaluation vectors $y_{i} = {(y_{i}^{lm ⊤}, y_{i}^{c 1 ⊤}, y_{i}^{c 2 ⊤})}^{⊤} \in C^{43} = Y$ , and the geometry of the response space $Y_{/ Trl \times Rot \times Scl}^{*}$ widely corresponds to the classic Kendall’s shape space geometry, with the difference that, considering landmarks more descriptive than single semi-landmarks, we choose a weighted inner product $〈 y_{i}, y_{i}^{'} 〉 = y_{i}^{†} W y_{i}^{'}$ with diagonal weight matrix W with diagonal ${(1_{11}^{⊤}, \frac{3}{14} 1_{14}^{⊤}, \frac{3}{18} 1_{18}^{⊤})}^{⊤}$ assigning the weight of three landmarks to each outline segment. We model the astragalus shapes $[y_{i}] \in Y_{/ Trl \times Rot \times Scl}^{*}$ as $[μ_{i}] = {Exp}_{[p]} (β_{{status}_{i}} + β_{{pop}_{i}} + β_{{age}_{i}} + β_{{sex}_{i}} + β_{{mobility}_{i}})$ with the pole $[p] \in Y_{/ G}^{*}$ specified as overall mean and the conditional mean $[μ_{i}] \in Y_{/ Trl \times Rot \times Scl}^{*}$ depending on the effect coded covariate effects $x_{i j} \mapsto β_{x_{i j}} \in T_{[p]} Y_{/ Trl \times Rot \times Scl}^{*}$ . For identifiability, the population and mobility effects are centered around the status effect, as we only have data on different populations/mobility levels for domesticated sheep. All base-learners are regularized to one degree of freedom by employing ridge penalties for the coefficients of the covariate bases ${b_{j}^{(l)}}_{l}$ while the coefficients of the response basis (the standard basis for $C^{43})$ are left un-penalized. With a step-length of $η = 0.1$ , 10-fold shape-wise cross-validation suggests early stopping after 89 boosting iterations. Due to the regular design, we can make use of the functional linear array model (Brockhaus, Scheipl, and Greven Citation2015) for saving computation time and memory, which lead to 8 sec of initial model fit followed by 47 sec of cross-validation. To interpret the categorical covariate effects, we rely on TP factorization (). The first component of the status effect explains about 2/3 of the variance of the status effect and over 50% of the cumulative effect variance in the model. In that main direction, the effect of feral is not located between wild and domestic, as might be naively expected. By contrast, the second component of the effect seems to reflect the expected order and still explains a considerable amount of variance. Similar to Pöllath, Schafberg, and Peters (Citation2019), we find little influence of age, sex, and mobility on the astragalus shape. Yet, all covariates were selected by the boosting algorithm.

Fig. 2 Left: Shares of different factorized covariate effects in the total predictor variance. Right: Factorized effect plots showing the two components of the status effect (rows): in the right column, the two first directions $ξ_{1}^{(1)}, ξ_{1}^{(2)} \in T_{[p]} Y_{/ Trl + Rot + Scl}^{*}$ are visualized via line-segments originating at the overall mean shape (empty circles) and ending in the shape resulting from moving 1 unit into the target direction (solid circles; large: landmarks; small: semi-landmarks along the outline); in the left column, the status effect in the respective direction is depicted. As illustrated in the middle plot, an effect of 1 would correspond to the full extend of the direction shown to the right.

Visually, differences in estimated mean shapes are rather small, which is, in our experience, quite usual for shape data. With differences in size, rotation and translation excluded by definition, only comparably small variance remains in the observed shapes. Nonetheless, TP factorization provides accessible visualization of the effect directions and allows to partially order the effect levels in each direction.

5.2 Cellular Potts Model Parameter Effects on Cell Form

The stochastic biophysical model proposed by Thüroff et al. (Citation2019), a Cellular Potts Model (CPM), simulates migration dynamics of cells (e.g., wound healing or metastasis) in two dimensions. The progression of simulated cells is the result of many consecutive local elementary events sampled with a Metropolis-algorithm according to a Hamiltonian. Different parameters controlling the Hamiltonian have to be calibrated to match real live cell properties (Schaffer Citation2021). Considering whole cells, parameter implications on the cell form are not obvious. To provide additional insights, we model the cell form in dependence on four CPM parameters considered particularly relevant: the bulk stiffness $x_{i 1}$ , membrane stiffness $x_{i 2}$ , substrate adhesion $x_{i 3}$ , and signaling radius $x_{i 4}$ are subsumed in a vector $x_{i}$ of metric covariates for $i = 1, \dots, n$ . Corresponding sampled cell outlines y_i were provided by Sophia Schaffer in the context of Schaffer (Citation2021), who ran underlying CPM simulations and extracted outlines. Deriving the intrinsic orientation of the cells from their movement trajectories, we parameterize $y_{i} : [0, 1] \to C$ , clockwisely relative to arc-length such that $y_{i} (0) = y_{i} (1)$ points into the movement direction of the barycenter of the cell. With an average of $k = \frac{1}{n} \sum_{i = 1}^{n} k_{i} \approx 43$ samples per curve (after sub-sampling preserving 95% of their inherent variation, as described in Volkmann et al. Citation2021, supplement), the evaluation vectors $y_{i} \in C^{k_{i}}$ are equipped with an inner-product implementing trapezoidal rule integration weights. Example cell outlines are depicted in Figure S4, supplementary materials. The results shown below are based on cell samples obtained from 30 different CPM parameter configurations. For each configuration, 33 out of 10.000 Monte Carlo samples were extracted as approximately independent. This yields a dataset of $n = 990 = 30 \times 33$ cell outlines.

As positioning of the irregularly sampled cell outlines y_i , $i = 1, \dots, n$ , in the coordinate system is arbitrary, we model the cell forms $[y_{i}] \in Y_{/ Trl + Rot}^{*}$ . Their estimated overall form mean $[p]$ serves as pole in the additive model $\begin{matrix} [μ_{i}] = {Exp}_{[p]} (h (x_{i})) \\ = {Exp}_{[p]} (\sum_{j} β_{j} x_{i j} + \sum_{j} f_{j} (x_{i j}) + \sum_{j \neq \ddot{ȷ}} f_{j \ddot{ȷ}} (x_{i j}, x_{i \ddot{ȷ}})) \end{matrix}$ where the conditional form mean $[μ_{i}]$ is modeled in dependence on tangent-space linear effects with coefficients $β_{j} \in T_{[p]} Y_{/ Trl \times Rot}$ and nonlinear smooth effects f_j for covariate $j = 1, \dots, 4$ , as well as smooth interaction effects $f_{j \ddot{ȷ}}$ for each pair of covariates $j \neq \ddot{ȷ}$ . All involved (effect) functions are modeled via a cyclic cubic P-spline basis ${b_{0}^{(r)}}_{r}$ with 7 (inner) knots and a ridge penalty, and quadratic P-splines with 4 knots for the covariates x_ij equipped with a second-order difference penalty for the f_j and ridge penalties for interactions. Covariate effects are mean centered and interaction effects $f_{j \ddot{ȷ}} (x_{j}, x_{\ddot{ȷ}})$ are centered around their marginal effects $f_{j} (x_{j}), f_{\ddot{ȷ}} (x_{\ddot{ȷ}})$ , which are in turn centered around the linear effects $β_{j} x_{j}$ and $β_{\ddot{ȷ}} x_{\ddot{ȷ}}$ , respectively. Resulting predictor terms involve 69 (linear effect) to 1173 (interaction) basis coefficients but are penalized to a common degree of freedom of 2 to ensure a fair base-learner selection. We fit the model with a step-size of $η = 0.25$ and stop after 2000 boosting iterations observing no further meaningful risk reduction, since no need for early-stopping is indicated by 10-fold form-wise cross-validation. Due to the increased number of data points and coefficients, the irregular design, and the increased number of iterations, the model fit takes considerably longer than in Section 5.1, with about 50 initial minutes followed by 8 hr of cross-validation. However, as usual in boosting, model updates are large in the beginning and only marginal in later iterations, such that fits after 1000 or 500 iterations would already yield very similar results.

Observing that the most relevant components point into similar directions, we jointly factorize the predictor as $\hat{h} (x_{i}) = \sum_{r} ξ^{(r)} {\hat{h}}^{(r)} (x_{i})$ with TP factorization. The first component explains about 93% of the total predictor variance (Figure S3, supplementary materials), indicating that, post-hoc, a good share of the model can be reduced to the geodesic model $[{\hat{μ}}_{i}] = {Exp}_{[p]} (ξ^{(1)} {\hat{h}}^{(1)} (x_{i}))$ illustrated in . A positive effect in the direction $ξ^{(1)}$ makes cells larger and more keratocyte/croissant shaped, a negative effect—pointing into the opposite direction—makes them smaller and more mesenchymal shaped/elongated. The bulk stiffness $x_{i 1}$ turns out to present the most important driving factor behind the cell form, explaining over 75% of the cumulative variance of the effects (Figure S2, supplementary materials). Around 80% of its effect are explained by the linear term reflecting gradual shrinkage at the side of the cells with increasing bulk stiffness.

Fig. 3 Center: the main direction $ξ^{(1)}$ of the model illustrated as vectors pointing from the overall mean cell form $[p]$ (gray curve) to the form ${Exp}_{[p]} (ξ^{(1)})$ (filled dots), which are both oriented as cells migrating rightwards. Left: Effects of the bulk stiffness $x_{i 1}$ into the direction $ξ^{(1)}$ . A vertical line from 0, corresponding to $[p]$ , to 1, corresponding to the full extent of $ξ^{(1)}$ , underlines the connection between the plots and helps to visually asses the amount of change for a given value of $x_{i 1}$ . Right: The overall effect of $x_{i 1}$ and membrane stiffness $x_{i 2}$ , comprising linear, smooth and interaction effects, as a 3D surface plot. The heat map plotted on the surface shows only the interaction effect $f_{12}^{(1)} (x_{i 1}, x_{i 2})$ illustrating deviations from the marginal effects, which are of particular interest for CPM calibration.

5.3 Realistic Shape and form Simulation Studies

To evaluate the proposed approach, we conduct simulation studies for both shape and form regression for irregular curves. We compare sample sizes $n \in {54, 162}$ and average grid sizes $k = \frac{1}{n} \sum_{i = 1}^{n} k_{i} \in {40, 100}$ as well as an extreme case with k_i = 3 for each curve but n = 720, that is, where only random triangles are observed (yet, with known parameterization over $[0, 1]$ ). We additionally investigate the influence of nuisance effects and compare different inner product weights. While important results are summarized in the following, comprehensive visualizations can be found in Section S.5, supplementary materials.

Simulation design: We simulate models of the form $[μ] = {Exp}_{[p]} (β_{κ} + f_{1} (z_{1}))$ with overall mean $[p]$ , a binary effect with levels $κ \in {0, 1}$ and a smooth effect of $z_{1} \in [- 60, 60]$ . We choose a cyclic cubic B-spline basis with 27 knots for $T_{[p]} Y_{/ G}^{*}$ , placing them irregularly at 1/27-quantiles of unit-speed parameterization time-points of the curves. Cubic B-splines with four regularly placed knots are used for covariates in smooth effects. True models are based on the bot dataset from R package Momocs (Bonhomme et al. Citation2014) comprising outlines of 20 beer (κ = 0) and 20 whiskey (κ = 1) bottles of different brands. A smooth effect is induced by the 2D viewing transformations resulting from tilting the planar outlines in a 3D coordinate system along their longitudinal axis by an angle of up to 60 degree toward the viewer ( $z_{1} = 60$ ) and away ( $z_{1} = - 60$ ) (i.e., in a way not captured by 2D rotation invariance). Establishing ground truth models based on a fit to the bottle data, we simulate new responses $[y_{1}], \dots, [y_{n}]$ via residual resampling (Section S.5, supplementary materials) to preserve realistic autocorrelation. Subsequently, we randomly translate, rotate and scale $y_{1}, \dots, y_{n} \in Y$ somewhat around the aligned shape/form representatives to obtain realistic samples.

The implied residual variance $\frac{1}{n} \sum_{i = 1}^{n} ‖ ϵ_{i} ‖_{i}^{2} = \frac{1}{n} \sum_{i = 1}^{n} d_{i}^{2} ([y_{i}], [μ_{i}])$ on simulated datasets ranges around 105% of the predictor variance $\frac{1}{n} \sum_{i = 1}^{n} ‖ h (x_{i}) ‖_{i}^{2} = \frac{1}{n} \sum_{i = 1}^{n} d_{i}^{2} ([μ_{i}], [p])$ in the form scenario and around 65% in the shape scenario. All simulations were repeated 100 times, fitting models with the model terms specified above and three additional nuisance effects: a linear effect $β z_{1}$ (orthogonal to $f_{1} (z_{1})$ ), an effect f₂ of the same structure as f₁ but depending on an independently uniformly drawn variable z₂, and a constant effect $h_{0} \in T_{[p]} Y_{/ G}^{*}$ to test centering around $[p]$ . Base-learners are regularized to 4 degrees of freedom (step-length $η = 0.1$ ). Early-stopping is based on 10-fold cross-validation.

Form scenario: In the form scenario, the smooth covariate effect f₁ offers a particularly clear interpretation. TP factorization decomposes the true effect into its two relevant components, where the first (major) component corresponds to the bare projection of the tilted outline in 3D into the 2D image plane and the second to additional perspective transformations (). For this effect, we observe a median relative mean squared error $rMSE ({\hat{h}}_{j}) = \sum_{i = 1}^{n} ‖ {\hat{h}}_{j} (x_{i}) - h_{j} (x_{i}) ‖_{i}^{2} / \sum_{i = 1}^{n} ‖ h (x_{i}) ‖_{i}^{2}$ of about 3.7% of the total predictor variance for small data settings with n = 54 and $k = 100$ (5.9% with k = 40), which reduces to 1.5% for n = 162 (for both k = 40 and k = 100). It is typical for functional data that, from a certain point, adding more (highly correlated) evaluations per curve leads to distinctly less improvement in the model fit than adding further observations (compare, e.g., also Stöcker et al. Citation2021). In the extreme k_i = 3 scenario, we obtain an rMSE of around 15%, which is not surprisingly considerably higher than for the moderate settings above. Even in this extreme setting (), the effect directions are captured well, while the size of the effect is underestimated. Rotation alignment based on only three points (which are randomly distributed along the curves) might considerably differ from the full curve alignment, and averaging over these sub-optimal alignments masks the full extend of the effect. Still, results are very good given the sparsity of information in this case. Having a simpler form, the binary effect $β_{κ}$ is also estimated more accurately with an rMSE of around 1.5% for n = 54, k = 100 (1.9% for k = 40) and less than 0.8% for n = 162 (for both k = 40 and k = 100). The pole estimation accuracy varies on a similar scale.

Fig. 4 Left: First (row 1) and second (row 2) main components of the smooth effect $f_{1} (z_{1})$ in the form scenario obtained from TP factorization. Normalized component directions are visualized as bottle outlines after transporting them to the true pole (gray solid outline). Underlying truth (solid lines/shaded areas) are plotted together with five example estimates for n = 162 and k = 100 (black solid lines) and the extremely sparse k_i = 3 setting (gray dashed lines). Center: Conditional means for both bottle types with fixed metric covariate $z_{1} = 0$ in the shape scenario with n = 54 and k = 40. Five example estimates (black solid outlines) are plotted in front of the underlying truth (shaded areas). Right: rMSE of shown example estimates (jittered diamonds) contextualized with boxplots of rMSE distributions observed in respective simulation scenarios.

Shape scenario: Qualitatively, the shape scenario shows a similar picture. For k = 40, we observe median rMSEs of 2.8% (n = 54) and 2.2% (n = 162) for $f_{1} (z_{1})$ , and 1.5% and 0.6% for the binary effect $β_{κ}$ . For k = 100, accuracy is again slightly higher.

Nuisance effects and integration weights: Nuisance effects in the model where generally rarely selected and, if selected at all, only lead to a marginal loss in accuracy. The constant effect is only selected sometimes in the extreme triangle scenarios, when pole estimation is difficult. We refer to Brockhaus et al. (Citation2017), who perform gradient boosting with functional responses and a large number of covariate effects with stability selection, for simulations with larger numbers of nuisance effects and further discussion in a related context, as variable selection is not our main focus here. Finally, simulations indicate that inner product weights implementing a trapezoidal rule for numerical integration are slightly preferable for typical grid sizes ( $k = 40, 100$ ), whereas weights of $1 / k_{i}$ equal over all grid points within a curve gave slightly better results in the extreme k_i = 3 settings.

All in all, the simulations show that Riemannian L₂-Boosting can adequately fit both shape and form models in a realistic scenario and captures effects reasonably well even for a comparably small number of sampled outlines or evaluations per outline.

6 Discussion and Outlook

Compared to existing (landmark) shape regression models, the presented approach extends linear predictors to more general additive predictors including also, for example, smooth nonlinear model terms and interactions, and yields the first regression approach for functional shape as well as form responses. Moreover, we propose novel visualizations based on TP factorization that, similar to FPC analysis, enable a systematic decomposition of the variability explained by an additive effect on tangent space level. Yielding meaningful coordinates for model effects, its potential for visualization will be useful also for FAMs in linear spaces and also beyond our model framework, such as we exemplarily illustrate for the nonparametric approach of Jeon and Park (Citation2020) in Section S.8, supplementary materials.

Instead of operating on the original evaluations $y_{i} \in C^{k_{i}}$ of response curves y_i as in all applications above, another frequently used approach expands y_i , $i = 1, \dots, n$ , in a common basis first, before carrying out statistical analysis on coefficient vectors (compare Ramsay and Silverman (Citation2005), Morris (Citation2015), and Müller and Yao (Citation2008) for smoothing spline, wavelet or FPC representations in FDA or Bonhomme et al. (Citation2014) in shape analysis). Shape/form regression on the coefficients is, in fact, a special case of our approach, where the inner product is evaluated on the coefficients instead of evaluations (Section S.6, supplementary materials).

The proposed model is motivated by geodesic regression. However, in the multiple linear predictor, a linear effect of a single covariate does, in general, not describe a geodesic for fixed nonzero values of other covariate effects. Or put differently, ${Exp}_{[p]} (h_{1} + h_{2}) \neq {Exp}_{{Exp}_{[p]} (h_{1})} (h_{2}) \neq {Exp}_{{Exp}_{[p]} (h_{2})} (h_{1})$ in general. Thus, hierarchical geodesic effects of the form ${Exp}_{{Exp}_{[p]} (h_{1})} (h_{2})$ , relevant, i.a., in mixed models for hierarchical/longitudinal study designs (Kim et al. Citation2017), present an interesting future extension of our model. Moreover, an “elastic” extension based on the square-root-velocity framework (Srivastava and Klassen Citation2016) presents a promising direction for future research, as do other manifold responses.

Supplemental material

Supplemental Material

Download Zip (226 MB)

Acknowledgments

We sincerely thank Nadja Pöllath for providing carefully recorded sheep astragalus data and important insights and comments, and Sophia Schaffer for running and discussing cell simulations and providing fully processed cell outlines.

Supplementary Materials

Supplementary material with further details is provided in an online supplement.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

We gratefully acknowledge funding by grant GR 3793/3-1 from the German research foundation (DFG) and support by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

References

Adams, D., Rohlf, F., and Slice, D. (2013), “A Field Comes of Age: Geometric Morphometrics in the 21st Century,” Hystrix, the Italian Journal of Mammalogy, 24, 7–14.
Web of Science ®Google Scholar
Backenroth, D., Goldsmith, J., Harran, M. D., Cortes, J. C., Krakauer, J. W., and Kitago, T. (2018), “Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis,” Journal of the American Statistical Association, 113, 1003–1015.
PubMed Web of Science ®Google Scholar
Baranyi, P., Yam, Y., and Várlaki, P. (2013), Tensor Product Model Transformation in Polytopic Model-based Control, Boca Raton, FL: CRC Press.
Google Scholar
Bonhomme, V., Picq, S., Gaucherel, C., and Claude, J. (2014), “Momocs: Outline Analysis using R,” Journal of Statistical Software, 56, 1–24.
Web of Science ®Google Scholar
Brockhaus, S., Melcher, M., Leisch, F., and Greven, S. (2017), “Boosting Flexible Functional Regression Models with a High Number of Functional Historical Effects,” Statistics and Computing, 27, 913–926.
Web of Science ®Google Scholar
Brockhaus, S., Rügamer, D., and Greven, S. (2020), “Boosting Functional Regression Models with FDboost,” Journal of Statistical Software, 94, 1–50.
Web of Science ®Google Scholar
Brockhaus, S., Scheipl, F., and Greven, S. (2015), “The Functional Linear Array Model,” Statistical Modelling, 15, 279–300.
Web of Science ®Google Scholar
Bühlmann, P., and Yu, B. (2003), “Boosting with the L2 Loss: Regression and Classification,” Journal of the American Statistical Association, 98, 324–339.
Web of Science ®Google Scholar
Clutton-Brock, J., Dennis-Bryan, K., Armitage, P. L., and Jewell, P. A. (1990), “Osteology of the Soay Sheep,” Bulletin of the British Museum (Natural History), 56, 1–56.
Google Scholar
Cornea, E., Zhu, H., Kim, P., Ibrahim, J. G., and the Alzheimer’s Disease Neuroimaging Initiative. (2017), “Regression Models on Riemannian Symmetric Spaces,” Journal of the Royal Statistical Society, Series B, 79, 463–482.
Google Scholar
Davis, B. C., Fletcher, P. T., Bullitt, E., and Joshi, S. (2010), “Population Shape Regression from Random Design Data,” International Journal of Computer Vision, 90, 255–266.
Web of Science ®Google Scholar
Dryden, I. L., and Mardia, K. V. (2016), Statistical Shape Analysis: With Applications in R, Chichjester: Wiley.
Google Scholar
Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2011), “Recent Advances on Functional Additive Regression,” in Recent Advances in Functional Data Analysis and Related Topics, ed. F. Ferraty, pp. 97–102, Heidelberg: Springer.
Google Scholar
Fletcher, P. T. (2013), “Geodesic Regression and the Theory of Least Squares on Riemannian Manifolds,” International Journal of Computer Vision, 105, 171–185.
Web of Science ®Google Scholar
Greven, S., and Scheipl, F. (2017), “A General Framework for Functional Regression Modelling,” (with discussion and rejoinder), Statistical Modelling, 17(1–2), 1–35 and 100–115.
Web of Science ®Google Scholar
Happ, C., and Greven, S. (2018), “Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains,” Journal of the American Statistical Association, 113, 649–659.
Web of Science ®Google Scholar
Hinkle, J., Fletcher, P. T., and Joshi, S. (2014), “Intrinsic Polynomials for Regression on Riemannian Manifolds,” Journal of Mathematical Imaging and Vision, 50, 32–52.
Web of Science ®Google Scholar
Hofner, B., Hothorn, T., Kneib, T., and Schmid, M. (2011), “A Framework for Unbiased Model Selection Based on Boosting,” Journal of Computational and Graphical Statistics, 20, 956–971.
Web of Science ®Google Scholar
Hofner, B., Kneib, T., and Hothorn, T. (2016), “A Unified Framework of Constrained Regression,” Statistics and Computing, 26, 1–14.
Web of Science ®Google Scholar
Hong, Y., Singh, N., Kwitt, R., and Niethammer, M. (2014), “Time-Warped Geodesic Regression,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 105–112, Springer.
Google Scholar
Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., and Hofner, B. (2010), “Model-based Boosting 2.0,” Journal of Machine Learning Research, 11, 2109–2113.
Web of Science ®Google Scholar
Huckemann, S., Hotz, T., and Munk, A. (2010), “Intrinsic MANOVA for Riemannian Manifolds with an Application to Kendall’s Space of Planar Shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 593–603.
PubMed Web of Science ®Google Scholar
Jeon, J. M., and Park, B. U. (2020), “Additive Regression with Hilbertian Responses,” The Annals of Statistics, 48, 2671–2697.
Web of Science ®Google Scholar
Jeon, J. M., Lee, Y. K., Mammen, E., and Park, B. U. (2022), “Locally Polynomial Hilbertian Additive Regression,” Bernoulli, 28, 2034–2066.
Web of Science ®Google Scholar
Jeon, J. M., Park, B. U., and Van Keilegom, I. (2021), “Additive Regression for Non-Euclidean Responses and Predictors,” The Annals of Statistics, 49, 2611–2641.
Web of Science ®Google Scholar
Jupp, P. E., and Kent, J. T. (1987), “Fitting Smooth Paths to Spherical Data,” Journal of the Royal Statistical Society, Series C, 36, 34–46.
Web of Science ®Google Scholar
Karcher, H. (1977), “Riemannian Center of Mass and Mollifier Smoothing,” Communications on Pure and Applied Mathematics, 30, 509–541.
Web of Science ®Google Scholar
Kendall, D. G., Barden, D., Carne, T. K., and Le, H. (1999), Shape and Shape Theory (Vol. 500), Chichester: Wiley.
Google Scholar
Kent, J. T., Mardia, K. V., Morris, R. J., and Aykroyd, R. G. (2001), “Functional Models of Growth for Landmark Data,” in Proceedings in Functional and Spatial Data Analysis, 109115.
Google Scholar
Kim, H. J., Adluru, N., Collins, M. D., Chung, M. K., Bendlin, B. B., Johnson, S. C., Davidson, R. J., and Singh, V. (2014), “Multivariate General Linear Models (mglm) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2705–2712.
Google Scholar
Kim, H. J., Adluru, N., Suri, H., Vemuri, B. C., Johnson, S. C., and Singh, V. (2017), “Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5777–5786.
Google Scholar
Klingenberg, W. (1995), Riemannian Geometry, Berlin: de Gruyter.
Google Scholar
Kneib, T., Hothorn, T., and Tutz, G. (2009), “Variable Selection and Model Choice in Geoadditive Regression Models,” Biometrics, 65, 626–634. DOI: 10.1111/j.1541-0420.2008.01112.x.
PubMed Web of Science ®Google Scholar
Kume, A., Dryden, I. L., and Le, H. (2007), “Shape-Space Smoothing Splines for Planar Landmark Data,” Biometrika, 94, 513–528.
Web of Science ®Google Scholar
Lay, D. M. (1967), “A Study of the Mammals of Iran: Resulting from the Street Expedition of 1962-63,” in Fieldiana: Zoology 54. Field Museum of Natural History.
Google Scholar
Li, Y., and Ruppert, D. (2008), “On the Asymptotics of Penalized Splines,” Biometrika, 95, 415–436.
Web of Science ®Google Scholar
Lin, L., St. Thomas, B., Zhu, H., and Dunson, D. B. (2017), “Extrinsic Local Regression on Manifold-Valued Data,” Journal of the American Statistical Association, 112, 1261–1273. DOI: 10.1080/01621459.2016.1208615.
PubMed Web of Science ®Google Scholar
Lin, Z., Müller, H.-G., and Park, B. U. (2020), “Additive Models for Symmetric Positive-Definite Matrices, Riemannian Manifolds and Lie Groups,” arXiv preprint arXiv:2009.08789.
Google Scholar
Lutz, R. W., and Bühlmann, P. (2006), “Boosting for High-Multivariate Responses in High-Dimensional Linear Regression,” Statistica Sinica, 16, 471–494.
Web of Science ®Google Scholar
Mallasto, A., and Feragen, A. (2018), “Wrapped Gaussian Process Regression on Riemannian Manifolds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5580–5588.
Google Scholar
Mayr, A., Binder, H., Gefeller, O., and Schmid, M. (2014), “The Evolution of Boosting Algorithms,” Methods of Information in Medicine, 53, 419–427. DOI: 10.3414/ME13-01-0122.
PubMed Web of Science ®Google Scholar
Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P., and Morris, J. S. (2015), “Bayesian Function-on-Function Regression for Multilevel Functional Data,” Biometrics, 71, 563–574. DOI: 10.1111/biom.12299.
PubMed Web of Science ®Google Scholar
Morris, J. S. (2015), “Functional Regression,” Annual Review of Statistics and its Applications, 2, 321–359.
Web of Science ®Google Scholar
Morris, J. S., and Carroll, R. J. (2006), “Wavelet-based Functional Mixed Models,” Journal of the Royal Statistical Society, Series B, 68, 179–199. DOI: 10.1111/j.1467-9868.2006.00539.x.
Google Scholar
Müller, H.-G., and Yao, F. (2008), “Functional Additive Models,” Journal of the American Statistical Association, 103, 1534–1544.
Google Scholar
Muralidharan, P., and Fletcher, P. T. (2012), “Sasaki Metrics for Analysis of Longitudinal Data on Manifolds,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1027–1034, IEEE.
Google Scholar
Olsen, N. L., Markussen, B., and Raket, L. L. (2018), “Simultaneous Inference for Misaligned Multivariate Functional Data,” Journal of the Royal Statistical Society, Series C, 67, 1147–1176.
Google Scholar
Pennec, X. (2006), “Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements,” Journal of Mathematical Imaging and Vision, 25, 127–154.
Web of Science ®Google Scholar
Petersen, A., and Müller, H.-G. (2019), “Fréchet Regression for Random Objects with Euclidean Predictors,” The Annals of Statistics, 47, 691–719.
Web of Science ®Google Scholar
Pigoli, D., Menafoglio, A., and Secchi, P. (2016), “Kriging Prediction for Manifold-Valued Random Fields,” Journal of Multivariate Analysis, 145, 117–131.
Web of Science ®Google Scholar
Pöllath, N., Schafberg, R., and Peters, J. (2019), “Astragalar Morphology: Approaching the Cultural Trajectories of Wild and Domestic Sheep Applying Geometric Morphometrics,” Journal of Archaeological Science: Reports, 23, 810–821.
Google Scholar
R Core Team (2018), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis, New York: Springer.
Google Scholar
Rosen, O., and Thompson, W. K. (2009), “A Bayesian Regression Model for Multivariate Functional Data,” Computational Statistics & Data Analysis, 53, 3773–3786. DOI: 10.1016/j.csda.2009.03.026.
PubMed Web of Science ®Google Scholar
Schafberg, R., and Wussow, J. (2010), “Julius Kühn. Das Lebenswerk eines agrarwissenschaftlichen Visionärs,” Züchtungskunde, 82, 468–484.
Google Scholar
Schaffer, S. A. (2021), “Cytoskeletal Dynamics in Confined Cell Migration: Experiment and Modelling,” PhD thesis, LMU Munich. DOI: 10.5282/edoc.28480.
Google Scholar
Scheipl, F., Staicu, A.-M., and Greven, S. (2015), “Functional Additive Mixed Models,” Journal of Computational and Graphical Statistics, 24, 477–501. DOI: 10.1080/10618600.2014.901914.
PubMed Web of Science ®Google Scholar
Schiratti, J.-B., Allassonnière, S., Colliot, O., and Durrleman, S. (2017), “A Bayesian Mixed-Effects Model to Learn Trajectories of Changes from Repeated Manifold-Valued Observations,” The Journal of Machine Learning Research, 18, 4840–4872.
Web of Science ®Google Scholar
Shi, X., Styner, M., Lieberman, J., Ibrahim, J. G., Lin, W., and Zhu, H. (2009), “Intrinsic Regression Models for Manifold-Valued Data,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 192–199, Springer.
Google Scholar
Srivastava, A., and Klassen, E. P. (2016), Functional and Shape Data Analysis, New York: Springer-Verlag.
Google Scholar
Stöcker, A., Brockhaus, S., Schaffer, S. A., Bronk, B. v., Opitz, M., and Greven, S. (2021), “Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition,” Statistical Modelling, 21, 385–404.
Web of Science ®Google Scholar
Stöcker, A., Pfeuffer, M., Steyer, L., and Greven, S. (2022), “Elastic Full Procrustes Analysis of Plane Curves via Hermitian Covariance Smoothing.” DOI: 10.48550/arXiv.2203.10522.
Google Scholar
Thüroff, F., Goychuk, A., Reiter, M., and Frey, E. (2019), “Bridging the Gap between Single-Cell Migration and Collective Dynamics,” eLife, 8, e46842.
PubMed Web of Science ®Google Scholar
Volkmann, A., Stöcker, A., Scheipl, F., and Greven, S. (2021), “Multivariate Functional Additive Mixed Models,” Statistical Modelling. DOI: 10.1177/1471082X211056158.
Web of Science ®Google Scholar
Wood, S. N., Pya, N., and Säfken, B. (2016), “Smoothing Parameter and Model Selection for General Smooth Models,” Journal of the American Statistical Association, 111, 1548–1563.
Web of Science ®Google Scholar
Yao, F., Müller, H., and Wang, J. (2005), “Functional Data Analysis for Sparse Longitudinal Data,” Journal of the American Statistical Association, 100, 577–590.
Web of Science ®Google Scholar
Zeder, M. A. (2006), “Reconciling Rates of Long Bone Fusion and Tooth Eruption and Wear in Sheep (Ovis) and Goat (Capra),” Recent Advances in Ageing and Sexing Animal Bones, 9, 87–118.
Google Scholar
Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C., and Lin, W. (2009), “Intrinsic Regression Models for Positive-Definite Matrices with Applications to Diffusion Tensor Imaging,” Journal of the American Statistical Association, 104, 1203–1212.
PubMed Web of Science ®Google Scholar
Zhu, H., Li, R., and Kong, L. (2012), “Multivariate Varying Coefficient Model for Functional Responses,” Annals of Statistics, 40, 2634–2666.
PubMed Web of Science ®Google Scholar
Zhu, H., Morris, J. S., Wei, F., and Cox, D. D. (2017), “Multivariate Functional Response Regression, with Application to Fluorescence Spectroscopy in a Cervical Pre-cancer Study,” Computational Statistics and Data Analysis, 111, 88–101.
PubMed Web of Science ®Google Scholar

Functional Additive Models on Manifolds of Planar Shapes and Forms

Abstract

1 Introduction

2 Geometry of Functional Shapes and Forms