Abstract
In this article, we establish the mathematical foundations for modeling the randomness of shapes and conducting statistical inference on shapes using the smooth Euler characteristic transform. Based on these foundations, we propose two Chi-squared statistic-based algorithms for testing hypotheses on random shapes. Simulation studies are presented to validate our mathematical derivations and to compare our algorithms with state-of-the-art methods to demonstrate the utility of our proposed framework. As real applications, we analyze a dataset of mandibular molars from four genera of primates and show that our algorithms have the power to detect significant shape differences that recapitulate known morphological variation across suborders. Altogether, our discussions bridge the following fields: algebraic and computational topology, probability theory and stochastic processes, Sobolev spaces and functional analysis, analysis of variance for functional data, and geometric morphometrics. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
1 Introduction
The quantification of shapes has become an important research direction. It has brought advances to many fields including geometric morphometrics (Boyer et al. Citation2011; Gao, Kovalsky, and Daubechies Citation2019; Gao et al. Citation2019), biophysics and structural biology (Wang et al. Citation2021; Tang et al. Citation2022), and radiogenomics (Crawford et al. Citation2020). When shapes are considered as random variables, their corresponding quantitative summaries are also random, implying that such summaries of random shapes are statistics. The statistical inference on shapes based on these quantitative summaries has been of particular interest (Fasy et al. Citation2014; Roycraft, Krebs, and Polonik Citation2023).
In this article, we bring together mathematical and statistical approaches to make three significant contributions to shape statistics: (i) we provide mathematical foundations for the randomness of shapes encountered in applications, bridging algebraic topology (Hatcher Citation2002) and stochastic processes (Hairer Citation2009); (ii) we connect the statistical inference on shape-valued data to the well-studied analysis of variance for functional data (fdANOVA, Zhang Citation2013), bridging topological data analysis (TDA, Edelsbrunner and Harer Citation2010) and functional data analysis (FDA, Hsing and Eubank Citation2015); and (iii) our framework does not rely on any assumptions about diffeomorphisms or pre-specified landmarks.
1.1 A Motivating Scientific Question
Through modeling the randomness of shapes, we aim to address the following statistical inference question: Is the observed difference between two groups of shapes statistically significant? For example, the mandibular molars in are from four genera of primates. A pertinent question from a morphological perspective is: In , do the molars from genus Tarsius exhibit significant differences from those from the other genera?
The primary objective of this article is to propose a powerful approach for testing hypotheses on random shapes. This would help address morphology-motivated statistical inference questions like the one raised above. In achieving this objective, we lay down the mathematical foundations that justify our hypothesis testing methods. We take two key steps: In Step 1, we find the appropriate representations of shapes; and in Step 2, we test hypotheses on shapes using these representations. In Section 1.2, we provide a literature review on shape representations and introduce the topological summary employed in this article. Section 1.3 begins by presenting the main theme of our hypothesis testing approach, followed by an overview of our contributions. Since the molars in are diffeomorphic to the two-dimensional unit sphere, some existing diffeomorphism-related methods can be considered for representing the molars (e.g., parameterized surfaces; Kurtek et al. Citation2011). In contrast, we aim to propose an approach that does not rely on any diffeomorphic assumptions, allowing for a wider range of applications.
1.2 Overview of Shape and Topological Data Analysis
In classical geometric morphometrics, shapes are represented using prespecified points called landmarks (Kendall Citation1989). The manual landmarking of a collection of shapes requires domain knowledge, can be very labor intensive, and is subject to bias (Boyer et al. Citation2011). Furthermore, an equal number of landmarks must be selected for each shape in a study in order to make comparisons (e.g., the Procrustes framework discussed in sec. 2.1 of Gao et al. Citation2019). This necessitates comprehensive information about entire collections of shapes for consistency, which can be difficult to obtain (e.g., landmarking cancer tumors, which can have very different morphology across a population of patients). Unfortunately, many datasets do not come with prespecified landmarks (e.g., Goswami Citation2015). Although many algorithms can automatically sample reasonable landmarks on shapes when their parameters are fine-tuned (e.g., Gao et al. Citation2019; Gao, Kovalsky, and Daubechies Citation2019), using a finite number of landmarks extracted from a continuum inevitably results in the loss of information. Diffeomorphism-based approaches (Dupuis, Grenander, and Miller Citation1998; Gao et al. Citation2019) are part of the “computational anatomy” that was historically studied by the “pattern theory school” pioneered by Ulf Grenander (Grenander and Miller Citation1998). They enable the comparison of (dis-)similarity between shapes with benefit of bypassing the need for landmarks. However, these approaches are based on the assumption that the shapes being compared are diffeomorphic to one another, making them unsuitable for many datasets (e.g., fruit fly wings in Miller Citation2015). Furthermore, parameterized curves and surfaces (PCS) provide a toolbox for assessing the heterogeneity of shapes with summary statistics that are invariant to reparameterizations (Kurtek et al. Citation2010, Citation2011, Citation2012). Despite their effectiveness in analyzing real data (e.g., DT-MRI brain fibers; Kurtek et al. Citation2012), PCS are based on assumptions about the diffeomorphism types of the shapes of interest. For example, Kurtek et al. (Citation2011) focuses on surfaces that are diffeomorphic to the two-dimensional unit sphere.
TDA opens the door for landmark-free and diffeomorphism-free representations of shapes. Motivated by differential topology, Turner, Mukherjee, and Boyer (Citation2014) proposed the persistent homology transform (PHT) with the capability to sufficiently encode all information within shapes (Ghrist, Levanger, and Mai Citation2018). To describe the PHT, we briefly provide some basics of TDA. One common statistical invariant in TDA is the persistence diagram (PD, Edelsbrunner and Harer Citation2010). When equipped with the Wasserstein distance, the collection of PDs, denoted as , is a Polish space (Mileyko, Mukherjee, and Harer Citation2011). Thus, probability measures can be applied, and the randomness of shapes can be represented using the probability measures on . The PHT takes values in , where denotes the sphere and is the -fold Cartesian product of (Turner, Mukherjee, and Boyer Citation2014, Lemma 2.1 and Definition 2.1). A single PD does not preserve all information of a shape (Crawford et al. Citation2020). In contrast, the PHT is injective, which means it preserves all the information of a shape. However, since is not a vector space and the distances on are abstract (e.g., the Wasserstein and bottleneck distances, Cohen-Steiner et al. Citation2007), many fundamental statistical concepts do not easily apply to summaries resulting from the PHT. For example, the definition of moments corresponding to probability measures on (e.g., means) is highly nontrivial (Mileyko, Mukherjee, and Harer Citation2011). The difficulty in defining these concepts hinders the application of PHT-based statistical methods in .
The smooth Euler characteristic transform (SECT, Crawford et al. Citation2020) offers an alternative summary statistic for shapes. The SECT not only preserves the information of shapes (Ghrist, Levanger, and Mai Citation2018, Corollary 1) but also represents shapes using continuous functions instead of PDs. More precisely, the values of the SECT are maps from the sphere to a separable Banach space , the collection of continuous functions on a compact interval (values of T will be given in (3.1)). Hence, for any shape K, its SECT, denoted as , lies in . Specifically, belongs to for each . As a result, the randomness of shapes K is represented via the SECT by a collection of -valued random variables. Probability theory in separable Banach spaces is better developed than in (e.g., Hairer Citation2009). In particular, a -valued random variable is a stochastic process with its sample paths in . As we will demonstrate in Section 3, here can be replaced with a reproducing kernel Hilbert space (RKHS). The theory of stochastic processes has evolved over a century and FDA is a well-developed branch of statistics. Consequently, a myriad of tools are available to underpin both the randomness of shapes and the statistical inference on shapes.
From an application perspective, Crawford et al. (Citation2020) applied the SECT to magnetic resonance images taken from tumors in a cohort of glioblastoma multiforme (GBM) patients. Using summary statistics derived from the SECT as predictors within Gaussian process regression, the authors demonstrated that the SECT can predict clinical outcomes more effectively than existing tumor shape quantification approaches and common molecular assays. The relative performance of the SECT in the GBM study suggests a promising future for its utility in medical imaging and broader statistical applications related to shape analyses. Similarly, Wang et al. (Citation2021) used derivatives of the Euler characteristic transform (ECT) as predictors in statistical models for subimage analysis. This analysis is akin to variable selection, aiming to identify physical features that are important for distinguishing between two classes of shapes. Lastly, Marsh et al. (Citation2022) highlighted that the SECT outperforms the standard measures employed in organoid morphology.
1.3 Overview of Contributions and Article Organization
Our goal is to address the hypothesis testing question posed in Section 1.1 by employing a landmark-free and diffeomorphism-free approach, which opens up possibilities for further applications in the future. We formulate the question more generically here. Let and be two distributions that generate two collections of random shapes, and . Detecting whether there is a significant difference between and is equivalent to rejecting the hypothesis . Since each shape is random, is a random variable taking values in a vector space (as discussed in Section 1.2) and can be decomposed as follows (see Theorem 5.1 for a rigorous version) (1.1) (1.1) where denotes the mean of with respect to the distribution . The random terms in (1.1) can be characterized by the Karhunen–Loève (KL) expansion (Hsing and Eubank Citation2015, sec. 7.3). To reject , it suffices to reject . That is, the question posed in Section 1.1 can be addressed by testing for the equality of two means. The important component of the test is the variance represented by the random terms in (1.1). In Section 5, we formulate this test as a two-sample problem in the fdANOVA literature (Zhang Citation2013, sec. 5.2). In addition, using the KL expansion, we provide a -statistic in Section 5 to test the hypothesis. Throughout the article, our focus is on the two-sample problem. However, one may also consider employing the one-way fdANOVA to compare the means of three or more groups of shapes. The theoretical foundation and numerical experiments for this aspect are left for future research.
To develop our framework, we have to address the following mathematical foundation related questions: (i) What underlying probability spaces allow the randomness of shapes and their corresponding SECT? and (ii) Are the conditions of the KL expansion satisfied in our setting? We answer these questions in Sections 3 and 4—we model the randomness of shapes via the SECT using RKHS-valued random fields. The “theory of random sets” is a well-established framework for characterizing set-valued random variables (Molchanov Citation2005). However, its application to persistent homology-based statistics (e.g., the SECT) remains underexplored. In this article, we introduce a new probability space to characterize the randomness of shapes in a manner compatible with the SECT.
We first propose a collection of shapes as our sample space on which the SECT is well-defined. We then demonstrate that every shape in this collection has its SECT in , where is not only a Sobolev space (Brezis Citation2011) but also an RKHS (reasons for using instead of for are in Appendix A.1). Importantly, is a separable Banach space (Theorem C.1) and, hence, a Polish space. It helps construct a probability space to characterize the distributions of shapes. Building on this probability space, we define the mean and covariance of the SECT. Using the Sobolev embedding, we present some properties of the mean and covariance, which pave the way for the KL expansion of the SECT.
Traditionally, the statistical inference on shapes in TDA is conducted in the persistence diagram space , which is unsuitable for exponential family-based distributions and requires any corresponding statistical inference to be highly nonparametric (Fasy et al. Citation2014; Robinson and Turner Citation2017). The PHT-based statistical inference in is even more difficult. With the KL expansion of the SECT, we propose a -statistic for testing hypotheses on shapes. Beyond the mathematical foundations, we also provide simulation studies to illustrate the performance of our proposed hypothesis testing method. Lastly, we apply our proposed framework to answer the motivating question raised in Section 1.1.
We organize this article as follows. In Section 2, we provide the mathematical preparations. In Section 3, we define the SECT for a specific collection of shapes, highlighting its properties. In Section 4, we construct a probability space to model shape distributions. In Section 5, we propose the KL expansion of the SECT, leading to a statistic for hypothesis testing. In Section 6, we conduct simulation studies to evaluate our method. In Section 7, we apply our method to real data. In Section 8, we conclude the article. The Appendix provides the proofs of theorems, further data analysis, and future research topics.
2 Notations and Mathematical Preparations
To model the shapes discussed in our motivating question from Section 1.1, we need certain preparations regarding (i) topology and (ii) function spaces.
Topology
The first question we must address is: What are the “shapes” in our framework? Ghrist, Levanger, and Mai (Citation2018) and Curry, Mukherjee, and Turner (Citation2022) applied o-minimal structures (van den Dries Citation1998) to prove the injectivity of the PHT. Subsequent to this, o-minimal structures have been applied in many TDA studies to model shapes (e.g., Jiang, Kurtek, and Needham Citation2020; Kirveslahti and Mukherjee Citation2023). To stay consistent with the existing literature, we also model shapes using o-minimal structures. An o-minimal structure is a sequence of subset collections satisfying six set-theoretical axioms, where denotes the power set of . The precise definition of o-minimal structures is available in van den Dries (Citation1998) and is provided in Appendix A.3 for the reader’s convenience.
A typical example of o-minimal structures is the collection of semialgebraic sets. Specifically, a set is semialgebraic if it can be expressed as a finite union of sets of the form , where are polynomial functions on . If we define as the collection of semialgebraic subsets of , then is an o-minimal structure (van den Dries Citation1998, chap. 2). The unit sphere , open ball for any R > 0, and all polyhedra (e.g., polygon meshes in computer graphics) are semialgebraic, given that they can be represented using either the polynomial or affine functions. We assume the following:
Assumption 1.
The o-minimal structure of interest contains all semialgebraic sets.
Hereafter, a “shape” refers to a compact set for a prespecified o-minimal structure satisfying Assumption 1. Assumption 1 incorporates many common shapes (e.g., balls and polyhedra) in our framework. More importantly, it implies the subsequent Theorem 2.1 through the “triangulation theorem” (van den Dries Citation1998, chap. 8). Although the definition of an o-minimal structure is highly abstract (see Appendix A.3), each compact set in resembles a polyhedron, which is precisely stated as follows.
Theorem 2.1.
Suppose is an o-minimal structure satisfying Assumption 1 and . If K is compact, there exists a finite simplicial complex S such that the polyhedron is homeomorphic to K, where each denotes a simplex.
Herein, a finite simplicial complex S is a finite collection of simplexes. Each face of a simplex also belongs to S (i.e., S is a so-called “closed complex” referred to in chap. 8 of van den Dries Citation1998). Theorem 2.1 directly results from the “triangulation theorem” (van den Dries Citation1998); hence, its proof is omitted. For the dth component of , Theorem 2.1 indicates that the compact sets are subsets of that are homeomorphic to polyhedra. Theorem 2.1 also implies that the homology groups of each compact are well-defined and finitely generated; hence, the Betti numbers and Euler characteristic of K are well-defined and finite (Hatcher Citation2002, chap. 2).
Function Spaces
We apply the following notations throughout this article:
For any normed space , let denote its norm. Denote as for succinctness.
Let X be a compact metric space equipped with metric dX , and let denote a normed space. is the collection of continuous maps from X to . Furthermore, is a normed space equipped with . The Hölder space is defined as . Here, is a normed space equipped with . Obviously, . For simplicity, we denote and . For a given T > 0 (e.g., see (3.1)), we denote as .
The inner product of is defined as (Brezis Citation2011, chap. 8.3, Remark 17).
Suppose is a metric space (not necessarily compact). Both and denote the Borel algebra generated by the metric topology corresponding to dY.
denotes a function F defined on the set Z.
The following inequalities are useful for deriving many results presented in this article (2.1) (2.1) where is a constant depending only on T. The first inequality in (2.1) results from the definition of , while the second inequality is from Brezis (Citation2011) (Corollary 9.14; also see Appendix L.2). EquationEquation (2.1)(2.1) (2.1) implies the following Sobolev embedding (2.2) (2.2)
3 Smooth Euler Characteristic Transform
In this section, we give the background on the SECT and propose corresponding mathematical foundations. Notably, we specify the “sample space”—a collection of shapes on which the SECT is well-defined. The SECT of the shapes in this sample space has properties that are suitable for the probability theory developed in Section 4. The molars in the motivating question from Section 1.1 will be modeled as elements of the sample space.
Suppose an o-minimal structure satisfying Assumption 1 is given, and we focus on shapes in d-dimensional space . We assume the shape is compact and , for example, the in or the surfaces of the mandibular molars in as presented by . For each direction , we define a filtration of sublevel sets by the following (see for an illustration) (3.1) (3.1)
We then have the following Euler characteristic curve (ECC, denoted as ) in direction ν (3.2) (3.2) for , where is the kth Betti number of . The sum in (3.2) ends at d–1 because higher homology groups are trivial (Curry, Mukherjee, and Turner Citation2022, sec. 4). If is a triangle mesh, , where , and denote the number of vertices, edges, and faces of the mesh, respectively. Due to Theorem 2.1, the compactness of K guarantees that the Betti numbers in (3.2) are well-defined and finite.
The Euler characteristic transform (ECT) defined as was proposed by Turner, Mukherjee, and Boyer (Citation2014) as an alternative to the PHT. Based on the ECT, Crawford et al. (Citation2020) further proposed the SECT as follows (3.3) (3.3)
A visualization of the function is presented in . The following lemma implies that the Lebesgue integrals in (3.3) are well-defined.
Lemma 3.1.
For any fixed and , the function is piecewise constant with only finitely many discontinuities.
Through the “cell decomposition theorem” (van den Dries Citation1998, chap. 3), Lemma 3.1 directly follows from either Lemma 3.4 of Curry, Mukherjee, and Turner (Citation2022) or “(2.10) Proposition” in Chapter 4 of van den Dries (Citation1998). Hence, the proof of Lemma 3.1 is omitted.
To investigate the distribution of over different shapes K, we introduce the following condition to restrict our attention to a subset of .
Condition 3.1.
Let . The condition is that K satisfies the following inequality (3.4) (3.4) where is the PD of K associated with the function (also see ), is the persistence of the homology feature ξ, denotes the cardinality of a multiset, and M > 0 is a sufficiently large prespecified number.
Condition 3.1 involves technicalities from computational topology (Edelsbrunner and Harer Citation2010). To maintain the flow of the article, we relegate the details of this condition, as well as the definitions of and , to Appendix B. Heuristically, Condition 3.1 implies the existence of a uniform upper bound on the number of nontrivial homology features of K across all directions ν. Hereafter, we focus on shapes in the following collection
Our proposed collection is suitable for modeling shapes in many applications. For example, the surfaces of the molars in are compact subsets of , bounded by a common open ball, and can be approximately represented by triangle meshes (hence, modeled by an o-minimal structure satisfying Assumption 1). In addition, the four genera of primates in share a phylogentic relationship which implies that their molars have common baseline features and satisfy Condition 3.1 with a shared upper bound M. In each application, the dimension d and radius R of the ball B(0, R) can easily be determined based on observed shapes. Although our mathematical framework requires the existence of such an M in (3.4), the value of M is not needed for our statistical methodology (see Section 5). Thus, Condition 3.1 does not hinder our proposed statistical methodology.
Lemma 3.1 implies that the function of t belongs to . Therefore, the function of t is absolutely continuous on . Furthermore, we have the following regularity result of the Sobolev type.
Lemma 3.2.
For any and , the function belongs to .
Lemma 3.2 is a special case of Lemma C.3. It indicates , which is enhanced by the following result.
Theorem 3.2.
For each , we have: (i) There exists a constant depending only on M, R, and d such that the following inequality holds for any directions , (3.5) (3.5) and (ii) , where is equipped with the geodesic distance .
Results complementary to Theorem 3.2 can be found in Theorem C.3, which imply that the function belongs to . Theorem 3.2(i) is an analog of Lemma 2.1 in Turner, Mukherjee, and Boyer (Citation2014). Theorem 3.2(ii) implies . As a result, (3.3) defines the following map (3.6) (3.6)
In Appendix D.1, we provide detailed proof-of-concept examples (similar to ) to visually illustrate the and support the regularity results in Theorems 3.2 and C.3.
Corollary 1 of Ghrist, Levanger, and Mai (Citation2018) implies the following result, which shows that the SECT preserves all the information of shapes .
Theorem 3.3.
The map defined in (3.6) is injective for all dimensions d.
The map is injective, but not surjective. Specifically, Theorem 3.2 suggests that the image of does not lie outside of . An explicit characterization of the image remains a topic for future research.
Inspired by Theorem 3.3, one may consider reconstructing a shape from either the or the . From a theoretical standpoint, a shape can be reconstructed using the “Schapira’s inversion formula” (Schapira Citation1995). Further details are available in Ghrist, Levanger, and Mai (Citation2018). From an algorithmic perspective, the proof of Theorem 3.1 in Turner, Mukherjee, and Boyer (Citation2014) offers an algorithm to reconstruct low-dimensional meshes from their ECT. Nevertheless, effective algorithmic approaches to reconstructing shapes are still underdeveloped. Challenges in reconstructing shapes are extensively discussed in Fasy et al. (Citation2018). A comprehensive exploration of the reconstruction using SECT is also left for future research.
Together with (3.6), Theorem 3.3 allows us to represent each by . This perspective aids us in modeling the randomness of shapes using probability measures on the separable Banach space . Here, we prefer over -Hölder space . This is because -Hölder spaces are typically not separable (Hairer Citation2009, Remark 4.21). The separability condition is essential for probability measures on Banach spaces to exhibit non-pathological behavior (Hairer Citation2009, sec. 4).
4 Probabilistic Distributions over the SECT
To address the motivating question outlined in Section 1.1 using hypothesis testing, we need to view the observed shapes (e.g., the molars in ) as shape-valued random variables. In this section, we construct a probability space to model the randomness of shapes and make the SECT a random variable (in the measurable sense) taking values in . More importantly, this probability space helps justify the KL expansion of the SECT, which lays the foundations for our hypothesis testing method in Section 5.
Probability Space
Suppose is equipped with a σ-algebra . A distribution of shapes K across is represented by a probability measure on . Then, is a probability space. For each fixed , the integer-valued map is defined on . Hereafter, we assume the following:
Assumption 2.
For each fixed , the map is a measurable function and, hence, a real-valued random variable.
A σ-algebra satisfying Assumption 2 exists. Here, we construct a metric ρ on and show that the Borel algebra induced by ρ satisfies Assumption 2. We define (4.1) (4.1)
Theorem 4.1.
The map ρ is a metric on . Assumption 2 is satisfied if .
Under Assumption 2, the ECC , for each , is a stochastic process defined on the probability space . Since each sample path has finitely many discontinuities (Lemma 3.1), for each is a Riemann integral, which is equal to the limit of Riemann sum . Given that each is a random variable under Assumption 2, the limit of the Riemann sum for each is a random variable as well. Therefore, for each with is a stochastic process. Then, under Assumption 2, (3.3) defines the following stochastic process on for each (4.2) (4.2)
Precisely, for each fixed ν, we have the stochastic process defined on ; and, for each fixed , we have the real-valued random variable defined on .
Since is an RKHS (Appendix A.1), Lemma 3.2 and Theorem 4.2, together with Theorem 7.1.2 of Hsing and Eubank (Citation2015), imply the following. Its proof is omitted.
Theorem 4.2.
(i) For each is a real-valued stochastic process with sample paths in . Equivalently, is a random variable taking values in . (ii) The map defined in (3.6) is a random variable taking values in .
Using Theorem 4.2 in conjunction with Theorem 3.3, we can represent random shapes (which model the surfaces of the mandibular molars in ) as -valued random variables. This representation through the SECT has no loss of information.
In Appendix D.2, we provide proof-of-concept examples to illustrate random shapes and their SECT representations visually. These examples relate the SECT to Fréchet regression (Petersen and Müller Citation2019), Wasserstein regression (Chen et al. Citation2023), and manifold learning (Dunson and Wu Citation2021; Meng and Eloyan Citation2021; Li, Mukhopadhyay, and Dunson Citation2022).
Mean and Covariance of the SECT
For deriving the KL expansion in Section 5, we define the mean and covariance of the SECT. To do so, we need the following lemma.
Lemma 4.1.
For any probability measure defined on the measurable space , we have
Lemma 4.1, together with (2.1), implies that for all . Then, we define the mean and covariance functions as follows (4.3) (4.3)
Lemma C.4 provides several properties of the mean and covariance that validate our KL expansion of in Section 5. Additionally, Lemma C.4 demonstrates that the mean of SECT belongs to . A tentative discussion on the “pseudo-inverse” is provided after Lemma C.4 in Appendix C.
In most shape analysis studies, data are preprocessed by alignment. In Appendix E, we introduce the “ECT alignment” as a preprocessing step before any statistical inference. Throughout the article, we assume that the data have been aligned using this method. The ECT alignment exploits rigid motions, does not rely on landmarks, and is equivalent to the alignment approach outlined in Wang et al. (Citation2021) (Supplementary Section 4). The primary objective of the ECT alignment is to minimize the differences between two shapes that arise from rigid motions. For instance, the molars in were aligned using the ECT alignment. Furthermore, the ECT alignment is compatible with our SECT framework. Appendix E demonstrates that the ECT alignment does not alter the qualitative properties of SECT (e.g., the measurability, Sobolev-regularity, and -Hölder continuity).
In applications, it is infeasible to sample infinitely many directions and levels . For given shapes K, we compute for finitely many directions and levels . To retain information about shapes K, one needs to properly set the numbers of directions and levels (i.e., Γ and Δ). From a theoretical viewpoint, Curry, Mukherjee, and Turner (Citation2022) comprehensively discussed the number Γ of directions needed to recover shapes K from when K are “piecewise linearly embedded shapes with plausible geometric bounds.” From the numerical perspective, we note the following: (i) Wang et al. (Citation2021) provided detailed simulation studies on the choices of Γ and Δ in sub-image analysis, and a general guideline for setting Γ and Δ in practice was presented in Supplementary Table 1 therein; and (ii) in our Appendix K, we provide detailed numerical experiments on the tradeoffs between the choices of Γ and Δ, the statistical power of our proposed algorithms (Algorithms 1 and 2), and computational cost.
5 Testing Hypotheses on Shapes
In this section, we apply the probabilistic formulation from Section 4 and Lemma C.4 to test hypotheses on shapes. Suppose and are two distributions on the measurable space . Let be the product probability measure defined on the product σ-algebra , satisfying for all . To address the motivating question from Section 1.1, we test the following hypotheses (5.1) (5.1) for example, suppose and model the distributions of molars from two genera of primates (). Rejecting the in (5.1) helps distinguish the two genera of primates.
Define for as the mean functions corresponding to and . To reject the null in (5.1) (equivalently, distinguish two collections of shapes), it suffices to reject the null hypothesis H0 in the following (5.2) (5.2)
Analysis of Variance for Functional Data (fdANOVA)
Considering the hypotheses in (5.2) for all directions results in simultaneous multiple-comparisons and inflation of the Type I error. To address this issue, we focus on a specific direction, motivated by the observation that the null hypothesis H0 in (5.2) is equivalent to . Hence, the direction of interest is defined as (5.3) (5.3)
Lemma C.4 and (2.2) imply for all ν. Lemma C.4, together with (2.1), confirms the existence of a maximizer in (5.3). The maximizer in (5.3) may not be unique. If there are multiple maximizers, we arbitrarily choose one, as this choice does not influence our framework. The null hypothesis in (5.2) is then equivalent to , where the defined in (5.3) is called a distinguishing direction. Hereafter, we investigate the distribution of .
Based on the discussion above, testing the hypotheses in (5.2) is equivalent to testing for using , which is a fdANOVA problem that has been well-studied in the literature (e.g., Zhang Citation2013, sec. 5.2). However, many state-of-the-art fdANOVA approaches are incompatible with . For example, the Gaussianity of is not guaranteed (Remark C.1), and the “two-sample problem assumptions” in Section 5.2 of Zhang (Citation2013) may not be satisfied. Besides, the L2-norm-based test (Zhang and Chen Citation2007) and F-type test (Shen and Faraway Citation2004) are not preferred when the functional data are not Gaussian (Zhang Citation2013, chap. 5). Additionally, many fdANOVA methods are time-consuming. For example, tests based on random projections (TRP, Cuesta-Albertos and Febrero-Bande Citation2010) require the computation of (at least 30) L2-projections for each observed function, followed by the application of appropriate ANOVA tests to these projections. To address the Gaussianity issue and achieve computational efficiency, we propose a method for fdANOVA using the KL expansion. Our test has a foundation that aligns with the probabilistic framework of in Section 4; it is comparable with the existing methods in terms of size and power (see Appendix J); and it is also computationally efficient, allowing for the permutation test used with our method.
Karhunen–Loève Expansion
Let be the covariance function of the stochastic process corresponding to , for (see (4.3)). Hereafter, we assume the following, which is true under the null hypothesis in (5.1).
Assumption 3
(Homoscedasticity). , where is defined in (5.3).
This is a standard assumption in the fdANOVA literature (e.g., Zhang Citation2013, sec. 5.2) and can be tested using the methods proposed by Jia Guo and Zhang (Citation2019).
We define an integral operator on as . This operator is compact and self-adjoint (Hsing and Eubank Citation2015, Theorems 4.6.2 and Example 3.3.4). Moreover, the Hilbert-Schmidt theorem (Reed and Simon Citation1972, Theorem VI.16) suggests that there is a complete orthonormal basis for so that (i) each is an eigenfunction with eigenvalue λl , (ii) , and (iii) . Lemma C.4 and Theorem 7.3.5 of Hsing and Eubank (Citation2015) imply the following KL expansion:
Theorem 5.1
(Karhunen–Loève expansion). (i) For each fixed , we have (5.4) (5.4) where for , and is the expectation associated with . For each , the random variables are defined on the probability space , are mutually uncorrelated, and have mean 0 and variance 1. (ii) There exists so that and (5.5) (5.5) for any and each fixed . The null set is allowed to be empty.
Using the KL expansion in (5.4), the random sampling of shapes may be considered, which is discussed in Appendix M.1 and left for future research.
Our Approach
Consider two independent collections of random shapes , for (i.e., ). The pairing in is arbitrary for the following reasons: (i) pairs and with have the same distribution , and (ii) numerical experiments in Sections 6 and 7 demonstrate that the performance of our proposed algorithms is numerically invariant to shuffling the index i within each collection . Without loss of generality, we assume that all the shapes have been aligned using the “ECT alignment” (Appendix E). Here, we present the theoretical foundation for employing to test the hypotheses in (5.2). This foundation helps address the motivating question from Section 1.1.
Without loss of generality, we assume , for all , where is the null set in Theorem 5.1 satisfying . Then, we have (5.6) (5.6) where δl and θl are defined in (5.5). Theorem 5.1 implies that, for each fixed l, the random variables are iid across with mean θl and variance 1; for each fixed i, the random variables are mutually uncorrelated across . The following lemma represents the null H0 in (5.2) using the means .
Lemma 5.1.
The null H0 in (5.2) is equivalent to for all positive integers l.
Recall that . When eigenvalues λl in the denominator of (5.5) are close to zero for large l, the estimated θl becomes unstable. Specifically, even if , an extremely small λl can move the corresponding estimated θl far away from zero. Using the standard approach in principal component analysis, we focus on with (5.7) (5.7)
Hence, to test the hypotheses in (5.2) via Lemma 5.1, we test the following (5.8) (5.8)
Under the null in (5.8), for each , the central limit theorem indicates that is asymptotically N(0, 1) when n is large. The mutual uncorrelation in Theorem 5.1 and the asymptotic normality of provide the asymptotic independence of across . Then, is asymptotically under the in (5.8). At the asymptotic significance , we reject the if (5.9) (5.9)
In applications, neither the mean nor the covariance is known. Hence, the KL expansions in (5.5) cannot be directly used and must be estimated. In Appendix F, we propose a numerical foundation for conducting the asymptotic -test in (5.9) and encapsulate the numerical procedures for the test in Algorithm 1. In all our analyses in Sections 6 and 7, the numerical estimates (see (F.4) in Appendix F) of the L in (5.7) are smaller than 10. When the values are large (e.g., several hundred), one may also consider applying the adaptive Neyman test proposed by Fan (Citation1996).
In the simulation studies presented in .1, our Algorithm 1 has comparable performance with more than ten existing state-of-the-art fdANOVA methods. Nonetheless, both Algorithm 1 and the existing methods exhibit Type I error inflation (e.g., the rejection rate of Algorithm 1 is 0.118, whereas the significance is 0.05). To mitigate this inflation, we may consider applying the permutation test using one of these methods that is computationally efficient. For example, Górecki and Smaga (Citation2015) proposed a permutation test based on an F-type statistic (FP). Specifically, Górecki and Smaga (Citation2015) approximated each observed function by basis functions via information criteria, and the F-type statistic was approximated by a form conducive to efficiently computing permutation-based p-values. However, the FP also exhibits Type I error inflation (see .1). Motivated by the FP, we apply the permutation test to the -statistic defined in (5.9) in the following way: we first apply Algorithm 1 to our original shapes and then repeatedly reapply Algorithm 1 to the shapes with shuffled group labels j. The -test statistic derived from the original data is then compared to that from the shuffled data. A detailed description of our permutation-based approach is presented in Algorithm 2 in Appendix F. Simulations in Section 6 demonstrate that our permutation-based approach eliminates the Type I error inflation encountered by Algorithm 1. The permutation nature of Algorithm 2 is also advantageous for small sample sizes. Note, however, that the power of Algorithm 2 under the alternative is moderately weaker than that of Algorithm 1. Lastly, the runtimes of Algorithms 1 and 2, when applied to simulations, are studied in Appendix K. We present the runtimes when applying the algorithms to real data in .
6 Experiments Using Simulations
We present simulations showing the performance of our Algorithms 1 and 2. In addition, we compare our algorithms with the “randomization-style null hypothesis significance test (NHST)” (Robinson and Turner Citation2017), the TRP using Wald-type permutation statistic (TRP-WTPS, Cuesta-Albertos and Febrero-Bande Citation2010; Pauly, Brunner, and Konietschke Citation2014), and the FP. Details of the randomization-style NHST are given in Appendix G and referred to as Algorithm 3. The application of the FP and TRP to the SECT is described in Section 5. We implement the FP and TRP-WTPS using the R package fdANOVA with its default parameters as recommended by Górecki and Smaga (2019). Additional simulations comparing our proposed algorithms and other existing fdANOVA methods are presented in Appendix J.
We focus on a family of distributions with shapes via (6.1) (6.1)
and . The ε denotes the dissimilarity between and . For each , through the discussion in Section 5, we test the following hypotheses via fdANOVA methods (i.e., FP, TRP-WTPS, Algorithms 1, and 2) where the mean , and the null hypothesis H0 is true when . We also test using Algorithm 3.
We set T = 3, directions for , levels for (i.e., Γ = 4 and Δ = 50 in Algorithms 1, 2, and 3), the confidence level 95% (i.e., ), and the number of permutations Π = 1000. For each {0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.08, 0.1}, we independently generate two collections and through (6.1) with the number of shape pairs set to n = 100, and we compute the SECT of each generated shape in directions and at levels . We then implement the fdANOVA methods and Algorithm 3 to these computed SECT statistics and get the corresponding Accept/Reject outputs. We repeat this procedure 1000 times and report the rejection rates across all 1000 replicates for each ε in . The rejection rates are also visually presented in . We choose Γ = 4 as the number of directions in our simulations based on the following observation: in Appendix K, we experiment with all combinations of , and . When Δ = 50 and n = 100, the number Γ = 4 is sufficiently large for our Algorithms 1 and 2 to distinguish from with using the significance level . Moreover, this choice allows us to demonstrate that even a relatively small number of directions (e.g., Γ = 4) is sufficient for implementing our Algorithms 1 and 2.
The results in and demonstrate that our proposed algorithms are effective at detecting the difference between and in terms of distinguishing their mean functions. Notably, our algorithms (especially Algorithm 2) tend to avoid falsely detecting differences between shape-generating distributions under the null hypothesis (i.e., ). As ε increases, deviates from , and the power of our algorithms in detecting the deviation increases. When , the power of Algorithms 1 and 2 exceeds 0.99. For all the ε, it is difficult to see the deviation of from visually. For instance, by merely observing the shapes in , one might find it hard to differentiate between the shape collections generated by (blue) and (pink). However, in more than 99% of the simulations, our algorithms detect the difference between the two distributions. We also randomly shuffle the index i within each collection and apply Algorithms 1 and 2 to the shuffled collections. The results obtained from the unshuffled and shuffled shape collections, respectively, are nearly identical. Algorithm 3 performs well in detecting the discrepancy between and . However, its power under the alternative hypotheses (i.e., ) is weaker than that of our Algorithms 1 and 2. Moreover, Algorithms 1 and 2 exhibit performance comparable to twelve existing state-of-the-art fdANOVA methods (see , , and in Appendix J).
7 Applications
We first apply our proposed Algorithms 1 and 2 to the MPEG-7 shape silhouette database (Sikora Citation2001) as a toy example. Details of this are provided in Appendix I. This analysis shows that our proposed algorithms can distinguish between shape classes in the silhouette database and do not falsely identify signals when there are no differences between groups.
In this section, we apply our algorithms to address the motivating question in Section 1.1. Specifically, we use Algorithms 1 and 2 to distinguish between the four categories of mandibular molars in that are from four genera of primates. The shapes in come from two suborders of primates: Haplorhini and Strepsirrhini (see ). In the haplorhine suborder collection, 29 molars came from the genus Tarsius (yellow panels in ), and 9 molars came from the genus Saimiri (grey panels in ). In the strepsirrhine collection, 11 molars came from the genus Microcebus (blue panels in ), and 6 molars came from the genus Mirza (green panels in ).
Before applying Algorithms 1 and 2, we preprocess the raw triangle mesh data of the surfaces of the molars by aligning them through the ECT alignment approach detailed in Appendix E. The aligned molars are presented in . We apply our Algorithms 1 and 2 to the preprocessed molars. For each aligned molar, we compute its SECT for 2918 directions; in each direction, we use 200 sublevel sets. To compare any pair of molar groups, as a proof of concept, we select the smaller size of the two groups as the sample size input n in our algorithms. For example, when comparing the Tarsius and Microcebus groups, we choose n = 11; that is, we compare the first 11 molars of the Tarsius group to all the molars in the Microcebus group. We apply our algorithms to the four groups of molars and present the results in . The p-values in are either -test p-values (Algorithm 1) or permutation-test p-values (Algorithm 2 with 1000 permutations). The small p-values (P < 0.05) in show that our proposed algorithms can distinguish the four different genera of primates. Since the genera Microcebus and Mirza belong to the same suborder Strepsirrhini (see ), the p-value from Algorithm 2 is comparatively large when comparing molars from these two groups. In comparison, although the Tarsius and Saimiri both belong to the suborder Haplorhini, the molars of the two genera look different. Specifically, the paraconids (i.e., the cusp highlighted in red in ) are only retained by the genus Tarsius and, thus, are a key reason for the small p-values () when comparing with molars from the Saimiri. Other small p-values () in our analyses are a result of the corresponding genera belonging to different suborders.
In addition to testing the difference between genera, we apply our algorithms within the genus Tarsius. Specifically, we focus on the first 28 molars in the Tarsius group. We randomly split the 28 molars into two halves and apply Algorithms 1 and 2 to test the difference between the two halves. We repeat the random splitting procedure 100 times and present the corresponding p-values in . The results are summarized by their mean and standard deviation (in parentheses). These p-values show that our proposed Algorithm 2 tends to avoid the Type I error for the molars from the genus Tarsius.
Landmark methods are widely used in geometric morphometrics. One state-of-the-art approach is the “Gaussian process landmarking (GPL)” algorithm (Gao et al. Citation2019; Gao, Kovalsky, and Daubechies Citation2019) which can automatically sample landmarks on the surfaces of the molars in . Gao et al. (Citation2019) showed that these sampled landmarks could induce a continuous Procrustes distance to measure the dissimilarity between molars. A permutation test can be derived using the Procrustes distance induced by the GPL algorithm. This test is detailed in Appendix H and is encapsulated by Algorithm 4. We use the GPL-based Algorithm 4 to differentiate the four collections of molars. For this, we use the MATLAB code from the GitHub repository provided by Gao et al. (Citation2019) to compute the Procrustes distance. Performance of Algorithm 4 is in , which shows that the GPL-based method and our Algorithm 2 have comparable performance. However, repeatedly computing the Procrustes distance is time-consuming. Hence, Algorithm 2 is more computationally efficient than Algorithm 4 while achieving similar performance (see the last row of ).
We want to note that, in addition to the GPL algorithm, many other existing methods can be applied to measure dissimilarity between molars, including parameterized surfaces (Kurtek et al. Citation2010, Citation2011) and the approaches from computational anatomy (Grenander and Miller Citation1998). Similarly, the parameterized curves (Kurtek et al. Citation2012) can also be used to analyze the silhouette database in Appendix I. An even more comprehensive comparison of our algorithms with the entire edifice of existing methods is left for future research.
8 Conclusions and Discussions
In this article, we established the mathematical foundations for the randomness of shapes via the SECT. Specifically, (i) was constructed as the underlying probability space; (ii) the SECT was modeled as a -valued random variable. We further demonstrated several properties of the SECT ensuring its KL expansion, which led to a -statistic for testing hypotheses on random shapes. We bridged the fdANOVA and TDA. Simulation studies corroborated our mathematical derivations and showed the performance of our hypothesis testing algorithms. Our approach was shown to be powerful in detecting the difference between two shape-generating distributions. We applied our proposed algorithms to silhouette and primate molar datasets. Importantly, our simulations when , together with the applications to the molars and the silhouette database, indicate that our algorithms tend to avoid falsely detecting differences between shape-generating distributions when there are none. Using the molars in , we compared the performance of our algorithms to a permutation test based on a state-of-the-art landmarking algorithm (Gao et al. Citation2019; Gao, Kovalsky, and Daubechies Citation2019), underscoring the efficiency of our algorithms. We enumerate potential future research areas in Appendix M, for example, the fdANOVA methods can be used for brain connectivity (Chen et al. Citation2024; Meng and Eloyan Citation2024) via topological summaries.
Supplementary_Materials.pdf
Download PDF (5.6 MB)acc_form.pdf
Download PDF (81.3 KB)Acknowledgments
We are grateful to the Editor, Associate Editor, and three Referees for their thorough review of our article and the insightful suggestions that have tremendously improved its quality. We want to thank Dr. Matthew T. Harrison from the Division of Applied Mathematics at Brown University for useful comments and suggestions. KM wants to thank Mattie Ji from the Department of Mathematics at Brown University for her insightful comments. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Supplementary Materials
The supplementary materials provide the proofs of theorems, further data analysis, and future research topics.
Data Availability Statement
The source code for implementing the simulation studies and applications is publicly available online at https://github.com/JinyuWang123/TDA.git.
Disclosure Statement
The authors report there are no competing interests to declare.
Additional information
Funding
References
- Boyer, D. M., Lipman, Y., Clair, E. S., Puente, J., Patel, B. A., Funkhouser, T., Jernvall, J., and Daubechies, I. (2011), “Algorithms to Automatically Quantify the Geometric Similarity of Anatomical Surfaces,” Proceedings of the National Academy of Sciences, 108, 18221–18226. Available at https://www.pnas.org/doi/abs/10.1073/pnas.1112822108. DOI: 10.1073/pnas.1112822108.
- Brezis, H. (2011), Functional Analysis, Sobolev Spaces and Partial Differential Equations (Vol. 2), New York: Springer.
- Chen, Y., Lin, Z., and Müller, H.-G. (2023), “Wasserstein Regression,” Journal of the American Statistical Association, 118, 869–882. DOI: 10.1080/01621459.2021.1956937.
- Chen, Y., Lin, S.-C., Zhou, Y., Carmichael, O., Müller, H.-G., Wang, J.-L., and Alzheimer’s Disease Neuroimaging Initiative. (2024), “Gradient Synchronization for Multivariate Functional Data, with Application to Brain Connectivity,” Journal of the Royal Statistical Society, Series B. DOI: 10.1093/jrsssb/qkad140.
- Cohen-Steiner, D., Edelsbrunner, H., and Harer, J. (2007), “Stability of Persistence Diagrams,” Discrete & Computational Geometry, 37, 103–120. DOI: 10.1007/s00454-006-1276-5.
- Crawford, L., Monod, A., Chen, A. X., Mukherjee, S., and Rabadán, R. (2020), “Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis,” Journal of the American Statistical Association, 115, 1139–1150. DOI: 10.1080/01621459.2019.1671198.
- Cuesta-Albertos, J., and Febrero-Bande, M. (2010), “A Simple Multiway Anova for Functional Data,” Test, 19, 537–557. DOI: 10.1007/s11749-010-0185-3.
- Curry, J., Mukherjee, S., and Turner, K. (2022), “How Many Directions Determine a Shape and Other Sufficiency Results for Two Topological Transforms,” Transactions of the American Mathematical Society, Series B, 9, 1006–1043. DOI: 10.1090/btran/122.
- Dunson, D. B., and Wu, N. (2021), “Inferring Manifolds from Noisy Data Using Gaussian Processes,” arXiv preprint arXiv:2110.07478.
- Dupuis, P., Grenander, U., and Miller, M. I. (1998), “Variational Problems on Flows of Diffeomorphisms for Image Matching,” Quarterly of Applied Mathematics, LVI, 587–600. DOI: 10.1090/qam/1632326.
- Edelsbrunner, H., and Harer, J. (2010), Computational Topology: An Introduction, Providence, RI: American Mathematical Society.
- Fan, J. (1996), “Test of Significance based on Wavelet Thresholding and Neyman’s Truncation,” Journal of the American Statistical Association, 91, 674–688. DOI: 10.1080/01621459.1996.10476936.
- Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., and Singh, A. (2014), “Confidence Sets for Persistence Diagrams,” The Annals of Statistics, 42, 2301–2339. DOI: 10.1214/14-AOS1252.
- Fasy, B. T., Micka, S., Millman, D. L., Schenfisch, A., and Williams, L. (2018), “Challenges in Reconstructing Shapes from Euler Characteristic Curves,” arXiv preprint arXiv:1811.11337.
- Gao, T., Kovalsky, S. Z., Boyer, D. M., and Daubechies, I. (2019), “Gaussian Process Landmarking for Three-Dimensional Geometric Morphometrics,” SIAM Journal on Mathematics of Data Science, 1, 237–267. DOI: 10.1137/18M1203481.
- Gao, T., Kovalsky, S. Z., and Daubechies, I. (2019), “Gaussian Process Landmarking on Manifolds,” SIAM Journal on Mathematics of Data Science, 1, 208–236. DOI: 10.1137/18M1184035.
- Ghrist, R., Levanger, R., and Mai, H. (2018), “Persistent Homology and Euler Integral Transforms,” Journal of Applied and Computational Topology, 2, 55–60. DOI: 10.1007/s41468-018-0017-1.
- Górecki, T., and Smaga, Ł. (2015), “A Comparison of Tests for the One-Way Anova Problem for Functional Data,” Computational Statistics, 30, 987–1010. DOI: 10.1007/s00180-015-0555-0.
- ——— (2019), “fdanova: An r Software Package for Analysis of Variance for Univariate and Multivariate Functional Data,” Computational Statistics, 34, 571–597.
- Goswami, A. (2015), “Phenome10k: A Free Online Repository for 3-D Scans of Biological and Palaeontological Specimens,” Google Scholar.
- Grenander, U., and Miller, M. I. (1998), “Computational Anatomy: An Emerging Discipline,” Quarterly of Applied Mathematics, 56, 617–694. DOI: 10.1090/qam/1668732.
- Hairer, M. (2009), “An Introduction to Stochastic PDEs,” arXiv preprint arXiv:0907.4178.
- Hatcher, A. (2002), Algebraic Topology, New York: Cambridge University Press.
- Hsing, T., and Eubank, R. (2015), Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators (Vol. 997), Chichester: Wiley.
- Jia Guo, B. Z., and Zhang, J.-T. (2019), “New Tests for Equality of Several Covariance Functions for Functional Data,” Journal of the American Statistical Association, 114, 1251–1263. DOI: 10.1080/01621459.2018.1483827.
- Jiang, Q., Kurtek, S., and Needham, T. (2020), “The Weighted Euler Curve Transform for Shape and Image Analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 844–845.
- Kendall, D. G. (1989), “A Survey of the Statistical Theory of Shape,” Statistical Science, 4, 87–99. DOI: 10.1214/ss/1177012582.
- Kirveslahti, H., and Mukherjee, S. (2023), “Representing Fields Without Correspondences: The Lifted Euler Characteristic Transform,” Journal of Applied and Computational Topology, 8, 1–34. DOI: 10.1007/s41468-023-00133-w.
- Kurtek, S., Klassen, E., Ding, Z., Jacobson, S. W., Jacobson, J. L., Avison, M. J., and Srivastava, A. (2010), “Parameterization-Invariant Shape Comparisons of Anatomical Surfaces,” IEEE Transactions on Medical Imaging, 30, 849–858. DOI: 10.1109/TMI.2010.2099130.
- Kurtek, S., Klassen, E., Gore, J. C., Ding, Z., and Srivastava, A. (2011), “Elastic Geodesic Paths in Shape Space of Parameterized Surfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 1717–1730. DOI: 10.1109/TPAMI.2011.233.
- Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012), “Statistical Modeling of Curves Using Shapes and Related Features,” Journal of the American Statistical Association, 107, 1152–1165. DOI: 10.1080/01621459.2012.699770.
- Li, D., Mukhopadhyay, M., and Dunson, D. B. (2022), “Efficient Manifold Approximation with Spherelets,” Journal of the Royal Statistical Society, Series B, 84, 1129–1149. DOI: 10.1111/rssb.12508.
- Marsh, L., Zhou, F. Y., Quin, X., Lu, X., Byrne, H. M., and Harrington, H. A. (2022), “Detecting Temporal Shape Changes with the Euler Characteristic Transform,” arXiv preprint arXiv:2212.10883.
- Meng, K., and Eloyan, A. (2021), “Principal Manifold Estimation Via Model Complexity Selection,” Journal of the Royal Statistical Society, Series B, 83, 369–394. DOI: 10.1111/rssb.12416.
- Meng, K., and Eloyan, A. (2024), “Population-Level Task-Evoked Functional Connectivity via Fourier Analysis,” Journal of the Royal Statistical Society, Series C. DOI: 10.1093/jrsssc/qlae015.
- Mileyko, Y., Mukherjee, S., and Harer, J. (2011), “Probability Measures on the Space of Persistence Diagrams,” Inverse Problems, 27, 124007. DOI: 10.1088/0266-5611/27/12/124007.
- Miller, E. (2015), “Fruit Flies and Moduli: Interactions between Biology and Mathematics,” Notices of the AMS, 62, 1178–1184.
- Molchanov, I. (2005), Theory of Random Sets (Vol. 19), Dordrecht: Springer.
- Pauly, M., Brunner, E., and Konietschke, F. (2014), “Asymptotic Permutation Tests in General Factorial Designs,” Journal of the Royal Statistical Society, Series B, 77, 461–473. DOI: 10.1111/rssb.12073.
- Petersen, A., and Müller, H.-G. (2019), “Fréchet Regression for Random Objects with Euclidean Predictors,” The Annals of Statistics, 47, 691–719. DOI: 10.1214/17-AOS1624.
- Reed, M., and Simon, B. (1972), Methods of Modern Mathematical Physics: Functional Analysis, New York: Academic Press.
- Robinson, A., and Turner, K. (2017), “Hypothesis Testing for Topological Data Analysis,” Journal of Applied and Computational Topology, 1, 241–261. DOI: 10.1007/s41468-017-0008-7.
- Roycraft, B., Krebs, J., and Polonik, W. (2023), “Bootstrapping Persistent Betti Numbers and Other Stabilizing Statistics,” The Annals of Statistics, 51, 1484–1509. DOI: 10.1214/23-AOS2277.
- Schapira, P. (1995), “Tomography of Constructible Functions,” in International Symposium on Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes, pp. 427–435, Springer.
- Shen, Q., and Faraway, J. (2004), “An f Test for Linear Models with Functional Responses,” Statistica Sinica, 14, 1239–1257.
- Sikora, T. (2001), “The mpeg-7 Visual Standard for Content Description-an Overview,” IEEE Transactions on Circuits and Systems for Video Technology, 11, 696–702. DOI: 10.1109/76.927422.
- Tang, W. S., da Silva, G. M., Kirveslahti, H., Skeens, E., Feng, B., Sudijono, T., Yang, K. K., Mukherjee, S., Rubenstein, B., and Crawford, L. (2022), “A Topological Data Analytic Approach for Discovering Biophysical Signatures in Protein Dynamics,” PLoS Computational Biology, 18, e1010045. DOI: 10.1371/journal.pcbi.1010045.
- Turner, K., Mukherjee, S., and Boyer, D. M. (2014), “Persistent Homology Transform for Modeling Shapes and Surfaces,” Information and Inference: A Journal of the IMA, 3, 310–344. DOI: 10.1093/imaiai/iau011.
- van den Dries, L. (1998), Tame Topology and o-Minimal Structures (Vol. 248), Cambridge: Cambridge University Press.
- Wang, B., Sudijono, T., Kirveslahti, H., Gao, T., Boyer, D. M., Mukherjee, S., and Crawford, L. (2021), “A Statistical Pipeline for Identifying Physical Features that Differentiate Classes of 3D Shapes,” The Annals of Applied Statistics, 15, 638–661. DOI: 10.1214/20-AOAS1430.
- Zhang, J.-T. (2013), Analysis of Variance for Functional Data, Boca Raton, FL: CRC Press.
- Zhang, J.-T., and Chen, J. (2007), “Statistical Inferences for Functional Data,” The Annals of Statistics, 35, 1052–1079. DOI: 10.1214/009053606000001505.