Statistical Analysis on Manifolds: A Nonparametric Approach for Inference on Shape Spaces

Size: px

Start display at page:

Download "Statistical Analysis on Manifolds: A Nonparametric Approach for Inference on Shape Spaces"

Silvester Hutchinson
5 years ago
Views:

1 Statistical Analysis on Manifolds: A Nonparametric Approach for Inference on Shape Spaces Abhishek Bhattacharya Department of Statistical Science, Duke University Abstract. This article concerns nonparametric statistics on manifolds with special emphasis on landmark based shape spaces in which a k-ad, i.e., a set of k points or landmarks on an object or a scene, is observed in D or 3D, for purposes of identification, discrimination, or diagnostics. Two different notions of shape are considered: reflection shape invariant under all translations, scaling and orthogonal transformations, and affine shape invariant under all affine transformations. A computation of the extrinsic mean reflection shape, which has remained unresolved in earlier works, is given in arbitrary dimensions, enabling one to extend nonparametric inference on Kendall type shape manifolds from D to higher dimensions. For both reflection and affine shapes, two sample test statistics are constructed based on appropriate choice of orthonormal frames on the tangent bundle and computations of differentials of projection maps with respect to them at the sample extrinsic mean. The samples in consideration can be either independent or be the outcome of a matched pair experiment. Examples are included to illustrate the theory. Contents. Introduction. Fréchet Mean and Variation on Metric Spaces 4 3. Extrinsic Analysis on Manifolds 5 4. Geometry of Shape Manifolds 3 5. Kendall s (Direct Similarity) Shape Spaces Σ k m 4 6. Reflection (Similarity) Shape Spaces RΣ k m 5 7. Affine Shape Spaces AΣ k m 5 8. Applications to Shape data 8 Acknowledgment 33 References Mathematics Subject Classification. Primary 6H35; Secondary 6G0, 6F40. Key words and phrases. manifold, equivariant embedding, shape space of k-ads, Fréchet function, extrinsic mean and variation, nonparametric analysis. This research is partially supported by NSF Grant DMS

2 ABHISHEK BHATTACHARYA. Introduction Statistical analysis of a probability measure Q on a differentiable manifold M has diverse applications in directional and axial statistics, morphometrics, medical diagnostics and machine vision. In this article, we are concerned with the analysis of shapes of landmark based data, in which each observation consists of k > m points in m-dimension called landmarks which represent k locations on an object. The configuration of k landmarks is called a k-ad. The choice of landmarks is generally made with expert help in the particular field of application. Depending on the way the data are collected or recorded, the appropriate shape of a k-ad is its orbit under a group of transformations. For example, one may look at k-ads modulo size and Euclidean rigid body motions of translation and rotation. The analysis of shapes under this invariance was pioneered by Kendall (977, 984) and Bookstein (978). Bookstein s approach is primarily registration-based requiring two or three landmarks to be brought into a standard position by translation, rotation and scaling of the k-ad. For these shapes, we would prefer Kendall s more invariant view of a shape identified with the orbit under rotation (in m-dimension) of the k-ad centered at the origin and scaled to have unit size. The resulting shape spaces are called similarity shape spaces and denoted by Σ k m. A fairly comprehensive account of parametric inference on these spaces, with many references to the literature, may be found in Dryden and Mardia (998). Once we consider the orbits under all orthogonal transformations and scaling, we get the reflection shape spaces RΣ k m. It is possible to embed RΣ k m into an Euclidean space and carry out extrinsic analysis. Such an embedding was first considered by Bandulasiri and Patrangenaru (005) and later independently by Dryden et al. (008). However the correct computation of the extrinsic mean reflection shape seems to have eluded earlier authors. It is computed in Corollary 6.3 in Section 6. This is one of the major results of this article. Recently there has been much emphasis on the statistical analysis of other notions of shapes of k-ads, namely, affine shapes invariant under affine transformations, and projective shapes invariant under projective transformations. Reconstruction of a scene from two (or more) aerial photographs taken from a plane is one of the research problems in affine shape analysis. Potential applications of projective shape analysis include face recognition and robotics-for robots to visually recognize a scene (Mardia and Patrangenaru (005)). In this article, we will mainly focus on the reflection and affine shape spaces. We define the notions of extrinsic means and variations of probability distributions on general manifolds and compute them for the shape spaces. We develop nonparametric two sample tests to distinguish between two distributions by comparing the sample extrinsic means and variations. The nonparametric methodology pursued here, along with the geometric and other mathematical issues that accompany it, stems from the earlier work of Bhattacharya and Patrangenaru(00, 003, 005) and Bhattacharya and Bhattacharya (008a, 008b, 008c). Examples of analysis with real data in Bhattacharya and Bhattacharya (008a) suggest that appropriate nonparametric methods detect differences in shape distributions better than their

3 STATISTICAL ANALYSIS ON MANIFOLDS 3 parametric counterparts in the literature, for distributions that occur in applications. The article is organized as follows. Section introduces the notions of Fréchet mean and variation of a probability distribution Q on a metric space (M,ρ). Section 3 outlines the theory of extrinsic analysis on general manifolds where ρ is the distance inherited by the manifold from an embedding J into some Euclidean space E N. The image of M under J is a differentiable submanifold of E N. A H-equivariant embedding (see Definition 3.) with a relatively large group H preserves a corresponding group of symmetries of the manifold M and is therefore preferred in our analysis. The image of the Fréchet mean under the embedding J is the projection of the Euclidean mean of the push forward Q J of Q on to J(M) (see Proposition 3.). This makes the Fréchet mean or extrinsic mean easy to compute in a number of important examples. In Section 3., we deduce the asymptotic distribution of the sample extrinsic mean. By the delta method, we linearly approximate the projection map by its differential into the tangent space of J(M) at the embedding of the extrinsic mean of Q. With suitable choice of orthonormal basis for the tangent space, we derive coordinates for the difference between the embeddings of the sample extrinsic mean and the extrinsic mean of Q which have asymptotically Gaussian distribution. This is used to construct confidence region for the extrinsic mean of Q, both by an asymptotic chi-squared statistic and pivotal bootstrap methods. In Section 3., we deduce the asymptotic distribution of the sample extrinsic variation and construct confidence intervals for the extrinsic variation of Q. The asymptotic theory is used in Section 3.3 to carry out two sample tests to identify differences between two probability distributions on the manifold. Appropriate tests are constructed for independent as well as matched pair samples. Matched pair samples arise, when, for example, we have two set of observations from the same subject (see Section 8.). Hence the paired sample can be viewed as one sample in the product manifold M M. To do inference on the marginals of a probability distribution on M M, we apply the methods of Sections 3. and 3. to the tangent space of J(M) J(M) at (J(µ E ),J(µ E )) where µ ie, i =,, are the extrinsic means of marginal distributions. Section 4 provides a brief expository description of the geometries of the manifolds that arise in shape analysis. Section 5 outlines the geometry of the planar similarity shape spaces and the general similarity shape spaces. The notions of a k-ad and how to represent its shape by an orbit under a group of transformations are introduced. In Section 6, we derive an expression for the extrinsic mean on the reflection shape spaces under an equivariant embedding (see Corollary 6.3). We derive expressions for the tangent and normal spaces to the embedded submanifold through Proposition 6.. We construct suitable orthonormal frames for the tangent space in Section 6. and use that to get asymptotic coordinates for the sample extrinsic mean. We require perturbation theory arguments for eigen values and eigen vectors to prove that the projection map of Section 3. is well defined and smooth. The methods of Section 3.3 are applied in Section 6. to carry out nonparametric inference on the reflection shape spaces. In Section 7, analogous results are obtained for the affine shape spaces.

4 4 ABHISHEK BHATTACHARYA Finally, Section 8 illustrates the theory with two applications to real data. The data considered in Section 8. is a matched pair sample of 3D reflection shapes. We compare the extrinsic mean shapes and extrinsic variations in shape of the marginals by appropriate two sample tests. In the example in Section 8., we have an independent random sample of D affine shapes. After removing some outliers, we construct confidence regions for the mean shape and confidence intervals for the variation in shape. When there are too many landmarks on the sample k-ads, making the dimension of the shape space close to the sample size, it becomes difficult to carry out inference on the mean shapes using pivotal bootstrap methods. That is because, in many simulations, the bootstrap covariance matrix is singular or close to being singular. Then one may compare only the first few principal scores of the coordinates of the sample extrinsic means, or use a nonpivotal bootstrap statistic, where one replaces the bootstrap covariance matrix by the sample covariance matrix. We try both these approaches for the examples in Section 8. Our analysis shows that the results obtained through appropriate bootstrap methods are consistent with those obtained by chi-squared approximation. [Note: Henceforth, BP (...) stands for Bhattacharya and Patrangenaru (...) and BB (...) stands for Bhattacharya and Bhattacharya (...).]. Fréchet Mean and Variation on Metric Spaces Let (M,ρ) be a metric space, ρ being a distance metrizing the topology of M. For a given probability distribution Q on (the Borel sigmafield of) M, define the Fréchet function of Q as (.) F(p) = ρ (p,x)q(dx), p M. M Now we define the Fréchet mean and Fréchet variation of Q. A general notion of a mean of a probability distribution on a metric space was first defined by Fréchet (948). The concept of variation was introduced in BP (00) where it has been referred to as the total variance. Definition.. Suppose F(p) < for some p M. Then the set of all p for which F(p) is the minimum value of F on M is called the Fréchet mean set of Q, denoted by C Q. If this set is a singleton, say {µ F }, then µ F is called the Fréchet mean of Q. The minimum value of F on M is called the Fréchet variation of Q and denoted by V. If X,X,...,X n are independent and identically distributed (iid) M-valued random variables defined on some probability space. (Ω, F,P) with common distribution Q, and Q n = n n δ X j is the corresponding empirical distribution, then the Fréchet mean set of Q n is called the sample Fréchet mean set, denoted by C Qn. The Fréchet variation of Q n is called the sample Fréchet variation and denoted by V n. Proposition. shows that under mild assumptions, the minimum value of F on M is attained, thereby proving that the Fréchet mean set is nonempty, as proved in Theorem., BP (003).

5 STATISTICAL ANALYSIS ON MANIFOLDS 5 Proposition.. Suppose every closed and bounded subset of M is compact. If the Fréchet function F of Q is finite for some p M, then C Q is nonempty and compact. Proposition. below proves the strong consistency of the sample Fréchet mean as an estimator of the Fréchet mean of Q. Here we define the sample Fréchet mean as a measurable selection from the sample Fréchet mean set C Qn. Proposition.. Suppose every closed and bounded subset of M is compact and the Fréchet function of Q is finite. Then given any ǫ > 0, there exists an integer-valued random variable N = N(ω,ǫ) and a P-null set A(ω,ǫ) such that (.) C Qn C ǫ Q {p M : ρ(p,c Q ) < ǫ}, n N outside of A(ω,ǫ). In particular, if C Q = {µ F }, then the sample Fréchet mean µ Fn (any measurable selection from C Qn ) is a strongly consistent estimator of µ F. Proof. See Theorem.3, BP (003). From Proposition. it follows that if the Fréchet function F(p) is finite for some p, then the Fréchet variation V is finite and equals F(p) for all p in the Fréchet mean set C Q. The sample Fréchet variation V n is the value of F n on the sample Fréchet mean set C Qn. Proposition.3 establishes the strong consistency of V n as an estimator of V. For a proof, see BB (008a). Proposition.3. Suppose every closed and bounded subset of M is compact, and F is finite on M. Then V n is a strongly consistent estimator of V. Remark.. It is known that a Riemannian manifold M which is complete (in its geodesic distance) satisfies the topological hypothesis of Propositions.,. and.3: every closed and bounded subset of M is compact (see Theorem.8, Do Carmo (99), pp ). The affine shape spaces considered in Section 7 are compact Riemannian manifolds and hence complete. When a manifold is not complete (similarity and reflection shape spaces, for example), we deduce other ways to establish consistency of the sample Fréchet mean and variation (e.g. Section 6, Remark 6.). Remark.. Proposition. requires the Fréchet mean of Q to exist for the sample Fréchet mean to be a consistent estimator. However the sample Fréchet variation is a consistent estimator of the Fréchet variation of Q even when the Fréchet function F does not have a unique minimizer. We will investigate sufficient conditions for the existence of the Fréchet mean on shape spaces in the subsequent sections. 3. Extrinsic Analysis on Manifolds From now on, we assume that M is a differentiable manifold of dimension d. To carry out nonparametric inference on M, one may use the Fréchet mean and variation to identify a probability distribution. Using the sample Fréchet mean and variation from a random sample on M, we can construct confidence regions for the population parameters, or given two such samples, we can distinguish between the underlying probability distributions by comparing the sample means and variations. The natural approach for nonparametric inference on a Riemannian manifold would be to use the geodesic distance in the definition of Fréchet function in (.) and

6 6 ABHISHEK BHATTACHARYA derive expressions for Fréchet mean and variation. However it is simpler both mathematically and computationally to carry out an extrinsic analysis on M, by embedding it into some Euclidean space E N R N via some map J : M E N such that both J and its derivative are injective, and for which J(M) has the induced topology from E N. Then J induces the metric (3.) ρ(x,y) = J(x) J(y) on M, where. denotes Euclidean norm ( u = N i= u i u = (u,u,..., u N )). This is called the extrinsic distance on M. Among the possible embeddings, one seeks out equivariant embeddings which preserve many of the geometric features of M. Definition 3.. For a Lie group H acting on a manifold M, an embedding J : M R N is H-equivariant if there exists a group homomorphism φ : H GL(N, R) such that (3.) J(hp) = φ(h)j(p) p M, h H. Here GL(N, R) is the general linear group of all N N non-singular matrices. For all our applications, H is compact. In case J(M) = M is a closed subset of E N, for every u E N there exists a compact set of points in M whose distance from u is the smallest among all points in M. We define this set to be the set of projections of u on M and denote it by (3.3) P M(u) = {x M : x u y u y M}. If this set is a singleton, u is said to be a nonfocal point of E N (w.r.t. M), otherwise it is said to be a focal point of E N. Definition 3. below defines the extrinsic mean and variation of a probability distribution Q corresponding to the embedding J. The notion of extrinsic mean on a manifold was introduced independently by Hendricks and Landsman (998) and Patrangenaru (998), and later considered in detail in BP (003, 005). Definition 3.. Let (M,ρ),J be as above. Let Q be a probability measure on M such that the Fréchet function (3.4) F(x) = ρ (x,y)q(dy) is finite. The Fréchet mean set of Q is called the extrinsic mean set of Q and the Fréchet variation of Q is called the extrinsic variation of Q. If X i, i =,...,n are iid observations from Q and Q n = n n i= δ X i, then the Fréchet mean set of Q n is called the sample extrinsic mean set and the Fréchet variation of Q is called the sample extrinsic variation. We define the sample extrinsic mean µ ne to be a measurable selection from the sample extrinsic mean set. We say that Q has an extrinsic mean µ E if the extrinsic mean set of Q is a singleton. Proposition 3. below gives a necessary and sufficient condition for that to hold. Proposition 3.. Let Q = Q J be the image of Q in E N. (a) If µ = u Q(du) is the mean of Q, then the extrinsic mean set of Q is given by J (P E M( µ)). N

7 STATISTICAL ANALYSIS ON MANIFOLDS 7 (b) The extrinsic variation of Q equals (3.5) V = x µ Q(dx) + µ µ E N where µ P M( µ). (c) If µ is a nonfocal point of E N, then the extrinsic mean of Q exists (as a unique minimizer of F). Proof. See Proposition 3., BP (003). 3.. Asymptotic Distribution of the Sample Extrinsic Mean. In this section, we assume that the extrinsic mean µ E of Q is uniquely defined. Then the mean µ of Q is a nonfocal point of E N and hence the projection set in (3.3) is a well defined map in a neighborhood of µ. Let us call that P (P : E N M). Also in a neighborhood of a nonfocal point such as µ, P(.) is smooth. Let Ȳ = n n Y j be the sample mean of Y j = J(X j ), j =,,...,n. Since Ȳ converges to µ almost surely, for sample size large enough, Ȳ is nonfocal and it can be shown that (3.6) n[p( Ȳ ) P( µ)] = n(d µ P)(Ȳ µ) + o P() where d µ P is the differential (map) of the projection P(.), which takes vectors in the tangent space of E N at µ to tangent vectors of M at P( µ) (see BP(005)). Since n(ȳ µ) has an asymptotic Gaussian distribution, and d µp is a linear map, from (3.6) it follows that n[p(ȳ ) P( µ)] has an asymptotic mean zero Gaussian distribution on the tangent space of J(M) at P( µ). Hence if we denote by T j, the coordinates of (d µ P)(Y j µ), j =,,...,n with respect to some orthonormal basis for T P( µ) M, then (3.7) n T L N(0,Σ) where Σ denotes the covariance matrix of T. Let L µ : E N T P( µ) M denote the linear projection on to T P( µ) M. Then from (3.6) and (3.7), it follows that (3.8) n(l µ [P(Ȳ ) P( µ)]) Σ L µ [P(Ȳ ) P( µ)] L X d. Using (3.8), we can construct the following confidence region for µ E : (3.9) {µ E = J [P( µ)] : n(l[p( µ) P(Ȳ )]) ˆΣ L[P( µ) P(Ȳ )] X d ( α)} with asymptotic confidence level ( α). Here L : E N T P( Ȳ ) M denotes the linear projection on to T M, ˆΣ P( Ȳ ) is the sample estimate of Σ and Xd ( α) is the ( α)-quantile of the chi-squared distribution with d degrees of freedom (Xd ). The corresponding pivotal bootstrap confidence region for µ E is given by (3.0) {µ E : n(l[p( µ) P(Ȳ )]) ˆΣ L[P( µ) P(Ȳ )] c ( α)}. Here c ( α) denotes the ( α)-quantile of the bootstrap distribution of n(l [P(Ȳ ) P(Ȳ )]) Σ L [P(Ȳ ) P(Ȳ )], Y j is the bootstrap resample from Y j, j =,...,n, Ȳ and Σ are the bootsrap analogues of Ȳ and ˆΣ respectively, and L denotes the linear projection into T P( Ȳ ) M. In the example from Section 8., bootstrap methods yield much smaller confidence region for µ E than by chi-squared approximation.

8 8 ABHISHEK BHATTACHARYA 3.. Asymptotic Distribution of the Sample Extrinsic Variation. Let V and V n denote the extrinsic variation of Q and Q n respectively. Let ρ be the extrinsic distance of (3.). Proposition 3.. If Q has extrinsic mean µ E and if Eρ 4 (X,µ E ) <, then L (3.) n(vn V ) N(0,Var(ρ (X,µ E ))) (3.) Proof. From definition of V n and V, it follows that V n V = n ρ (X j,µ ne ) ρ (x,µ E )Q(dx) n = n + n n ρ (X j,µ ne ) n M n ρ (X j,µ E ) n ρ (X j,µ E ) E [ ρ (X,µ E ) ] where µ ne is the sample extrinsic mean, i.e. some measurable selection from the sample extrinsic mean set. Note that n ρ (X j,µ ne ) = n Y j P(Ȳ n n ) (3.3) = n n Y j P( µ) + P( µ) P(Ȳ ) Ȳ P( µ),p(ȳ ) P( µ) Substitute (3.3) in (3.) to get that (3.4) which implies that (3.5) where (3.6) (3.7) V n V = P(Ȳ ) P( µ) Ȳ P( µ),p(ȳ ) P( µ) + n ρ (X j,µ E ) E [ ρ (X,µ E ) ] n n(vn V ) = T + T T = n P(Ȳ ) P( µ) n Ȳ P( µ),p(ȳ ) P( µ) and T = n n ρ (X j,µ E ) E [ ρ (X,µ E ) ]. n From the classical CLT, it follows that if Eρ 4 (X,µ E ) <, then (3.8) T L N(0,Var[ρ (X,µ E )]). Compare the expression of T with (3.6) to get that (3.9) T = d µ P(Ȳ µ), µ P( µ) + o P(). From the definition of P, it follows that P( µ) = argmin p M µ p. Hence the Euclidean derivative of µ p at p = P( µ) must be orthogonal to T P( µ) M, or µ P( µ) (T P( µ) M).

9 STATISTICAL ANALYSIS ON MANIFOLDS 9 Since d µ P(Ȳ µ) T P( µ) M, the first term in the expression of T in (3.9) is 0, and hence T = o P (). From (3.5) and (3.8), we conclude that (3.0) (3.) n(vn V ) = n { ρ (X j,µ E ) E [ ρ (X,µ E ) ]} + o P () n This completes the proof. L N(0,Var [ ρ (X,µ E ) ] ). Remark 3.. Although Proposition.3 does not require the uniqueness of the extrinsic mean of Q for V n to be a consistent estimator of V, Proposition 3. breaks down in the case of non-uniqueness. (see BB (008a)). V : Using Proposition 3., we can construct the following confidence interval I for (3.) I = {V : V [V n s n Z( α ),V n + s n Z( α )]}. The interval I has asymptotic confidence level of ( α). Here s is the sample variance of ρ (X j,µ ne ), j =,...,n and Z( α ) denotes the ( α )-quantile of N(0, ) distribution. From (3.), we can also construct a pivotal bootstrap confidence interval for V, the details of which are left to the reader Two Sample Tests. In this section, we will use the asymptotic distribution of the sample extrinsic mean and variation to construct nonparametric tests to compare two probability distributions Q and Q on M Independent Samples. Let X,...,X n and Y,...,Y n be two iid samples from Q and Q respectively that are mutually independent. Let µ ie and V i denote the extrinsic means and variations of Q i, i =, respectively. Similarly denote by ˆµ ie and ˆV i the sample extrinsic means and variations. We want to test the hypothesis H 0 : Q = Q. We start by comparing the sample extrinsic means. Let X j = J(X j ), Ỹ j = J(Y j ) be the embeddings of the sample points into E N. Let µ i be the mean of Q i = Q i J, i =,. Then under H 0, µ = µ = µ (say). Let ˆµ i, i =, be the sample means of { X j } and {Ỹj} respectively. Then from (3.6), it follows that (3.3) ni [P(ˆµ i ) P(µ)] = n i (d µ P)(ˆµ i µ) + o P (), i =,. n Hence, if n i such that i n +n p i, 0 < p i <, p + p =, then n(p(ˆµ ) P(ˆµ )) = nd µ P(ˆµ µ) nd µ P(ˆµ µ) + o P () (3.4) L N(0, Σ + Σ ). p p Here n = n +n is the pooled sample size, Σ i, i =,, are the covariance matrices of the coordinates of d µ P( X µ) and d µ P(Ỹ µ) with respect to some chosen basis for T P(µ) M. We estimate µ by the pooled sample mean ˆµ = n (n ˆµ + n ˆµ ). Denote the coordinates of {dˆµ P( X j ˆµ)} n and {dˆµp(ỹj ˆµ)} n by {S j }n

10 0 ABHISHEK BHATTACHARYA and {Sj }n respectively. Let ˆΣ i i =,, denote the sample covariance matrices of {Sj i}ni i =,, respectively. Then if H 0 is true, the statistic (3.5) T = ( S S ) ( n ˆΣ + n ˆΣ) ( S S ) converges in distribution to X distribution with d degrees of freedom, where d is the dimension of M. Hence we reject H 0 at asymptotic level α if T > Xd ( α). Alternatively, under the null hypothesis H 0 : µ E = µ E, from (3.6), it follows that (3.6) n[p(ˆµ ) P(ˆµ )] = nd µ P(ˆµ µ ) nd µ P(ˆµ µ ) + o P () which implies that, for µ M, (3.7) L µ [ n{p(ˆµ ) P(ˆµ )}] = L µ [ nd µ P(ˆµ µ )] L µ [ nd µ P(ˆµ µ )] + o P () L N(0, p L µ Σ L µ + p L µ Σ L µ). In (3.7), L µ denotes the linear projection from E N into T µ M identified with R d. Similarly L i µ, i =,, denote the linear projections from T P(µi) M into T µ M or their associated matrices with respect to some chosen bases for the tangent spaces. For sake of simplicity we use the same notations. Finally Σ i, i =,, are the covariance matrices of the coordinates of d µ P( X µ ) and d µ P(Ỹ µ ) respectively. Note that µ can be any point on M for (3.7) to hold. Using this one can construct the test statistic (3.8) T = (L[P(ˆµ ) P(ˆµ )]) ( n L ˆΣ L + n L ˆΣ L ) L[P(ˆµ ) P(ˆµ )] to test if H 0 is true. In the statistic T, L is the linear projection from E N into T p M, where p M. Li, i =,, are the matrices of linear projection from T M P(ˆµi) into T p M. The tangent spaces Tp M and M, TP(ˆµi) i =,, are identified with R d with respect to convenient basis frames. Again, p can be any point on M, but tangent space analysis is expected to provide better approximation to the asymptotic limit if we choose p = P(ˆµ), ˆµ being the pooled sample mean. Finally ˆΣ i, i =,, denote the sample covariance matrices of the coordinates of {dˆµ P( X j ˆµ )} n and {dˆµ P(Ỹj ˆµ )} n respectively with respect to the chosen basis for T M P(ˆµ ) and T M. L P(ˆµ) Under H 0, T X d. Hence we reject H 0 at asymptotic level α if T > Xd ( α). When the sample sizes are not too large, it is more efficient to construct a bootstrap confidence region for P(µ ) P(µ ) using the test statistic T in (3.8), and use that to test if H 0 is true. Let {Xj,j =,...,n } and {Yj,j =,...,n } denote the bootstrap resamples from the original samples in M. Denote by µ i, i =,, the bootstrap sample means, and by µ, the pooled sample mean. Let L i, i =,, be the matrices of linear projections from T M P(µ i ) into T M, P(µ ) L be the linear projection from E N into T P(µ ) M, with the tangent spaces identified with

11 STATISTICAL ANALYSIS ON MANIFOLDS R d, and Σ i, i =,, be the bootstrap sample covariance matrices of the coordinates of {d µ P(Xj µ )} and {d µ P(Yj µ )} respectively. Then the bootstrap version of T in (3.8) is (3.9) T = v Σ v where v = L [{P(µ ) P(ˆµ )} {P(µ ) P(ˆµ )}] and Σ = n L ˆΣ L + n L ˆΣ L. We reject H 0 at level α if T > c ( α), where c ( α) is the ( α)-quantile of the bootstrap distribution of T. Next we test if Q and Q have the same extrinsic variations, i.e. H 0 : V = V. From Proposition 3. and the fact that the samples are independent, we get that, under H 0, ( ) n(ˆv ˆV L ) N 0, σ + σ (3.30) p p (3.3) ˆV ˆV s n + s n L N(0,) where σ = Var[ρ (X,µ E )], σ = Var[ρ (Y,µ E )] and s, s are their sample estimates. Hence to test if H 0 is true, we can use the test statistic (3.3) T 3 = ˆV ˆV. s n + s n For a test of asymptotic size α, we reject H 0 if T 3 > Z( α ). We can also construct a bootstrap confidence interval for V V and use that to test if V V = 0. The details of that are left to the reader Matched Paired Samples. Next consider the case when (X,Y ),..., (X n,y n ) is an iid sample from some distribution Q on M = M M. Such samples arise, when for example two different treatments are applied to each subject in the sample (see Section 8.). Let X j s have distribution Q while Y j s come from some distribution Q on M. Our objective is to distinguish Q and Q by comparing the sample extrinsic means and variations. Since the X and Y samples are not independent, we cannot apply the methods of the earlier section. Instead we do our analysis on M. Note that M is a differentiable manifold which can be embedded into E N E N via the map J : M E N E N, J(x,y) = (J(x),J(y)). Let Q = Q J. Then if Q i has mean µ i, i =,, then Q has mean µ = (µ,µ ). The projection of µ on to M. = J( M) is given by P(µ) = (P(µ ),P(µ )). Hence if Q i has extrinsic mean µ ie, i =,, then Q has extrinsic mean µ E = (µ E,µ E ). Denote the paired sample as Z j (X j,y j ), j =,...,n and let ˆ µ = (ˆµ, ˆµ ), ˆ µ E = (ˆµ E, ˆµ E ) be the sample estimates of µ and µ E respectively. From (3.6), it

12 ABHISHEK BHATTACHARYA follows that n( P(ˆ µ) P( µ)) = nd µ P(ˆ µ µ) + o P () which can be written as ( ) P(ˆµ ) P(µ (3.33) n ) P(ˆµ ) P(µ ) = ( dµ P(ˆµ n µ ) d µ P(ˆµ µ ) ) + o P () Hence if H 0 : µ = µ = µ, then under H 0, ( ) P(ˆµ ) P(µ) n = ( ) dµ P(ˆµ n µ) + o P(ˆµ ) P(µ) d µ P(ˆµ µ) P () ( ) L Σ Σ (3.34) N(0,Σ = Σ Σ ). In (3.34), Σ i, i =, are the same as in (3.4) and Σ = (Σ ) is the covariance between the coordinates of d µ P( X µ) and d µ P(Ỹ µ). From (3.34), it follows that (3.35) ndµ P(ˆµ ˆµ ) This gives rise to the test statistic L N(0,Σ + Σ Σ Σ ). (3.36) T p = n( S S ) (ˆΣ + ˆΣ ˆΣ ˆΣ ) ( S S ) where S, S, ˆΣ and ˆΣ are as in (3.5) and ˆΣ = (ˆΣ ) is the sample covariance between {Sj }n and {S j }n. If H 0 is true, T p converges in distribution to Xd distribution. Hence we reject H 0 at asymptotic level α if T p > Xd ( α). If the null hypothesis were H 0 : µ E = µ E, then from (3.33), it follows that under H 0, (3.37) n[p(ˆµ ) P(ˆµ )] = nd µ P(ˆµ µ ) nd µ P(ˆµ µ ) + o P () which implies that, for any µ M, (3.38) L µ [ n{p(ˆµ ) P(ˆµ )}] = L µ [ nd µ P(ˆµ µ )] L µ [ nd µ P(ˆµ µ )] + o P () L N(0,Σ) where Σ = L µ Σ L µ + L µ Σ L µ L µ Σ L µ L µ Σ L µ. In (3.38), L µ, L i µ and Σ i, i =, are the same as in (3.7), and Σ = Σ denotes the covariance between the coordinates of d µ P( X µ ) and d µ P(Ỹ µ ). Hence to test if H 0 is true, one can use the test statistic (3.39) (3.40) T p = nl[p(ˆµ ) P(ˆµ )] ˆΣ L[P(ˆµ ) P(ˆµ )] where ˆΣ = L ˆΣ L + L ˆΣ L L ˆΣ L L ˆΣ L. In the statistic T p, L, L i and ˆΣ i, i =, are as in (3.8) and ˆΣ = (ˆΣ ) denotes the sample covariance between the coordinates of {dˆµ P( X j ˆµ )} n and {dˆµ P(Ỹj ˆµ )} n. Under H L 0, T p X d. Hence we reject H 0 at asymptotic level α if T p > Xd ( α). In our example in Section 8., the two statistics T p and T p yield values which are quite close to each other.

13 STATISTICAL ANALYSIS ON MANIFOLDS 3 One can also find a bootstrap confidence region for P(µ ) P(µ ) as in Section 3.3. and use that to test if H 0 is true. The details are left to the reader. Let V and V denote the extrinsic variations of Q and Q and let ˆV, ˆV be their sample analogues. Suppose we want to test the hypothesis, H 0 : V = V. From (3.0), we get that (3.4) ( n(ˆv V ) n(ˆv V ) (3.4) ) = ( n [ρ (X j,µ E ) Eρ (X,µ E )] n n [ρ (Y j,µ E ) Eρ (Y,µ E )] ( ( )) L σ N 0, σ σ σ ) + o P () where σ = Cov(ρ (X,µ E ),ρ (Y,µ E )), σ and σ are as in (3.30). Hence if H 0 is true, (3.43) n(ˆv ˆV L ) N(0,σ + σ σ ). This gives rise to the test statistic, n(ˆv (3.44) T 3p = ˆV ) s + s s where s,s,s are sample estimates of σ,σ,σ respectively. We reject H 0 at asymptotic level α if T 3p > Z( α ). We can also get a ( α) level confidence interval for V V using bootstrap resamples and use that to test if H 0 is true. 4. Geometry of Shape Manifolds Many differentiable manifolds M naturally occur as submanifolds, or surfaces or hypersurfaces, of an Euclidean space. One example of this is the sphere S d = {p R d+ : p = }. The shape spaces of interest here are not of this type. They are generally quotients of a Riemannian manifold N under the action of a transformation group G, i.e., M = N/G. A number of them are quotient spaces of N = S d under the action of a compact group G, i.e., the elements of the space are orbits in S d traced out by the application of G. Among important examples of this kind are the Kendall s shape spaces and reflection shape spaces. In some cases the action of the group is free, i.e., gp = p only holds for the identity element g = e. Then the elements of the orbit O p = {gp: g G} are in one-one correspondence with elements of G, and one can identify the orbit with the group. The orbit inherits the differential structure of the Lie group G. The tangent space T p N at a point p may then be decomposed into a vertical subspace V p of dimension that of the group G along the orbit space to which p belongs, and a horizontal subspace H p which is orthogonal to it. The vertical subspace is isomorphic to the tangent space of G and the horizontal one can be identified with the tangent space of M at the orbit O p. With this identification, M is a differentiable manifold of dimension that of N minus the dimension of G. For carrying out an extrinsic analysis on M, we use a smooth map J from N into some Euclidean space E which is an embedding of M into that Euclidean

14 4 ABHISHEK BHATTACHARYA space. Then the image J(M) is a differentiable submanifold of E. If π denotes the projection map, π : N M, π(p) = O p, then the tangent space of J(M) at J(π(p)) is dj(h p ) where dj denotes the differential of the map J : N E. Among all possible embeddings, we choose J to be equivariant under the action of a large group H on M. In most cases, H is compact. 5. Kendall s (Direct Similarity) Shape Spaces Σ k m Kendall s shape spaces are quotient spaces S d /G, under the action of the special orthogonal group G = SO(m) of m m orthogonal matrices with determinant +. Important cases include m =, 3. For the case m =, consider the space of all planar k-ads (z,z,...,z k ) (z j = (x j,y j )), k >, excluding those with k identical points. The set of all centered and normed k-ads, say u = (u,u,...,u k ) comprise a unit sphere in a (k )-dimensional vector space and is, therefore, a (k 3)-dimensional sphere S k 3, called the preshape sphere. The group G = SO() acts on the sphere by rotating each landmark by the same angle. The orbit under G of a point u in the preshape sphere can thus be seen to be the circle S, so that Kendall s planar shape space Σ k can be viewed as the quotient space S k 3 /G S k 3 /S, a (k 4)-dimensional compact manifold. An algebraically simpler representation of Σ k is given by the complex projective space CP k. For nonparametric extrinsic analysis on Σ k, see BP (003, 005), BB (008a). For many applications in archeology, astronomy, morphometrics, medical diagnosis, etc., see Bookstein (986, 997), Kendall (989), Dryden and Mardia (998), BP (003, 005), BB (008a, 008b, 008c) and Small (996). When m >, consider a set of k points in R m, not all points being the same. Such a set is called a k-ad or a configuration of k landmarks. We will denote a k-ad by the m k matrix, x = (x,...,x k ) where x i, i =,...,k are the k landmarks from the object of interest. Assume k > m. The direct similarity shape of this k-ad is what remains after we remove the effects of translation, rotation and scaling. To remove translation, we substract the mean x = k k i= x i from each landmark to get the centered k-ad u = (x x,...,x k x). We remove the effect of scaling by dividing u by its euclidean norm to get (5.) z = ( x x u,..., x k x u ) = (z,z,...,z k ). This z is called the preshape of the k-ad x and it lies in the unit sphere S k m in the hyperplane H k m = {z R m k : z k = 0}. Hence (5.) S k m = {z R m k : Trace(zz ) =, z k = 0} Here k denotes the k vector of all ones. Thus the preshape sphere Sm k may be identified with the sphere S km m. Then the shape of the k-ad x is the orbit of z under left multiplication by m m rotation matrices. In other words Σ k m = Sm/SO(m). k One can also remove the effect of translation from the original k-ad x by postmultiplying the centered k-ad u by a Helmert matrix H which is a k (k ) matrix satisfying H H = I k and kh = 0. The resulting k-ad ũ = uh

15 STATISTICAL ANALYSIS ON MANIFOLDS 5 lies in R m (k ). Then the preshape of x is z = ũ/ ũ and the preshape sphere can be represented as (5.3) S k m = {z R m (k ) : Trace(zz ) = } The advantage of using this representation of S k m is that there is no linear constraint on the coordinates of z and hence analysis becomes simpler. However, now the choice of the preshape depends on the choice of H which can vary. In most cases, including applications, we will represent the preshape of x as in (5.) and the preshape sphere by (5.). For m >, the direct similarity shape space Σ k m fails to be a manifold. That is because the action of SO(m) is not in general free. Indeed, the orbits of preshapes under SO(m) have different dimensions in different regions (see, e.g., Kendall et al. (999) and Small (996)). To avoid that, one may consider the shape of only those k-ads whose preshapes have rank at least m. This subset is a manifold but is not complete (in its geodesic distance). 6. Reflection (Similarity) Shape Spaces RΣ k m Consider now the reflection shape of a k-ad as defined in Section 5, but with SO(m) replaced by the larger orthogonal group O(m) of all m m orthogonal matrices (with determinants either + or -). Then the reflection (similarity) shape of a k-ad x is given by the orbit (6.) σ(x) = σ(z) = {Az: A O(m)} where z is the preshape of x in S k m. For the action of O(m) on S k m to be free and the reflection shape space to be a Riemannian manifold, we consider only those shapes where the columns of z span R m. The set of all such z is called the nonsingular part of S k m and denoted by NS k m. Then the reflection (similarity) shape space is defined as (6.) RΣ k m = {σ(z): z S k m, rank(z) = m} = NS k m/o(m) which is a Riemannian manifold of dimension km m m(m )/. It has been shown that the map (6.3) J : RΣ k m S(k, R), J(σ(z)) = z z is an embedding of the reflection shape space into S(k, R) (see Bandulasiri and Patrangenaru (005), Bandulasiri et al. (007) and Dryden et al. (008)). It is H-equivariant where H = O(k) acts on the right: Aσ(z). = σ(za ), A O(k). Indeed, then J(Aσ(z)) = φ(a)j(σ(z)) where φ : O(k) GL(k, R), φ(a): S(k, R) S(k, R), φ(a)b = ABA. It is easy to show that φ(a) is an isometry and φ is a group homomorphism. Define M k m as the set of all k k positive semi-definite matrices of rank m and trace. Then the image of RΣ k m under the embedding J in (6.3) is (6.4) J(RΣ k m) = {A M k m: A k = 0}. If we represent the preshape sphere S k m as in (5.3), then M k m = J(RΣ k+ m ). Hence M k m is a submanifold (not complete) of S(k, R) of dimension km m(m )/.

16 6 ABHISHEK BHATTACHARYA The main results of this section are Theorems 6., 6.4 and their corollaries. Together with Corollary 6.3, Theorem 6. derives the extrinsic mean of a probability distribution on the reflection shape space under the embedding (6.3). A recent computation of this given in Dryden et al. (008) is incorrect (See Remark 6.). Proposition 6. below identifies the tangent and normal spaces of M k m. The expression for the tangent space has also been derived in Dryden et al. (008). The derivation of the tangent space in the proof of Proposition 6. is different from the one in there and is included here for the sake of readability. by Proposition 6.. Let A M k m. (a) The tangent space of M k m at A is given (6.5) T A M k m = {U ( T S S 0 ) U : T S(m, R), trace(t) = 0} where A = UDU is a singular value decomposition (s.v.d.) of A, U SO(m) and D = diag(λ,...,λ k ), λ... λ k 0. (b) The orthocomplement of the tangent space in S(k, R) or the normal space is given by (6.6) (T A M k m) = {U ( λim 0 0 T ) U : λ R, T S(k m, R)} Proof. Represent the preshape of a (k+) ad x by the m k matrix z where z = Trace(zz ) = and let Sm k+ be the preshape sphere, S k+ m = {z R m k : z = }. Let NSm k+ be the nonsingular part of Sm k+, i.e., NS k+ m = {z S k+ : rank(z) = m}. Then RΣ k+ m = NSm k+ /O(m) and Mm k = J(RΣ k+ m ). The map is an embedding. Hence J : RΣ k+ m m S(k, R), J(σ(z)) = z z = A (6.7) T A M k m = dj σ(z) (T σ(z) RΣ k+ m ). Note that T σ(z) RΣ k+ m can be identified with the horizontal subspace H z of T z Sm k+ which is (6.8) H z = {v R m k : trace(zv ) = 0, zv = vz } (see Kendall et. al. (999)). Consider the map (6.9) J : NS k+ m S(k, R), J(z) = z z. Its derivative is a isomorphism between the horizontal subspace of TNSm k+ and TMm. k The derivative is given by TS k+ m (6.0) d J : TS k+ m S(k, R), d J z (v) = z v + v z. Hence (6.) T A M k m = d J z (H z ) = {z v + v z: v H z }. From the description of H z in (6.8) and using the fact that z has full row rank, it follows that (6.) H z = {zv: v R k k, trace(z zv) = 0,zvz S(m, R)}.

17 STATISTICAL ANALYSIS ON MANIFOLDS 7 From (6.) and (6.), we get that (6.3) T A M k m = {Av + v A: AvA S(k, R), trace(av) = 0}. Let A = UDU be a s.v.d. of A as in the statement of the proposition. Using the fact that A has rank m, (6.3) can be written as (6.4) T A Mm k = {U(Dv + v D)U : DvD S(k, R), trace(dv) = 0} ( ) T S = {U S U : T S(m, R), Trace(T) = 0}. 0 This proves part (a). From the definition of orthocomplement and (6.4), we get that (6.5) (T A Mm) k = {v S(k, R): trace(v w) = 0 w T A Mm} k ( ) λim 0 = {U U : λ R, R S(k m, R)} 0 R where I m is the m m identity matrix. This proves (b) and completes the proof. For a k k positive semi definite matrix µ with rank at least m, its projection into M k m is defined as (6.6) P(µ) = {A Mm: k µ A = argmin µ x } x Mm k if this set is non empty. The following theorem shows that the projection set is nonempty and derives formula for the projection matrices. Theorem 6.. P(µ) is non empty and consists of (6.7) A = (λ j λ + m )U ju j where λ λ... λ k are the ordered eigen values of µ; U,U,...,U k are some corresponding orthonormal eigen vectors and λ = m m λ j. Proof. Let (6.8) f(x) = µ x, x S(k, R). If f has a minimizer A in Mm k then (grad f)(a) T A (Mm) k where grad denotes the Euclidean derivative operator. But (grad f)(a) = (A µ). Hence if A minimizes f, then (6.9) A µ = U A ( λim 0 0 T ) U A where U A = (U A,U A,...,Uk A ) is a k k matrix consisting of an orthonormal basis of eigen vectors of A corresponding to its ordered eigen values λ A λ A... λ A m > 0 =... = 0. From (6.9) it follows that (6.0) µu A j = (λ A j λ)u A j ; j =,,...,m. Hence {λ A j λ}m are eigen values of µ with {UA j }m as corresponding eigen vectors. Since these eigen values are ordered, this implies that there exists a

18 8 ABHISHEK BHATTACHARYA singular value decomposition of µ: µ = k λ ju j U j, and a set of indices S = {i,i,...,i m }, i < i <... < i m k such that (6.) λ A j λ = λ ij and (6.) Uj A = U ij, j =,...,m. Add the equations in (6.) to get λ = m λ where λ j S = λj m. Hence (6.3) A = j S(λ j λ + m )U ju j. Since k λ j =, therefore λ /m and λ j λ + m > 0 j S. Hence A is positive semi definite of rank m. It is easy to check that trace(a)= and hence A Mm. k It can be shown that among the matrices A of the form (6.3), the function f defined in (6.8) is minimized when (6.4) S = {,,...,m}. Define M m k as the set of all k k positive semi-definite matrices of rank m and trace =. This is a compact subset of S(k, R). Hence f restricted to M m k attains a minimum value. Let A 0 be a corresponding minimizer. If rank(a 0 ) < m, say = m, then A 0 minimizes f restricted to Mm k. Mm k is a Riemannian manifold (it is J(RΣ k+ m )). Hence A 0 must have the form m (6.5) A 0 = (λ j λ + )U j U j m m where λ = λj m. But if one defines (6.6) A = (λ j λ + m )U ju j m λj m with λ =, then it is easy to check that f(a) < f(a 0). Hence A 0 cannot be a minimizer of f over M m k, that is, a minimizer must have rank = m. Then it lies in Mm k and from (6.3) and (6.4), it follows that it has the form as in (6.6). This completes the proof. Let Q be a probability distribution on RΣ k m and let µ be the mean of Q Q J in S(k, R). Then µ is positive semi definite of rank at least m and µ k = 0. Theorem 6. can be used to get the formula for the extrinsic mean set of Q. This is obtained in Corollary 6.3. Corollary 6.3. (a) The projection of µ into J(RΣ k m) is given by (6.7) P J(RΣ k m )( µ) = {A: A = (λ j λ + m )U ju j } where λ... λ k are the ordered eigen values of µ, U,...,U k are corresponding orthonormal eigen vectors and λ = m λj m. (b) The projection set in (6.7) is a

19 STATISTICAL ANALYSIS ON MANIFOLDS 9 singleton and Q has a unique extrinsic mean µ E iff λ m > λ m+. Then µ E = σ(f) where F = (F,...,F m ), F j = λ j λ + m U j. Proof. Since µ k = 0, therefore U j k = 0 j m. Hence any A in (6.7) lies in J(RΣ k m) Now part (a) follows from Theorem 6. using the fact that J(RΣ k m) Mm. k For simplicity, let us denote λ j λ + m, j =,...,m by λ j. To prove part (b), note that if λ m = λ m+, clearly A = m λ j U ju j and A = m λ j U ju j + λ mu m+ U m+ are two distinct elements in the projection set of (6.7). Consider next the case λ m > λ m+. Let µ = UΛU = V ΛV be two different s.v.d. of µ. Then U V consists of orthonormal eigen vectors of Λ = diag(λ,...,λ k ). The fact λ m > λ m+ implies that ( ) (6.8) U V 0 V = 0 V where V SO(m) and V SO(k m). Write ( ) Λ 0 Λ =. 0 Λ Then ΛU V = U V Λ implies Λ V = V Λ and Λ V = V Λ. Hence λ jv j V j ( λ =U j (V ) j (V ) j ( Λ + ( =U m λ)i m = λ ju j U j. ) U ) U This proves that the projection set in (6.7) is a singleton when λ m > λ m+. Then for any F in part (b) and A as in (6.7), A = F F = J(σ(F)). This proves part (b) and completes the proof. From Proposition 3. and Corollary 6.3, it follows that the extrinsic variation of Q has the following expression: V = x µ Q(dx) + µ A, A P J(RΣ k J(RΣ k m ) m )( µ). = x Q(dx) + m( J(RΣ k m ) m λ) (6.9) λ j. Remark 6.. From the proof of Theorem 6. and Corollary 6.3, it follows that the extrinsic mean set C Q of Q is also the extrinsic mean set of Q restricted to M m k. Since Mk m is a compact metric space, from Proposition., it follows that C Q is compact. Let X,X,...,X n be an iid sample from Q and let µ ne and V n be the sample extrinsic mean and variation respectively. Then from Proposition.3, it follows that V n is a consistent estimator of V. From Proposition., it follows that if Q has a unique extrinsic mean µ E, then µ ne is a consistent estimator of µ E.

20 0 ABHISHEK BHATTACHARYA Remark 6.. In Dryden et al. (008), the mean φ-shape is defined as m φ( µ) = λ ju j U j m λ. j The article states that this is the natural projection of µ on to J(RΣ k m). However one can easily check that the distance between φ( µ) and µ is in general greater than the distance of the latter from the correct expression of the projection P J(RΣ k m )( µ) given in Corollary 6.3. The article also states that, The mean φ-shape so defined is the extrinsic mean reflection shape... in the sense of Bhattacharya & Patrangenaru (003, 005) and Hendriks & Landsman (998) which is clearly incorrect. 6.. Asymptotic Distribution of the Sample Mean Reflection Shape. Let X,..., X n be an iid sample from some probability distribution Q on RΣ k m and let µ ne be the sample extrinsic mean (any measurable selection from the sample extrinsic mean set). In the last section, we saw that if Q has a unique extrinsic mean µ E, that is if the mean µ of Q = Q J is a nonfocal point of S(k, R), then µ ne converges a.s. to µ E as n. Also from the calculations of Section 3., it follows that if the projection map P P J(RΣ k m ) is continuously differentiable at µ, then n[j(µ ne ) J(µ E )] has asymptotic mean zero Gaussian distribution on T J(µE)J(RΣ k m). To find the asymptotic coordinates and the asymptotic dispersion matrix, we need to compute the differential of P at µ (if it exists). Consider first the map P : N( µ) S(k, R), P(µ) = m (λ j(µ) λ(µ) + /m)u j (µ)u j (µ) as in Theorem 6.. Here N( µ) is an open neighborhood of µ in S(k, R) where P is defined. Hence for µ N( µ), λ m (µ) > λ m+ (µ). It can be shown that P is smooth on N( µ) (see Theorem 6.4). Let γ(t) = µ + tv be a curve in N( µ) with γ(0) = µ and γ(0) = v S(k, R). Let µ = UΛU, U = (U,...,U k ), Λ = diag(λ,...,λ k ) be a s.v.d. of µ as in Corollary 6.3. Then (6.30) γ(t) = U(Λ + tu vu)u = U γ(t)u where γ(t) = Λ + tu vu. Then γ(t) is a curve in S(k, R) starting at Λ. Say ṽ = γ(0) = U vu. From (6.30) and the definition of P, we get that (6.3) P[γ(t)] = UP[ γ(t)]u. Differentiate (6.3) at t = 0, noting that d dt P[γ(t)] t=0 = d µ P(v) and d dt P[ γ(t)] t=0 = d Λ P(ṽ), to get that (6.3) d µ P(v) = Ud Λ P(ṽ)U. Let us find d dt P[ γ(t)] t=0. For that without loss of generality, we may assume that λ > λ >... > λ k. That is because, the set of all such matrices forms an open dense set of S(k, R). Then we can choose a s.v.d. for γ(t): γ(t) = k λ j(t)e j (t)e j (t) such that {e j (t),λ j (t)} k are some smooth functions of t satisfying e j (0) = e j and λ j (0) = λ j, where {e j } k is the canonical basis for Rk. Since e j (t) e j (t) =, we get by differentiating, (6.33) e jė j (0) = 0, j =,...,k. Also since γ(t)e j (t) = λ j (t)e j (t), we get that (6.34) ṽe j + Λė j (0) = λ j ė j (0) + λ j (0)e j, j =,...,k.

21 STATISTICAL ANALYSIS ON MANIFOLDS Consider the orthonormal basis (frame) for S(k, R): {E ab : a b k} defined as { (e a e t b (6.35) E ab = + e be t a) if a < b e a e t a if a = b. Let ṽ = E ab, a b k. From equations (6.33) and (6.34), we get that 0 if a = b or j / {a,b} (6.36) ė j (0) = / (λ a λ b ) e b if j = a < b / (λ b λ a ) e a if j = b > a and { if j = a = b (6.37) λj (0) = 0 o.w. Since P[ γ(t)] = [λ j (t) λ(t) + m ]e j(t)e j (t) where λ(t) = m m λ j(t), therefore (6.38) λ(0) = m d dt P[ γ(t)] t=0 = + λ j (0), [ λ j (0) λ(0)]ej e j [λ j λ + m ][e jė j (0) + ė j (0)e j]. Take γ(0) = ṽ = E ab, a b k in (6.38). From equations (6.36) and (6.37), we get that (6.39) E ab if a < b m, d dt P[ γ(t)] E aa m t=0 = d Λ P(E ab ) = m E jj if a = b m, (λ a λ + m )(λ a λ b ) E ab if a m < b k, 0 if m < a b k. Then from (6.3) and (6.39), we get that UE ( ab U if a < b m, (6.40) d µ P(UE ab U U E aa ) m m ) = E jj U if a = b m, (λ a λ + m )(λ a λ b ) UE ab U if a m < b k, 0 if m < a b k. From the description of the tangent space T P( µ) M k m in (6.5), it is clear that d µ P(UE ab U ) T P( µ) M k m a b.

22 ABHISHEK BHATTACHARYA Let us denote by (6.4) (6.4) F ab = UE ab U, a m,a < b k, F a = UE aa U, a m. Then from (6.40), we get that F ab if a < b m, (6.43) d µ P(UE ab U F a ) = F if a = b m, ( λa λ ) + m (λa λ b ) F ab if a m < b k, 0 o.w. where F = m m a= F a. Note that the vectors {F ab,f a } in (6.4) and (6.4) are orthonormal and m a= (F a F) = 0. Hence from (6.43), we conclude that the subspace spanned by d µ P(UE ab U ) has dimension This proves that m(m ) = km m + m + m(k m) m(m ) = dim(m k m). T P( µ) M k m = Span{d µ P(UE ab U )} a b. Consider the orthonormal basis {UE ab U : a b k} of S(k, R). Define (6.44) Fa = H aj F j, a m where H is a (m ) m Helmert matrix, that is HH = I m and H m = 0. Then the vectors {F ab } defined in (6.4) and { F a } defined in (6.44) together form an orthonormal basis of T P( µ) M k m. This is proved in Theorem 6.4. Theorem 6.4. Let µ be a nonfocal point in S(k, R). Let µ = UΛU be a s.v.d. of µ. (a) The projection map P : N( µ) S(k, R) is smooth and its derivative dp : S(k, R) TMm k is given by (6.40). (b) The vectors (matrices) {F ab : a m,a < b k} defined in (6.4) and { F a : a (m )} defined in (6.44) together form an orthonormal basis of T P( µ) Mm. k (c) Let A S(k, R) T µ S(k, R) have coordinates ((a ij )) i j k with respect to the orthonormal basis {UE ij U } of S(k, R). That is, A = a ij UE ij U, i j k a ij = A,UE ij U = Then d µ P(A) has coordinates a ij, i < j m, { U i AU j if i < j U i AU i if i = j. ã i, i (m ), ( λ i λ + ) (λ i λ j ) a ij, i m < j k m

STATISTICS ON SHAPE MANIFOLDS: THEORY AND APPLICATIONS Rabi Bhattacharya, Univ. of Arizona, Tucson (With V. Patrangenaru & A.

STATISTICS ON SHAPE MANIFOLDS: THEORY AND APPLICATIONS Rabi Bhattacharya, Univ. of Arizona, Tucson (With V. Patrangenaru & A. Bhattacharya) CONTENTS 1. INTRODUCTION - EXAMPLES 2. PROBAB. MEASURES ON MANIFOLDS