Curve alignment and functional PCA

Curve alignment and functional PCA Juhyun Par* Department of Mathematics and Statistics, Lancaster University, Lancaster, U.K. juhyun.par@lancaster.ac.u Abstract When dealing with multiple curves as functional data, it is a common practice to apply functional PCA to summarize and characterize random variation in finite dimension. Often functional data however exhibits additional variability that distorts the assumed common structure. This is recognized as the problem of curve registration. While the registration step is routinely employed, this is considered as a preprocessing step prior to any serious analysis. Consequently, the effect of alignment is mostly ignored in subsequent analyses and is not well understood. We revisit the issue by particularly focusing on the effect of variability on the FPCA and illustrate the phenomena from a borrowed perturbation viewpoint. This allows us to quantify bias in estimating eigenvalues and eigenfunctions when curves are not aligned. Some discussion on statistical implication is given. Keywords functional data analysis; curve alignment; registration; functional principal component analysis. 1 Introduction Repeated measurements in the form of curves are increasingly common in various scientific applications including biomedicine, physical sciences and econometrics. Usually the sample of curves is assumed to have some homogeneous structure in functional shape, while allowed for individual variability. Summarising and characterising variability in parsimonious manner is one of the aims of functional data analysis (Ramsay and Silverman, 2002, 2005). It is much

desirable that a few common components are able to extract most variability and are easy to interpret (Par et al. 2007). Functional PCA utilises the well-nown Karhunen-Loève expansion to provide an optimal representation of the function with a small number of common components. This is based on the assumption that the underlying random function shares the common mean and covariance function. Functional PCA also provides a useful representation when functional linear model is employed with functional explanatory variables. Figure 1 about here. Often functional data exhibits additional variability. Figure 1 shows a typical example of functional data, taen from the famous Zürich longitudinal growth studies. (Gasser et al., 1984). The left hand side plot shows height measurements of boys taen over years since childhood, with measurements points mared by circles. An observation that these are monotonically increasing does not seem so informative but simply confirms that children grow over! The plot on the right shows estimated velocity curves. Now distinctive features are apparent, which is often the case for functional data that consideration of the 1st or 2nd derivatives allows more meaningful comparison. In particular with children growth processes, it is well nown that each individual reaches its maximum growth at different, while the overall process shares common features between children. The presence of bumps appearing in the velocity curves was considered as an important characteristic of growth processes and not surprisingly the occurrence of such bumps varies between child to child. It is natural to allow variability but the consequence may sound severe in that a simple averaging would not produce a proper mean and most common statistics cannot be pursued in a usual manner. In the longitudinal data framewor, this phenomenon may be described by mixed effect models but there seems no notion of phase variability itself. Alternative approach would be to tae shape invariant models, which is a semi-parametric approach where a parametric form is assumed for phase variability. If no parametric form is assumed, the model is equivalent to those for curve alignments in functional data. This is treated as registration problem in functional data analysis, for which several methods have been developed. Basically when the functions exhibit identifiable features, curves can be aligned to match those features or landmars (Gasser and Kneip, 1995). This wors well as long as features are correctly identified. Several other methods have been developed to automate the procedure when the features are less prominent, see Ramsay and Silverman (2005) and references therein. Although the issue of phase variability has been rightly acnowledged, most analyses treat registration as a preprocessing step and thus do not consider any carry-on effects on later analyses. We point out that in practice, lie any other statistical procedure, the

registration step requires careful human intervention and experiences as well as additional estimation and smoothing step to mae it right. What happens then if registration was not carried out or was done improperly? This question is also relevant when, in modern applied sciences with abundant data, often PCA or a type of factor analysis is carried out to find a low-dimensional representation, ignoring phase variability. Most theoretical analyses view the goal of registration as finding a correct mean function and do not go beyond it. It may sound absurd to worry about the second moment property when the first moment is already in trouble. But as will be explained later, this is not necessarily a linear process and there is a systematic relationship between processes with and without phase variability. For example, Figure 2 demonstrates the effect of additional variability on the number of significant components in function PCA. The top row in Figure 2 shows an example of data with (right) and without (left) variability and the bottom row is the result of functional PCA. When additional variability is present, PCA produces many more uncorrelated principal components scores than anticipated (bottom right). Some issues with interpretability in PCA may be attributed to an improper registration. Figure 2 about here. A recent wor of Kneip and Ramsay (2007) addresses a similar issue (and more) and propose a new procedure to combine registration to fit functional PCA models, extending the convex averaging idea of registration (Liu and Müller, 2004). We also revisit the issue but our focus lies on understanding and quantifying the effect of imperfect registration on functional PCA. This is illustrated based on perturbation theory and numerical examples. Some discussion is given for a simple diagnostic chec for registration methods and a correction for functional PCA. 2 fpca with variability 2.1 Notation for fpca To fix the idea, consider a stochastic process X L 2 (T ) with compact support T = [0, T ], with the mean function µ(t) = E[X(t)] and the covariance function γ(s, t) = Cov(X(s), X(t)). Assume that T E[X(t)2 ] <. Let λ 1 λ 2... be the ordered eigenvalues of the covariance operator defined through γ with the corresponding eigenfunctions φ 1, φ 2,.... We assume that λ <. Then X(t) = µ(t) + ξ φ (t), (1) =1

where E[ξ] = 0 and E[ξ j ξ ] = λ j I(j = ). Thus in our analysis we will automatically assume that X admits the representation of (1). With a sample of curves available, these quantities are replaced by their estimates and a finite number of components {φ } are usually considered sufficient to extract significant observed variation. Theoretical properties of estimators are studied in Dauxois et al. (1982), Rice and Silverman (1991), Kenip (1994) and Hall et al. (2006). 2.2 Time variability and warping function When variability is present, we observe a sample from Y ( ) = X(η( )), where η( ) is called a warping function to describe transformation on axis. The warping function η( ) is assumed to satisfy the following conditions: (A1) η is a smooth monotone function; (A2) E[η(t)] = t for all t T ; (A3) T Var[η(t)] <. (A4) η is independent of amplitude variability (ξs). These are standard assumptions when dealing with registration (Gasser and Kneip, 1995). 2.3 Covariance function estimation without registration Suppose that {Y 1,, Y n } are independent random functions from Y. An empirical covariance function estimator is ˆγ Y (s, t) = 1 n {Y i (s) n Ȳ (s)}{y (t) Ȳ (t)}, i=1 where Ȳ (t) = 1 n n i=1 Y i(s). To study statistical properties of the estimator, this may be written as ˆγ Y = γ Y + n 1/2 {n 1/2 (ˆγ Y γ Y )}, (2) where γ Y (s, t) = Cov(Y (s), Y (t)). The usual asymptotic properties of functional PCA rely on the convergence of the latter term {n 1/2 (ˆγ Y γ Y )}, see Hall and Hosseini-Nasab (2006). However, closeness to γ Y is not so much of interest when additional variability is present. Instead we view the estimator as an approximation to γ X and thus decompose the estimator as ˆγ Y = γ X + (γ Y γ X ) + (ˆγ Y γ Y ).

The term ˆγ Y γ Y follows the usual asymptotics discussed with (2) and is negligible compared to the bias term γ Y γ X. Hence, from now on we will focus on the bias term only. 2.4 Approximation to covariance function We introduce the following notations to describe the observed process Y. µ(s) = E[Y (s)] φ(s) = φ(η(s)) Also let us define γ η (s, t) = E[{η(s) s}{η(t) t}]. Replacing t by η(t) in (1), we obtain a representation of Y. This is not any longer orthonomal decomposition but provides a useful starting point to study ˆγ Y. Combined with independence between η and ξ in (A4) we obtain E[ˆγ Y (s, t)] = E[{ µ(s) µ η (s)}{ µ(s) µ η (t)}] + n 1 λ E[φ (η i (s)) n φ (s) φ (t)]. Under further smoothness assumptions, the expression can be simplified further, which we summarise in the following proposition. Proposition 1. Assume that γ(s, t) = λ φ (s)φ (t) and has continuous second derivatives. Then, we have γ Y (s, t) = γ(s, t) + γ η v(s, t) + o( γ η ) where v(s, t) = γ η(s, t) γ η µ (s)µ (t) + γ η(s, t) γ η + γ η(s, s) γ η λ φ (s)φ (t) λ 2 φ (s)φ (t) + γ η(t, t) γ η λ 2 φ (s)φ (t). Proof of Proposition 1 To simplify the expression, observe that from (A2) any smooth function g of η can be approximated using Taylor expansion as g(η(s)) = g(s) + g (s)(η(s) s) + g (s) 2 (η(s) s)2 + o p (1).

Applying this approximation to µ and φ leads to γ Y (s, t) = γ(s, t) + γ η (s, t)µ (s)µ (t) +γ η (s, t) + γ η(t, t) 2 λ φ (s)φ (t) + γ η(s, s) 2 λ φ (s)φ (t) + o( γ η ) λ φ (s)φ (t) where γ η = sup γ η (s, t). Note that ˆγ Y is not an unbiased estimator and γ Y = E[ n 1 ˆγ n Y ]. Though not neglible itself, phase variability is still assumed to be small relative to amplitude variability.. This opens up the possibility of viewing the bias as additive perturbation. If we assume that the process is well approximated by a finite number of basis functions, it is equivalent to wor with a finite dimensional operator. Then the problem is reduced to finding eigenvalues and eigenfunctions for a perturbed covariance operator. 2.5 Perturbation theory of eigenvalue problem Consider the covariance function of the form γ ε (s, t) = γ(s, t) + εv(s, t) for some function v. Define the corresponding operator by Γ ε φ = γ ε φ for φ L 2. We write the corresponding covariance operator by Γ ε = Γ + εv. The perturbation theory formulates the problem of finding eigenvalues and eigenfunctions as finding approximation terms in order of ε. where Γφ = λ φ = 0, 1, 2, Γ ε φ (ε) = λ (ε)φ (ε) λ (ε) = λ + ελ (1) + ε 2 λ (2) + φ (ε) = φ + εφ (1) + ε 2 φ (2) +

Basically, since the {φ l }s are orthonomal bases, any function can be written in terms of φ l. If φ (r) = l c lφ l and the constants can be determined inductively. If we require that φ (ε) be normalised, then 1 =< φ (ε), φ (ε) >. For illustration, below are given first and second order approximation, which quantify the bias. The expansion is not limited to any order but we only list first two order approximation results for simplification. As a reference, general rth order approximation is provided in Appendix. Here the eigenfunction expansions assume that eigenvalues are distinct. First order approximation: λ (1) = < V φ, φ > φ (1) = l < V φ, φ l > φ l λ λ l Therefore we may write λ (ε) = λ + ε < φ, V φ > +O(ε 2 ) φ (ε) = φ + ε < φ, V φ l > φ l + O(ε 2 ). λ λ l l Second order approximation: λ (2) = < V φ, φ (1) >= l φ (2) = c φ + m c m φ m < V φ, φ l > 2 λ λ l c = 1 2 < φ(1), φ(1) >= 1 < V φ, φ l > 2 2 (λ λ l ) 2 l c m = l < V φ, φ l >< V φ m, φ l > (λ λ l )(λ λ m ) < V φ, φ >< V φ, φ m > (λ λ m ) 2 Note that this framewor is not limited to the case of registration error. Additional smoothing bias, sampling error or discrete approximation error can be formulated in a similar

manner. For example Silverman (1996) used a similar argument to show benefit of smoothing in functional PCA. In the case of sampling error, as in (2), the expression is given with ε = n 1/2, which was the basis of asymptotic expansions of eigenvalues and eigenfunctions in Hall and Hosseini-Nasab (2006). However note that unlie their results, bias here does not diminish as sample size grows. 2.6 Numerical example To illustrate the approximation numerically, we revisit one component model, shown in Figure 2 and defined by x i (s) = a i f(s) a i i.i.d. Normal(µ a, σ 2 a) y i (s) = a i f(s β i ) β i i.i.d. Normal(0, σ 2 β). It follows that µ(s) = µ a f(s), µ(η i (s)) = µ a f(s β i ). For this model, we have only one significant component with λ = σ 2 a f 2, φ(s) = f(s) f. Although it is possible to derive the exact covariance function, we find it does not help much gaining insight into the problem. Instead we use the approximation to covariance function as where γ η (s, t) = σβ 2 γ y (s, t) = γ x (s, t) + σβv(s, 2 t) + o(σβ) 2, v(s, t) = (µ 2 a + σ 2 a)f (s)f (t) + σ2 a 2 {f (s)f(t) + f(s)f (t)} The first order approximation results gives λ ε λ + σβ 2 < f, V f >. f 2

For a concrete example, we consider a special case with f(s) = 1 2π exp ( ) s2 2. With some elementary manipulation, it can be shown that v(s, t) = σ2 a 2 f(s)f(t){(s + t)2 2} λ ε = σ2 a 2 π ( σ2 βσa 2 1 1 2 ) + o(σ 2 β ) 2 = 0.2881σa 2 0.6464σβσ 2 a 2 + o(σβ) 2. Thus, without registration, we would underestimate the eigenvalue. Depending on the magnitude of σ β and σ α, the error term may be significant. Figure 3 about here. For numerical comparison, we simulate 50 curves truncated on [-6,6] with σ a = 2, σ β = 0.5 and estimate the eigenvalues and eigenfunctions of the empirical covariance function estimator. A landmar registration is used to align the curves based on the pea. Results are based on 100 simulations. Figure 3 shows box plots for 4 leading eigenvalues estimated based on x i (0), y i (1), ˆx i (2) respectively and the first order correction (4) suggested in Section 3.1. As expected, the eigenvalues for unregistered curves show spurious variation. Effect of registration is reflected in (2). The bias corrected ones are more variable but can improve the estimation. However there is no guarantee that the ordering will be preserved. We also find that improvements in corrected eigenfunction estimations are rather disappointing. But we have not tried to improve estimation of eigenfunctions, as we consider this beyond the scope of this paper. In addition, for our theoretical derivation and bias approximation, we could ignore all other factors such as smoothing, discrete approximation, different registration methods and possibly higher order approximation errors. In contrast, numerical results will be affected by all these additional factors beyond control. This is particularly relevant to estimating eigenfunctions. 3 Discussion and statistical considerations We focus on the first order approximation. As Γ and Γ ε are exchangeable, we may rewrite the approximation results in terms of λ and φ as λ = λ (ε) ε < φ (ε), V φ (ε) > +O(ε 2 ) = < φ (ε), Γφ (ε) > +O(ε 2 ) φ = φ (ε) < φ (ε), Γφ l (ε) > φ l (ε) + O(ε 2 ). λ (ε) λ l (ε) l

Note that even if λ may have ties, it is unliely that the estimates of λ (ε) have ties. To some extent, this formulation suggests a procedure to recover the true values of λ and φ but this requires the full nowledge of Γ which is unnown. Nevertheless, these have two usage. Firstly combined with our imperfect registration, we can devise a simple bias correction procedure. 3.1 Bias correction based on registration 1. Functional PCA with the original data {y i } produces (ˆΓ ε, ˆλ (ε), ˆφ (ε)); 2. Applying registration method gives (ˆx i, ˆη i ) and thus (ˆΓ, ˆλ, ˆφ ); 3. Bias corrected estimators; ˆλ,2 = < ˆφ (ε), ˆΓ ˆφ (ε) > ˆφ,2 = ˆφ (ε) l < ˆφ (ε), ˆΓ ˆφ l (ε) > ˆλ (ε) ˆλ l (ε). Apparently, when the registration is immaculate, there will be no need of bias correction step and it is best to use (ˆλ, ˆφ ) for further inference. To chec if bias correction is necessary, we can apply a simple diagnostic chec as explined below. 3.2 Diagnostics for registration There is no standard diagnostic for checing the registration step, other than heuristic inspection. Most registration methods however produce not only the registered curves but also the warping functions η. The warping functions are relatively simple functions but we have shown that these are directly related to the performance of the estimators. So the first step would be to examine these functions. Recall that ε = sup γ η. If registration is properly done, ε should be small. Moreover, if curves are already aligned, an additional registration should not alter the curves too much. Thus, we may employ the following diagnostic procedure: 1. Register {y i } and estimate (ˆx i, ˆη i ); 2. Calculate ˆγ η. sup ˆγ η is the order of magnitude in error; 3. Register {ˆx i } and estimate (ˆx (2) i, ˆφ (2) i );

4. Estimate ˆγ η (2). If it is not negligible, then correct bias. However, It may not be practical to correct all eigenfunctions, as these are much more subtle and the recovery relies on the full eigenspace. Nevertheless, one way of avoiding the pitfall with selection of eigenfunctions, as demonstrated in Figure 2, is at least to correct eigenvalues and determine the importance of eigenfunctions in relation to the corresponding eigenvalues. In our discussion, we have not specified any particular registration method and these are suggested as a general principle. Comparison of γ η may be useful when different registration methods are employed. Alternatively, the moment based registration by James (2008) incorporates this idea by directly penalizing the deviation of η in estimation. More generally, the method developed by Kneip and Ramsay (2007) is based on a template matching and can be viewed as an elaborate implementation of our proposal, by iteratively updating registration based on functional PCA. Appendix General rth order approximation: Aφ (0) Assume that A is real symmetric and satisfies = λ (0) φ(0) = 0, 1,,. For identifiability we ssume that (φ, φ ) = 1 and (φ j, φ ) = 0 for j. Consider where (A + εb)φ (ε) = λ (ε)φ (ε) λ (ε) = λ (0) + ελ (1) + ε 2 λ (2) + φ (ε) = φ (0) + εφ (1) + ε 2 φ (2) +. Here λ = λ (0) and φ = φ (0). The equation may be expressed as (A + εb)(φ + εφ (1) + ε 2 φ (2) + ) = (λ + ελ (1) + ε 2 λ (2) + )(φ + εφ (1) + ε 2 φ (2) + ). Assuming the existence of infinite sum series, we have for r 1 Aφ (r) λ φ (r) = λ (r) φ + λ (r 1) φ (1) Denote the right hand side term by f so that f = λ (r) φ + λ (r 1) φ (1) + + λ (1) + + λ (1) φ(r 1) φ(r 1) Bφ (r 1). Bφ (r 1). (3)

and write Aφ (r) Observe that the lefthand side term Aφ (r) < φ l, Aφ (r) In other words, we have λ φ (r) λ φ (r) = f satisifies the following: λ φ (r) > = < (A λ )φ l, φ (r) > { 0 if l = = (λ l λ ) < φ l, φ (r) > if l < φ, f >= 0 < φ l, f >= (λ l λ ) < φ l, φ (r) > l. (4) Equating this with f in place generates λ (r) for r = 1 leads to results stated in Section 2.5. approximation, let us define α (r),l =< φ(r), φ l > so that φ (r) first order approximation results can be summarised as λ (1) = < Bφ, φ > α (1), = 0, α (1),l = < Bφ, φ l > λ λ l l. and φ (r) sequentially. In particular solving Before proceeding this for higher order = α (r), φ + l α(r),l φ l. Then the Consider r 2. For eigenvalue approximation, from (4), the rth order term λ (r) obtained from λ (r) = l α (r 1),l < Bφ, φ l > λ (r 1) α (1), λ(1) α(r 1),. can be For eigenfunction approximation, we need to derive coefficients α (r),l. It can be seen that the coefficient α (r), is related to the normalising condition as 2α (2), = j {α (1),j }2 2α (r), = j α (1),j α(r 1),j j α (2),j α(r 2),j j α (r 1),j α (1),j r 2. The remaining coefficients α (r),l are obtained from (λ λ l )α (r),l = j α (r 1),l < Bφ j, φ l > λ (r 1) α (1),l λ(1) α(r 1),l. To complete calculation, it is necessary to iterate between λ (r) λ (r 1) α (r 1),l ( or φ (r 1) ) λ (r) α (r),l ( or φ(r) ). and φ (r) sequentially, that is,

References [1] Dauxois, J. Pousse, A. and Romain, Y. (1982) Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. Journal of Multivariate Analysis, 12, 136-154. [2] Liu, X. and Müller, H. G. (2004) Functional convex averaging and synchronization for -warped random curves. Journal of the American Statistical Association, 99, 687-699. [3] Gasser, T. and Kneip, A. (1995) Searching for structure in curve samples. Journal of the American Statistical Association, 90, 1179-1188. [4] Gasser, T. Müller, H. G., Köhler, W., Molinari, L. and Prader, A. (1984), Nonparametric regression analysis of growth curves, Annals of Statistics, 12, 210-229. [5] Hall, P, Müller, H. G. and Wang, J. L. (2006). Properties of principal componenet methods for functional and longitudinal data analysis. Annals of Statistics, 34, 1493-1517. [6] Kenip, A. (1994) Nonparametric estimation of common regressors for similar curve data. Annals of Statistics, 22, 1386-1427. [7] Kneip, A. and Ramsay, J. O. (2007) Combining registration and fitting for functional models. technical report. [8] Par, J. Gasser, T. and Rousson, V. (2007) Structural components in functional data. technical report. [9] Ramsay, J. O. and Silverman, B. W. (2002) Applied functional data analysis, New Yor: Springer. [10] Ramsay, J. O. and Silverman, B. W. (2005) Functional data analysis, New Yor: Springer. [11] Rice, J. W. and Silverman, B. W. (1991) Estimating the mean and the covariance structure nonparametrically when the data are curves. Journal of Royal Statistical Society, B, 53, 233-243. [12] Silverman, B. W. (1996) Smoothed functional principal components analysis by choice of norm. Annals of Statistics, 24, 1-24.

180 15 height 150 120 velocity 10 5 90 2 7 12 17 21 age (years) 0 2 7 12 17 21 age (years) Figure 1: Example of height measurements (left) with estimated velocity curves (right) from Zürich growth studies. ]

0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 pc 1 pc 2 pc 1 pc 2 1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0 pc 3 1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0 pc 4 pc 3 pc 4 1.0 0.0 0.5 1.0 1.0 0.0 0.5 1.0 Figure 2: Functional PCA with (right) and without (left) variability. Additional variation produces many more components than anticipated.

la(1) la(2) 0.2 0.4 0.6 0.00 0.10 0.20 0 1 2 3 0 1 2 3 la(3) la(4) 0.000 0.010 0.020 0 1 2 3 0.0000 0.0010 0.0020 0 1 2 3 Figure 3: Comparison of eigenvalues.