Georgia State University. Georgia State University. Jing Wang Georgia State University. Fall

Similar documents
Review of Calculus, cont d

Math 270A: Numerical Linear Algebra

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1

Continuous Random Variables

8 Laplace s Method and Local Limit Theorems

Chapter 5 : Continuous Random Variables

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

New Expansion and Infinite Series

Best Approximation. Chapter The General Case

The steps of the hypothesis test

Orthogonal Polynomials and Least-Squares Approximations to Functions

Tests for the Ratio of Two Poisson Rates

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

1 The Lagrange interpolation formula

Numerical integration

Numerical Analysis: Trapezoidal and Simpson s Rule

Discrete Least-squares Approximations

Theoretical foundations of Gaussian quadrature

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Estimation of Binomial Distribution in the Light of Future Data

Monte Carlo method in solving numerical integration and differential equation

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

The Regulated and Riemann Integrals

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Estimation on Monotone Partial Functional Linear Regression

Student Activity 3: Single Factor ANOVA

5.7 Improper Integrals

MATH 144: Business Calculus Final Review

Numerical Integration

Math 1B, lecture 4: Error bounds for numerical methods

Lecture 14: Quadrature

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

Best Approximation in the 2-norm

Calculus of Variations

Approximation of functions belonging to the class L p (ω) β by linear operators

1.9 C 2 inner variations

1B40 Practical Skills

Variational Techniques for Sturm-Liouville Eigenvalue Problems

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

Predict Global Earth Temperature using Linier Regression

Lecture 19: Continuous Least Squares Approximation

STURM-LIOUVILLE BOUNDARY VALUE PROBLEMS

Self-similarity and symmetries of Pascal s triangles and simplices mod p

Research Article Moment Inequalities and Complete Moment Convergence

PENALIZED LEAST SQUARES FITTING. Manfred von Golitschek and Larry L. Schumaker

Integral points on the rational curve

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Numerical Methods I Orthogonal Polynomials

Review of basic calculus

Non-Linear & Logistic Regression

MAC-solutions of the nonexistent solutions of mathematical physics

SOLUTIONS FOR ANALYSIS QUALIFYING EXAM, FALL (1 + µ(f n )) f(x) =. But we don t need the exact bound.) Set

Chapters 4 & 5 Integrals & Applications

Research Article On Existence and Uniqueness of Solutions of a Nonlinear Integral Equation

Orthogonal Polynomials

Chapter 0. What is the Lebesgue integral about?

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

MATH 174A: PROBLEM SET 5. Suggested Solution

Lecture 1. Functional series. Pointwise and uniform convergence.

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

1 Online Learning and Regret Minimization

Chapter 3 Polynomials

QUADRATURE is an old-fashioned word that refers to

Recitation 3: More Applications of the Derivative

A NOTE ON ESTIMATION OF THE GLOBAL INTENSITY OF A CYCLIC POISSON PROCESS IN THE PRESENCE OF LINEAR TREND

1. Gauss-Jacobi quadrature and Legendre polynomials. p(t)w(t)dt, p {p(x 0 ),...p(x n )} p(t)w(t)dt = w k p(x k ),

Calculus of Variations: The Direct Approach

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

NOTES ON HILBERT SPACE

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

Spanning tree congestion of some product graphs

S. S. Dragomir. 2, we have the inequality. b a

Math 8 Winter 2015 Applications of Integration

g i fφdx dx = x i i=1 is a Hilbert space. We shall, henceforth, abuse notation and write g i f(x) = f

Numerical Linear Algebra Assignment 008

Partial Derivatives. Limits. For a single variable function f (x), the limit lim

An instructive toy model: two paradoxes

New Integral Inequalities for n-time Differentiable Functions with Applications for pdfs

Abstract inner product spaces

Positive Solutions of Operator Equations on Half-Line

Songklanakarin Journal of Science and Technology SJST R1 Thongchan. A Modified Hyperbolic Secant Distribution

Czechoslovak Mathematical Journal, 55 (130) (2005), , Abbotsford. 1. Introduction

Penalized least squares smoothing of two-dimensional mortality tables with imposed smoothness

Main topics for the First Midterm

1 2-D Second Order Equations: Separation of Variables

Entropy and Ergodic Theory Notes 10: Large Deviations I

Numerical Integration

Chapter 3. Vector Spaces

Definition of Continuity: The function f(x) is continuous at x = a if f(a) exists and lim

Lecture 21: Order statistics

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon

Lecture 6: Singular Integrals, Open Quadrature rules, and Gauss Quadrature

NUMERICAL INTEGRATION

p-adic Egyptian Fractions

ODE: Existence and Uniqueness of a Solution

New data structures to reduce data size and search time

A Bernstein polynomial approach for solution of nonlinear integral equations

Lecture Note 9: Orthogonal Reduction

Transcription:

Georgi Stte University ScholrWorks @ Georgi Stte University Mthemtics Disserttions Deprtment of Mthemtics nd Sttistics Fll 12-2016 Functionl Principl Component Anlysis for Discretely Observed Functionl Dt nd Sprse Fisher s Discriminnt Anlysis with Thresholded Liner Constrints Jing Wng Georgi Stte University Follow this nd dditionl works t: http://scholrworks.gsu.edu/mth_diss Recommended Cittion Wng, Jing, "Functionl Principl Component Anlysis for Discretely Observed Functionl Dt nd Sprse Fisher s Discriminnt Anlysis with Thresholded Liner Constrints." Disserttion, Georgi Stte University, 2016. http://scholrworks.gsu.edu/mth_diss/35 This Disserttion is brought to you for free nd open ccess by the Deprtment of Mthemtics nd Sttistics t ScholrWorks @ Georgi Stte University. It hs been ccepted for inclusion in Mthemtics Disserttions by n uthorized dministrtor of ScholrWorks @ Georgi Stte University. For more informtion, plese contct scholrworks@gsu.edu.

Functionl Principl Component Anlysis for Discretely Observed Functionl Dt nd Sprse Fisher s Discriminnt Anlysis with Thresholded Liner Constrints by Jing Wng Under the Direction of Xin Qi, PhD ABSTRACT We propose new method to perform functionl principl component nlysis (FPCA) for discretely observed functionl dt by solving successive optimiztion problems. The new frmework cn be pplied to both regulrly nd irregulrly observed dt, nd to both dense nd sprse dt. Our method does not require estimtes of the individul smple functions or the covrince functions. Hence, it cn be used to nlyze functionl dt

with multidimensionl rguments (e.g. rndom surfces). Furthermore, it cn be pplied to mny processes nd models with complicted or nonsmooth covrince functions. In our method, smoothness of eigenfunctions is controlled by directly imposing roughness penlties on eigenfunctions, which mkes it more efficient nd flexible to tune the smoothness. Efficient lgorithms for solving the successive optimiztion problems re proposed. We provide the existence nd chrcteriztion of the solutions to the successive optimiztion problems. The consistency of our method is lso proved. Through simultions, we demonstrte tht our method performs well in the cses with smooth smples curves, with discontinuous smple curves nd nonsmooth covrince nd with smple functions hving two dimensionl rguments (rndom surfces), repectively. We pply our method to clssifiction problems of retinl pigment epithelil cells in eyes of mice nd to longitudinl CD4 counts dt. In the second prt of this disserttion, we propose sprse Fisher s discriminnt nlysis method with thresholded liner constrints. Vrious regulrized liner discriminnt nlysis (LDA) methods hve been proposed to ddress the problems of the LDA in high-dimensionl settings. Asymptotic optimlity hs been estblished for some of these methods when there re only two clsses. A difficulty in the symptotic study for the multiclss clssifiction is tht for the two-clss clssifiction, the clssifiction boundry is hyperplne nd n explicit formul for the clssifiction error exists, however, in the cse of multiclss, the boundry is usully complicted nd no explicit formul for the error generlly exist. Another difficulty in proving the symptotic consistency nd optimlity for sprse Fisher s discriminnt nlysis is tht the covrince mtrix is involved in the constrints of the optimiztion problems for high order components. It is not esy to estimte generl high-dimensionl covrince mtrix. Thus, we propose sprse Fisher s discriminnt nlysis method which voids the estimtion of the

covrince mtrix, provide symptotic consistency results nd the corresponding convergence rtes for ll components. To prove the symptotic optimlity, we provide n symptotic upper bound for generl liner clssifiction rule in the cse of muticlss which is pplied to our method to obtin the symptotic optimlity nd the corresponding convergence rte. In the specil cse of two clsses, our method chieves the sme s or better convergence rtes compred to the existing method. The proposed method is pplied to multivrite functionl dt with wvelet trnsformtions. INDEX WORDS: Functionl PCA, discretely observed functionl dt, successive optimiztion problems, roughness penlty, consistency, sprse Fisher s discriminnt nlysis, thresholded liner constrints, symptotic consistency, symptotic optimlity, convergence rte

Functionl Principl Component Anlysis for Discretely Observed Functionl Dt nd Sprse Fisher s Discriminnt Anlysis with Thresholded Liner Constrints by Jing Wng A Disserttion Submitted in Prtil Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the College of Arts nd Sciences Georgi Stte University 2016

Copyright by Jing Wng 2016

Functionl Principl Component Anlysis for Discretely Observed Functionl Dt nd Sprse Fisher s Discriminnt Anlysis with Thresholded Liner Constrints by Jing Wng Committee Chir: Xin Qi Committee: Gengsheng Qin Yichun Zho Ruiyn Luo Electronic Version Approved: Office of Grdute Studies College of Arts nd Sciences Georgi Stte University October 2016

iv DEDICATION This disserttion is dedicted to my der prents, my dvisor, nd ll my der friends.

v ACKNOWLEDGEMENTS I would like to give my deep thnks nd sincere grtitude to everyone for helping me to complete this disserttion. First nd foremost I wnt to express my sincere grtitude to my thesis dvisor, Xin Qi, for the continuous support of my Ph.D study nd relted reserch, for his ptience, motivtion, nd immense knowledge. I hve been mzingly fortunte to hve n dvisor who gve me the freedom to explore on my own, nd t the sme time the guidnce to recover when my steps fltered. His guidnce helped me in ll the time of reserch nd writing of this disserttion. I could not hve imgined hving better dvisor nd mentor for my Ph.D study. He hs been my dvisor, mentor, collbortor nd friend. I pprecite Professors, Gengsheng Qin, Yichun Zho nd Ruiyn Luo, who kindly greed to serve on my disserttion committee nd hve given me continuous help throughout my grdute study. I m lso thnkful to them for reding my reports, commenting on my views nd helping me understnd nd enrich my ides. Without their precious support it would not be possible to conduct this reserch. Besides, I would like to thnk other professors from our deprtment for educting me in vrious courses. At lst, I wnt to thnk everyone who hs helped me throughout my grdute life, especilly my immedite fmily to whom this disserttion is dedicted to. None of this would hve been possible without the love nd ptience of my fmily. I m lso grteful to friends. Their support nd cre helped me overcome setbcks nd sty focused on my grdute study. I gretly vlue their friendship nd I deeply pprecite their belief in me.

vi TABLE OF CONTENTS ACKNOWLEDGEMENTS................. v LIST OF TABLES.................... ix LIST OF FIGURES.................... x LIST OF ABBREVIATIONS................ xi Chpter 1 INTRODUCTION............... 1 Chpter 2 FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR DISCRETELY OBSERVED FUNCTIONAL DATA 6 2.1 Bckground nd Nottions........................ 6 2.2 Silvermn s pproch to smoothed functionl PCA........ 9 2.3 Functionl PCA for discretely observed functionl dt..... 12 2.3.1 Regulr cse.............................. 12 2.3.2 Irregulr cse............................. 14 2.3.3 Computtionl issues......................... 17 2.3.4 Consistency.............................. 19 2.3.5 Extensions to FPCA for functionl dt with multidimensionl rguments.................................. 21 2.4 Simultion studies............................. 22 2.4.1 Smooth rndom curves with 2 PC curves.............. 23 2.4.2 Smooth rndom curves with 3 PC curves.............. 27 2.4.3 Rndom surfce............................. 31 2.5 Applictions................................. 33 2.5.1 Retinl pigment epithelium (RPE) dt.............. 33 2.5.2 Longitudinl CD4 counts dt.................... 35

vii Chpter 3 SPARSE FISHER S DISCRIMINANT ANALYSIS WITH THRESHOLDED LINEAR CONSTRAINTS..... 39 3.1 Fisher s discriminnt nlysis...................... 39 3.2 Sprse Fisher s discriminnt nlysis with thresholded liner constrints..................................... 41 3.2.1 The cse of K = 2.......................... 42 3.2.2 The cse of K > 2.......................... 43 3.2.3 Computtion............................. 44 3.3 Asymptotic consistency nd symptotic optimlity........ 45 3.3.1 The cse of K = 2.......................... 49 3.3.2 The cse of K > 2........................... 51 3.4 Simultion studies............................. 54 3.5 Appliction to multivrite functionl dt............. 56 3.5.1 Dily nd sports ctivities dt................... 56 3.5.2 Austrlin sign lnguge dt.................... 58 Chpter 4 PROOFS OF THEOREMS............ 60 4.1 Proof of Theorem 2.3.1.......................... 60 4.2 Proof of Theorem 2.3.2.......................... 63 4.3 Proof of Theorem 2.3.4.......................... 64 4.4 Proof of Theorem 2.3.5.......................... 69 4.5 Proof of Theorem 3.3.1........................... 71 4.6 Proof of Theorem 3.3.4.......................... 79 4.7 Proof of Theorem 3.3.5.......................... 84 4.8 Proof of Theorem 3.3.7.......................... 90 4.9 Proof of Theorem 3.3.8.......................... 93 4.10 Proof of Theorem 3.3.9.......................... 99 4.11 Proof of Lemms.............................. 105 4.11.1 Proof of Lemm 3........................... 105 4.11.2 Proof of Lemm 4........................... 107

viii 4.11.3 Proof of Lemm 6........................... 114 4.11.4 Proof of Lemm 7........................... 117 4.11.5 Proof of Lemm 1........................... 118 4.11.6 Proof of Lemm 2........................... 119 4.11.7 Proof of Lemm 8............................ 121 4.11.8 Proof of Lemm 9........................... 122 4.11.9 Proof of Lemm 10.......................... 123 4.11.10 Proof of Lemm 11.......................... 125 4.11.11 Proof of Lemm 12.......................... 127 Bibliogrphy....................... 137

ix LIST OF TABLES Tble 2.1 The verges nd stndrd devitions of cumultive vrince of selected principl component scores for the simultions in Section 2.4.1: Regulr Cse................................... 25 Tble 2.2 The verges nd stndrd devitions of cumultive vrince of selected principl component scores for the simultions in Section 2.4.1: Irregulr Cse................................... 26 Tble 2.3 Selected smoothing prmeter with the usul cross-vlidtion procedure for the simultions in Section 2.4.2: Regulr Cse......... 29 Tble 2.4 The verges nd stndrd devitions of cumultive vrince of selected principl component scores for the simultions in Section 2.4.2: Regulr Cse................................... 30 Tble 2.5 Selected smoothing prmeter with the usul cross-vlidtion procedure for the simultions in Section 2.4.2: Irregulr Cse......... 30 Tble 2.6 The verges nd stndrd devitions of cumultive vrince of selected principl component scores for the simultions in Section 2.4.2: Irregulr Cse.................................... 31 Tble 3.1 The verges nd stndrd devitions of misclssifiction rtes (%) for the simultions in Section 3.4..................... 56 Tble 3.2 The verges nd stndrd devitions of the misclssifiction rtes (%) for the dily nd sports ctivities dt................ 58 Tble 3.3 The verges nd stndrd devitions of the misclssifiction rtes (%) for the Austrlin sign lnguge dt................. 59

x LIST OF FIGURES Figure 2.1 The First Two Principl Component Curves............ 23 Figure 2.2 Simulted Smple Curves for the simultions in Section 2.4.1... 24 Figure 2.3 The First Three Principl Component Curves........... 27 Figure 2.4 Simulted Smple Curves for the simultions in Section 2.4.2... 28 Figure 2.5 Eigenfunctions nd their estimtes in one simultion....... 32 Figure 2.6 Locl regions of two RPE cells.................... 34 Figure 2.7 Men joint densities of four ctegories............... 35 Figure 2.8 Plots of the first four estimted principl component functions for RPE dt.................................. 36 Figure 2.9 Smple curves from CD4 dt.................... 37 Figure 2.10 Estimtes of the first three eigenfunctions for CD4 dt...... 38

xi LIST OF ABBREVIATIONS GSU - Georgi Stte University PCA - Principl Component Anlysis FPCA - Functionl Principl Component Anlysis LDA - Liner Discriminnt Anlysis CV - Cross-vlidtion

1 Chpter 1 INTRODUCTION Principl component nlysis (PCA) is one of the best known techniques in both multivrite nlysis nd functionl dt nlysis. Different from clssicl PCA, functionl principl component nlysis (PCA) requires smoothing or regulrizing of the estimted principl component curves (see Chpter 9 in Rmsy nd Silvermn [33]). Reders cn find generl overview of mny methods for computing the smoothed functionl principl components when the smple curves re fully observed in Rmsy nd Silvermn [33]. Ferrty nd Vieu [16] provides more discussions on nonprmetric methods nd developments for functionl dt nlysis. However, in prctice, the smple functions re usully observed t discrete points with mesurement errors. The observtion points might be irregulr or sprse. Severl FPCA methods for discretely smpled functionl dt or longitudinl dt hve been developed. Shi, Weiss nd Tylor [38], Rice nd Wu [35] nd Jmes, Hstie nd Sugr [23] proposed mixed effects pproches in which individul smple curves or eigenfunctions of the covrince function re represented by bsis function expnsions. Stniswlis nd Lee [44] nd Yo, Müller nd Wng [54] used nonprmetric methods to estimte covrince functions nd then obtined the eigenfunctions. Hung, Shen nd Buj [22] proposed n FPCA method for regulrly observed discrete functionl dt bsed on penlized rnk one pproximtion to the dt mtrix. Peng nd Pul [29] ssumed finite rnk model for the covrince function, represented the eigenfunctions s bsis function expnsions, nd proposed the restricted mximum likelihood method to estimte the prmeters. Other nonprmetric pproches to this problem tend to fll into two clsses. The pproches in the first clss smooth ech individul curve by the smoothing spline method or other methods (see section 9.5 in Rmsy nd Silvermn [33]). Then the smoothed principl component curves cn be obtined by the usul functionl PCA or other methods. For exmple, motivted by the dulity reltion between row nd column spces of dt mtrix, Benko et l. [6] proposed n FPCA method for regulrly observed discrete functionl dt. The methods

2 in the second clss ssume tht covrince functions re smooth. Smoothing methods such s kernel methods nd free-knot spline smoothing re used to obtin smoothed estimtes of men functions nd covrince functions. Then the principl components curves cn be estimted by the eigenfunctions of the smoothed covrince function. However, some of these methods (Hung et l. [22] nd Benko et l. [6] ) cnnot be pplied to the discrete functionl dt in which the observtion points re irregulr nd sprse. Some of them (Stniswlis nd Lee [44] nd Yo et l.[54]) need to estimte the covrince functions, hence it is hrd to pply these methods to functionl dt with two or three dimensionl rguments since we hve to estimte four or six dimensionl covrince functions. In this disserttion, we first propose new method to perform FPCA for discretely observed functionl dt by solving successive optimiztion problems. The new frmework cn be pplied to both regulrly nd irregulrly observed dt, nd to both dense nd sprse dt. First, our method does not need to estimte the individul smple functions or the covrince functions nd we do not ssume tht they re smooth. Hence, it cn be esily pplied to discretely observed functionl dt with two or three dimensionl rguements nd to processes nd models with complicted or nonsmooth covrince functions. Most of the current methods ssume tht either the smple functions or the covrince functions re smooth explicitly or implicitly. Some of them need to obtin the smoothed estimtions of the smple functions or the covrince functions. However, there re mny importnt processes nd models with nonsmooth smple functions nd nonsmooth covrince functions but with smooth eigenfunctions. Our methods cn be pplied to these processes nd models. Some rel functionl dt hve complicted covrince functions in which we re not interested. In this cse, our methods void estimting the complicted covrince functions. Second, our method controls the smoothness of eigenfunctions by directly imposing roughness penlties on eigenfunctions nd cn use different smoothing prmeters for different eigenfunctions. Hence, it is efficient nd flexible to tune the smoothness of eigenfunctions in this method. Our methods cn lso be esily extended to nlyze the discretely observed functions defined on high-dimensionl spces, e.g. rndom surfces. Section 5 in Müller (2005) listed some open problems concern the ppliction of FDA methods including nlysis of rndom surfces nd higher-dimensionl functions. We pplied our methods to simulted discretely observed

3 rndom surfce dt. For high-dimensionl dt, the covrince functions re defined on higher dimension spce with dimensions equl to two times of the dimensions of the smple functions. Hence, it is very hrd to obtin good estimtions of covrince functions in this cse. Efficient lgorithms for solving the successive optimiztion problems re proposed. We provide the existence nd chrcteriztion of the solutions to the successive optimiztion problems. The consistency of our method is lso proved. The following rel exmple is used to motivte nd illustrte the method developed in this disserttion. The liner discriminnt nlysis (LDA) hs been fvored tool for supervised clssifiction in the settings of smll p nd lrge n. However, it fces mjor problems for high-dimensionl dt. In theory, Bickel nd Levin [7] nd Sho et l. [37] showed tht the usul LDA cn be s bd s the rndom guessing when p > n. In prctice, the clssic LDA methods hve bd predictive performnce in high-dimensionl settings. To ddress these problems, vrious regulrized discriminnt nlysis methods hve been proposed, including Friedmn [17], Krznowski et l. [26], Dudoit et l. [13], Bickel nd Levin [7], Guo et l. [19], Xu et l. [53], Tibshirni et l. [47], Witten nd Tibshirni [52], Clemmensen et l. [11], Sho et l. [37], Ci nd Liu [9], Fn et l. [15], Qi et l. [32] nd mny others. Asymptotic optimlity hs been estblished in some of these ppers when there re two clsses. Sho et l. [37] mde sprsity ssumptions on both the difference δ = µ 2 µ 1, where µ 1 nd µ 2 re the popultion mens of the two clss, nd the within-clss covrince mtrix Σ. Then thresholding procedures were pplied to both the difference between the two smple clss mens nd the smple within-clss covrince mtrix Σ. The symptotic optimlity nd the corresponding convergence rte for their clssifiction rule were obtined. Ci nd Liu [9] observed tht in the cse of two clsses, the optiml clssifiction rule depends on Σ only through Σ 1 δ. Hence, they ssumed l 1 sprsity for Σ 1 δ, proposed sprse estimte of it through minimizing its l 1 norm with n l constrint, nd provided symptotic optimlity of their clssifiction rule. Fn et l. [15] imposed l 0 sprsity ssumption on Σ 1 δ, estimted it through minimiztion problem with n l 1 constrint nd derived the symptotic optimlity. A mjor difficulty preventing the derivtion of symptotic optimlity of the liner clssifiction rules for multiple clsses is tht for the two-clss clssifiction, the clssifiction boundry of LDA is hyperplne nd n explicit formul for the clssifiction error exists, however,

4 for the multiclss clssifiction, the clssifiction boundry is usully complicted nd no explicit formul for the clssifiction error generlly exist. As specil cse of LDA, the Fisher s discriminnt nlysis projects the originl vribles X to low dimensionl subspce to generte new predictor vribles, Xα 1, Xα 2,..., Xα K 1, where the coefficient vectors α 1, α 2,..., α K 1 re sequentilly clculted nd K is the number of clsses. The coefficient vectors re found by mximizing the between clss vrition of the new predictor vribles reltive to their within clss vrition nd the new predictors re orthogonl to ech other, tht is, the liner constrints α T i Σα j = 0 for ny 1 j < i < K re stisfied. Once the coefficients re determined nd the clssifiction rule is to ssign new observtion to the clss with the smple clss men closest to this observtion in the projection subspce. Besides the complicted clssifiction boundry for multiclss, the liner constrint α T i Σα j = 0 poses dditionl difficulty in studying the symptotic consistency nd optimlity for the Fisher s discriminnt nlysis in high dimensionl setting for K > 2 becuse the covrince mtrix Σ is involved. It is not esy to find consistent estimte for generl Σ in the high-dimensionl settings. Qi et l. [32] introduced sprse Fisher s discriminnt nlysis method, n dvntge of which is tht the proposed lgorithm is pplicble to ny liner constrints imposed on the higher order components. In the second prt of this disserttion, insted of iming to find consistent estimte of Σ, we pply soft-thresholding procedure to obtin consistent estimte of the subspce {Σα 1,, Σα i 1 } which defines the liner constrints for α i, for ny 1 < i K 1. Then tking dvntge of the lgorithm in the pper bove, we propose the estimtes of α i, for ll 1 i K 1, nd n clssifiction rule. We study the theoreticl properties of this method in high dimensionl settings, including the symptotic consistency of the estimte of α i nd the subspces defining the orthogonl constrints, the symptotic optimlity, nd the corresponding convergence rtes, where the number K of clsses cn be ny fixed positive integer. In the specil cse of K = 2, the symptotic optimlity of the our method is compred to the existing method nd our method hs the sme or better convergence rte. We pply our method to the clssifiction problems for multivrite functionl dt through the wvelet trnsformtions. The reminder of the disserttion is orgnized s follow. In Chpter 2, we present our new method to perform FPCA for discretely observed functionl dt by solving successive

5 optimiztion problems. We first give some bckground, bsic nottions nd our min ssumptions. The clssic Silvermn s method to perform smoothed FPCA is lso introduced. We then present our method long with its theoreticl properties, nd n lgorithm for solving the successive optimiztion problems in prctice. Simultion results with comprison to other estblished method re reported to illustrte the effectiveness of our method. At lst, we pply our method on 2 rel dt sets: the RPE dt set nd the Longitudinl CD4 counts dt set. In Chpter 3, we propose sprse Fisher s discriminnt nlysis method with thresholded liner constrints which voids the estimtion of the covrince mtrix. We first introduce nottions nd briefly review the clssic Fisher s discriminnt nlysis. Then our sprse Fisher s LDA method with thresholded liner constrints re introduced. We lso present the min theoreticl results long with simultion studies nd pplictions. All proofs of our theorems cn be found in the Chpter 4 of the disserttion.

6 Chpter 2 FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR DISCRETELY OBSERVED FUNCTIONAL DATA In this chpter, we present our new method to perform function principl nlysis (FPCA) for discretely observed functionl dt by solving successive optimiztion problems. We first give some bckground, bsic nottions nd our min ssumptions. The clssic Silvermn s method to perform smoothed FPCA is introduced in Section 2.2. We then present our method long with its theoreticl properties, nd n lgorithm for solving the successive optimiztion problems in prctice. Simultion results re reported to illustrte the effectiveness of our method. At lst, we pply our method on the RPE dt set nd the Longitudinl CD4 counts dt set in Section 2.5. The proofs of ll theorems re provided in Section 4.11 of Chpter 4. 2.1 Bckground nd Nottions First, we introduce nottions nd definitions used in this chpter. Let N denote the collection of ll the positive integers. In this chpter, we will minly consider functions defined in finite intervl [, b] in the following two spces, the spce L 2 ([, b]) of squre integrble functions b L 2 ([, b]) = {f : f is mesurble function on [, b] nd f(t) 2 dt < } nd the Sobolev spce W 2 2 ([, b]) of functions with squre integrble second derivtives, W 2 ([, b]) = {f : f, f re bsolutely continuous on[, b] ndf L 2 ([, b])}

7 where f nd f denote the first nd second derivtives of f, respectively. For ny f, g L 2 ([, b]), define the usul inner product f, g = b f(t)g(t)dt with corresponding squred norm f 2 = f, f. Given smoothing prmeter α 0, for ny f, g W2 2 ([, b]), define [f, g] = b f (t)g (t)dt nd the inner product f, g α = f, g + α[f, g] with corresponding squred norm f 2 α = f, f α. Here we use the sme nottions s those in Silvermn [41]. Let X(t), t b be mesurble stochstic process (rndom function) on [, b] nd X 1 (t), X 2 (t),, X n (t) be i.i.d smple functions from the distribution of X(t). Below we give three bsic ssumptions on X(t) which re essentilly the sme s those in Silvermn [41]. Assumption 1. E [ X 4 ] = E [ ( b X(t) 2 dt ) 2 ] <. Under Assumption 1, X(t) L 2 ([, b]).s.. Assume tht men function EX(t) = µ(t). Define the covrince function of X(t) Γ(s, t) = E [(X(s) µ(s)) (X(t) µ(t))], s, t [, b], (2.1) Under Assumption 1, Γ hs sequence of nonnegtive eigenvlues λ 1 λ 2 0 nd the corresponding eigenfunctions γ 1, γ 2,. Every eigenfunction hs been scled to hve L 2 -norm 1. The set of ll the eigenfunctions forms n orthonorml bsis of L 2 ([, b]). Furthermore, we hve decomposition Γ(s, t) = λ j γ j (s)γ j (t) j=1. Suppose tht we re interested in estimting the first K eigenvlues nd eigenfunctions of the covrince function Γ.

8 Assumption 2. Any eigenvlue λ j, 1 j K hs multiplicity 1, so tht λ 1 > λ 2 > > λ K > λ K+1. This ssumption is just the third ssumption in Section 5.2 of Silvermn [41]. If n eigenvlue hs multiplicity 1, then the corresponding eigenfunction is uniquely determined up to sign. If the multiplicity is lrger thn 1, the eigenfunctions cn not be uniquely determined up to sign (Qi nd Zho [31]). Assumption 3. The eigenfunctions γ j, 1 j K belong to W 2 2 ([, b]). If the covrince function Γ is smooth, then Assumption 3 holds. However, there re mny importnt rndom processes whose covrince mtrices re nonsmooth, but the eigenfunctions belong to W 2 2 ([, b]) (Qi nd Zho [31]). For exmple, the continuous prmeter AR(1) model in time series, Brownin motion, Poisson process nd the stochstic differentil eqution models driven by them. Exmple 1. (Brownin motion nd Poisson process). Consider the stndrd Brownin motion nd the Poisson process with rte 1 in time intervl [0, 1]. Their covrince functions re the sme nd equl to min(s, t), 0 s, t 1 (see Pge 89 in the book Glssermn [18]) which is nonsmooth. The eigenvlues nd eigenfunctions re ( ) 2 2 λ j =, γ j = ( ) (2j 1)πt 2 sin, j = 1, 2, (2.2) (2j 1)π 2 Exmple 2. (Stochstic differentil eqution models). SDE re widely used to model rndom processes in my res. One exmple is the fmous Blck-Scholes Model in finnce. Let S t denote the price of stock t time t. Then S t stisfies the following SDE, ds t = νs t dt + σs t dw t where ν is the instntneous men return, σ is the instntneous return voltility nd W t is Brownin motion. Another exmple is the counting processes model in survivl nlysis. Let N t be the number

9 of the occurrences of the event in [0, t]. Then N t stisfies dn t = λ(t)dt + dm t where λ(t) is smooth intensity function nd M t is mrtingle (Qi nd Zho [31]). Exmple 3. (Continuous prmeter models in time series). Consider the con-tinuous prmeter AR(1) model in time series. Its covrince function is Γ(s, t) = e α s t α where α is positive number (see Section 3.7 in Priestley [30]). This covrince function is nonsmooth. For these models, the covrince functions re nonsmooth but the eigenfunctions re smooth. In ddition to these processes nd models, some rel functionl dt hve covrince functions with complicted ptterns. 2.2 Silvermn s pproch to smoothed functionl PCA In this section, the independent smple curves from the distribution of X(t), {X 1 (t), X 2 (t),, X n (t) : t b} re ssumed to be entirely observed nd t could be continuum in [, b], or in twodimensionl region [, b] [c, d], or in higher-dimensionl regions.. The covrince function is defined s 2.1 nd covrince opertor (Γγ)(t) = b Γ(t, s)γ(s)ds. For ny β, γ L 2 ([, b]), Cov [ β, X, γ, X ] = β, Γγ

10 FPCA is one of the key techniques in functionl dt nlysis for ptterns discovery nd dimension reduction in dt sets. The first popultion functionl principl component s one-dimensionl projection of X γ 1, X = b γ 1 (t)x(t)dt, γ 1 = 1 which mximizes the vrince of principl component scores γ, Γγ V r( γ 1, X ) = mx V r( γ, X ) = mx γ, Γγ = mx (2.3) γ =1 γ =1 γ =1 γ 2 for ll nonzero liner functionls l in L 2 ([, b]) with the norm l = 1. γ 1 is clled the first principl component weight function or the first PC curve. Let λ 1 be the mximum vlue of (2.3). The pir (λ 1, γ 1 ) re the first eigenvlue nd eigenfunction of Γ (see Section 2, Chpter 3 in Weinberger [51]), Γγ 1 = λ 1 γ 1 The second functionl principl component (γ 2, X): γ 2 is the solution to γ, Γγ mx (2.4) γ =1, γ,γ 1 =0 γ 2 Let λ 2 be the mximum vlue of (2.4). The pir (λ 2, γ 2 ) re the second eigenvlue nd eigenfunction of Γ, Γγ 2 = λ 2 γ 2 Similry, the successive popultion functionl principl components re defined. However, we usully do not know the true covrince function Γ nd the popultion principl component weight functions cn not be obtined directly. We cn use the smple covrince function ˆΓ n to estimte Γ nd use the eigenvlues nd eigenfunctions of ˆΓ n to estimte the eigenvlues nd eigenfunctions of Γ, which re clled non-smooth estimtors. It ws pointed out tht the non-smooth principl component curves cn show substntil vribility (see Chpter 9 in Rmsy nd Silvermn [33]). Therefore, smoothing of the estimted principl component weight functions is necessry.

11 Silvermn [41] (see lso Chpter 9 in Rmsy nd Silvermn [33]) proposed n importnt method which incorportes smoothing by replcing the usul L 2 norm with norm tht tkes the roughness of the functions into ccount. Qi nd Zho [31] summrizes the theoreticl nd prcticl dvntges of Silvermn s pproch s follows: First, the wek ssumptions underlying this method mke it pplicble to dt from mny fields. Silvermn [41] did not mke ny ssumptions on the men curves nd smple curves. Hence, in ddition to dt with smooth rndom curves, this method cn be pplied to nlyze dt where the smple curves cn be unsmooth or even discontinuous, such s those encountered in finncil engineering, survivl nlysis nd other fields. For covrince functions, Silvermn [41] only ssumed tht they hve series expnsions by their eigenfunctions without imposing smoothing constrint. This is ttrctive becuse the covrince functions re continuous but unsmooth in mny importnt models such s stochstic differentil eqution models in finncil engineering nd counting process models in survivl nlysis. Second, Silvermn s method controls the smoothness of eigenfunction curves by directly imposing roughness penlties on these functions insted of on smple curves or covrince functions. Furthermore, this pproch chnges the eigenvlue nd eigenfunction problems in the usul L 2 spce to problems in nother Hilbert spce, the Sobolev spce (with norm different from the usul norm in the Sobolev spce). Therefore, mny powerful tools from the theory of Hilbert spce cn be employed to study the properties of this method. Third, this pproch incorportes the smoothing step into the step for computing eigenvlues nd eigenfunctions. Therefore, this method is computtionlly efficient with the sme computtionl lod s the usul unsmoothed functionl PCA. Fourth, the estimtes produced by this method re invrint under scle trnsformtions. As pointed out by Hung et l. [22], the invrince property under scle trnsformtions should be guiding principle in introducing roughness penlties to functionl PCA. Let α be nonnegtive smoothing prmeter. Silvermn defines the smoothed estimtors

12 {(ˆλ [α] j, ˆγ [α] ) : j N}of {(λ j, γ j ) : j N} to be the solutions of the following successive j optimiztion problems: First,ˆγ [α] 1 is the solution of the optimiztion problem mx γ =1 γ, ˆΓ n γ γ, γ + α[γ, γ] = mx γ =1 γ, ˆΓ n γ. (2.5) γ 2 α Let ˆγ [α] 1 be the mximum vlue of (2.5). For ny k N, if we hve obtined {ˆγ [α] j, j = 1, 2,, k 1} nd {ˆλ [α] j, j = 1, 2,, k 1},ˆγ [α] k is the solution of the optimiztion problem mx γ =1, γ,ˆγ [α] j α=0, 1 j k 1 γ, ˆΓ n γ γ 2 α (2.6) nd ˆλ [α] k is the mximum vlue of (2.6). Note tht {(ˆλ [α] j smple size n nd the smoothing prmeter α (Qi nd Zho [31])., ˆγ [α] ) : j N} depends on both the j 2.3 Functionl PCA for discretely observed functionl dt We consider two smple scenrios for smple functions observed t discrete points, regulr cse nd irregulr cse, respectively. 2.3.1 Regulr cse In this cse, we ssume tht the smple functions re observed t the sme set {t 1, t 2, t m } of discrete observtion points cross ll the subjects with mesurement errors, where m is the totl number of observtion points for ech smple function. After sorting the observtion points from the smllest to the lrgest, we get = t (1) < t (2) < t (m 1) < t (m) = b. Let us consider the following model: Y pq = X p (t (q) ) + ɛ pq, p = 1,, n, q = 1,, m, (2.7) where Y pq is the observtion of the smple function X p t point t (q) with mesurement error ɛ pq nd n is the totl number of smple curves. Our estimtes {ˆλ k, ˆγ k } k 1 of {λ k, γ k } k 1 re the solutions to the following successive optimiztion problems. The first pir of estimtes

13 {ˆλ 1, ˆγ 1 } re the mximum vlue nd the solution to the following optimiztion problem: mq=1 ml=1 ˆΣql γ(t (q) )γ(t (l) )w q w l mx, (2.8) γ W2 2([,b]), γ =1 γ 2 + α 1 [γ, γ] where α 1 > 0 is smoothing prmeter, ˆΣ ql = 1 n np=1 ( Ypq Ȳ q) ( Ypl Ȳ l), 1 q, l m, Ȳ q = 1 n (Y 1q + + Y nq ), nd ( t(2) t (1) ) /2 q = 1 w q = ( t(q+1) t (q 1) ) /2 1 < q < m ( t(m) t (m 1) ) /2 q = m. (2.9) The higher order estimtes {ˆλ k, ˆγ k }, k 2 re the solutions to the following optimiztion problems: mx γ = 1, γ, ˆγ j = 0, j = 1,, k 1 mq=1 ml=1 ˆΣql γ(t (q) )γ(t (l) )w q w l, (2.10) γ 2 + α k [γ, γ] where α k is the smoothing prmeter for the k-th estimtes nd the estimtes of eigenfunctions re orthogonl to ech other. We cn choose different smoothing prmeters for different principl components. The ide behind our method is s follows. The true eigenvlues nd eigenfunctions re the solutions to the following successive optimiztion problems: mx γ = 1, γ, γ j = 0, j = 1,, k 1 γ, Γγ γ 2. where Γγ is the function defined by (Γγ)(t) = b Γ(t, s)γ(s)ds. These optimiztion problems depend on the covrince function Γ only through the inner product γ, Γγ. Hence, we use the numertors in (4.95) nd (2.10) to pproximte γ, Γγ. However, if there re no penlty terms in the denomintors in (4.95) nd (2.10), the mximum vlues of (4.95) nd (2.10) re infinities. Since tuning α k does not ffect the first k 1 estimtes, we cn tune

14 the prmeters one by one. We give theorem on the existence nd chrcteriztion of solutions of the successive optimiztion problems (4.95) nd (2.10). Our methods solve the optimiztion problems in the Sobolev spce, hence mny powerful tools from the theory of Hilbert spce cn be used to study the symptotic consistency of our method. We give theorem on the existence nd chrcteriztion of solutions of the successive optimiztion problems (4.95) nd (2.10). Theorem 2.3.1. The solutions {ˆλ k, ˆγ k : k 1} of the successive optimiztion problems (4.95) nd (2.10) exist for ny {α k > 0, k 1}. Moreover, for ech k, ˆγ k hs continuous second derivtives on [, b] nd, on ny subintervl {[t (q 1), t (q) ], 1 q m 1}, it cn be written s liner combintion of the following t most 4k functions, where 1 j k. exp t 1 sin t 1, exp t 1 cos t 1 2α 2α 2α 2α 4 j 4 j 4 j exp t 1 sin t 1, exp t 1 cos t 1, 2α 2α 2α 2α 4 j Hence, the first solution ˆγ 1 is similr to smoothing splines except tht the solutions to the optimiztion problems in smoothing spline methods re cubic polynomils between ny two djcent observtion points. 4 j 4 j 4 j 4 j 2.3.2 Irregulr cse In this cse, we ssume the observtion time points re {t pq : p = 1,, n, q = 1,, N p }, where n is the number of smple curves nd N p is the number of the observtion points of the p-th smple function X p. The model is Y pq = X p (t pq ) + ɛ pq, p = 1,, n, q = 1,, N p, (2.11)

15 where Y pq is the observtion of the rndom function X p t time t pq nd ɛ pq is the mesurement error. For irregulr cse, we ssume tht the men function µ(t) is smooth nd the observtion points t pq re rndom vribles with density function h(t) which is bounded below wy from zero on [, b]. Our FPCA procedure for irregulr cse hs three steps. In the first step, we estimte the men function µ(t) bsed on the pooled dt from ll individuls by locl liner smoother. This step is the sme s the first step of the procedure in Yo et l. [54]. We define the estimte ˆµ(t) of µ(t) by solving the following optimiztion problem min,b N n p ( ) tpq t κ {Y pq b(t t pq )} 2, (2.12) p=1 q=1 η µ where κ is the kernel, nd η µ is the bndwidth. Let â(t) nd ˆb(t) be the minimizers, then ˆµ(t) = â(t). In the second step, we estimte the density function h(t) bsed on pooled observtion time points by the mximum penlized likelihood estimtion method (see Silvermn [40], Silvermn [42] nd Chpter 6 in Rmsy nd Silvermn [33]). Let ĝ(t) be the minimizer of the functionl 1 N n N p b g(t pq ) + e g(t) dt + η g [g, g], (2.13) p=1 q=1 where N = n p=1 N p nd η g is smoothing prmeter, then the estimte ĥ(t) = eĝ(t). Here we use the mximum penlized likelihood estimtion method insted of the kernel density estimtion method becuse the density estimte in this step will pper in the denomintors in the third step. Hence, the density estimte must be positive. In the mximum penlized likelihood estimtion, the log density is first estimted, then its exponentil is clculted s the density estimte. Hence, the mximum penlized likelihood density estimte is strictly positive. The third step is to solve the following successive optimiztion problems. The first pir of estimtes {ˆλ 1, ˆγ 1 } of {λ 1, γ 1 } re the mximum vlue nd the solution to the optimiztion

16 problem: mx γ W 2 2 ([, b]), γ = 1 1 n n 1 N p χ [Np>1] p=1 N p (N p 1) l q:1 γ 2 + α 1 [γ, γ] U (p) ql, (2.14) where α 1 > 0 is smoothing prmeter, χ [Np>1] is the indictor function of N p > 1, n = n p=1 χ [Np>1] is the totl number of the smple functions with t lest two observtion points nd U (p) ql = γ(t pq)(y pq ˆµ(t pq )) ĥ(t pq ) γ(t pl)(y pl ˆµ(t pl )). ĥ(t pl ) The higher order estimtes {ˆλ k, ˆγ k }, k 2 re the solutions to the following optimiztion problems: mx γ = 1, γ, ˆγ j = 0, j = 1,, k 1 1 n n 1 N p χ [Np>1] p=1 N p (N p 1) l q:1 γ 2 + α k [γ, γ] U (p) ql, (2.15) where α k is the positive smoothing prmeter for the k-th estimtes. Now we intuitively explin (2.14) nd (2.15). For ech 1 p n, if N p > 1, then is n pproximtion to the U-sttistic 1 N p (N p 1) N p l q:1 U (p) ql 1 N p (N p 1) N p l q:1 γ(t pq )(Y pq µ(t pq )) h(t pq ) γ(t pl)(y pl µ(t pl )). (2.16) h(t pl ) For different p s with N p > 1, (2.16) re independently nd identiclly distributed rndom vribles if we ssume tht N p is rndom vrible independent of t pq nd the rndom

17 function X. Therefore, by the lw of lrge numbers, the numertors in (2.14) nd (2.15) re pproximtion to γ, Γγ. We give similr theorem s Theorem 2.3.1 for the irregulr cse. Theorem 2.3.2. The solutions {(ˆλ k, ˆγ k ) : k 1} of the successive optimiztion problems (2.14) nd (2.15) exist for ny {α k > 0, k 1}. Moreover, for ech k, ˆγ k hs continuous second derivtives on [, b] nd on the subintervl between ny two djcent pooled observtion points, it cn be written s liner combintion of the following t most 4k functions, where 1 j k. exp t 1 sin t 1, exp t 1 cos t 1 2α 2α 2α 2α 4 j 4 j 4 j exp t 1 sin t 1, exp t 1 cos t 1, 2α 2α 2α 2α 4 j 4 j 4 j 4 j 4 j 2.3.3 Computtionl issues Although Theorems 2.3.1 nd 2.3.2 give the forms of the solutions to the successive optimiztion problems in our FPCA procedure, it is not convenient to compute the exct solutions in prctice. Insted, we choose n pproprite bsis nd use the bsis expnsions to pproximte the solutions to the successive optimiztion problems s did [33] in Section 9.4. We develop similr lgorithms for computing the solutions to the successive optimiztion problems in our method s those in Section 9.4 of [33]. We first choose n pproprite bsis {φ ν } M ν=1, where M is the number of bsis functions. For exmple, we cn choose the Fourier series s our bsis for the periodic cse nd the B-spline bsis for the nonperiodic cse. Let γ k = M ν=1 c kν φ ν, k 1, be the solutions to (4.2) or (2.15) restricted to the liner spce spnned by the bsis functions. They re the pproximtions to {ˆγ k } k 1. The coefficients c k = (c k1,, c km ) T re solutions to the following successive optimiztion problems, mx c R M, c T Jc = 1 c T j Jc = 0, j = 1,, k 1 c T Vc c T Jc + α k c T Kc. (2.17)

18 J nd K re M M mtrices with elements J νν = b φ ν(t)φ ν (t)dt nd K νν = b φ ν(t)φ ν (t)dt, ν, ν = 1,, M, where φ ν is the second derivtive of φ ν. V is M M mtrix with elements in regulr cse nd V νν = m q=1 l=1 m ˆΣ ql φ ν (t (q) )φ ν (t (l) )w q w l, (2.18) V νν = 1 n in irregulr cse. n p=1 χ [Np>1] N p (N p 1) N p l q:1 φ ν (t pq )(Y pq ˆµ(t pq )) ĥ(t pq ) φν (t pl)(y pl ˆµ(t pl )), (2.19) ĥ(t pl ) The lgorithm for solving (2.17) is s follows: Perform Cholesky fctoriztion L T 1 L 1 = J + α 1 K nd clculte the inverse mtrix L 1 1 of L 1. Let B 1 = (L 1 1 ) T VL 1 1 nd compute the first eigenvector d 1 of B 1. Then c 1 = L 1 where r 1 is rel number chosen such tht c T 1 Jc 1 = 1. 1 d 1 r 1, For k > 1, suppose tht we hve obtined c 1, c k 1. Perform the Cholesky fctoriztion L T k L k = J + α k K nd clculte the inverse mtrix L 1 k of L k. Let C k 1 = [c 1, c k 1 ], tht is, C k 1 is n M (k 1) mtrix with the j-th column equl to c j. Perform QR-decomposition Q k R k = (L 1 k )T JC k 1, where Q k is M (k 1) mtrix with columns hve norm 1 nd orthogonl to ech other nd R k is n upper tringulr mtrix. Clculte the projection mtrix P k = I Q k Q T k onto the liner spce orthogonl to the liner spce spnned by the columns of (L 1 k )T JC k 1, where I is the identity mtrix of M dimension. Let B k = P k (L 1 k )T VL 1 k P k nd compute the first eigenvector d k of B k. Then c k = L 1 k d k r k, where r k is rel number chosen such tht c T k Jc k = 1.

19 2.3.4 Consistency We ssume throughout this section tht we wnt to estimtes the first K principl component curves, where K is ny fixed positive integer number. First, we consider the regulr model (2.7). For this model, we consider the following two cses for the distributions of t q : Cse 1 (Nonrndom Cse). {t q, 1 q m} re nonrndom. Define ( ) δ m = mx t(q) t (q 1). (2.20) 2 q m Cse 2 (Rndom Cse). {t q, 1 q m} re i.i.d. rndom vribles hving density functions h(t) in [, b] with respect to Lebesgue mesure nd re independent of the rndom functions X p, 1 p n. Furthermore, h(t) hs positive lower bound c. In order to give the consistency result for the regulr model (2.7), we need the following two more ssumptions: Assumption 4. The mesurement errors ɛ pq, 1 p n, 1 q m re independent rndom vribles nd re independent of the rndom functions X p, 1 p n nd the observtion times t q, 1 q m. For ech q, {ɛ 1q,, ɛ nq } hve the sme distribution with men 0 nd vrince σq. 2 Furthermore, sup σq 2 σ 2, q sup E ɛ ql 3 ρ, q,l where σ nd ρ re some positive numbers nd do not depend on m. Remrk 2.3.3. We do not ssume tht ll the mesurement errors hve the sme distributions. Insted we only ssume tht the errors rising t the sme observtion time hve the sme distribution, which is more generl thn the former. Assumption 5. The covrince function Γ(s, t) is continuous function in [, b] [, b]. Define function ϖ(δ) = sup [Γ(t, t) 2Γ(s, t) + Γ(s, s)], (2.21) s,t [,b], s t δ

20 where 0 < δ b. Note tht Γ(t, t) 2Γ(s, t) + Γ(s, s) = E [ ((X(s) µ(s)) (X(t) µ(t))) 2]. Under Assumption 5, we hve lim δ 0 ϖ(δ) = 0 nd Γ is bounded. If G is smooth, then ϖ(δ) = O(δ). Although the covrince functions of Brownin motion nd Poisson process with rte 1 re not smooth, for both of them, we hve E [ ((X(s) µ(s)) (X(t) µ(t))) 2] = t s, nd therefore, ϖ(δ) = δ Theorem 2.3.4. Under Assumptions 1 5, suppose tht m, n, mx 1 k K α k 0 nd If the following is stisfied tht for Cse 1, mx 1 k K α k min 1 k K α k = O p (1). (2.22) 1 δm ϖ(δ m ) + δ m + 0 min 1 k K α k n nd for Cse 2, 1 min 1 k K α k ϖ( 3 log m cm ) + log m m + log m 0, nm then the estimtors {(ˆλ k, ˆγ k ) : 1 k K} re consistent. Second, we consider the irregulr model (4.130). For this model, we mke the following ssumptions on the number of observtion points, mesurement errors, men functions nd density functions. They re ctully prts of ssumptions in Yo et l. [54] nd Hll et l. [20]. Assumption 6. The numbers of the observtion points N p, 1 p n, re i.i.d rndom vribles tking positive integer vlues with EN p < nd P (N p > 1) > 0. The mesurement

21 errors ɛ pq, 1 p n, 1 q m re i.i.d rndom vribles with men zero nd finite vrince. The rndom functions, the observtion points, the number of the observtion points nd the mesurement errors re independent. Assumption 7. Both the men function nd the density function hve squre integrble second derivtives, tht is, µ(t), h(t) W2 2 ([, b]). The kernel κ in (4.91) is compctly supported, symmetric nd Hölder continuous. The smoothing prmeter η µ in (4.91) stisfies n ρ 1 1 2 ηµ = o(1), η µ = o(n 1 4 ), where ρ 1 > 0 is some constnt. There re two positive constnts c < C such tht c h(t) C t b nd the smoothing prmeter η g stisfies η g 0, n 1 ρ 2 η g, where ρ 2 > 0 is some constnt. Now we present the consistency result for the irregulr model. Theorem 2.3.5. Under Assumptions 1 3 nd 6 7, suppose tht n, mx 1 k K α k 0 nd If the following is stisfied mx 1 k K α k min 1 k K α k = O p (1). [ ] 1 n 1 2 (η 1 + η 1 2 ɛ g ) + η 3 4 ɛ g 0, min 1 k K α k for some ɛ > 0, then the estimtors {(ˆλ k, ˆγ k ) : 1 k K} re consistent. 2.3.5 Extensions to FPCA for functionl dt with multidimensionl rguments Functionl dt with multidimensionl rguments re collected in growing number of fields. For exmple, in sptil dt nlysis, dt re collected from different plces

22 nd t different times. Such dt cn be view s discretely observed functionl dt which re functions of both spce nd time. Anlysis of such dt is considered n importnt direction of functionl dt nlysis (see Section 22.2 in Rmsy nd Silvermn [33] nd Section 5 in Müller [28]). Our method cn be esily extended in this context by defining similr successive optimiztion problems in multidimensionl spces. The numertors in the successive optimiztion problems (2.14) nd (2.15) cn be strightforwrdly extended to the multidimensionl cse nd the penlty terms in the denomintors cn be replced with J d m(γ) = j 1 + +j d =m m! j 1! j d! Ω m γ t j 1 1 t j d j 2 dt 1 dt d where d is the dimension of the spce of the rguments, Ω R d is the region in which the function is defined nd we ssume tht the eigenfunctions hve squre integrble m-th derivtives. Our method voids the estimtes of covrince functions which hve 2d rguments nd re not very esy to estimte when d 2. 2.4 Simultion studies To illustrte the performnce of our method, we conduct three simultion studies. In the first study, the smple curves re smooth with both eqully nd uneqully spced observtion time points, nd we will compre our method with n lterntive method (Method II) which first obtins the smooth estimte of men curve nd covrince functions, nd then compute the eigenfunctions of the smoothed covrince function s the estimtions of the PC curves. We use the softwre pckge PACE for the second method, which ws developed by Yo et l, nd downloded the softwre from http://www.stt.ucdvis.edu/pace/downlod. In the second study, the smple curves re simulted with 3 true principle curves nd we will compre our mthod with Method II. In the third study, we simulte rndom surfces nd perform FPCA in two-dimensionl spce with our method.

23 2.4.1 Smooth rndom curves with 2 PC curves We first simulte 200 curves from the following rndom curve on [0, 1], X(t) = 2 sin(2πu) sin( πt ) + cos(2πu) sin(3πt 2 2 ), where U is rndom vrible with uniform distribution on [0, 1]. The covrince function of X(t) hs two nonzero eigenvlues nd the corresponding eigenfunctions re 2 sin( πt) nd 2 2 sin( 3πt). Figure 2.1 shows the plot of the first two principl component curves. 2 The first two principl component curves γ(t) 1.5 1.0 0.5 0.0 0.5 1.0 1.5 The 1st PC curves The 2nd PC curves 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.1 The First Two Principl Component Curves

24 one smple curve with discrete nd noise observtions X(t) 8 6 4 2 0 2 4 6 o one smple curve 101 noise observtion points 0.0 0.2 0.4 0.6 0.8 1.0 t 200 smple curves without noise 200 discretely observed smple curve with noise X(t) 10 5 0 5 10 X(t) 10 5 0 5 10 0.0 0.2 0.4 0.6 0.8 1.0 t 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.2 Simulted Smple Curves for the simultions in Section 2.4.1 For discrete observtions, we will consider two cses. Regulr cse The observed dt re generted from the following model Y pq = X p (t q ) + ɛ pq, t q = q 1, q = 1, n, p = 1,, 200. 100

25 where the mesurement errors ɛ pq N(0, 3), the observtion points {t pq } re eqully spced on [0, 1]. We consider n = 101, 51 nd 21, tht is, we smple different number of observtion points: 101 eqully spced with mesurement errors; 51 eqully spced with mesurement errors; 21 eqully spced with mesurement errors. Plots of the simulted smple curves re shown in 2.2. We will estimte the first two eigenfunctions by our method nd Method II respectively. We conducted 200 simultions nd in ech simultion, 200 observtions re generted s trining smple nd 500 observtions s test smple. For our method, use the usul crossvlidtion procedure to select the smoothing prmeter α from {1e 10, 1e 09, 1e 08, 1e 07, 1e 06, 1e 05, 1e 04, 1e 03, 1e 02, 1e 01}, such tht the totl vrince ccounted for by ll the principl components on the test dt is mximized. We obtin the smoothing prmeter α = 1e 04. The prmeters for Method II cn be chosen by generlized crossvlidtion (GCV) method. Tble 2.1 lists the cumultive vrince of selected principl component scores. Under different settings, the first two estimted principle component curves obtined by our method explin lrger totl vrition in the dt. Tble 2.1 The verges nd stndrd devitions of cumultive vrince of selected principl component scores for the simultions in Section 2.4.1: Regulr Cse. For ech smpling strtegy shown in column 1, the first row is the verge nd stndrd devition of the first estimted PC score vrince; the second row is for the second esitmted PC score vrince. Our Method Method II (PACE) Selected PCs Vr.PC.Score Vr.PC.Score Vr.PC.Score Vr.PC.Score (Men) (Vrince) (Men) (Vrince) 101 Eqully spced 1st PC 0.005264516 0.0002149985 0.0053 0.0035 2nd PC 0.008040198 0.0001687864 0.0056 0.0035 51 Eqully spced 1st PC 0.01088641 0.0004329838 0.0108 0.0075 2nd PC 0.01695650 0.0004169770 0.0119 0.0075 21 Eqully spced 1st PC 0.03002970 0.001443941 0.0322 0.0164 2nd PC 0.04807245 0.001607120 0.0391 0.0164

26 Irregulr cse In this cse, the observed dt re generted from the following model Y pq = X p (t q ) + ɛ pq, q = 1, n, p = 1,, 200. where the mesurement errors ɛ pq N(0, 3), the observtion points {t pq } re i.i.d rndom vribles from Uniform[0, 1]. Tht is, we smple 200 curves nd mke different number of observtion points: 101 uneqully spced with mesurement errors; 51 uneqully spced with mesurement errors; 21 uneqully spced with mesurement errors. We use the principl component curves estimted from both methods to pproximte the true principl component curves. We conducted 200 simultions. Similrly to the regulr cse, we use the usul cross-vlidtion procedure to obtin the prmeters for our method such tht the totl vrince ccounted for by ll the principl components on the test dt is mximized. We obtin the smoothing prmeter α = 1e 04 nd the cumultive vrince of selected PC scores re listed in Tble 2.2. Under different settings, the estimted PC curves obtined by our method explin lrger totl vrition in the dt. Tble 2.2 The verges nd stndrd devitions of cumultive vrince of selected PC scores for the simultions in Section 2.4.1: Irregulr Cse. For ech smpling strtegy shown in column 1, the first row is the verge nd stndrd devition of the first estimted PC score vrince; the second row is for the second esitmted PC score vrince. Our Method Method II (PACE) Selected PCs Vr.PC.Score Vr.PC.Score Vr.PC.Score Vr.PC.Score (Avg.) (Std.) (Avg.) (Std.) 101 Uneqully spced 1st PC 0.005266157 0.0001979994 0.0053 0.0037 2nd PC 0.007900791 0.0001649976 0.0056 0.0037 51 Uneqully spced 1st PC 0.01037529 0.0004865600 0.0112 0.0061 2nd PC 0.01672025 0.0004334033 0.0123 0.0061 21 Uneqully spced 1st PC 0.02997724 0.001536284 0.0288 0.0171 2nd PC 0.04821568 0.001759098 0.0359 0.0170

27 2.4.2 Smooth rndom curves with 3 PC curves We consider the following rndom curve on [0, 1], X(t) = 3β 1 sin( πt 2 ) + 2β 2 sin( 3πt 2 ) + β 3 sin( 5πt 2 ), where β i, i = 1, 2, 3 re rndom vribles with norml distribution on [0, 1]. The covrince function of X(t) hs three nonzero eigenvlues nd the corresponding eigenfunctions re 2 sin( πt ), 2 sin( 3πt) nd 2 sin( 5πt ). Figure 2.3 shows the plot of the first three principl 2 2 2 component curves. The first three principl component curves γ(t) 1.5 1.0 0.5 0.0 0.5 1.0 1.5 The 1st PC curves The 2nd PC curves The 3rd PC curves 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.3 The First Three Principl Component Curves

28 one smple curve with discrete nd noise observtions X(t) 6 4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 t 200 smple curves without noise 200 discretely observed smple curve with noise X(t) 10 5 0 5 10 X(t) 10 5 0 5 10 0.0 0.2 0.4 0.6 0.8 1.0 t 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.4 Simulted Smple Curves for the simultions in Section 2.4.2

29 Similr to the previous simultion, we will consider two cses. Regulr cse The observed dt re generted from the following model Y pq = X p (t q ) + ɛ pq, t q = q 1, q = 1, n, p = 1,, 200. 100 where the mesurement errors ɛ pq N(0, 3), the observtion points {t pq } re eqully spced on [0, 1]. Plots of the simulted smple curves from one simultion re shown in 2.4. We conducted 300 simultions with different number of observtion points smpled: 101 eqully spced mesurements; 51 eqully spced mesurements; 21 eqully spced mesurements. In ech simultion,200 observtions re generted s the trining smple nd 500 observtions s the test smple. For our method, we use the usul cross-vlidtion to obtin the smoothing prmeter from {1e 10, 1e 09, 1e 08, 1e 07, 1e 06, 1e 05, 1e 04, 1e 03, 1e 02, 1e 01}, such tht the totl vrince ccounted for by ll the principl components on the test dt is mximized. The smoothing prmeter chosen under different smpling strtegies re listed in 2.3. We use the principl component curves estimted from both methods to pproximte the true principl component curves. The cumultive vrince of selected PC scores re listed in Tble 2.4. We cn see tht the first three estimted principle component curves obtined by our method cccount for lrger vrition in the dt. Tble 2.3 Selected smoothing prmeter with the usul cross-vlidtion procedure for the simultions in Section 2.4.2: Regulr Cse Smoothing Prmeter 101 Eqully Spced 51 Eqully Spced 21 Eqully Spced α 1e-09 1e-06 1e-04 Irregulr cse In this cse, the observted dt re generted from the following model Y pq = X p (t q ) + ɛ pq, q = 1, n, p = 1,, 200. where the mesurement errors ɛ pq N(0, 3), the observtion points {t pq } re i.i.d rndom vribles from Uniform[0, 1]. We use the principl component curves estimted from both

30 Tble 2.4 The verges nd stndrd devitions of cumultive vrince of selected PC scores for the simultions in Section 2.4.2: Regulr Cse. For ech smpling strtegy shown in column 1, the first row is the verge nd stndrd devition of the first estimted PC score vrince; the second row is for the second esitmted PC score vrince. Our Method Method II (PACE) Selected PCs Vr.PC.Score Vr.PC.Score Vr.PC.Score Vr.PC.Score (Avg.) (Std.) (Avg.) (Std.) 101 Eqully spced 1st PC 0.04248567 0.003085132 0.0452 0.0030 2nd PC 0.06662123 0.003577294 0.0456 0.0030 51 Eqully spced 1st PC 0.08836079 0.005708103 0.0894 0.0058 2nd PC 0.13893059 0.006298565 0.0909 0.0058 21 Eqully spced 1st PC 0.2178350 0.01486907 0.2221 0.0152 2nd PC 0.3435396 0.01646455 0.2313 0.0153 3rd PC 0.3506672 0.01646290 0.2344 0.0157 methods to pproximte the true principl component curves with our method nd Method II. Similrly to the regulr cse, we conducted 300 simultions with different number of observtion points smpled: 101 uneqully spced mesurements; 51 uneqully spced mesurements; 21 uneqully spced mesurements. We use the usul cross-vlidtion procedure to obtin the prmeters from for our method, such tht the totl vrince ccounted for by ll the principl components on the test dt is mximized. The smoothing prmeter chosen under different smpling strtegies re listed in 2.5 nd the cumultive vrince of selected PC scores re listed in Tble 2.6. Tble 2.5 Selected smoothing prmeter with the usul cross-vlidtion procedure for the simultions in Section 2.4.2: Irregulr Cse 101 Unqully Spced 51 Unqully Spced 21 Unqully Spced α 1e-06 1e-04 1e-03

31 Tble 2.6 The verges nd stndrd devitions of cumultive vrince of selected PC scores for the simultions in Section 2.4.2: Irregulr Cse. For ech smpling strtegy shown in column 1, the first row is the verge nd stndrd devition of the first estimted PC score vrince; the second row is for the second esitmted PC score vrince. Our Method Method II (PACE) Selected PCs Vr.PC.Score Vr.PC.Score Vr.PC.Score Vr.PC.Score (Avg.) (Std.) (Avg.) (Std.) 101 Uneqully spced 1st PC 0.04388102 0.002888812 0.0436 0.0029 2nd PC 0.06803005 0.003254693 0.0440 0.0029 3rd PC 0.06900475 0.003255002 0.0440 0.0029 51 Uneqully spced 1st PC 0.08348801 0.005615962 0.0757 0.0051 2nd PC 0.12794842 0.006420639 0.0772 0.0051 3rd PC 0.12948339 0.006429075 0.0772 0.0051 21 Uneqully spced 1st PC 0.2062827 0.01513302 0.2206 0.0140 2nd PC 0.3182653 0.01546205 0.2304 0.0140 3rd PC 0.3275431 0.01565964 2.4.3 Rndom surfce We smple 200 surfces from the distribution of ( ) ( ) π(s t) π(s + t) X(s, t) = 1 + e s cos (t) + (s 1) 2 t + ξ 1 sin + ξ 2 sin, 2 2 where 0 s, t 1, ξ 1 N(0, 2), ξ 2 N(0, 1). The summtion of the first three terms is the men function. The covrince function of X(s, t) hs two nonzero eigenvlues with eigenfunctions 1 (1 2 ( ) π(s t) 2 π ) sin, 2 1 (1 + 2 ( ) π(s + t) 2 π ) sin, 0 s, t 1. 2 For ech smpled surfce, we mke 10 to 30 observtions (irregulr cse) from distribution with truncted bivrite norml density with men (0.4, 0.6) nd covrince mtrix I restricted to the region [0, 1] [0, 1]. The number of the observtions for ech surfce is rndom vrible with discrete uniform distribution on {10, 11,, 30}. The

32 mesurement error ɛ N(0, 0.2 2 ). The eigenfunctions nd their estimtes in one simultion re plotted in Figure 2.5. The smllest MISE of our method re 0.022 nd 0.026 for the first two eigenfunctions respectively. 2 2 1 1 f(s,t) 0 f(s,t) 0 1 1 2 1.0 0.8 0.6 s 0.4 0.2 0.0 0.0 0.2 0.4 t 0.6 0.8 1.0 2 1.0 0.8 0.6 s 0.4 0.2 0.0 0.0 0.2 0.4 t 0.6 0.8 1.0 2 2 1 1 f(s,t) 0 f(s,t) 0 1 1 2 1.0 0.8 0.6 0.4 t 0.2 0.8 0.6 0.0 0.2 0.4 s 2 1.0 0.8 0.6 0.4 t 0.2 0.8 0.6 0.0 0.2 0.4 s 0.0 1.0 0.0 1.0 Figure 2.5 Eigenfunctions nd their estimtes in one simultion: The top left is the first true eigenfunction; the top right is the estimte of the first eigenfunction; the bottom left is the second eigenfunction; the bottom right is the estimte of the second eigenfunction.

33 2.5 Applictions 2.5.1 Retinl pigment epithelium (RPE) dt The retinl pigment epithelium (RPE) is the pigmented cell lyer between the choroid nd the photoreceptor cell lyer of eye. RPE is essentil for visul function (see Struss Struss [45]). It provides multiple functions tht support norml photoreceptor function, such s shielding the retin from excess incoming light, trnsporting wter, nutrients nd metbolic end products between the subretinl spce nd the blood, s well s secreting vriety of growth fctors nd signling molecules (Zinn nd Mrmor Zinn nd Mrmor [55]). RPE is key site of pthogenesis of ge-relted mculr degenertion (AMD) which is min source of vision loss even blindness in the elderly (Spide nd et l. Spide nd et l. [43]). The dt is the collection of imges of RPE cells of 88 mouse eyes provided in Emory Eye Center s L. F. Montgomery Lb t Emory University (Jing nd et l. Jing nd et l. [24]). The purpose of the study is to exmine the reltionship between the morphology of RPE lyer nd the ge nd disese sttus of the eye. Specificlly, it is desirble to construct clssifiction rule bsed on the dt so tht the morphology of RPE of the eyes with different genotypes nd in different ge groups cn be seprted. There re two genotypes: wild nd mutted, nd two ge groups: young (ge 60 dys) nd elderly (ge > 60) groups in the dt. Hence, we hve four clsses (tht is, four combintions of genotypes nd ge groups). In ech imge, there re severl thousnds of cells. Severl chrcteristics of ech cell were mesured including re, perimeter, spect rtio, nd so on. Locl regions of two imges with different genotypes, but hving the sme ge equl to 60 dys, re shown in Figure 2.6 (Chrenek nd et l. Chrenek nd et l. [10]). It cn be seen tht the distributions of the re nd the shpe of cells in the two imges re quite different. Hence, we use the distributions of the re nd the spect rtio ( mesure of shpes) of cells s clssifiers respectively. The density curves of the re nd the spect rtio for ech eye re estimted using the penlized likelihood method (see Section 5.4.3 in Rmsy, Hooker nd Grves Rmsy et l. [34]), respectively, nd the principl component scores re clculted nd used to construct the clssifiction rules. The eyes in different ge groups cn be seprted using the distribution of the re of cells which cnnot distinguish the eyes with different genotypes in the sme

34 Figure 2.6 Locl regions of two imges with different genotypes, but sme ge equl to 60 dys: (Left) RPE cells of the wild type nd ge 60 dys; (Right) RPE cells of the mutted type nd ge 60 dys. ge groups. Conversely, the distribution of the spect rtio of cells cn seprte the eyes with different genotypes, but cnnot distinguish those with the sme genotypes in different ge groups. Hence, we will combine the informtion of the re nd the spect rtio of cells together nd pply our method to the joint density functions. The dt contins 88 imges of mouse eyes. 27 re in the young ge group with the wild genotype, 13 re in the elderly ge group with the wild genotype, 27 re in the young ge group with the mutted genotype nd 21 re in the elderly ge group with the mutted genotype. We first estimte the joint density function of the re nd the spect rtio of cells in ech imge using the kernel method (see Section 5.6 in Venbles nd Ripley [49]). The vlues of the density functions re clculted on grid of 731 21 eqully spced points in the two-dimensionl spce (the re of cells re distributed between 0 nd 730 µm 2 nd the spect rtio between 0 nd 1). The men joint densities of the four ctegories re plotted in Figure 2.7 which indictes the joint density curve is good clssifier of the genotype nd the ge group. We pply our method to the 88 joint density functions. Most of vritions in the dt re ccounted for by the first four principl components which re plotted in Figure 2.8. Then we clculte four PC scores for ech eye imge, hence ll the PC scores form 88 4 mtrix which is used to construct clssifiction rules. We pply three clssifiction methods, LDA (liner discriminnt nlysis), QDA (qudrtic discriminnt nlysis) nd SVM (support vector mchine), to the mtrix. Leve-one-out cross vlidtion is used to