An Introduction to Functional Data Analysis

Size: px

Start display at page:

Download "An Introduction to Functional Data Analysis"

Nigel Taylor
6 years ago
Views:

1 An Introduction to Functional Data Analysis Chongzhi Di Fred Hutchinson Cancer Research Center Biotat 578A: Special Topics in (Genetic) Epidemiology November 10, 2015

2 Textbook Ramsay and Silverman (2005), Functional Data Analysis, 2nd edition, Springer. Ruppert, Wand and Carroll (2003), Semiparametric Regression, Cambridge University Press. Software: R packages fda: Functional Data Analysis refund: Regression with Functional Data SemiPar: Semiparametric Regression Acknowledgements Prof. Giles Hooker, Cornell University hooker 1

3 Introduction What Is Functional Data? Functional data is multivariate data with an ordering on the dimensions. (Müller, (2006)) Key assumption is smoothness: y ij = x i (t ij ) + ɛ ij with t in a continuum (usually time), and x i (t) smooth Functional data = the functions x i (t). 2 Highest quality data from monitoring equipment Optical tracking equipment (eg handwriting data, but also for physiology, motor control,...) Electrical measurements (EKG, EEG and others) Spectral measurements (astronomy, materials sciences) But, noisier and less frequent data can also be used. 8 /181

4 Introduction Canadian Weather Data Average daily temperature and precipitation records in 35 weather stations across Canada (classical and much over-used) Temperature Precipitation 3 Interest is in variation in and relationships between smooth, underlying processes. 10 / 181

5 Introduction Weather In Vancouver Measure of climate: daily precipitation and temperature in Vancouver, BC averaged over 40 years. 4 9 / 181

6 Introduction Medfly Data Records of number of eggs laid by Mediterranean Fruit Fly (Ceratitis capitata) in each of 25 days (courtesy of H.-G. Müller). Total of 50 flies Assume eggcount measurements relate to smooth process governing fertility Also record total lifespan of each fly. Would like to understand how fecundity at each part of lifetime influences lifespan / 181

7 Multilevel Functional Principal Component Analysis Introduction SHHS: Sleep Heart Health Study More than 3, 000 subjects, two visits per subject Y ij (t): normalized EEG δ power series subject 1 visit 1 subject 1 visit subject 2 visit 1 subject 2 visit subject 3 visit 1 subject 3 visit time (hours) time (hours) / 27

8 FDA for accelerometry data Introduction Accelerometry data Activity Count Data: three dimensional time series per subject 1-minute resolution: time points in 7 days P10002 day P10002 day P10002 day P10002 day P10002 day P10002 day P10002 day / 21

9 Introduction What Are We Interested In? Representations of distribution of functions mean variation covariation Relationships of functional data to covariates responses other functions Relationships between derivatives of functions. Timing of events in functions / 181

10 Introduction What Are The Challenges? Estimation of functional data from noisy, discrete observations. Numerical representation of infinite-dimensional objects Representation of variation in infinite dimensions. Description of statistical relationships between infinite dimensional objects. n < p =, and use of smoothness. Measures of variation in estimates / 181

11 Representing Functional Data Representing Functional Data / 181

12 Representing Functional Data From Discrete to Functional Data Represent data recorded at discrete times as a continuous function in order to Medfly record 1 Allow evaluation of record at any time point (especially if observation times are not the same across records). Evaluate rates of change. Reduce noise. Allow registration onto a common time-scale / 181

13 Representing Functional Data From Discrete to Functional Data Two problems/two methods 1 Representing non-parametric continuous-time functions. Basis-expansion methods: x(t) = K φ i (t)c i for pre-defined φ i (t) and coefficients c i. Several basis systems available: focus on Fourier and B-splines 2 Reducing noise in measurements Smoothing penalties: n c = argmin (y i x(t i )) 2 + λ [Lx(t)] 2 dt i=1 Lx(t) measures roughness of x λ a smoothing parameter that trades-off fit to the y i and roughness; must be chosen. i= / 181

14 Representing Functional Data: Basis Expansions Basis Expansions Consider only one record represent x(t) as x(t) = y i = x(t i ) + ɛ i K c j φ j (t) = Φ(t)c j=1 We say Φ(t) is a basis system for x. Terms for curvature in linear regression 13 implies y i = β 0 + β 1 t i + β 2 t 2 i + β 3 t 3 i + + ɛ i x(t) = β 0 + β 1 t + β 2 t 2 + β 3 t 3 + Polynomials are unstable; Fourier bases and B-splines will be more useful. 18 / 181

15 Representing Functional Data: Basis Expansions The Fourier Basis basis functions are sine and cosine functions of increasing frequency: 1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt),... sin(mωt), cos(mωt),... constant ω = 2π/P defines the period P of oscillation of the first sine/cosine pair / 181

16 Representing Functional Data: Basis Expansions B-spline Bases Splines are polynomial segments joined end-to-end. Segments are constrained to be smooth at the joins. The points at which the segments join are called knots. System defined by The order m (order = degree+1) of the polynomial the location of the knots. Bsplines are a particularly useful means of incorporating the constraints. See de Boor, 2001, A Practical Guide to Splines, Springer / 181

17 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 2: piecewise linear, continuous / 181

18 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 3: piecewise quadratic, continuous derivatives / 181

19 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 4: piecewise cubic, continuous 2nd derivatives / 181

20 Representing Functional Data: Basis Expansions An illustration of basis expansions for B-splines 19 Sum of scaled basis functions results in fit. 26 / 181

21 Representing Functional Data: Smoothing Penalties Ordinary Least-Squares Estimates Assume we have observations for a single curve y i = x(t i ) + ɛ and we want to estimate K x(t) c j φ j (t) j=1 Minimize the sum of squared errors: n SSE = (y i x(t i )) 2 = i=1 This is just linear regression! n (y i Φ(t i )c) 2 i= / 181

22 Representing Functional Data: Smoothing Penalties Linear Regression on Basis Functions If the N by K matrix Φ contains the values φ j (t k ), and y is the vector (y 1,...,y N ), we can write SSE(c) = (y Φc) T (y Φc) The error sum of squares is minimized by the ordinary least squares estimate ĉ = Then we have the estimate ( Φ T Φ) 1 Φ T y ( 1 ˆx(t) = Φ(t)ĉ = Φ(t) Φ Φ) T Φ T y / 181

23 Representing Functional Data: Smoothing Penalties Smoothing Penalties Problem: how to choose a basis? Large affect on results. Finesse this by specifying a very rich basis, but then imposing smoothness. In particular, add a penalty to the least-squares criterion: PENSSE = n (y i x(t i )) 2 + λj[x] i=1 J[x] measures roughness of x. λ represents a continuous tuning parameter (to be chosen): λ : roughness increasingly penalized,x(t) becomes smooth. λ 0: penalty reduces, x(t) fits data better /181

24 Representing Functional Data: Smoothing Penalties What do we mean by smoothness? Some things are fairly clearly smooth: constants straight lines What we really want to do is eliminate small wiggles in the data while retaining the right shape Too smooth Too rough Just right / 181

25 Representing Functional Data: Smoothing Penalties The D Operator We use the notation that for a function x(t), Dx(t) = d dt x(t) We can also define further derivatives in terms of powers of D: D 2 x(t) = d2 dt 2x(t),...,Dk x(t) = dk x(t),... dtk Dx(t) is the instantaneous slope of x(t); D 2 x(t) is its curvature. We measure the size of the curvature for all of x by J 2 [x] = [D 2 x(t) ] 2 dt / 181

26 Representing Functional Data: Smoothing Penalties Calculating the Penalized Fit When x(t) = Φ(t)c, we have that [D 2 x(t) ] 2 dt = c T [ D 2 Φ(t) ] [ D 2 Φ(t) ] T cdt = c T R 2 c [R 2 ] jk = [D 2 φ j (t)][d 2 φ k (t)]dt is the penalty matrix. The penalized least squares estimate for c is n ĉ = This is still a linear smoother: [ Φ T Φ + λr 2 ] 1 Φ T y [ ] 1 ŷ = Φ Φ T Φ + λr 2 Φ T y = S(λ)y / 181

27 Representing Functional Data: Smoothing Penalties Linear Smooths and Degrees of Freedom In least squares fitting, the degrees of freedom used to smooth the data is exactly K, the number of basis functions In penalized smoothing, we can have K > n. The smoothing penalty reduces the flexibility of the smooth The degrees of freedom are controlled by λ. A natural measure turns out to be [ ] 1 df (λ) = trace[s(λ)], S(λ) = Φ Φ T Φ + λr L Φ T Medfly data fit with 25 basis functions, λ = e 4 resulting in df = / 181

28 Representing Functional Data: Smoothing Penalties Choosing Smoothing Parameters: Cross Validation There are a number of data-driven methods for choosing smoothing parameters. Ordinary Cross Validation: leave one point out and see how well you can predict it: OCV(λ) = 1 n ( ) 2 y i x i λ (t 1 (yi x λ (t i )) 2 i) = n (1 S(λ) ii ) 2 Generalized Cross Validation tends to smooth more: GCV(λ) = (yi x λ (t i )) 2 [trace(i S(λ))] 2 will be used here. Other possibilities: AIC, BIC, / 181

29 Representing Functional Data: Smoothing Penalties Generalized Cross Validation Use a grid search, best to do this for log(λ) Smooth Rough Right GCV / 181

30 Representing Functional Data: Smoothing Penalties Alternatives: Smoothing and Mixed Models Connection between the smoothing criterion for c: PENSSE(λ) = n (y i c T Φ(t i )) 2 + λc T Rc i=1 and negative log likelihood if c N(0, τ 2 R 1 ): log L(c y) = 1 2σ 2 n i=1 (y i c T Φ(t i )) τ 2cT Rc (note that R is singular must use generalized inverse). Suggests using ReML estimates for σ 2 and τ 2 in place of λ. This can be carried further in FDA; see references / 181

31 Representing Functional Data: Smoothing Penalties Summary 1 Basis Expansions x i (t) = Φ(t)c i 30 Good basis systems approximate any (sufficiently smooth) function arbitrarily well. Fourier bases useful for periodic data. B-splines make efficient, flexible generic choice. 2 Smoothing Penalties used to penalize roughness of result Lx(t) = 0 defines what is smooth. Commonly Lx = D 2 x straight lines are smooth. Alternative: Lx = D 3 x + wdx sinusoids are smooth. Departures from smoothness traded off against fit to data. GCV used to decide on trade off; other possibilities available. These tools will be used throughout the rest of FDA. Once estimated, we will treat smooths as fixed, observed data (but see comments at end). 44 / 181

32 Exploratory Data Analysis Exploratory Data Analysis / 181

33 Exploratory Data Analysis Mean and Variance Summary statistics: mean x(t) = 1 n xi (t) covariance σ(s, t) = cov(x(s), x(t)) = 1 n (xi (s) x(s))(x i (t) x(t)) Medfly Data: / 181

34 Exploratory Data Analysis Correlation ρ(s, t) = σ(s, t) σ(s, s) σ(t, t) From multivariate to functional data: turn subscripts j, k into arguments s, t / 181

Exploratory Data Analysis Functional PCA Instead of covariance matrix Σ, we have a surface σ(s, t). Would like a low-dimensional summary/interpretation.

35 Exploratory Data Analysis Functional PCA Instead of covariance matrix Σ, we have a surface σ(s, t). Would like a low-dimensional summary/interpretation. Multivariate PCA, use Eigen-decomposition: Σ = U T DU = p d j u j uj T j=1 and ui T u j = I(i = j). For functions: use Karhunen-Loève decomposition: σ(s, t) = d j ξ j (s)ξ j (t) j=1 for ξ i (t)ξ j (t)dt = I(i = j) / 181

36 Exploratory Data Analysis PCA and Karhunen-Loève σ(s, t) = d i ξ i (s)ξ i (t) i=1 The ξ i (t) maximize Var [ ξ i (t)x j (t)dt ]. d i = Var [ ξ i (t)x j (t)dt ] d i / d i is proportion of variance explained Principal component scores are f ij = ξ j (t)[x i (t) x(t)]dt Reconstruction of x i (t): x i (t) = x(t) + f ij ξ j (t) j= / 181

37 Exploratory Data Analysis functional Principal Components Analysis fpca of Medfly data Scree Plot Components Usual multivariate methods: choose # components based on percent variance explained, screeplot, or information criterion / 181

38 Exploratory Data Analysis functional Principal Components Analysis Interpretation often aided by plotting x(t) ± 2 d i ξ i (t) 37 PC1 = overall fecundity PC2 = beginning versus end PC3 = middle versus ends 51 / 181

39 Exploratory Data Analysis Derivatives Derivatives Component 1 PCs Component 2 Often useful to examine a rate of change. Examine first derivative of medfly data. Variation divides into fast or slow either early or late / 181

Alternatives. The D Operator

Alternatives. The D Operator Using Smoothness Alternatives Text: Chapter 5 Some disadvantages of basis expansions Discrete choice of number of basis functions additional variability. Non-hierarchical bases (eg B-splines) make life