FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

Similar documents
FUNCTIONAL DATA ANALYSIS

Functional modeling of longitudinal data

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE

Dynamic Relations for Sparsely Sampled Gaussian Processes

arxiv: v1 [stat.me] 18 Jul 2015

Modeling Repeated Functional Observations

Shrinkage Estimation for Functional Principal Component Scores, with Application to the Population Kinetics of Plasma Folate

Introduction to Functional Data Analysis A CSCU Workshop. Giles Hooker Biological Statistics and Computational Biology

Functional Latent Feature Models. With Single-Index Interaction

Functional principal component analysis of aircraft trajectories

Modeling Multi-Way Functional Data With Weak Separability

Degradation Modeling and Monitoring of Truncated Degradation Signals. Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban

Second-Order Inference for Gaussian Random Curves

Functional quasi-likelihood regression models with smooth random effects

Empirical Dynamics for Longitudinal Data

Additive modelling of functional gradients

Modeling Repeated Functional Observations

Independent component analysis for functional data

Diagnostics for functional regression via residual processes

Fractal functional regression for classification of gene expression data by wavelets

Functional Data Analysis for Sparse Longitudinal Data

Mixture of Gaussian Processes and its Applications

Diagnostics for functional regression via residual processes

TIME-WARPED GROWTH PROCESSES, WITH APPLICATIONS TO THE MODELING OF BOOM-BUST CYCLES IN HOUSE PRICES

Fundamental concepts of functional data analysis

Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data 1

Regularized principal components analysis

A Stickiness Coefficient for Longitudinal Data

Conditional functional principal components analysis

Curve alignment and functional PCA

A Stickiness Coefficient for Longitudinal Data

OPTIMAL DESIGNS FOR LONGITUDINAL AND FUNCTIONAL DATA 1

Sparseness and Functional Data Analysis

Analysis of AneuRisk65 data: warped logistic discrimination

AN INTRODUCTION TO THEORETICAL PROPERTIES OF FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. Ngoc Mai Tran Supervisor: Professor Peter G.

DEGRADATION MODELING AND MONITORING OF ENGINEERING SYSTEMS USING FUNCTIONAL DATA ANALYSIS

FUNCTIONAL DATA ANALYSIS FOR VOLATILITY PROCESS

7. Variable extraction and dimensionality reduction

Modeling Sparse Generalized Longitudinal Observations With Latent Gaussian Processes

Wavelet Regression Estimation in Longitudinal Data Analysis

Time-Varying Functional Regression for Predicting Remaining Lifetime Distributions from Longitudinal Trajectories

Properties of Principal Component Methods for Functional and Longitudinal Data Analysis 1

Estimation of the mean of functional time series and a two-sample problem

Functional principal component and factor analysis of spatially correlated data

Smooth Common Principal Component Analysis

Nonparametric time series forecasting with dynamic updating

Noise & Data Reduction

Testing the Equality of Covariance Operators in Functional Samples

A Note on Hilbertian Elliptically Contoured Distributions

Recovering gradients from sparsely observed functional data

Australia. Accepted author version posted online: 08 Aug 2014.

Multilevel Cross-dependent Binary Longitudinal Data

Diagnostics for Linear Models With Functional Responses

Discriminant analysis on functional data. 1 Introduction. Actas do XV Congresso Anual da SPE 19

Extended GaussMarkov Theorem for Nonparametric Mixed-Effects Models

Tolerance Bands for Functional Data

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Time Series: Theory and Methods

Tutorial on Functional Data Analysis

Function-on-Scalar Regression with the refund Package

Forecasting 1 to h steps ahead using partial least squares

Version of record first published: 01 Jan 2012

Functional Preprocessing for Multilayer Perceptrons

Singular Additive Models for Function to. Function Regression

Dynamic Retrospective Regression for Functional. Data. Daniel Gervini. Department of Mathematical Sciences, University of Wisconsin Milwaukee

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions

The Mahalanobis distance for functional data with applications to classification

Curves clustering with approximation of the density of functional random variables

Warped Functional Analysis of Variance

Functional Data Analysis of High-Frequency Household Energy Consumption Curves for Policy Evaluation

Functional principal components analysis via penalized rank one approximation

Weakly dependent functional data. Piotr Kokoszka. Utah State University. Siegfried Hörmann. University of Utah

Noise & Data Reduction

Technische Universität München. Zentrum Mathematik. Time Series in Functional Data Analysis

Time-Varying Functional Regression for Predicting Remaining Lifetime Distributions from Longitudinal Trajectories

Alignment and Analysis of Proteomics Data using Square Root Slope Function Framework

MIXTURE INNER PRODUCT SPACES AND THEIR APPLICATION TO FUNCTIONAL DATA ANALYSIS

An Introduction to Functional Data Analysis

Functional time series

Functional Data Analysis

Second-Order Comparison of Gaussian Random Functions and the Geometry of DNA Minicircles

Regularized Partially Functional Quantile Regression. Shivon Sue-Chee

Continuous Probability Distributions from Finite Data. Abstract

PRINCIPAL COMPONENTS ANALYSIS

Outline of GLMs. Definitions

Experimental Design and Data Analysis for Biologists

Functional Data Analysis & Variable Selection

Robust Methods for Multivariate Functional Data Analysis. Pallavi Sawant

Index Models for Sparsely Sampled Functional Data

Supplementary Material to General Functional Concurrent Model

Estimating Mixture of Gaussian Processes by Kernel Smoothing

Tests for separability in nonparametric covariance operators of random surfaces

Functional Density Synchronization

Three Papers by Peter Bickel on Nonparametric Curve Estimation

Mixture regression for observational data, with application to functional regression models

PLS discriminant analysis for functional data

Functional quadratic regression

Generalized Functional Linear Models with Semiparametric Single-Index Interactions

A Course on Advanced Econometrics

A Selective Review of Sufficient Dimension Reduction

Transcription:

FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One Shields Ave., Davis, CA 95616, USA. e-mail: mueller@wald.ucdavis.edu 1 Research partially supported by NSF Grant DMS-0806199 1

Functional data analysis (FDA) refers to the statistical analysis of data samples consisting of random functions or surfaces, where each function is viewed as one sample element. Typically, the random functions contained in the sample are considered to be independent and smooth. FDA methodology is essentially nonparametric, utilizes smoothing methods, and allows for flexible modeling. The underlying random processes generating the data are sometimes assumed to be (non-stationary) Gaussian processes. Functional data are ubiquitous and may involve samples of density functions (Kneip and Utikal, 2001) or hazard functions (Chiou and Müller, 2009). Application areas include growth curves, econometrics, evolutionary biology, genetics and general kinds of longitudinal data. FDA methodology features functional principal component analysis (Rice and Silverman, 1991), warping and curve registration (Gervini and Gasser, 2004) and functional regression (Ramsay and Dalzell, 1991). Theoretical foundations and asymptotic analysis of FDA are closely tied to perturbation theory of linear operators in Hilbert space (Bosq, 2000). Finite sample implementations often require to address ill-posed problems with suitable regularization. A broad overview of applied aspects of FDA can be found in the textbook Ramsay and Silverman (2005). The basic statistical methodologies of ANOVA, regression, correlation, classification and clustering that are available for scalar and vector data have spurred analogous developments for functional data. An additional aspect is that the time axis itself may be subject to random distortions and adequate functional models sometimes need to reflect such time-warping. Another issue is that often the random trajectories are not directly observed. Instead, for each sample function one has available measurements on a time grid that may range from very dense to extremely sparse. Sparse and randomly distributed measurement times are frequently encountered in longitudinal studies. Additional contamination of the measurements of the trajectory levels by errors is also common. These situations require careful modeling of the relationship between the recorded observations and the assumed underlying functional trajectories (Rice and Wu, 2001; James and Sugar, 2003; Yao et al., 2005). Initial analysis of functional data includes exploratory plotting of the observed functions in a spaghetti plot to obtain an initial idea of functional shapes, check for outliers and identify landmarks. Preprocessing may include 2

outlier removal and curve alignment (registration) to adjust for time-warping. Basic objects in FDA are the mean function µ and the covariance function G. For square integrable random functions X(t), µ(t) = E(Y (t)), G(s, t) = cov {X(s), X(t)}, s, t T, (1) with auto-covariance operator (Af)(t) = T f(s)g(s, t) ds. This linear operator of Hilbert- Schmidt type has orthonormal eigenfunctions φ k, k = 1, 2,..., with associated ordered eigenvalues λ 1 λ 2..., such that A φ k = λ k φ k. The foundation for functional principal component analysis is the Karhunen-Loève representation of random functions X(t) = µ(t) + A k φ k (t), where A k = T (Y (t) µ(t))φ k(t) dt are uncorrelated centered random variables with var(a k ) = λ k. Estimators employing smoothing methods (local least squares or splines) have been developed for various sampling schemes (sparse, dense, with errors) to obtain a data-based version of this representation, where one regularizes by truncating at a finite number K of included components. The idea is to borrow strength from the entire sample of functions rather than estimating each function separately. The functional data are then represented by the subject-specific vectors of score estimates Âk, k = 1,..., K, which can be used to represent individual trajectories and for subsequent statistical analysis. Useful representations are alternatively obtained with pre-specified fixed basis functions, notably B-splines and wavelets. Functional regression models may include one or several functions among the predictors, responses, or both. For pairs (X, Y ) with centered random predictor functions X and scalar k=1 responses Y, the linear model is E(Y X) = T X(s)β(s) ds. The regression parameter function β is usually represented in a suitable basis, for example the eigenbasis, with coefficient estimates determined by least squares or similar criteria. A variant, which is also applicable for classification purposes, is the generalized functional linear model E(Y X) = g{µ + T X(s)β(s) ds} with link function g. The link function (and an additional variance function if applicable) is adapted to the (often discrete) distribution of Y ; 3

the components of the model can be estimated by quasi-likelihood. The class of useful functional regression models is large. A flexible extension of the functional linear model is the functional additive model. Writing centered predictors as X = k=1 A kφ k, it is given by E(Y X) = f k (A k )φ k k=1 for smooth functions f k with E(f k (A k )) = 0. Of practical relevance are models with varying domains, with more than one predictor function, and functional (autoregressive) time series models. In addition to the functional trajectories themselves, their derivatives are of interest to study the dynamics of the underlying processes. References Bosq, D. (2000). Linear Processes in Function Spaces: Theory and Applications. Springer- Verlag, New York. Chiou, J.-M. and Müller, H.-G. (2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. Journal of the American Statistical Association 104 572 585. Gervini, D. and Gasser, T. (2004). Self-modeling warping functions. Journal of the Royal Statistical Society: Series B 66 959 971. James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association 98 397 408. Kneip, A. and Utikal, K. J. (2001). Inference for density families using functional principal component analysis. Journal of the American Statistical Association 96 519 542. Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis. Journal of the Royal Statistical Society: Series B 53 539 572. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. 2nd ed. Springer Series in Statistics, Springer, New York. 4

Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society: Series B 53 233 243. Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253 259. Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100 577 590. 5