An Introduction to Functional Data Analysis

Size: px
Start display at page:

Download "An Introduction to Functional Data Analysis"

Transcription

1 An Introduction to Functional Data Analysis Chongzhi Di Fred Hutchinson Cancer Research Center Biotat 578A: Special Topics in (Genetic) Epidemiology November 10, 2015

2 Textbook Ramsay and Silverman (2005), Functional Data Analysis, 2nd edition, Springer. Ruppert, Wand and Carroll (2003), Semiparametric Regression, Cambridge University Press. Software: R packages fda: Functional Data Analysis refund: Regression with Functional Data SemiPar: Semiparametric Regression Acknowledgements Prof. Giles Hooker, Cornell University hooker 1

3 Introduction What Is Functional Data? Functional data is multivariate data with an ordering on the dimensions. (Müller, (2006)) Key assumption is smoothness: y ij = x i (t ij ) + ɛ ij with t in a continuum (usually time), and x i (t) smooth Functional data = the functions x i (t). 2 Highest quality data from monitoring equipment Optical tracking equipment (eg handwriting data, but also for physiology, motor control,...) Electrical measurements (EKG, EEG and others) Spectral measurements (astronomy, materials sciences) But, noisier and less frequent data can also be used. 8 /181

4 Introduction Canadian Weather Data Average daily temperature and precipitation records in 35 weather stations across Canada (classical and much over-used) Temperature Precipitation 3 Interest is in variation in and relationships between smooth, underlying processes. 10 / 181

5 Introduction Weather In Vancouver Measure of climate: daily precipitation and temperature in Vancouver, BC averaged over 40 years. 4 9 / 181

6 Introduction Medfly Data Records of number of eggs laid by Mediterranean Fruit Fly (Ceratitis capitata) in each of 25 days (courtesy of H.-G. Müller). Total of 50 flies Assume eggcount measurements relate to smooth process governing fertility Also record total lifespan of each fly. Would like to understand how fecundity at each part of lifetime influences lifespan / 181

7 Multilevel Functional Principal Component Analysis Introduction SHHS: Sleep Heart Health Study More than 3, 000 subjects, two visits per subject Y ij (t): normalized EEG δ power series subject 1 visit 1 subject 1 visit subject 2 visit 1 subject 2 visit subject 3 visit 1 subject 3 visit time (hours) time (hours) / 27

8 FDA for accelerometry data Introduction Accelerometry data Activity Count Data: three dimensional time series per subject 1-minute resolution: time points in 7 days P10002 day P10002 day P10002 day P10002 day P10002 day P10002 day P10002 day / 21

9 Introduction What Are We Interested In? Representations of distribution of functions mean variation covariation Relationships of functional data to covariates responses other functions Relationships between derivatives of functions. Timing of events in functions / 181

10 Introduction What Are The Challenges? Estimation of functional data from noisy, discrete observations. Numerical representation of infinite-dimensional objects Representation of variation in infinite dimensions. Description of statistical relationships between infinite dimensional objects. n < p =, and use of smoothness. Measures of variation in estimates / 181

11 Representing Functional Data Representing Functional Data / 181

12 Representing Functional Data From Discrete to Functional Data Represent data recorded at discrete times as a continuous function in order to Medfly record 1 Allow evaluation of record at any time point (especially if observation times are not the same across records). Evaluate rates of change. Reduce noise. Allow registration onto a common time-scale / 181

13 Representing Functional Data From Discrete to Functional Data Two problems/two methods 1 Representing non-parametric continuous-time functions. Basis-expansion methods: x(t) = K φ i (t)c i for pre-defined φ i (t) and coefficients c i. Several basis systems available: focus on Fourier and B-splines 2 Reducing noise in measurements Smoothing penalties: n c = argmin (y i x(t i )) 2 + λ [Lx(t)] 2 dt i=1 Lx(t) measures roughness of x λ a smoothing parameter that trades-off fit to the y i and roughness; must be chosen. i= / 181

14 Representing Functional Data: Basis Expansions Basis Expansions Consider only one record represent x(t) as x(t) = y i = x(t i ) + ɛ i K c j φ j (t) = Φ(t)c j=1 We say Φ(t) is a basis system for x. Terms for curvature in linear regression 13 implies y i = β 0 + β 1 t i + β 2 t 2 i + β 3 t 3 i + + ɛ i x(t) = β 0 + β 1 t + β 2 t 2 + β 3 t 3 + Polynomials are unstable; Fourier bases and B-splines will be more useful. 18 / 181

15 Representing Functional Data: Basis Expansions The Fourier Basis basis functions are sine and cosine functions of increasing frequency: 1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt),... sin(mωt), cos(mωt),... constant ω = 2π/P defines the period P of oscillation of the first sine/cosine pair / 181

16 Representing Functional Data: Basis Expansions B-spline Bases Splines are polynomial segments joined end-to-end. Segments are constrained to be smooth at the joins. The points at which the segments join are called knots. System defined by The order m (order = degree+1) of the polynomial the location of the knots. Bsplines are a particularly useful means of incorporating the constraints. See de Boor, 2001, A Practical Guide to Splines, Springer / 181

17 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 2: piecewise linear, continuous / 181

18 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 3: piecewise quadratic, continuous derivatives / 181

19 Representing Functional Data: Basis Expansions Splines Medfly data with knots every 3 days. Splines of order 4: piecewise cubic, continuous 2nd derivatives / 181

20 Representing Functional Data: Basis Expansions An illustration of basis expansions for B-splines 19 Sum of scaled basis functions results in fit. 26 / 181

21 Representing Functional Data: Smoothing Penalties Ordinary Least-Squares Estimates Assume we have observations for a single curve y i = x(t i ) + ɛ and we want to estimate K x(t) c j φ j (t) j=1 Minimize the sum of squared errors: n SSE = (y i x(t i )) 2 = i=1 This is just linear regression! n (y i Φ(t i )c) 2 i= / 181

22 Representing Functional Data: Smoothing Penalties Linear Regression on Basis Functions If the N by K matrix Φ contains the values φ j (t k ), and y is the vector (y 1,...,y N ), we can write SSE(c) = (y Φc) T (y Φc) The error sum of squares is minimized by the ordinary least squares estimate ĉ = Then we have the estimate ( Φ T Φ) 1 Φ T y ( 1 ˆx(t) = Φ(t)ĉ = Φ(t) Φ Φ) T Φ T y / 181

23 Representing Functional Data: Smoothing Penalties Smoothing Penalties Problem: how to choose a basis? Large affect on results. Finesse this by specifying a very rich basis, but then imposing smoothness. In particular, add a penalty to the least-squares criterion: PENSSE = n (y i x(t i )) 2 + λj[x] i=1 J[x] measures roughness of x. λ represents a continuous tuning parameter (to be chosen): λ : roughness increasingly penalized,x(t) becomes smooth. λ 0: penalty reduces, x(t) fits data better /181

24 Representing Functional Data: Smoothing Penalties What do we mean by smoothness? Some things are fairly clearly smooth: constants straight lines What we really want to do is eliminate small wiggles in the data while retaining the right shape Too smooth Too rough Just right / 181

25 Representing Functional Data: Smoothing Penalties The D Operator We use the notation that for a function x(t), Dx(t) = d dt x(t) We can also define further derivatives in terms of powers of D: D 2 x(t) = d2 dt 2x(t),...,Dk x(t) = dk x(t),... dtk Dx(t) is the instantaneous slope of x(t); D 2 x(t) is its curvature. We measure the size of the curvature for all of x by J 2 [x] = [D 2 x(t) ] 2 dt / 181

26 Representing Functional Data: Smoothing Penalties Calculating the Penalized Fit When x(t) = Φ(t)c, we have that [D 2 x(t) ] 2 dt = c T [ D 2 Φ(t) ] [ D 2 Φ(t) ] T cdt = c T R 2 c [R 2 ] jk = [D 2 φ j (t)][d 2 φ k (t)]dt is the penalty matrix. The penalized least squares estimate for c is n ĉ = This is still a linear smoother: [ Φ T Φ + λr 2 ] 1 Φ T y [ ] 1 ŷ = Φ Φ T Φ + λr 2 Φ T y = S(λ)y / 181

27 Representing Functional Data: Smoothing Penalties Linear Smooths and Degrees of Freedom In least squares fitting, the degrees of freedom used to smooth the data is exactly K, the number of basis functions In penalized smoothing, we can have K > n. The smoothing penalty reduces the flexibility of the smooth The degrees of freedom are controlled by λ. A natural measure turns out to be [ ] 1 df (λ) = trace[s(λ)], S(λ) = Φ Φ T Φ + λr L Φ T Medfly data fit with 25 basis functions, λ = e 4 resulting in df = / 181

28 Representing Functional Data: Smoothing Penalties Choosing Smoothing Parameters: Cross Validation There are a number of data-driven methods for choosing smoothing parameters. Ordinary Cross Validation: leave one point out and see how well you can predict it: OCV(λ) = 1 n ( ) 2 y i x i λ (t 1 (yi x λ (t i )) 2 i) = n (1 S(λ) ii ) 2 Generalized Cross Validation tends to smooth more: GCV(λ) = (yi x λ (t i )) 2 [trace(i S(λ))] 2 will be used here. Other possibilities: AIC, BIC, / 181

29 Representing Functional Data: Smoothing Penalties Generalized Cross Validation Use a grid search, best to do this for log(λ) Smooth Rough Right GCV / 181

30 Representing Functional Data: Smoothing Penalties Alternatives: Smoothing and Mixed Models Connection between the smoothing criterion for c: PENSSE(λ) = n (y i c T Φ(t i )) 2 + λc T Rc i=1 and negative log likelihood if c N(0, τ 2 R 1 ): log L(c y) = 1 2σ 2 n i=1 (y i c T Φ(t i )) τ 2cT Rc (note that R is singular must use generalized inverse). Suggests using ReML estimates for σ 2 and τ 2 in place of λ. This can be carried further in FDA; see references / 181

31 Representing Functional Data: Smoothing Penalties Summary 1 Basis Expansions x i (t) = Φ(t)c i 30 Good basis systems approximate any (sufficiently smooth) function arbitrarily well. Fourier bases useful for periodic data. B-splines make efficient, flexible generic choice. 2 Smoothing Penalties used to penalize roughness of result Lx(t) = 0 defines what is smooth. Commonly Lx = D 2 x straight lines are smooth. Alternative: Lx = D 3 x + wdx sinusoids are smooth. Departures from smoothness traded off against fit to data. GCV used to decide on trade off; other possibilities available. These tools will be used throughout the rest of FDA. Once estimated, we will treat smooths as fixed, observed data (but see comments at end). 44 / 181

32 Exploratory Data Analysis Exploratory Data Analysis / 181

33 Exploratory Data Analysis Mean and Variance Summary statistics: mean x(t) = 1 n xi (t) covariance σ(s, t) = cov(x(s), x(t)) = 1 n (xi (s) x(s))(x i (t) x(t)) Medfly Data: / 181

34 Exploratory Data Analysis Correlation ρ(s, t) = σ(s, t) σ(s, s) σ(t, t) From multivariate to functional data: turn subscripts j, k into arguments s, t / 181

35 Exploratory Data Analysis Functional PCA Instead of covariance matrix Σ, we have a surface σ(s, t). Would like a low-dimensional summary/interpretation. Multivariate PCA, use Eigen-decomposition: Σ = U T DU = p d j u j uj T j=1 and ui T u j = I(i = j). For functions: use Karhunen-Loève decomposition: σ(s, t) = d j ξ j (s)ξ j (t) j=1 for ξ i (t)ξ j (t)dt = I(i = j) / 181

36 Exploratory Data Analysis PCA and Karhunen-Loève σ(s, t) = d i ξ i (s)ξ i (t) i=1 The ξ i (t) maximize Var [ ξ i (t)x j (t)dt ]. d i = Var [ ξ i (t)x j (t)dt ] d i / d i is proportion of variance explained Principal component scores are f ij = ξ j (t)[x i (t) x(t)]dt Reconstruction of x i (t): x i (t) = x(t) + f ij ξ j (t) j= / 181

37 Exploratory Data Analysis functional Principal Components Analysis fpca of Medfly data Scree Plot Components Usual multivariate methods: choose # components based on percent variance explained, screeplot, or information criterion / 181

38 Exploratory Data Analysis functional Principal Components Analysis Interpretation often aided by plotting x(t) ± 2 d i ξ i (t) 37 PC1 = overall fecundity PC2 = beginning versus end PC3 = middle versus ends 51 / 181

39 Exploratory Data Analysis Derivatives Derivatives Component 1 PCs Component 2 Often useful to examine a rate of change. Examine first derivative of medfly data. Variation divides into fast or slow either early or late / 181

Alternatives. The D Operator

Alternatives. The D Operator Using Smoothness Alternatives Text: Chapter 5 Some disadvantages of basis expansions Discrete choice of number of basis functions additional variability. Non-hierarchical bases (eg B-splines) make life

More information

From Data To Functions Howdowegofrom. Basis Expansions From multiple linear regression: The Monomial Basis. The Monomial Basis

From Data To Functions Howdowegofrom. Basis Expansions From multiple linear regression: The Monomial Basis. The Monomial Basis From Data To Functions Howdowegofrom Basis Expansions From multiple linear regression: data to functions? Or if there is curvature: y i = β 0 + x 1i β 1 + x 2i β 2 + + ɛ i y i = β 0 + x i β 1 + xi 2 β

More information

Introduction to Functional Data Analysis A CSCU Workshop. Giles Hooker Biological Statistics and Computational Biology

Introduction to Functional Data Analysis A CSCU Workshop. Giles Hooker Biological Statistics and Computational Biology Introduction to Functional Data Analysis A CSCU Workshop Giles Hooker Biological Statistics and Computational Biology gjh27@cornell.edu www.bscb.cornell.edu/ hooker/fdaworkshop 1 / 26 Agenda What is Functional

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

Functional Data Analysis

Functional Data Analysis Functional Data Analysis Functional Data Analysis A Short Course Giles Hooker 11/10/2017 1 / 184 Functional Data Analysis Table of Contents 1 Introduction 2 Representing Functional Data 3 Exploratory Data

More information

Functional Data Analysis

Functional Data Analysis FDA 1-1 Functional Data Analysis Michal Benko Institut für Statistik und Ökonometrie Humboldt-Universität zu Berlin email:benko@wiwi.hu-berlin.de FDA 1-2 Outline of this talk: Introduction Turning discrete

More information

Functional Linear Models for Scalar Responses

Functional Linear Models for Scalar Responses Chapter 9 Functional Linear Models for Scalar Responses This is the first of two chapters on the functional linear model. Here we have a dependent or response variable whose value is to be predicted or

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

Statistical Models. Functional Linear Regression. A First Idea. Models for Scalar Responses

Statistical Models. Functional Linear Regression. A First Idea. Models for Scalar Responses Statistical Models Functional Linear Regression So far we have focussed on exploratory data analysis Smoothing Functional covariance Functional PCA/CCA Now we wish to examine predictive relationships generalization

More information

Smooth Common Principal Component Analysis

Smooth Common Principal Component Analysis 1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin Motivation 1-1 Volatility Surface

More information

Functional Data Analysis

Functional Data Analysis FDA p. 1/42 Functional Data Analysis An introduction David Wooff d.a.wooff@durham.ac.uk University of Durham FDA p. 2/42 Main steps in FDA Collect, clean, and organize the raw data. Convert the data to

More information

Tutorial on Functional Data Analysis

Tutorial on Functional Data Analysis Tutorial on Functional Data Analysis Ana-Maria Staicu Department of Statistics, North Carolina State University astaicu@ncsu.edu SAMSI April 5, 2017 A-M Staicu Tutorial on Functional Data Analysis April

More information

On the Identifiability of the Functional Convolution. Model

On the Identifiability of the Functional Convolution. Model On the Identifiability of the Functional Convolution Model Giles Hooker Abstract This report details conditions under which the Functional Convolution Model described in Asencio et al. (2013) can be identified

More information

Functional Data Analysis & Variable Selection

Functional Data Analysis & Variable Selection Auburn University Department of Mathematics and Statistics Universidad Nacional de Colombia Medellin, Colombia March 14, 2016 Functional Data Analysis Data Types Univariate - Contains numbers as its observations

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Supplementary Material to General Functional Concurrent Model

Supplementary Material to General Functional Concurrent Model Supplementary Material to General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 This Supplementary Material contains six sections. Appendix A discusses modifications

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Generalised Smoothing in Functional Data Analysis

Generalised Smoothing in Functional Data Analysis Generalised Smoothing in Functional Data Analysis Michelle Carey Department of Mathematics and Statistics University of Limerick A thesis submitted for the degree of Doctor of Philosophy (PhD) Supervised

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1 FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One

More information

Spatially Adaptive Smoothing Splines

Spatially Adaptive Smoothing Splines Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =

More information

Fundamental concepts of functional data analysis

Fundamental concepts of functional data analysis Fundamental concepts of functional data analysis Department of Statistics, Colorado State University Examples of functional data 0 1440 2880 4320 5760 7200 8640 10080 Time in minutes The horizontal component

More information

Functional principal component analysis of vocal tract area functions

Functional principal component analysis of vocal tract area functions INTERSPEECH 17 August, 17, Stockholm, Sweden Functional principal component analysis of vocal tract area functions Jorge C. Lucero Dept. Computer Science, University of Brasília, Brasília DF 791-9, Brazil

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Generalized Functional Concurrent Model

Generalized Functional Concurrent Model Biometrics, 1 25 2015 Generalized Functional Concurrent Model Janet S. Kim, Arnab Maity, and Ana-Maria Staicu Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, U.S.A.

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Testing the Equality of Covariance Operators in Functional Samples

Testing the Equality of Covariance Operators in Functional Samples Scandinavian Journal of Statistics, Vol. 4: 38 5, 3 doi:./j.467-9469..796.x Board of the Foundation of the Scandinavian Journal of Statistics. Published by Blackwell Publishing Ltd. Testing the Equality

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Multidimensional Density Smoothing with P-splines

Multidimensional Density Smoothing with P-splines Multidimensional Density Smoothing with P-splines Paul H.C. Eilers, Brian D. Marx 1 Department of Medical Statistics, Leiden University Medical Center, 300 RC, Leiden, The Netherlands (p.eilers@lumc.nl)

More information

Regularized principal components analysis

Regularized principal components analysis 9 Regularized principal components analysis 9.1 Introduction In this chapter, we discuss the application of smoothing to functional principal components analysis. In Chapter 5 we have already seen that

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Regularization Methods for Additive Models

Regularization Methods for Additive Models Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Functional responses and functional covariates: The general case

Functional responses and functional covariates: The general case Functional responses and functional covariates: The general case Page 1 of 27 Overview The fundamental question is this: If we have a covariate z(s), how much of its variation over s should we use to fit

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Canonical correlation analysis

Canonical correlation analysis Canonical correlation analysis Exploring correlations between two sets of functions. Page 1 of 17 1. Let the two sets of functions be x i (t) and y i (t), i = 1,..., N. The correlation surface r(s, t)

More information

Functional responses, functional covariates and the concurrent model

Functional responses, functional covariates and the concurrent model 14 Functional responses, functional covariates and the concurrent model 14.1 Introduction We now consider a model for a functional response involving one or more functional covariates. In this chapter

More information

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE Hans-Georg Müller 1 and Fang Yao 2 1 Department of Statistics, UC Davis, One Shields Ave., Davis, CA 95616 E-mail: mueller@wald.ucdavis.edu

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Modeling Multi-Way Functional Data With Weak Separability

Modeling Multi-Way Functional Data With Weak Separability Modeling Multi-Way Functional Data With Weak Separability Kehui Chen Department of Statistics University of Pittsburgh, USA @CMStatistics, Seville, Spain December 09, 2016 Outline Introduction. Multi-way

More information

Supervised Classification for Functional Data by Estimating the. Estimating the Density of Multivariate Depth

Supervised Classification for Functional Data by Estimating the. Estimating the Density of Multivariate Depth Supervised Classification for Functional Data by Estimating the Density of Multivariate Depth Chong Ma University of South Carolina David B. Hitchcock University of South Carolina March 25, 2016 What is

More information

Bayesian Estimation of Principal Components for Functional Data

Bayesian Estimation of Principal Components for Functional Data Bayesian Analysis (2014) 1, Number 1 Bayesian Estimation of Principal Components for Functional Data Adam Suarez and Subhashis Ghosal Abstract. The area of principal components analysis (PCA) has seen

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Stochastic Differential Equations.

Stochastic Differential Equations. Chapter 3 Stochastic Differential Equations. 3.1 Existence and Uniqueness. One of the ways of constructing a Diffusion process is to solve the stochastic differential equation dx(t) = σ(t, x(t)) dβ(t)

More information

Generalized Additive Models (GAMs)

Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized

More information

Functional principal components analysis via penalized rank one approximation

Functional principal components analysis via penalized rank one approximation Electronic Journal of Statistics Vol. 2 (2008) 678 695 ISSN: 1935-7524 DI: 10.1214/08-EJS218 Functional principal components analysis via penalized rank one approximation Jianhua Z. Huang Department of

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Cubic Splines; Bézier Curves

Cubic Splines; Bézier Curves Cubic Splines; Bézier Curves 1 Cubic Splines piecewise approximation with cubic polynomials conditions on the coefficients of the splines 2 Bézier Curves computer-aided design and manufacturing MCS 471

More information

Engineering 7: Introduction to computer programming for scientists and engineers

Engineering 7: Introduction to computer programming for scientists and engineers Engineering 7: Introduction to computer programming for scientists and engineers Interpolation Recap Polynomial interpolation Spline interpolation Regression and Interpolation: learning functions from

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Noise & Data Reduction

Noise & Data Reduction Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

General Functional Concurrent Model

General Functional Concurrent Model General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 Abstract We propose a flexible regression model to study the association between a functional response and multiple

More information

M2R IVR, October 12th Mathematical tools 1 - Session 2

M2R IVR, October 12th Mathematical tools 1 - Session 2 Mathematical tools 1 Session 2 Franck HÉTROY M2R IVR, October 12th 2006 First session reminder Basic definitions Motivation: interpolate or approximate an ordered list of 2D points P i n Definition: spline

More information

Functional Preprocessing for Multilayer Perceptrons

Functional Preprocessing for Multilayer Perceptrons Functional Preprocessing for Multilayer Perceptrons Fabrice Rossi and Brieuc Conan-Guez Projet AxIS, INRIA, Domaine de Voluceau, Rocquencourt, B.P. 105 78153 Le Chesnay Cedex, France CEREMADE, UMR CNRS

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

Vulnerability of economic systems

Vulnerability of economic systems Vulnerability of economic systems Quantitative description of U.S. business cycles using multivariate singular spectrum analysis Andreas Groth* Michael Ghil, Stéphane Hallegatte, Patrice Dumas * Laboratoire

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

ASPECTS OF PENALIZED SPLINES

ASPECTS OF PENALIZED SPLINES ASPECTS OF PENALIZED SPLINES A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Yingxing

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Polynomial Chaos and Karhunen-Loeve Expansion

Polynomial Chaos and Karhunen-Loeve Expansion Polynomial Chaos and Karhunen-Loeve Expansion 1) Random Variables Consider a system that is modeled by R = M(x, t, X) where X is a random variable. We are interested in determining the probability of the

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Time Series. Anthony Davison. c

Time Series. Anthony Davison. c Series Anthony Davison c 2008 http://stat.epfl.ch Periodogram 76 Motivation............................................................ 77 Lutenizing hormone data..................................................

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Estimating Parameters in Delay Differential Equation Models

Estimating Parameters in Delay Differential Equation Models Estimating Parameters in Delay Differential Equation Models Liangliang WANG and Jiguo CAO Delay differential equations (DDEs) are widely used in ecology, physiology and many other areas of applied science.

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2010, Volume 32, Issue 11. http://www.jstatsoft.org/ Bayesian Functional Data Analysis Using WinBUGS Ciprian M. Crainiceanu Johns Hopins University A. Jeffrey

More information

Approaches to Modeling Menstrual Cycle Function

Approaches to Modeling Menstrual Cycle Function Approaches to Modeling Menstrual Cycle Function Paul S. Albert (albertp@mail.nih.gov) Biostatistics & Bioinformatics Branch Division of Epidemiology, Statistics, and Prevention Research NICHD SPER Student

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Functional responses, functional covariates and the concurrent model

Functional responses, functional covariates and the concurrent model Functional responses, functional covariates and the concurrent model Page 1 of 14 1. Predicting precipitation profiles from temperature curves Precipitation is much harder to predict than temperature.

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

Penalized Regression, Mixed Effects Models and Appropriate Modelling

Penalized Regression, Mixed Effects Models and Appropriate Modelling Penalized Regression, Mixed Effects Models and Appropriate Modelling N. HECKMAN Department of Statistics, University of British Columbia R. LOCKHART Department of Statistics & Actuarial Science, Simon

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Statistics of Stochastic Processes

Statistics of Stochastic Processes Prof. Dr. J. Franke All of Statistics 4.1 Statistics of Stochastic Processes discrete time: sequence of r.v...., X 1, X 0, X 1, X 2,... X t R d in general. Here: d = 1. continuous time: random function

More information

Learned-Loss Boosting

Learned-Loss Boosting Noname manuscript No. (will be inserted by the editor) Learned-Loss Boosting Giles Hooker, James O. Ramsay Department of Psychology McGill University Montreal, QC, H2X 2G3 Canada giles.hooker@mcgill.ca

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

Tests for separability in nonparametric covariance operators of random surfaces

Tests for separability in nonparametric covariance operators of random surfaces Tests for separability in nonparametric covariance operators of random surfaces Shahin Tavakoli (joint with John Aston and Davide Pigoli) April 19, 2016 Analysis of Multidimensional Functional Data Shahin

More information

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = = 18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013 1. Consider a bivariate random variable: [ ] X X = 1 X 2 with mean and co [ ] variance: [ ] [ α1 Σ 1,1 Σ 1,2 σ 2 ρσ 1 σ E[X] =, and Cov[X]

More information