Modelling with smooth functions. Simon Wood University of Bath, EPSRC funded

Size: px
Start display at page:

Download "Modelling with smooth functions. Simon Wood University of Bath, EPSRC funded"

Transcription

1 Modelling with smooth functions Simon Wood University of Bath, EPSRC funded

2 Some data... Daily respiratory deaths, temperature and ozone in Chicago (NMMAPS) deaths temperature ozone Apparently daily respiratory deaths are significantly negatively correlated with ozone and temperature. time But obviously there is confounding here - we need a model.

3 A model... death i Poi(µ i ) log(µ i ) = α + f 1 (t i ) + f 2 (o3 i, tmp i ) + f 3 (pm10 i ) The f j are smooth functions to estimate. 1. α + f 1 is the (log) background respiratory mortality rate. 2. f 2 is modification by ozone and temperature (interacting). 3. f 3 is the modification from particulates. Actually the predictors are aggregated over the 3 days preceding death. Fit in R (mgcv package) gam(death s(time,k=200)+te(o3,tmp)+s(pm10), family=poisson)

4 Air pollution model estimates... s(time,99.75) time te(o3,tmp,31.6) tmp s(pm10,3.09) o3 pm10 High ozone and temp associated with increased risk.

5 Some more data... log(1/r) octane Can we predict octane rating of fuel from near infra-red spectrum? 60 octane/spectrum pairs available (from pls package) wavelength (nm) Try a signal regression model. k i is i th spectrum, f is a smooth function... octane i = f (ν)k i (ν)dν + ɛ i

6 Signal regression model fit... If each row of matri NIR contains a spectral intensity, and each row of nm the corresponding wavelengths, then gam(octane s(nm,by=nir)) estimates the model. Plots of estimated f (ν) and fitted vs. data: s(nm,9.16):nir octane nm predicted octane

7 And one more dataset ret bmi dur gly How does probability of diabetic retinopathy relate to previous disease duration, body mass inde and % glycosylated hemoglobin? Wisconsin study (see gss package). Model: ret i Bernouilli, logit{e(ret)} = f 1 (dur) + f 2 (bmi) + f 3 (gly) + f 4 (dur, bmi) + f 5 (dur, gly) + f 6 (gly, bmi)

8 Retinopathy model estimates gam(ret s(dur) + s(gly) + s(bmi) + ti(dur,gly) + ti(dur,bmi) + ti(gly,bmi), family=binomial) s(dur,3.26) dur s(gly,1) gly s(bmi,2.67) bmi te(gly,bmi,2.5) te(dur,gly,0) te(dur,bmi,0) gly bmi bmi dur dur gly

9 Additive smooth models: structured fleibility The preceding are eamples of generalized additive models (GAM). A GAM is a GLM, in which the linear predictor depends linearly on unknown smooth functions of covariates... y i EF (µ i, φ), g(µ i ) = f 1 ( 1i ) + f 2 ( 2i ) + Hastie and Tibshirani (1990) and Wahba (1990) laid the foundations for these models as discussed here. If we can work with GAMs at all we can easily generalize: g(µ i ) = A i θ + j L ij f j ( j ) + Z i b L ij are linear functionals; A & Z are model matrices; θ and b are parameters and Gaussian random effects.

10 Additive smooth models & practical computation Making GAMs work in practice requires at least 3 things A way of representing the smooth functions f j. 2. A way of estimating the f j. 3. Some means of deciding how smooth the f j should be. 3 is the awkward part of the enterprise, and strongly influences 1 and 2. As well as point estimates we also need further inferential tools, such as interval estimates, model selection methods, AIC, p-values etc. We d also like to go beyond univariate eponential family. This talk will cover these things in order.

11 Representing smooth functions: splines To motivate how to represent several smooth terms in a model, first consider a simpler smoothing problem. Consider estimating the smooth function f in the model y i = f ( i ) + ɛ i from i, y i data using smoothing splines... wear size The red curve is the function minimizing (y i f ( i )) 2 + λ f () 2 d. i

12 Splines and the smoothing parameter Smoothing parameter λ controls the stiffness of the spline. wear wear size size wear wear size size But the spline can be written ˆf () = i β ib i (), where the basis functions b i () do not depend on λ.

13 General spline theory background Consider a Hilbert space of real valued functions, f, on some domain τ (e.g. [0, 1]). It is a reproducing kernel Hilbert space, H, if evaluation is bounded. i.e. M s.t. f (t) M f. Then the Riesz representation thm says that there is a function R t H s.t. f (t) = R t, f. Now consider R t (u) as a function of t: R(t, u) R t, R s = R(t, s) so R(t, s) is known as reproducing kernel of H. Actually, to every positive definite function R(t, s) corresponds a unique r.k.h.s.

14 Spline smoothing problem RKHS are quite useful for constructing smooth models, to see why consider finding ˆf to minimize {y i f (t i )} 2 + λ f (t) 2 dt. i Let H have f, g = g (t)f (t)dt. Let H 0 denote the RKHS of functions for which f (t) 2 dt = 0, with basis φ 1 (t) = 1, φ 2 (t) = t, say. Spline problem seeks ˆf H 0 H to minimize {y i f (t i )} 2 + λ Pf 2. i

15 Spline smoothing solution ˆf (t) = n i=1 c ir ti (t) + M i=1 d iφ i (t) is the basis representation of ˆf. Why? Suppose minimizer were f = ˆf + η where η H and η ˆf : 1. η(t i ) = R ti, η = P f 2 = Pˆf 2 + η 2 which is minimized when η = obviously this argument is rather general. So if E ij = R ti, R tj and T ij = φ j (t i ) then we seek ĉ and ˆd to minimize y Td Ec λct Ec.... straightforward to compute (but at O(n 3 ) cost).

16 Reduced rank smoothers Can obtain efficient reduced rank basis (and penalty) by 1. using spline basis for a representative subset of data, or 2. using Lanczos methods to find a low order eigenbasis. In either case we end up representing the smoother as f () = j β j b j () basis functions, b j () known; coefficients β not. Corresponding smoothing penalty is β T Sβ. S is known. So y i = f ( i ) + ɛ i becomes y = Xβ + ɛ where X ij = b j ( i ) and ˆβ = argmin β y Xβ 2 + λβ T Sβ. Eamples follow asymptotic justification of rank reduction.

17 What rank? Consider, f, a rank k cubic regression spline (i.e. λ = 0), parameterized in terms of function values at evenly spaced knots. From basic regression theory, average sampling standard error of ˆf is O( k/n). From approimation theory for cubic splines, asymptotic bias of ˆf is bounded by O(k 4 ). So if we let k = O(n 1/9 ) we minimize the asymptotic MSE at O(n 8/9 ). Actually in practice we would want to penalize and then k = O(n 1/5 ) is appropriate. Point is that asymptotically k n is appropriate.

18 P-splines: B-spline basis & appro penalty y basis functions y (β i 1 2β i + β i i+1) 2 = y (β i 1 2β i + β i i+1) 2 = 16.4 y (β i 1 2β i + β i i+1) 2 =

19 Eample: Thin plate splines One way of generalizing splines from 1D to several D is to turn the fleible strip into a fleible sheet ˆf minimizing e.g. {y i f ( i, z i )} 2 + λ f 2 + 2fz 2 + fzzddz 2 i This results in a thin plate spline. It is an isotropic smooth. Isotropy may be appropriate when different covariates are naturally on the same scale linear predictor linear predictor linear predictor z z z

20 Cheaper thin plate splines RKHS or similar theory gives eplicit basis (for any dimension), with coefficients (c and d) minimizing y Td Ec 2 + λc T Ec s.t. T T c = 0. Can reduce O(n 3 ) cost to O(k 3 ) by replacing E by its rank k truncated eigen-decomposition (computed by Lanczos methods). Cheap and somewhat optimal truth t.p.r.s. 16 d.f knot basis

21 Non-isotropic tensor product smooths A different construction is needed if covariates have different scales. Start with marginal spline f z and let its coefficients vary smoothly with f(z) f(,z) z z

22 The complete tensor product smooth Let each coefficient of f z be a spline of. Construct is symmetric (see right). f(,z) f(,z) z z

23 Tensor product penalties - one per dimension -wiggliness: sum marginal penalties over red curves. z-wiggliness: sum marginal z penalties over green curves. f(,z) f(,z) z z

24 Tensor product epressions Suppose the basis epansions for smoothing w.r.t. and z marginally are j B jb j (z) and i α ia i ().... and the marginal penalties are B T S z B and α T S α. The tensor product basis construction gives: f (, z) = β ij b j (z)a i () With two penalties (requiring 2 smoothing parameters) J z (f ) = β T I I S z β and J (f ) = β T S I J β The construction generalizes to any number of marginals and multi-dimensional marginals.

25 z Isotropic vs. tensor product comparison Isotropic Thin Plate Spline Tensor Product Spline z 0.2 z z z z z z... each figure smooths the same data. The only modification is that has been divided by 5 in the bottom row.

26 Smooth model components A rich range of basis-penalty smoothers is possible... s(times,10.9) X a b f(,z) z c times Y s(region,63.04) d lat When combining several smooths in one model, we often need an identifiability constraint i f ( i) = 0. Re-write as 1 T Xβ = 0 and absorb by reparameterization. e 3 2 lon f

27 Representing a GAM Consider the GAM y i EF(µ i, φ), g(µ i ) = β 0 + j f j( ji ). Each f j () has a basis - penalty representation, say X j β j, β jt S j β j (with constraints absorbed). So the GAM becomes g(µ) = Xβ, where X = [1 : X 1 : X 2 :...] and β T = [β 0, β 1T, β 2T,...]. Penalty is now λj β T S j β, where S j is such that β T S j β β jt S j β j. Can easily include parametric components and linear functionals of smooths.

28 ˆβ given λ Let l be the model log-likelihood and use penalized MLE L(β) = l(β) 1 λ j β T S j β, 2 j ˆβ = argma β L(β) Optimize by Newton s method (penalized IRLS for EF). Bayes motivation. Prior: π(β) = N(0, ( λ j S j ) ); likelihood, π(y β): as is. Then MAP estimate is ˆβ. Further, in large sample limit, if I is information matri at ˆβ, β y N( ˆβ, (I + λ j S j ) 1 ) which can be used to construct well calibrated CIs. Note link to Gaussian random effects!

29 AIC, effective degrees of freedom, p-values Following through the derivation of AIC in the penalized case, yields AIC = 2l( ˆβ) + 2tr(F) where F = (I + λ j S j ) 1 I. Better still F = V β I, where V β is an approimate posterior covariance matri for β, corrected for λ uncertainty (see Greven and Kneib, 2010, Biometrika for why). tr(f) is the effective degrees of freedom of the model. p-values for testing smooth terms or variance components for equality to zero can also be obtained (see Wood 2013a,b Biometrika). But we still need smoothing parameter (λ) estimates.

30 Smoothing parameter estimates Let ρ = log λ and π(ρ) = constant, then the marginal likelihood is π(ρ y) = π(y β)π(β ρ)dβ Empirical Bayes: ˆρ = argma λ π(ρ y).... same as REML in Gaussian random effects contet. In practice use a Laplace approimation for the integral. log π(ρ y) L( ˆβ) 1 2 log λ j S j log H + c 2 H is Hessian of L. Log require numerical care.

31 How marginal likelihood works y y y λ too low, prior variance too high λ and prior variance about right λ too high, prior variance too low Draw β from prior implied by λ. Find average value of likelihood for these draws. Choose λ to maimize this average likelihood. i.e. formally, maimize π(y β)π(β λ)dβ w.r.t. λ. Cross validation is an alternative.

32 Numerical fitting strategy Optimize REML w.r.t. ρ = log(λ) by Newton s method. For each trial ρ Re-parameterize β so that log S + computation is stable. 2. Find ˆβ by an inner Newton optimization of L(β). 3. Use implicit differentiation to find first two derivatives of ˆβ w.r.t. ρ. 4. Compute log determinant terms and their derivatives. 5. Hence compute the REML score and its first two derivatives, as required for the net Newton step. For large datasets an alternative is often possible: Find ˆβ by iterative fitting of working linear models, and estimate ρ for the working model at each iteration step.

33 Where s the eponential family assumption?... the single parameter independent EF assumption of GAMs is barely used. It just simplifies some of the numerical computations (and adds to numerical robustness). Actually we can use most of the apparatus just described with almost any regular likelihood, provided its practical to compute with, and the first three or four derivatives w.r.t. to the model coefficients can also be computed...

34 Eample: predicting prostate status Healthy Enlarged Cancer Intensity Intensity Intensity Daltons Daltons Daltons Model: ordered category (benign/enlarged/cancer) predicted by logistic latent random variable with mean µ i = f (D)ν i (D)dD, ν i (D) is i th spectrum. gam(ds s(d,by=k),family=ocat(r=3)) a b c f(d) deviance residuals Daltons theoretical quantiles

35 Fuel efficiency of cars city.mpg hw.mpg width height weight hp

36 Multivariate additive model Correlated bivariate normal response (hw.mpg, city.mpg). Component means given by smooth additive predictors. Best model very simple (and somewhat unepected) a b c s(weight,6.71) s(hp,4.1) effects weight hp Gaussian quantiles d e f s.1(weight,5.71) s.1(hp,5.69) effects 6e 05 2e 05 2e weight hp Gaussian quantiles

37 Scale location models Can additively model mean and variance (and skew and... ) Simple eample: y i N(µ i, σ i ) µ i = j f j ( ji ), log σ i = j g j (z ji ). Here is a simple 1-D smoothing eample of this... accel σ times times

38 R packages There are many alternative R packages available: 1. gam for original backfitting approach vgam for vector GAMs and more gamlss GAMs for location scale and shape mboost GAMs via boosting. 5. gss Smoothing Spline ANOVA. 6. scam Shaped constrained additive models. 7. gamm4 GAMMs using lme4. 8. bayes MCMC and likelihood based GAMs Methods discussed here are in R recommended package mgcv... 1 No smoothing parameter selection 2 Limited smoothing parameter selection 3 see also mgcv::jagam

39 mgcv package in R

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Smoothness Selection. Simon Wood Mathematical Sciences, University of Bath, U.K.

Smoothness Selection. Simon Wood Mathematical Sciences, University of Bath, U.K. Smoothness Selection Simon Wood Mathematical Sciences, Universit of Bath, U.K. Smoothness selection approaches The smoothing model i = f ( i ) + ɛ i, ɛ i N(0, σ 2 ), is represented via a basis epansion

More information

Spatial smoothing using Gaussian processes

Spatial smoothing using Gaussian processes Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling

More information

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv, gamm4 mgcv is a package supplied with R for generalized additive modelling, including generalized additive mixed models.

More information

Smooth additive models for large datasets

Smooth additive models for large datasets Smooth additive models for large datasets Simon Wood, University of Bristol 1, U.K. in collaboration with Zheyuan Li, Gavin Shaddick, Nicole Augustin University of Bath Funded by EPSRC and the University

More information

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,

More information

1. SS-ANOVA Spaces on General Domains. 2. Averaging Operators and ANOVA Decompositions. 3. Reproducing Kernel Spaces for ANOVA Decompositions

1. SS-ANOVA Spaces on General Domains. 2. Averaging Operators and ANOVA Decompositions. 3. Reproducing Kernel Spaces for ANOVA Decompositions Part III 1. SS-ANOVA Spaces on General Domains 2. Averaging Operators and ANOVA Decompositions 3. Reproducing Kernel Spaces for ANOVA Decompositions 4. Building Blocks for SS-ANOVA Spaces, General and

More information

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

mboost - Componentwise Boosting for Generalised Regression Models

mboost - Componentwise Boosting for Generalised Regression Models mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting

More information

Generalized Additive Models (GAMs)

Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

GENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH. by Shui He B.S., Fudan University, 1993

GENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH. by Shui He B.S., Fudan University, 1993 GENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH by Shui He B.S., Fudan University, 1993 Submitted to the Graduate Faculty of the Graduate School

More information

Checking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Checking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K. Checking, Selecting & Predicting with GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Model checking Since a GAM is just a penalized GLM, residual plots should be checked, exactly as for

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Partial Generalized Additive Models

Partial Generalized Additive Models Partial Generalized Additive Models An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University

More information

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

Measurement Error in Spatial Modeling of Environmental Exposures

Measurement Error in Spatial Modeling of Environmental Exposures Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Chapter 10. Additive Models, Trees, and Related Methods

Chapter 10. Additive Models, Trees, and Related Methods Chapter 10 Additive Models, Trees, and Related Methods In observational studies we usually have observed predictors or covariates X 1, X 2,..., X p and a response variable Y. A scientist is interested

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Estimating prediction error in mixed models

Estimating prediction error in mixed models Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

gamboostlss: boosting generalized additive models for location, scale and shape

gamboostlss: boosting generalized additive models for location, scale and shape gamboostlss: boosting generalized additive models for location, scale and shape Benjamin Hofner Joint work with Andreas Mayr, Nora Fenske, Thomas Kneib and Matthias Schmid Department of Medical Informatics,

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS Eric Gilleland Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Supported

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

APTS course: 20th August 24th August 2018

APTS course: 20th August 24th August 2018 APTS course: 20th August 24th August 2018 Flexible Regression Preliminary Material Claire Miller & Tereza Neocleous The term flexible regression refers to a wide range of methods which provide flexibility

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Multinomial Regression Models

Multinomial Regression Models Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571 Models for

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Reduced-rank hazard regression

Reduced-rank hazard regression Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The

More information

arxiv: v1 [stat.me] 25 Sep 2007

arxiv: v1 [stat.me] 25 Sep 2007 Fast stable direct fitting and smoothness selection arxiv:0709.3906v1 [stat.me] 25 Sep 2007 for Generalized Additive Models Simon N. Wood Mathematical Sciences, University of Bath, Bath BA2 7AY U.K. s.wood@bath.ac.uk

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information