Modelling with smooth functions. Simon Wood University of Bath, EPSRC funded
|
|
- Marybeth Hoover
- 5 years ago
- Views:
Transcription
1 Modelling with smooth functions Simon Wood University of Bath, EPSRC funded
2 Some data... Daily respiratory deaths, temperature and ozone in Chicago (NMMAPS) deaths temperature ozone Apparently daily respiratory deaths are significantly negatively correlated with ozone and temperature. time But obviously there is confounding here - we need a model.
3 A model... death i Poi(µ i ) log(µ i ) = α + f 1 (t i ) + f 2 (o3 i, tmp i ) + f 3 (pm10 i ) The f j are smooth functions to estimate. 1. α + f 1 is the (log) background respiratory mortality rate. 2. f 2 is modification by ozone and temperature (interacting). 3. f 3 is the modification from particulates. Actually the predictors are aggregated over the 3 days preceding death. Fit in R (mgcv package) gam(death s(time,k=200)+te(o3,tmp)+s(pm10), family=poisson)
4 Air pollution model estimates... s(time,99.75) time te(o3,tmp,31.6) tmp s(pm10,3.09) o3 pm10 High ozone and temp associated with increased risk.
5 Some more data... log(1/r) octane Can we predict octane rating of fuel from near infra-red spectrum? 60 octane/spectrum pairs available (from pls package) wavelength (nm) Try a signal regression model. k i is i th spectrum, f is a smooth function... octane i = f (ν)k i (ν)dν + ɛ i
6 Signal regression model fit... If each row of matri NIR contains a spectral intensity, and each row of nm the corresponding wavelengths, then gam(octane s(nm,by=nir)) estimates the model. Plots of estimated f (ν) and fitted vs. data: s(nm,9.16):nir octane nm predicted octane
7 And one more dataset ret bmi dur gly How does probability of diabetic retinopathy relate to previous disease duration, body mass inde and % glycosylated hemoglobin? Wisconsin study (see gss package). Model: ret i Bernouilli, logit{e(ret)} = f 1 (dur) + f 2 (bmi) + f 3 (gly) + f 4 (dur, bmi) + f 5 (dur, gly) + f 6 (gly, bmi)
8 Retinopathy model estimates gam(ret s(dur) + s(gly) + s(bmi) + ti(dur,gly) + ti(dur,bmi) + ti(gly,bmi), family=binomial) s(dur,3.26) dur s(gly,1) gly s(bmi,2.67) bmi te(gly,bmi,2.5) te(dur,gly,0) te(dur,bmi,0) gly bmi bmi dur dur gly
9 Additive smooth models: structured fleibility The preceding are eamples of generalized additive models (GAM). A GAM is a GLM, in which the linear predictor depends linearly on unknown smooth functions of covariates... y i EF (µ i, φ), g(µ i ) = f 1 ( 1i ) + f 2 ( 2i ) + Hastie and Tibshirani (1990) and Wahba (1990) laid the foundations for these models as discussed here. If we can work with GAMs at all we can easily generalize: g(µ i ) = A i θ + j L ij f j ( j ) + Z i b L ij are linear functionals; A & Z are model matrices; θ and b are parameters and Gaussian random effects.
10 Additive smooth models & practical computation Making GAMs work in practice requires at least 3 things A way of representing the smooth functions f j. 2. A way of estimating the f j. 3. Some means of deciding how smooth the f j should be. 3 is the awkward part of the enterprise, and strongly influences 1 and 2. As well as point estimates we also need further inferential tools, such as interval estimates, model selection methods, AIC, p-values etc. We d also like to go beyond univariate eponential family. This talk will cover these things in order.
11 Representing smooth functions: splines To motivate how to represent several smooth terms in a model, first consider a simpler smoothing problem. Consider estimating the smooth function f in the model y i = f ( i ) + ɛ i from i, y i data using smoothing splines... wear size The red curve is the function minimizing (y i f ( i )) 2 + λ f () 2 d. i
12 Splines and the smoothing parameter Smoothing parameter λ controls the stiffness of the spline. wear wear size size wear wear size size But the spline can be written ˆf () = i β ib i (), where the basis functions b i () do not depend on λ.
13 General spline theory background Consider a Hilbert space of real valued functions, f, on some domain τ (e.g. [0, 1]). It is a reproducing kernel Hilbert space, H, if evaluation is bounded. i.e. M s.t. f (t) M f. Then the Riesz representation thm says that there is a function R t H s.t. f (t) = R t, f. Now consider R t (u) as a function of t: R(t, u) R t, R s = R(t, s) so R(t, s) is known as reproducing kernel of H. Actually, to every positive definite function R(t, s) corresponds a unique r.k.h.s.
14 Spline smoothing problem RKHS are quite useful for constructing smooth models, to see why consider finding ˆf to minimize {y i f (t i )} 2 + λ f (t) 2 dt. i Let H have f, g = g (t)f (t)dt. Let H 0 denote the RKHS of functions for which f (t) 2 dt = 0, with basis φ 1 (t) = 1, φ 2 (t) = t, say. Spline problem seeks ˆf H 0 H to minimize {y i f (t i )} 2 + λ Pf 2. i
15 Spline smoothing solution ˆf (t) = n i=1 c ir ti (t) + M i=1 d iφ i (t) is the basis representation of ˆf. Why? Suppose minimizer were f = ˆf + η where η H and η ˆf : 1. η(t i ) = R ti, η = P f 2 = Pˆf 2 + η 2 which is minimized when η = obviously this argument is rather general. So if E ij = R ti, R tj and T ij = φ j (t i ) then we seek ĉ and ˆd to minimize y Td Ec λct Ec.... straightforward to compute (but at O(n 3 ) cost).
16 Reduced rank smoothers Can obtain efficient reduced rank basis (and penalty) by 1. using spline basis for a representative subset of data, or 2. using Lanczos methods to find a low order eigenbasis. In either case we end up representing the smoother as f () = j β j b j () basis functions, b j () known; coefficients β not. Corresponding smoothing penalty is β T Sβ. S is known. So y i = f ( i ) + ɛ i becomes y = Xβ + ɛ where X ij = b j ( i ) and ˆβ = argmin β y Xβ 2 + λβ T Sβ. Eamples follow asymptotic justification of rank reduction.
17 What rank? Consider, f, a rank k cubic regression spline (i.e. λ = 0), parameterized in terms of function values at evenly spaced knots. From basic regression theory, average sampling standard error of ˆf is O( k/n). From approimation theory for cubic splines, asymptotic bias of ˆf is bounded by O(k 4 ). So if we let k = O(n 1/9 ) we minimize the asymptotic MSE at O(n 8/9 ). Actually in practice we would want to penalize and then k = O(n 1/5 ) is appropriate. Point is that asymptotically k n is appropriate.
18 P-splines: B-spline basis & appro penalty y basis functions y (β i 1 2β i + β i i+1) 2 = y (β i 1 2β i + β i i+1) 2 = 16.4 y (β i 1 2β i + β i i+1) 2 =
19 Eample: Thin plate splines One way of generalizing splines from 1D to several D is to turn the fleible strip into a fleible sheet ˆf minimizing e.g. {y i f ( i, z i )} 2 + λ f 2 + 2fz 2 + fzzddz 2 i This results in a thin plate spline. It is an isotropic smooth. Isotropy may be appropriate when different covariates are naturally on the same scale linear predictor linear predictor linear predictor z z z
20 Cheaper thin plate splines RKHS or similar theory gives eplicit basis (for any dimension), with coefficients (c and d) minimizing y Td Ec 2 + λc T Ec s.t. T T c = 0. Can reduce O(n 3 ) cost to O(k 3 ) by replacing E by its rank k truncated eigen-decomposition (computed by Lanczos methods). Cheap and somewhat optimal truth t.p.r.s. 16 d.f knot basis
21 Non-isotropic tensor product smooths A different construction is needed if covariates have different scales. Start with marginal spline f z and let its coefficients vary smoothly with f(z) f(,z) z z
22 The complete tensor product smooth Let each coefficient of f z be a spline of. Construct is symmetric (see right). f(,z) f(,z) z z
23 Tensor product penalties - one per dimension -wiggliness: sum marginal penalties over red curves. z-wiggliness: sum marginal z penalties over green curves. f(,z) f(,z) z z
24 Tensor product epressions Suppose the basis epansions for smoothing w.r.t. and z marginally are j B jb j (z) and i α ia i ().... and the marginal penalties are B T S z B and α T S α. The tensor product basis construction gives: f (, z) = β ij b j (z)a i () With two penalties (requiring 2 smoothing parameters) J z (f ) = β T I I S z β and J (f ) = β T S I J β The construction generalizes to any number of marginals and multi-dimensional marginals.
25 z Isotropic vs. tensor product comparison Isotropic Thin Plate Spline Tensor Product Spline z 0.2 z z z z z z... each figure smooths the same data. The only modification is that has been divided by 5 in the bottom row.
26 Smooth model components A rich range of basis-penalty smoothers is possible... s(times,10.9) X a b f(,z) z c times Y s(region,63.04) d lat When combining several smooths in one model, we often need an identifiability constraint i f ( i) = 0. Re-write as 1 T Xβ = 0 and absorb by reparameterization. e 3 2 lon f
27 Representing a GAM Consider the GAM y i EF(µ i, φ), g(µ i ) = β 0 + j f j( ji ). Each f j () has a basis - penalty representation, say X j β j, β jt S j β j (with constraints absorbed). So the GAM becomes g(µ) = Xβ, where X = [1 : X 1 : X 2 :...] and β T = [β 0, β 1T, β 2T,...]. Penalty is now λj β T S j β, where S j is such that β T S j β β jt S j β j. Can easily include parametric components and linear functionals of smooths.
28 ˆβ given λ Let l be the model log-likelihood and use penalized MLE L(β) = l(β) 1 λ j β T S j β, 2 j ˆβ = argma β L(β) Optimize by Newton s method (penalized IRLS for EF). Bayes motivation. Prior: π(β) = N(0, ( λ j S j ) ); likelihood, π(y β): as is. Then MAP estimate is ˆβ. Further, in large sample limit, if I is information matri at ˆβ, β y N( ˆβ, (I + λ j S j ) 1 ) which can be used to construct well calibrated CIs. Note link to Gaussian random effects!
29 AIC, effective degrees of freedom, p-values Following through the derivation of AIC in the penalized case, yields AIC = 2l( ˆβ) + 2tr(F) where F = (I + λ j S j ) 1 I. Better still F = V β I, where V β is an approimate posterior covariance matri for β, corrected for λ uncertainty (see Greven and Kneib, 2010, Biometrika for why). tr(f) is the effective degrees of freedom of the model. p-values for testing smooth terms or variance components for equality to zero can also be obtained (see Wood 2013a,b Biometrika). But we still need smoothing parameter (λ) estimates.
30 Smoothing parameter estimates Let ρ = log λ and π(ρ) = constant, then the marginal likelihood is π(ρ y) = π(y β)π(β ρ)dβ Empirical Bayes: ˆρ = argma λ π(ρ y).... same as REML in Gaussian random effects contet. In practice use a Laplace approimation for the integral. log π(ρ y) L( ˆβ) 1 2 log λ j S j log H + c 2 H is Hessian of L. Log require numerical care.
31 How marginal likelihood works y y y λ too low, prior variance too high λ and prior variance about right λ too high, prior variance too low Draw β from prior implied by λ. Find average value of likelihood for these draws. Choose λ to maimize this average likelihood. i.e. formally, maimize π(y β)π(β λ)dβ w.r.t. λ. Cross validation is an alternative.
32 Numerical fitting strategy Optimize REML w.r.t. ρ = log(λ) by Newton s method. For each trial ρ Re-parameterize β so that log S + computation is stable. 2. Find ˆβ by an inner Newton optimization of L(β). 3. Use implicit differentiation to find first two derivatives of ˆβ w.r.t. ρ. 4. Compute log determinant terms and their derivatives. 5. Hence compute the REML score and its first two derivatives, as required for the net Newton step. For large datasets an alternative is often possible: Find ˆβ by iterative fitting of working linear models, and estimate ρ for the working model at each iteration step.
33 Where s the eponential family assumption?... the single parameter independent EF assumption of GAMs is barely used. It just simplifies some of the numerical computations (and adds to numerical robustness). Actually we can use most of the apparatus just described with almost any regular likelihood, provided its practical to compute with, and the first three or four derivatives w.r.t. to the model coefficients can also be computed...
34 Eample: predicting prostate status Healthy Enlarged Cancer Intensity Intensity Intensity Daltons Daltons Daltons Model: ordered category (benign/enlarged/cancer) predicted by logistic latent random variable with mean µ i = f (D)ν i (D)dD, ν i (D) is i th spectrum. gam(ds s(d,by=k),family=ocat(r=3)) a b c f(d) deviance residuals Daltons theoretical quantiles
35 Fuel efficiency of cars city.mpg hw.mpg width height weight hp
36 Multivariate additive model Correlated bivariate normal response (hw.mpg, city.mpg). Component means given by smooth additive predictors. Best model very simple (and somewhat unepected) a b c s(weight,6.71) s(hp,4.1) effects weight hp Gaussian quantiles d e f s.1(weight,5.71) s.1(hp,5.69) effects 6e 05 2e 05 2e weight hp Gaussian quantiles
37 Scale location models Can additively model mean and variance (and skew and... ) Simple eample: y i N(µ i, σ i ) µ i = j f j ( ji ), log σ i = j g j (z ji ). Here is a simple 1-D smoothing eample of this... accel σ times times
38 R packages There are many alternative R packages available: 1. gam for original backfitting approach vgam for vector GAMs and more gamlss GAMs for location scale and shape mboost GAMs via boosting. 5. gss Smoothing Spline ANOVA. 6. scam Shaped constrained additive models. 7. gamm4 GAMMs using lme4. 8. bayes MCMC and likelihood based GAMs Methods discussed here are in R recommended package mgcv... 1 No smoothing parameter selection 2 Limited smoothing parameter selection 3 see also mgcv::jagam
39 mgcv package in R
An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.
An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η
More informationModel checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking
Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.
More informationBasis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.
Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the
More informationSmoothness Selection. Simon Wood Mathematical Sciences, University of Bath, U.K.
Smoothness Selection Simon Wood Mathematical Sciences, Universit of Bath, U.K. Smoothness selection approaches The smoothing model i = f ( i ) + ɛ i, ɛ i N(0, σ 2 ), is represented via a basis epansion
More informationSpatial smoothing using Gaussian processes
Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling
More informationmgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.
mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv, gamm4 mgcv is a package supplied with R for generalized additive modelling, including generalized additive mixed models.
More informationSmooth additive models for large datasets
Smooth additive models for large datasets Simon Wood, University of Bristol 1, U.K. in collaboration with Zheyuan Li, Gavin Shaddick, Nicole Augustin University of Bath Funded by EPSRC and the University
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More information1. SS-ANOVA Spaces on General Domains. 2. Averaging Operators and ANOVA Decompositions. 3. Reproducing Kernel Spaces for ANOVA Decompositions
Part III 1. SS-ANOVA Spaces on General Domains 2. Averaging Operators and ANOVA Decompositions 3. Reproducing Kernel Spaces for ANOVA Decompositions 4. Building Blocks for SS-ANOVA Spaces, General and
More informationGaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency
Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationFlexible Spatio-temporal smoothing with array methods
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationmboost - Componentwise Boosting for Generalised Regression Models
mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting
More informationGeneralized Additive Models (GAMs)
Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized
More informationP -spline ANOVA-type interaction models for spatio-temporal smoothing
P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationGENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH. by Shui He B.S., Fudan University, 1993
GENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH by Shui He B.S., Fudan University, 1993 Submitted to the Graduate Faculty of the Graduate School
More informationChecking, Selecting & Predicting with GAMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
Checking, Selecting & Predicting with GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Model checking Since a GAM is just a penalized GLM, residual plots should be checked, exactly as for
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationPartial Generalized Additive Models
Partial Generalized Additive Models An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University
More informationSimultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationMotivational Example
Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationGeneralized additive modelling of hydrological sample extremes
Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of
More informationMeasurement Error in Spatial Modeling of Environmental Exposures
Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationChapter 10. Additive Models, Trees, and Related Methods
Chapter 10 Additive Models, Trees, and Related Methods In observational studies we usually have observed predictors or covariates X 1, X 2,..., X p and a response variable Y. A scientist is interested
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationEffective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression
Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationComparing Non-informative Priors for Estimation and Prediction in Spatial Models
Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationComputation fundamentals of discrete GMRF representations of continuous domain spatial models
Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationOdds ratio estimation in Bernoulli smoothing spline analysis-ofvariance
The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationOn Gaussian Process Models for High-Dimensional Geostatistical Datasets
On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationgamboostlss: boosting generalized additive models for location, scale and shape
gamboostlss: boosting generalized additive models for location, scale and shape Benjamin Hofner Joint work with Andreas Mayr, Nora Fenske, Thomas Kneib and Matthias Schmid Department of Medical Informatics,
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationSTATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS
STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS Eric Gilleland Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Supported
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationAPTS course: 20th August 24th August 2018
APTS course: 20th August 24th August 2018 Flexible Regression Preliminary Material Claire Miller & Tereza Neocleous The term flexible regression refers to a wide range of methods which provide flexibility
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationLinear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.
Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationMultinomial Regression Models
Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571 Models for
More informationAlternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods
Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationReduced-rank hazard regression
Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The
More informationarxiv: v1 [stat.me] 25 Sep 2007
Fast stable direct fitting and smoothness selection arxiv:0709.3906v1 [stat.me] 25 Sep 2007 for Generalized Additive Models Simon N. Wood Mathematical Sciences, University of Bath, Bath BA2 7AY U.K. s.wood@bath.ac.uk
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information