Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Size: px
Start display at page:

Download "Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation"

Transcription

1 Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory storlie@lanl.gov

2 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

3 Motivating Example Computational Model from Yucca Mountain Certification 150 input variables (several of which are discrete in nature), dozens of time dependent responses Response Variable (for this illustration) ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of 239Pu (Plutonium 239) out of the Engineered Barrier System into the Unsaturated Zone at 10,000 years. Model is very expensive to run, we have an Latin Hypercube sample of size n = 300 where the model is evaluated. How to perform sensitivity/uncertainty analysis?

4 Computer Model Emulation An emulator is a simpler model that mimics a larger physical model. Evaluations of an emulator are much faster. Nonparametric Regression We have n observations from the model y i = f(x i )+ε i i = 1,...,n where x i = (x i1,...,x ip ), and f is the physical model. Usually some weak assumptions are made about f (e.g., f belongs to a smooth class of functions). Methods of Estimation: Orthogonal Series/Wavelets, Kernel Smoothing/local regression, Penalization Methods (Smoothing Splines, Gaussian Processes), Machine Learning/Algorithmic Approaches. With limited number of model evaluations and high number of inputs, we need to reduce emulator complexity.

5 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

6 Variable Selection in Regression Models Focus for now on variable selection for the linear model y = β 0 + p β j x j +ε Stepwise/best subsets type model fitting Can produce unstable estimates More recently: Continuous shrinkage using L1 penalty (LASSO), Tibshirani 1996 Stochastic Search Variable Selection (SVSS), George & McCulloch 1993, 1997.

7 Shrinkage, aka Penalized Regression Ridge Regression: Find the minimizing β j s to 1 n n y i β 0 i=1 p β j x i,j 2 +λ Note: All the x s must first be standardized. p Improved MSE Estimation via bias-variance trade-off. β 2 j Ridge Regression is equivalent to minimizing 1 n n y i β 0 i=1 p β j x i,j 2 subject to p βj 2 < t 2 for some t(λ).

8 Shrinkage, aka Penalized Regression LASSO: Find the minimizing β j s to 1 n n y i β 0 i=1 p β j x i,j 2 +λ p ( ) β 2 1/2 j This is equivalent to minimizing 1 n n y i β 0 i=1 for some t(λ). p β j x i,j 2 subject to p β j < t

9 Geometry of Ridge Regression and the LASSO

10 Stochastic Search Variable Selection (SSVS) Linear Regression: y = β 0 + p β jx j +ε, where: βj = γ j α j γj Bern(π j ) αj N(0,τ 2 j ) (γ 1,...,γ p ) is the model, and is treated as an unknown random variable. The prior probability that x j is included in the model is P(β j 0) = π j. Inference is based on the posterior probability that x j is included in the model, P(β j 0 y). It is common to determine the best model as the one that includes the variables that have P(β j 0 y) > 0.5.

11 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

12 Functional ANOVA Decomposition Any function f(x) can be decomosed into main effects and interactions, p p f(x) = µ 0 + f j (x j )+ f j,k (x j,x k )+, j<k where µ 0 is the mean, f j are the main effects, f j,k are the two-way interactions, and ( ) are the higher order interactions. The functional components (f j,f j,k, ) are an orthogonal decomposition of the space, which implies the constraints 1 0 f j(x j )dx j = 0 for all j and 1 0 f j,k(x j,x k )dx j = 0 for all j,k, and similar relations for higher order interactions. This insures identifiability of the functional components.

13 Functional ANOVA Decomposition A convenient way to treat the high order interactions is to let p p f(x) = µ 0 + f j (x j )+ f j,k (x j,x k )+f R (x) j<k where f R is a high-order interaction (catch-all) remainder. In general we can say the function f(x) lies in some space F, q F = {1} F j (1) where {1},F 1...F q is an orthogonal decomposition of the space. For the example above, we would have f 1 F 1,...,f p F p,f 1,2 F p+1,... Continuity assumptions on f, such as number of continuous derivatives, can be built in through the choice of the F j.

14 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

15 The General Smoothing Spline The L-spline estimate ˆf is given by the minimizer of 1 n q (y i f(x i )) 2 +λ P j f 2 n F, i=1 over f F. P j f is the orthogonal projection of f onto F j, j = 1,...,q. For the additive model with each component function in S 2 = {g : g, g are absolutely continuous and g L 2 [0,1]} then ˆf is given by the minimizer of 1 n n (y i f(x i )) 2 +λ i=1 p { [f j (1) f j (0)] [ f j (x j ) ] 2 dxj } The solution can be obtained conveniently with tools from reproducing kernel Hilbert space theory (see Wahba 1990).

16 Adaptive COmponent Selection and Smoothing Operator LASSO is to Ridge Regression as ACOSSO is to the Smoothing Spline. Find the minimizer over f F of 1 n q (y i f(x i )) 2 +λ w j P j f F. n i=1 For the additive model with each the minimization becomes 1 n p 1 } 1/2 (y i f(x i )) 2 +λ w j {[f j (1) f j (0)] 2 + [f j (x j )] 2 dx j n i=1 This estimator sets some of the functional components (f j s) equal to exactly zero (i.e., x j is removed from the model.) We want w j to allow prominent functional components to enjoy the benefit of a smaller penalty. Use a weight based on L 2 norm of an initial estimate f ( 1 ) γ/2 w j = f j γ L 2 = ( f j (x j )) 2 dx j 0 0

17 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

18 Bayesian Smoothing Spline ANOVA (BSS-ANOVA) Assume f(x) = µ 0 + p f j (x j )+ Model the mean as µ 0 N(0,τ 2 0 ) p f j,k (x j,x k )+f R (x) j<k Model the f j GP(0,τ 2 j K 1), f j,k GP(0,τ 2 j,k K 2) and f R GP(0,τ 2 R K R). The covariances functions K 1,K 2,K R are such that the functions µ 0,f j,f j,k,f R obey the Functional ANOVA constraints almost surely. They can also be chosen for desired level of continuity. Lastly apply SSVS to the variance parameters τj 2, τ2 j,k, j,k = 1,2,...,p, and τ R to accomplish variable selection.

19 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

20 Treating Discrete Inputs Discrete Inputs can be thought of as having a graphical structure. Two examples where j th predictor x j {0,1,2,3,4}:

21 Treating Discrete Inputs Use Functional ANOVA framework to allow for these discrete predictors. Restrictions implied on the discrete input main effect component are c f j(c) = 0, and similarly for interactions. The norm (penalty) used is f Lf, where L = D A is the graph Laplacian matrix. It can be shown that f Lf = A l,m [f(l) f(m)] 2, i.e., the penalty is the sum (weighted by the adjacency) of all of the squared distances between neighboring nodes. There is also a corresponding covariance function K 1 which enforces the ANOVA constaints for f j in the BSS-ANOVA framework as well. Something like a harmonic expansion over the graph domain with variance decreasing with frequency.

22 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

23 Simulation Study x j iid Unif{1,2,3,4,5,6} for j = 1,...,4 x j iid Unif(0,1) for j = 5,...,15. x 1,...,x 4 are unordered qualitative factors. The test function used here is a function of only 3 inputs (2 of which are qualitative). So 12 of the 15 inputs are completely uninformative. Collect a sample of size n = 100 from y i = f(x i )+ε i, where ε i iid N(0,1), giving SNR 100 : 1 for the 2 test cases.

24 Test function

25 Simulation Results Estimator Pred MSE Pred 99% CDF ISE ACOSSO 0.28 (0.03) 3.98 (0.60) (0.000) BSS-ANOVA 0.18 (0.01) 3.09 (0.46) (0.000) GP 1.09 (0.06) (1.73) (0.001) Pred MSE Average over the 100 realizations of the Mean Squared Error for prediction of new observations. Pred 99% Average over the 100 realizations of the 99 th percentile of the squared error for prediction of a new observation. CDF ISE Average over the 100 realizations of the integrated squared error of the true CDF curve to the estimated CDF via the emulator.

26 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work

27 Yucca Mountain Certification Response Variable (for this illustration) ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of 239Pu (Plutonium 239) out of the Engineered Barrier System into the Unsaturated Zone at 10,000 years. Predictor Variables (that appear in plots below) TH.INFIL: Categorical variable describing different scenarios for infiltration and thermal conductivity in the region surrounding the drifts. high relative humidity ( 85%). CPUCOLWF: Concentration of irreversibly attached plutonium on glass/waste form colloids when colloids are stable (mol/l).

28 Yucca Mountain Certification

29 Yucca Mountain Certification

30 Yucca Mountain Certification Below is a Sensitivity Analysis for ESIC239C.10K Let T j denote the total variance index for the j th input (i.e., T j is the proportion of the total variance of the output that can be attributed to the j th input and its interactions). Meta-model: ACOSSO Model Summary: R 2 = 0.960, model df = 92 Input ˆTj 95% T j CI p-val CPUCOLWF (0.473, 0.621) < 0.01 TH.INFIL (0.360, 0.518) < 0.01 RHMUNO (0.052, 0.126) < 0.01 FHHISSCS (0.041, 0.106) < 0.01 SEEPUNC (0.000, 0.040) 0.10

31 Conclusions and Further Work Functional ANOVA construction and variable selection can help to increase efficiency in function estimation. A general treatment of graphical inputs easily allows for ordinal and qualitative inputs as special cases. When using Functional ANOVA construction, the main effect and interaction functions are immediately available (i.e., no need to numerically integrate). Functional ANOVA construction also lends itself well to allowing for nonstationarity in function estimation. The overall function (which is potentially quite complex) is composed of fairly simple functions (i.e., main effects or 2-way interactions), so the extension is much easier than for a general function of p inputs.

32 References 1. Tibshirani, R. (1996), Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B. 2. George, E. & McCulloch, R. (1993), Variable selection via Gibbs sampling, Journal of the American Statistical Association. 3. Wahba, G. (1990), Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics. 4. Storlie, C., Bondell, H., Reich, B. & Zhang, H. (2009a), Surface estimation, variable selection, and the nonparametric oracle property, Statistica Sinica. 5. Reich, B., Storlie, C. & Bondell, H.D. (2009), Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes, Technometrics. 6. Smola, A. &Kondor, R. (2003), Kernels and Regularization on Graphs. In Learning theory and Kernel machines.

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models

Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models Curtis B. Storlie a, Laura P. Swiler b, Jon C. Helton b and Cedric J. Sallaberry

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Variable Selection for Nonparametric Quantile. Regression via Smoothing Spline ANOVA

Variable Selection for Nonparametric Quantile. Regression via Smoothing Spline ANOVA Variable Selection for Nonparametric Quantile Regression via Smoothing Spline ANOVA Chen-Yen Lin, Hao Helen Zhang, Howard D. Bondell and Hui Zou February 15, 2012 Author s Footnote: Chen-Yen Lin (E-mail:

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Grouping Pursuit in regression. Xiaotong Shen

Grouping Pursuit in regression. Xiaotong Shen Grouping Pursuit in regression Xiaotong Shen School of Statistics University of Minnesota Email xshen@stat.umn.edu Joint with Hsin-Cheng Huang (Sinica, Taiwan) Workshop in honor of John Hartigan Innovation

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Introduction to Smoothing spline ANOVA models (metamodelling)

Introduction to Smoothing spline ANOVA models (metamodelling) Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida

More information

Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial

Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial Alvaro Nosedal-Sanchez a, Curtis B. Storlie b, Thomas C.M. Lee c, Ronald Christensen d a Indiana University of Pennsylvania b Los

More information

Multi-scale modeling with generalized dynamic discrepancy

Multi-scale modeling with generalized dynamic discrepancy Multi-scale modeling with generalized dynamic discrepancy David S. Mebane,*, K. Sham Bhat and Curtis B. Storlie National Energy Technology Laboratory *Department of Mechanical and Aerospace Engineering,

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Spatially Adaptive Smoothing Splines

Spatially Adaptive Smoothing Splines Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters

A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters C. F. Jeff Wu H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute

More information

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Monitoring Wafer Geometric Quality using Additive Gaussian Process Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Doubly Decomposing Nonparametric Tensor Regression (ICML 2016)

Doubly Decomposing Nonparametric Tensor Regression (ICML 2016) Doubly Decomposing Nonparametric Tensor Regression (ICML 2016) M.Imaizumi (Univ. of Tokyo / JSPS DC) K.Hayashi (AIST / JST ERATO) 2016/08/10 Outline Topic Nonparametric Regression with Tensor input Model

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Iterative Gaussian Process Regression for Potential Energy Surfaces Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Outline Motivation: Calculation of potential energy surfaces (PES)

More information

COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES

COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES Statistica Sinica 6(26), 2-4 COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES Hao Helen Zhang and Yi Lin North Carolina State University and University of Wisconsin

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Regularization Methods for Additive Models

Regularization Methods for Additive Models Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information