Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation
|
|
- Delphia Benson
- 6 years ago
- Views:
Transcription
1 Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory storlie@lanl.gov
2 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
3 Motivating Example Computational Model from Yucca Mountain Certification 150 input variables (several of which are discrete in nature), dozens of time dependent responses Response Variable (for this illustration) ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of 239Pu (Plutonium 239) out of the Engineered Barrier System into the Unsaturated Zone at 10,000 years. Model is very expensive to run, we have an Latin Hypercube sample of size n = 300 where the model is evaluated. How to perform sensitivity/uncertainty analysis?
4 Computer Model Emulation An emulator is a simpler model that mimics a larger physical model. Evaluations of an emulator are much faster. Nonparametric Regression We have n observations from the model y i = f(x i )+ε i i = 1,...,n where x i = (x i1,...,x ip ), and f is the physical model. Usually some weak assumptions are made about f (e.g., f belongs to a smooth class of functions). Methods of Estimation: Orthogonal Series/Wavelets, Kernel Smoothing/local regression, Penalization Methods (Smoothing Splines, Gaussian Processes), Machine Learning/Algorithmic Approaches. With limited number of model evaluations and high number of inputs, we need to reduce emulator complexity.
5 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
6 Variable Selection in Regression Models Focus for now on variable selection for the linear model y = β 0 + p β j x j +ε Stepwise/best subsets type model fitting Can produce unstable estimates More recently: Continuous shrinkage using L1 penalty (LASSO), Tibshirani 1996 Stochastic Search Variable Selection (SVSS), George & McCulloch 1993, 1997.
7 Shrinkage, aka Penalized Regression Ridge Regression: Find the minimizing β j s to 1 n n y i β 0 i=1 p β j x i,j 2 +λ Note: All the x s must first be standardized. p Improved MSE Estimation via bias-variance trade-off. β 2 j Ridge Regression is equivalent to minimizing 1 n n y i β 0 i=1 p β j x i,j 2 subject to p βj 2 < t 2 for some t(λ).
8 Shrinkage, aka Penalized Regression LASSO: Find the minimizing β j s to 1 n n y i β 0 i=1 p β j x i,j 2 +λ p ( ) β 2 1/2 j This is equivalent to minimizing 1 n n y i β 0 i=1 for some t(λ). p β j x i,j 2 subject to p β j < t
9 Geometry of Ridge Regression and the LASSO
10 Stochastic Search Variable Selection (SSVS) Linear Regression: y = β 0 + p β jx j +ε, where: βj = γ j α j γj Bern(π j ) αj N(0,τ 2 j ) (γ 1,...,γ p ) is the model, and is treated as an unknown random variable. The prior probability that x j is included in the model is P(β j 0) = π j. Inference is based on the posterior probability that x j is included in the model, P(β j 0 y). It is common to determine the best model as the one that includes the variables that have P(β j 0 y) > 0.5.
11 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
12 Functional ANOVA Decomposition Any function f(x) can be decomosed into main effects and interactions, p p f(x) = µ 0 + f j (x j )+ f j,k (x j,x k )+, j<k where µ 0 is the mean, f j are the main effects, f j,k are the two-way interactions, and ( ) are the higher order interactions. The functional components (f j,f j,k, ) are an orthogonal decomposition of the space, which implies the constraints 1 0 f j(x j )dx j = 0 for all j and 1 0 f j,k(x j,x k )dx j = 0 for all j,k, and similar relations for higher order interactions. This insures identifiability of the functional components.
13 Functional ANOVA Decomposition A convenient way to treat the high order interactions is to let p p f(x) = µ 0 + f j (x j )+ f j,k (x j,x k )+f R (x) j<k where f R is a high-order interaction (catch-all) remainder. In general we can say the function f(x) lies in some space F, q F = {1} F j (1) where {1},F 1...F q is an orthogonal decomposition of the space. For the example above, we would have f 1 F 1,...,f p F p,f 1,2 F p+1,... Continuity assumptions on f, such as number of continuous derivatives, can be built in through the choice of the F j.
14 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
15 The General Smoothing Spline The L-spline estimate ˆf is given by the minimizer of 1 n q (y i f(x i )) 2 +λ P j f 2 n F, i=1 over f F. P j f is the orthogonal projection of f onto F j, j = 1,...,q. For the additive model with each component function in S 2 = {g : g, g are absolutely continuous and g L 2 [0,1]} then ˆf is given by the minimizer of 1 n n (y i f(x i )) 2 +λ i=1 p { [f j (1) f j (0)] [ f j (x j ) ] 2 dxj } The solution can be obtained conveniently with tools from reproducing kernel Hilbert space theory (see Wahba 1990).
16 Adaptive COmponent Selection and Smoothing Operator LASSO is to Ridge Regression as ACOSSO is to the Smoothing Spline. Find the minimizer over f F of 1 n q (y i f(x i )) 2 +λ w j P j f F. n i=1 For the additive model with each the minimization becomes 1 n p 1 } 1/2 (y i f(x i )) 2 +λ w j {[f j (1) f j (0)] 2 + [f j (x j )] 2 dx j n i=1 This estimator sets some of the functional components (f j s) equal to exactly zero (i.e., x j is removed from the model.) We want w j to allow prominent functional components to enjoy the benefit of a smaller penalty. Use a weight based on L 2 norm of an initial estimate f ( 1 ) γ/2 w j = f j γ L 2 = ( f j (x j )) 2 dx j 0 0
17 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
18 Bayesian Smoothing Spline ANOVA (BSS-ANOVA) Assume f(x) = µ 0 + p f j (x j )+ Model the mean as µ 0 N(0,τ 2 0 ) p f j,k (x j,x k )+f R (x) j<k Model the f j GP(0,τ 2 j K 1), f j,k GP(0,τ 2 j,k K 2) and f R GP(0,τ 2 R K R). The covariances functions K 1,K 2,K R are such that the functions µ 0,f j,f j,k,f R obey the Functional ANOVA constraints almost surely. They can also be chosen for desired level of continuity. Lastly apply SSVS to the variance parameters τj 2, τ2 j,k, j,k = 1,2,...,p, and τ R to accomplish variable selection.
19 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
20 Treating Discrete Inputs Discrete Inputs can be thought of as having a graphical structure. Two examples where j th predictor x j {0,1,2,3,4}:
21 Treating Discrete Inputs Use Functional ANOVA framework to allow for these discrete predictors. Restrictions implied on the discrete input main effect component are c f j(c) = 0, and similarly for interactions. The norm (penalty) used is f Lf, where L = D A is the graph Laplacian matrix. It can be shown that f Lf = A l,m [f(l) f(m)] 2, i.e., the penalty is the sum (weighted by the adjacency) of all of the squared distances between neighboring nodes. There is also a corresponding covariance function K 1 which enforces the ANOVA constaints for f j in the BSS-ANOVA framework as well. Something like a harmonic expansion over the graph domain with variance decreasing with frequency.
22 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
23 Simulation Study x j iid Unif{1,2,3,4,5,6} for j = 1,...,4 x j iid Unif(0,1) for j = 5,...,15. x 1,...,x 4 are unordered qualitative factors. The test function used here is a function of only 3 inputs (2 of which are qualitative). So 12 of the 15 inputs are completely uninformative. Collect a sample of size n = 100 from y i = f(x i )+ε i, where ε i iid N(0,1), giving SNR 100 : 1 for the 2 test cases.
24 Test function
25 Simulation Results Estimator Pred MSE Pred 99% CDF ISE ACOSSO 0.28 (0.03) 3.98 (0.60) (0.000) BSS-ANOVA 0.18 (0.01) 3.09 (0.46) (0.000) GP 1.09 (0.06) (1.73) (0.001) Pred MSE Average over the 100 realizations of the Mean Squared Error for prediction of new observations. Pred 99% Average over the 100 realizations of the 99 th percentile of the squared error for prediction of a new observation. CDF ISE Average over the 100 realizations of the integrated squared error of the true CDF curve to the estimated CDF via the emulator.
26 Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive Component Selection and Smoothing Operator Bayesian Smoothing Spline ANOVA Models Discrete Inputs Simulation Study Example from the Yucca Mountain Analysis Conclusions and Further Work
27 Yucca Mountain Certification Response Variable (for this illustration) ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of 239Pu (Plutonium 239) out of the Engineered Barrier System into the Unsaturated Zone at 10,000 years. Predictor Variables (that appear in plots below) TH.INFIL: Categorical variable describing different scenarios for infiltration and thermal conductivity in the region surrounding the drifts. high relative humidity ( 85%). CPUCOLWF: Concentration of irreversibly attached plutonium on glass/waste form colloids when colloids are stable (mol/l).
28 Yucca Mountain Certification
29 Yucca Mountain Certification
30 Yucca Mountain Certification Below is a Sensitivity Analysis for ESIC239C.10K Let T j denote the total variance index for the j th input (i.e., T j is the proportion of the total variance of the output that can be attributed to the j th input and its interactions). Meta-model: ACOSSO Model Summary: R 2 = 0.960, model df = 92 Input ˆTj 95% T j CI p-val CPUCOLWF (0.473, 0.621) < 0.01 TH.INFIL (0.360, 0.518) < 0.01 RHMUNO (0.052, 0.126) < 0.01 FHHISSCS (0.041, 0.106) < 0.01 SEEPUNC (0.000, 0.040) 0.10
31 Conclusions and Further Work Functional ANOVA construction and variable selection can help to increase efficiency in function estimation. A general treatment of graphical inputs easily allows for ordinal and qualitative inputs as special cases. When using Functional ANOVA construction, the main effect and interaction functions are immediately available (i.e., no need to numerically integrate). Functional ANOVA construction also lends itself well to allowing for nonstationarity in function estimation. The overall function (which is potentially quite complex) is composed of fairly simple functions (i.e., main effects or 2-way interactions), so the extension is much easier than for a general function of p inputs.
32 References 1. Tibshirani, R. (1996), Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B. 2. George, E. & McCulloch, R. (1993), Variable selection via Gibbs sampling, Journal of the American Statistical Association. 3. Wahba, G. (1990), Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics. 4. Storlie, C., Bondell, H., Reich, B. & Zhang, H. (2009a), Surface estimation, variable selection, and the nonparametric oracle property, Statistica Sinica. 5. Reich, B., Storlie, C. & Bondell, H.D. (2009), Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes, Technometrics. 6. Smola, A. &Kondor, R. (2003), Kernels and Regularization on Graphs. In Learning theory and Kernel machines.
Linear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationImplementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models
Implementation and Evaluation of Nonparametric Regression Procedures for Sensitivity Analysis of Computationally Demanding Models Curtis B. Storlie a, Laura P. Swiler b, Jon C. Helton b and Cedric J. Sallaberry
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationVariable Selection for Nonparametric Quantile. Regression via Smoothing Spline ANOVA
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline ANOVA Chen-Yen Lin, Hao Helen Zhang, Howard D. Bondell and Hui Zou February 15, 2012 Author s Footnote: Chen-Yen Lin (E-mail:
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationGrouping Pursuit in regression. Xiaotong Shen
Grouping Pursuit in regression Xiaotong Shen School of Statistics University of Minnesota Email xshen@stat.umn.edu Joint with Hsin-Cheng Huang (Sinica, Taiwan) Workshop in honor of John Hartigan Innovation
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationSpline Density Estimation and Inference with Model-Based Penalities
Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationHomogeneity Pursuit. Jianqing Fan
Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationIntroduction to Smoothing spline ANOVA models (metamodelling)
Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationReproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial
Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial Alvaro Nosedal-Sanchez a, Curtis B. Storlie b, Thomas C.M. Lee c, Ronald Christensen d a Indiana University of Pennsylvania b Los
More informationMulti-scale modeling with generalized dynamic discrepancy
Multi-scale modeling with generalized dynamic discrepancy David S. Mebane,*, K. Sham Bhat and Curtis B. Storlie National Energy Technology Laboratory *Department of Mechanical and Aerospace Engineering,
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationSpatially Adaptive Smoothing Splines
Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationA Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters
A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters C. F. Jeff Wu H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute
More informationMonitoring Wafer Geometric Quality using Additive Gaussian Process
Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationSelection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions
in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationDoubly Decomposing Nonparametric Tensor Regression (ICML 2016)
Doubly Decomposing Nonparametric Tensor Regression (ICML 2016) M.Imaizumi (Univ. of Tokyo / JSPS DC) K.Hayashi (AIST / JST ERATO) 2016/08/10 Outline Topic Nonparametric Regression with Tensor input Model
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationGaussian Processes for Computer Experiments
Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationIterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017
Iterative Gaussian Process Regression for Potential Energy Surfaces Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Outline Motivation: Calculation of potential energy surfaces (PES)
More informationCOMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES
Statistica Sinica 6(26), 2-4 COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES Hao Helen Zhang and Yi Lin North Carolina State University and University of Wisconsin
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationOdds ratio estimation in Bernoulli smoothing spline analysis-ofvariance
The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationGaussian Process Regression Networks
Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationRegularization Methods for Additive Models
Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationNonnegative Garrote Component Selection in Functional ANOVA Models
Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationHomework 1: Solutions
Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationDiscussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are
More information