Lasso & Bayesian Lasso

Size: px
Start display at page:

Download "Lasso & Bayesian Lasso"

Transcription

1 Readings Chapter 15 Christensen Merlise Clyde October 6, 2015

2 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection Operator Control how large coefficients may grow

3 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection Operator Control how large coefficients may grow subject to min β (Yc X c β c ) T (Y c X c β c ) β c j t

4 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection Operator Control how large coefficients may grow subject to min β (Yc X c β c ) T (Y c X c β c ) β c j t Equivalent Quadratic Programming Problem for penalized Likelihood min β c Y c X c β c 2 + λ β c 1

5 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection Operator Control how large coefficients may grow subject to min β (Yc X c β c ) T (Y c X c β c ) β c j t Equivalent Quadratic Programming Problem for penalized Likelihood min β c Y c X c β c 2 + λ β c 1 Posterior mode max β φ 2 { Yc X c β c 2 + λ β c 1 }

6 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection Operator Control how large coefficients may grow subject to min β (Yc X c β c ) T (Y c X c β c ) β c j t Equivalent Quadratic Programming Problem for penalized Likelihood min β c Y c X c β c 2 + λ β c 1 Posterior mode max β φ 2 { Yc X c β c 2 + λ β c 1 }

7 Picture

8 R Code > library(lars) > longley.lars = lars(as.matrix(longley[,-7]), longley[,7], type="lasso") > plot(longley.lars) LASSO Standardized Coefficients * * * * * * * ** * * * * * * * * ** * * ** * * * * * * * * * * * * * * * * * beta /max beta

9 Solutions > round(coef(longley.lars),5) GNP.deflator GNP Unemployed Armed.Forces Population Year [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,]

10 Cp Solution Min C p = SSE p /ˆσ 2 F n + 2p

11 Cp Solution Min C p = SSE p /ˆσ 2 F n + 2p > summary(longley.lars) LARS/LASSO Call: lars(x = as.matrix(longley[, -7]), y = longley[, 7], type Df Rss Cp

12 Cp Solution Min C p = SSE p /ˆσ 2 F n + 2p > summary(longley.lars) LARS/LASSO Call: lars(x = as.matrix(longley[, -7]), y = longley[, 7], type Df Rss Cp GNP.deflator GNP Unemployed Armed.Forces Population Year [7,]

13 Features Combines shrinkage (like Ridge Regression) with Selection (like stepwise selection)

14 Features Combines shrinkage (like Ridge Regression) with Selection (like stepwise selection) Uncertainty? Interval estimates?

15 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso

16 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ)

17 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ) β α, φ, τ N(0, diag(τ 2 )/φ)

18 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ) β α, φ, τ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ iid Exp(λ 2 /2)

19 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ) β α, φ, τ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ iid Exp(λ 2 /2) p(α, φ) 1/φ

20 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ) β α, φ, τ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ iid Exp(λ 2 /2) p(α, φ) 1/φ Can show that β j φ, λ iid DE(λ φ) 0 1 e 1 2 φ β2 s 2πs λ 2 2 e λ 2 s 2 ds = λφ1/2 2 e λφ1/2 β

21 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β, φ N(1 n α + X c β, I n /φ) β α, φ, τ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ iid Exp(λ 2 /2) p(α, φ) 1/φ Can show that β j φ, λ iid DE(λ φ) 0 1 e 1 2 φ β2 s 2πs λ 2 2 e λ 2 s 2 ds = λφ1/2 2 e λφ1/2 β Scale Mixture of Normals (Andrews and Mallows 1974)

22 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ)

23 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ) β τ, φ, λ, Y N(, )

24 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ) β τ, φ, λ, Y N(, ) φ τ, β, λ, Y G(, )

25 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ) β τ, φ, λ, Y N(, ) φ τ, β, λ, Y G(, ) 1/τ 2 j β, φ, λ, Y InvGaussian(, )

26 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ) β τ, φ, λ, Y N(, ) φ τ, β, λ, Y G(, ) 1/τ 2 j β, φ, λ, Y InvGaussian(, ) X InvGaussian(µ, λ) λ 2 f (x) = 2π x 3/2 e 1 λ 2 (x µ) 2 2 µ 2 x x > 0

27 Gibbs Sampling Integrate out α: α Y, φ N(ȳ, 1/(nφ) β τ, φ, λ, Y N(, ) φ τ, β, λ, Y G(, ) 1/τ 2 j β, φ, λ, Y InvGaussian(, ) X InvGaussian(µ, λ) λ 2 f (x) = 2π x 3/2 e 1 λ 2 (x µ) 2 2 µ 2 x x > 0 Homework: Derive the full conditionals for β, φ, 1/τ 2 see

28 Other Options Range of other scale mixtures used

29 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott)

30 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott) Generalized Double Pareto (Armagan, Dunson & Lee)

31 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott) Generalized Double Pareto (Armagan, Dunson & Lee) Normal-Exponenetial-Gamma (Griffen & Brown)

32 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott) Generalized Double Pareto (Armagan, Dunson & Lee) Normal-Exponenetial-Gamma (Griffen & Brown) Bridge - Power Exponential Priors

33 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott) Generalized Double Pareto (Armagan, Dunson & Lee) Normal-Exponenetial-Gamma (Griffen & Brown) Bridge - Power Exponential Priors Properties of Prior?

34 Other Options Range of other scale mixtures used Horseshoe (Carvalho, Polson & Scott) Generalized Double Pareto (Armagan, Dunson & Lee) Normal-Exponenetial-Gamma (Griffen & Brown) Bridge - Power Exponential Priors Properties of Prior?

35 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β φ N(0 p, diag(τ 2 ) ) φ

36 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on τ 2 j λ iid C + (0, λ) β φ N(0 p, diag(τ 2 ) ) φ

37 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on τ 2 j λ iid C + (0, λ) λ C + (0, 1/φ) β φ N(0 p, diag(τ 2 ) ) φ

38 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on τ 2 j λ iid C + (0, λ) λ C + (0, 1/φ) p(α, φ) 1/φ) β φ N(0 p, diag(τ 2 ) ) φ

39 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β φ N(0 p, diag(τ 2 ) ) φ τ 2 j λ iid C + (0, λ) λ C + (0, 1/φ) p(α, φ) 1/φ) In the case λ = φ = 1 and with X t X = I Y = X T Y

40 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β φ N(0 p, diag(τ 2 ) ) φ τj 2 λ iid C + (0, λ) λ C + (0, 1/φ) p(α, φ) 1/φ) In the case λ = φ = 1 and with X t X = I Y = X T Y E[β i Y] = 1 0 (1 κ i )yi p(κ i Y) dκ i = (1 E[κ yi ])yi where κ i = 1/(1 + τi 2 ) shrinkage factor

41 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β φ N(0 p, diag(τ 2 ) ) φ τj 2 λ iid C + (0, λ) λ C + (0, 1/φ) p(α, φ) 1/φ) In the case λ = φ = 1 and with X t X = I Y = X T Y E[β i Y] = 1 0 (1 κ i )yi p(κ i Y) dκ i = (1 E[κ yi ])yi where κ i = 1/(1 + τi 2 ) shrinkage factor Half-Cauchy prior induces a Beta(1/2, 1/2) distribution on κ i a priori

42 Horseshoe Beta(1/2, 1/2) density x

43 Simulation Study with Diabetes Data RMSE OLS LASSO HORSESHOE

44 Other Options Range of other scale mixtures used

45 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee)

46 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α)

47 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α) f (β j ) = 1 2ξ (1 + β j ξα ) (1+α) see

48 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α) f (β j ) = 1 2ξ (1 + β j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η)

49 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α) f (β j ) = 1 2ξ (1 + β j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density)

50 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α) f (β j ) = 1 2ξ (1 + β j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density) See the monomvn package on CRAN

51 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β j GDP(ξ = η/α, α) f (β j ) = 1 2ξ (1 + β j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density) See the monomvn package on CRAN Choice of prior? Properties? Fan & Li (JASA 2001) discuss Variable selection via nonconcave penalties and oracle properties

52 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero)

53 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection)

54 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0

55 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5)

56 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf

57 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem

58 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem Selection part of model uncertainty add prior

59 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection) Bayesian Posterior does not assign any probability to β j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem Selection part of model uncertainty add prior probability that β j = 0 and combine with decision problem

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Hierarchical sparsity priors for regression models

Hierarchical sparsity priors for regression models Hierarchical sparsity priors for regression models arxiv:37.53v [stat.me] Jul 4 J.E. Griffin and P.J. Brown School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury CT 7NF,

More information

On the Half-Cauchy Prior for a Global Scale Parameter

On the Half-Cauchy Prior for a Global Scale Parameter Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Efficient sampling for Gaussian linear regression with arbitrary priors

Efficient sampling for Gaussian linear regression with arbitrary priors Efficient sampling for Gaussian linear regression with arbitrary priors P. Richard Hahn Arizona State University and Jingyu He Booth School of Business, University of Chicago and Hedibert F. Lopes INSPER

More information

arxiv: v2 [stat.me] 27 Oct 2012

arxiv: v2 [stat.me] 27 Oct 2012 The Bayesian Bridge Nicholas G. Polson University of Chicago arxiv:119.2279v2 [stat.me] 27 Oct 212 James G. Scott Jesse Windle University of Texas at Austin First Version: July 211 This Version: October

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Bayesian variable selection and classification with control of predictive values

Bayesian variable selection and classification with control of predictive values Bayesian variable selection and classification with control of predictive values Eleni Vradi 1, Thomas Jaki 2, Richardus Vonk 1, Werner Brannath 3 1 Bayer AG, Germany, 2 Lancaster University, UK, 3 University

More information

Regularization with variance-mean mixtures

Regularization with variance-mean mixtures Regularization with variance-mean mixtures Nick Polson University of Chicago James Scott University of Texas at Austin Workshop on Sensing and Analysis of High-Dimensional Data Duke University July 2011

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

arxiv: v3 [stat.me] 26 Feb 2012

arxiv: v3 [stat.me] 26 Feb 2012 Data augmentation for non-gaussian regression models using variance-mean mixtures arxiv:113.547v3 [stat.me] 26 Feb 212 Nicholas G. Polson Booth School of Business University of Chicago James G. Scott University

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

On the half-cauchy prior for a global scale parameter

On the half-cauchy prior for a global scale parameter On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June

More information

arxiv: v1 [stat.me] 6 Jul 2017

arxiv: v1 [stat.me] 6 Jul 2017 Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

arxiv: v3 [stat.me] 10 Aug 2018

arxiv: v3 [stat.me] 10 Aug 2018 Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

arxiv: v2 [stat.me] 24 Feb 2018

arxiv: v2 [stat.me] 24 Feb 2018 Dynamic Shrinkage Processes Daniel R. Kowal, David S. Matteson, and David Ruppert arxiv:1707.00763v2 [stat.me] 24 Feb 2018 February 27, 2018 Abstract We propose a novel class of dynamic shrinkage processes

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model

More information

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Sparse Estimation and Uncertainty with Application to Subgroup Analysis Sparse Estimation and Uncertainty with Application to Subgroup Analysis Marc Ratkovic Dustin Tingley First Draft: March 2015 This Draft: October 20, 2016 Abstract We introduce a Bayesian method, LASSOplus,

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

arxiv: v2 [stat.me] 16 Aug 2018

arxiv: v2 [stat.me] 16 Aug 2018 arxiv:1807.06539v [stat.me] 16 Aug 018 On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models Ray Bai Malay Ghosh August 0, 018 Abstract We study high-dimensional Bayesian

More information

Regularized Skew-Normal Regression. K. Shutes & C.J. Adcock. March 31, 2014

Regularized Skew-Normal Regression. K. Shutes & C.J. Adcock. March 31, 2014 Regularized Skew-Normal Regression K. Shutes & C.J. Adcock March 31, 214 Abstract This paper considers the impact of using the regularisation techniques for the analysis of the (extended) skew-normal distribution.

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Regularized Skew-Normal Regression

Regularized Skew-Normal Regression MPRA Munich Personal RePEc Archive Regularized Skew-Normal Regression Karl Shutes and Chris Adcock Coventry University, University of Sheffield 24. November 213 Online at http://mpra.ub.uni-muenchen.de/52367/

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Robust Bayesian Simple Linear Regression

Robust Bayesian Simple Linear Regression Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for

More information

Handling Sparsity via the Horseshoe

Handling Sparsity via the Horseshoe Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637

More information

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Consistent change point estimation Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Outline of presentation Change point problem = variable selection

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Models for Structured Sparse Estimation via Set Cover Prior

Bayesian Models for Structured Sparse Estimation via Set Cover Prior Bayesian Models for Structured Sparse Estimation via Set Cover Prior Xianghang Liu,, Xinhua Zhang,3, Tibério Caetano,,3 Machine Learning Research Group, National ICT Australia, Sydney and Canberra, Australia

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Sparsity and the truncated l 2 -norm

Sparsity and the truncated l 2 -norm Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

arxiv: v4 [stat.co] 12 May 2018

arxiv: v4 [stat.co] 12 May 2018 Fully Bayesian Logistic Regression with Hyper-LASSO Priors for High-dimensional Feature Selection Longhai Li and Weixin Yao arxiv:405.339v4 [stat.co] 2 May 208 May 5, 208 Abstract High-dimensional feature

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Self-adaptive Lasso and its Bayesian Estimation

Self-adaptive Lasso and its Bayesian Estimation Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,

More information

Lasso regression. Wessel van Wieringen

Lasso regression. Wessel van Wieringen Lasso regression Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The Netherlands Lasso regression Instead

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 2 2012 Article 5 COMPUTATIONAL STATISTICAL METHODS FOR GENOMICS AND SYSTEMS BIOLOGY Bayesian Sparsity-Path-Analysis of Genetic

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Hierarchical Bayesian Modeling with Approximate Bayesian Computation

Hierarchical Bayesian Modeling with Approximate Bayesian Computation Hierarchical Bayesian Modeling with Approximate Bayesian Computation Jessi Cisewski Carnegie Mellon University June 2014 Goal of lecture Bring the ideas of Hierarchical Bayesian Modeling (HBM), Approximate

More information

MSG500/MVE190 Linear Models - Lecture 15

MSG500/MVE190 Linear Models - Lecture 15 MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least

More information

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Robust Bayesian Regression

Robust Bayesian Regression Readings: Hoff Chapter 9, West JRSSB 1984, Fúquene, Pérez & Pericchi 2015 Duke University November 17, 2016 Body Fat Data: Intervals w/ All Data Response % Body Fat and Predictor Waist Circumference 95%

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals Bioinformatics Advance Access published August 30 015 A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals A. Yazdani 1 D. B. Dunson 1 Human Genetic Center University of

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

A sparse factor analysis model for high dimensional latent spaces

A sparse factor analysis model for high dimensional latent spaces A sparse factor analysis model for high dimensional latent spaces Chuan Gao Institute of Genome Sciences & Policy Duke University cg48@duke.edu Barbara E Engelhardt Department of Biostatistics & Bioinformatics

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems

Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems Andrew Brown 1, Arvind Saibaba 2, Sarah Vallélian 2,3 SAMSI January 28, 2017 Supported by SAMSI Visiting Research Fellowship NSF

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Sparse additive Gaussian process with soft interactions. Title

Sparse additive Gaussian process with soft interactions. Title Sparse additive Gaussian process with soft interactions arxiv:1607.02670v1 [stat.ml] 9 Jul 2016 Garret Vo Department of Industrial and Manufacturing Engineering, Florida State University, Tallahassee,

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Scalable MCMC for the horseshoe prior

Scalable MCMC for the horseshoe prior Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

The RXshrink Package

The RXshrink Package The RXshrink Package October 20, 2005 Title Maximum Likelihood Shrinkage via Ridge or Least Angle Regression Version 1.0-2 Date 2005-10-19 Author Maintainer Depends R (>= 1.8.0), lars Identify and display

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Weakly informative priors

Weakly informative priors Department of Statistics and Department of Political Science Columbia University 23 Apr 2014 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh,

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Scalable Bayesian shrinkage and uncertainty quantification for high-dimensional regression

Scalable Bayesian shrinkage and uncertainty quantification for high-dimensional regression Scalable Bayesian shrinkage and uncertainty quantification for high-dimensional regression arxiv:1509.03697v [stat.me] 19 Apr 017 Bala Rajaratnam Department of Statistics, University of California, Davis

More information

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell. ABSTRACT POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.) Statisticians are often faced with the difficult task of model

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior

More information