Prior Distributions for the Variable Selection Problem

Size: px
Start display at page:

Download "Prior Distributions for the Variable Selection Problem"

Transcription

1 Prior Distributions for the Variable Selection Problem Sujit K Ghosh Department of Statistics North Carolina State University ghosh/ ghosh@stat.ncsu.edu Overview The Variable Selection Problem(VSP) A Bayesian Framework Choice of Prior Distributions Illustrative Examples Conclusions Bayesian Statistics Working Group, NCSU Disclaimer: This talk is not entirely based on my own research work Sujit Ghosh, October 3, Sujit Ghosh, October 3, The Variable Selection Problem Consider the following canonical linear model: y = Xβ + ɛ (1) where ɛ N n (0,σ 2 I)andβ =(β 1,...,β p ) T (X is an n p matrix) Under the above model, suppose also that only an unknown subset of the coefficients β j s are nonzero The problem of variable selection is to identify this unknown subset. Notice that the above canonical framework can be used to address many other problems of interest including multivariate polynomial regression and nonparametric function estimation Suppose the true data generating process (DGP) is given by y = X 0 β 0 + ɛ, (2) where β 0 =(β 0 1,...,β 0 p 0 ) T, X 0 is n p 0 and WLOG assume that X =(X 0 X 1 ) T and p p 0 1 (i.e., X 1 is n (p p 0 )) The LSE of β and σ 2 are given by ˆβ = (X T X) X T y (3) ˆσ 2 = y T (I P X )y/(n r) where r =rank(x) min(n, p), P X = X T (X T X) X T is the projection matrix and (X T X) is a g-inverse of X T X.Then Lemma: E[ˆβ] =((X T 0 X 0 ) X T 0 X 0 β 0, 0) T and E[X ˆβ] =X 0 β 0. Further, E[ ˆσ 2 ]=σ 2 for any g-inverse of X T X. In particular, if rank(x 0 )=p 0,thenE[ˆβ] =(β 0, 0) T. Sujit Ghosh, October 3, Sujit Ghosh, October 3,

2 VSP, contd. The variable selection problem is a special case of the model selection problem Each model under consideration corresponds to a distinct subset of x 1,...,x p (Geweke, 1996) The model (1) can be generalized to include discrete responses in terms of first two moments: E[y X] = g(xβ) V[y X] = Σ(X, β, φ), (4) where g( ) is a suitable link function and Σ( ) isann n covariance matrix which may depend on additional parameters φ Typically, a single model class is simply applied to all possible subsets so that all reduced models are nested under the full model VSP, contd. A common strategy for this VSP has been to select a model that minimizes a penalized sum of squares (PSS) criterion by a constraint optimization method (but why?) More specifically, if δ =(δ 1,...,δ p ) T denotes the indicator of the inclusion (δ j = 1) and exclusion (δ j =0)ofthevariablex j for j =1,...,p, then a PSS criterion would pick a δ {0, 1} p (and also β R p ) that minimizes PSS(β, δ) = y XD(δ)β 2 /nσ 2 + J(β, δ), (5) where D(δ) is the diagonal matrix with diagonal δ and J( ) denotes a suitable penalty function The choice of the penalty J( ) is crucial and can be shown to be equivalent to the choice of a prior distribution Sujit Ghosh, October 3, Sujit Ghosh, October 3, VSP, contd. A number of popular criteria correspond to (5) with different choices for J( ) J(β, δ) =λ(p, n) p j=1 δ j (notice p j=1 δ j = #non-zero β j s) λ(p, n) = 2 yields C p (Mallows, 1973) and AIC (Akaike, 1973) λ(p, n) =logn yields BIC (Schwarz, 1978) and λ(p, n) =2logp yields RIC (Foster and George, 1994) The C MML criteria (George and Foster, 2000) estimates λ(p, n) based on marginal maximum likelihood (MML) using an empirical Bayes framework. J(β, δ) =2 p j=1 δ j log(p/j), Benjamini and Hochberg (1995) Notice that all of the above penalties do not involve β and these are generally a function of δ (and n) Recent attempts have been to define penalties in terms of β J(β, δ) =λ p j=1 β j q, q 1 yields bridge regression (Frank and Friedman, 1993). Only q = 1 yields sparse solution among all q 1 (Fan and Li, 2001) q = 2 yields ridge regression q = 1 yields LASSO (Tibshirani, 1996) p J(β, δ) =λ 1 j=1 β p j + λ 2 j=1 β2 j yields Elastic Net (Zhou and Hastie, 2005) J(β, δ) =λ 1 ( p j=1 β j + λ 2 j<k max{ β j, β k }) yields OSCAR (Bondell and Reich, 2006) Thus, a general strategy would be to define a penalty function that would involve both δ and β We will consider this as priors: π(β, δ) exp{ J(β, δ)} Sujit Ghosh, October 3, Sujit Ghosh, October 3,

3 The full hierarchical Bayes model: A Bayesian Framework y β, δ,σ 2 N n (XD(δ)β,σ 2 I n ) (β, δ) σ 2 π(β, δ σ 2 ) exp{ J(β, δ)/σ 2 } (6) σ 2 π 0 (σ 2 ) ( e.g., IG(a 0,b 0 )) Given a loss function, L(θ, a) we can obtain (in theory) the Bayes estimator by minimizing the posterior expected loss, E[L(θ, a) y, X] = L(θ, a)π(θ y, X) dθ (7) wrt to a = a(y, X) whereθ =(β T, δ T,σ 2 ) T Which prior distributions? What loss functions? Can we even do optimization for a given prior distribution and loss function? Bayesian Framework, contd. A pure subjective point of view for prior selection is problematic for the VSP It is rather unrealistic to assume that uncertainty can be meaningfully described given the huge number ( 2 p 1 )and complexity of unknown model parameters. A common and practical approach has been to construct noninformative, semi-automatic formulation in this context. Roughly speaking the goal would be to specify priors that allow the posteriors model probabilities to accumulate near the true model (via some form of sparseness and smoothing) Unfortunately, there are no universally preferred method to construct such semi-automatic priors! (isn t that nice?) Sujit Ghosh, October 3, Sujit Ghosh, October 3, Bayesian Framework, contd. The choice of loss function, although mostly overlooked, is also crucial (different loss functions lead to different estimates) In general, suppose the true DGP is: (y, x) m 0 (y x)g 0 (x) Consider a model: (y, x) m(y x)g(x) where m(y x) = f(y x, θ)π(θ)dθ with sampling density f and prior π The Kullback-Liebler discrepancy between the DGP and model canbewrittenas: K(m 0 g 0,mg) = K(m 0,m x)g 0 (x)dx + K(g 0,g) where K(m 0,m x) = m 0 (y x)log m 0(y x) dy (8) m(y x) const. + 1 n ( ) log f(y i x i, θ)π(θ)dθ n i=1 Bayesian Framework, contd. Notice that if ˆθ denotes a MAP estimator then 1 n ( ) log f(y i x i, θ)π(θ)dθ n i=1 1 n ( log f(y i x i, n ˆθ) ) +( log π(ˆθ)) i=1 When y x follows a canonical normal linear model the above criteria is equivalent to the PSS criteria (5) Thus, J(θ) = log(π(θ)) emerges as a choice of the penalty function up to some multiplicative constant (see slide# 8) Hence the choice of a penalty function is equivalent to the choice of a prior distribution (including improper distributions in some cases) Sujit Ghosh, October 3, Sujit Ghosh, October 3,

4 Choice of Prior distributions We are not generally confident about any given set of predictors and hence little prior information on D(δ)β and σ 2 can be expected for each δ For each δ it is desirable to have some default priors for D(δ)β and σ 2 Unfortunately default priors for normal linear models are generally improper Nonobjective (conjugate) priors for β are typically centered at 0, making the model with no predictors as the null model within a testing of hypothesis set up The goal is to select a prior (and hence penalty function) that is criterion-based and fully automatic More generally we can think of constructing priors (and hence penalties) that may also depend on the design matrix X Zellner s prior: β δ,σ 2, X N p (β 0, σ2 g (XT X) ), σ 2 IG(a 0,b 0 )andδ = 1 w.p. 1. Here β 0,g,a 0 and b 0 need to be specified by the user (or estimated using either an EB or a HB procedure) Extensions of Zellner priors: β δ,σ 2, X N p (β 0, σ2 g (X(δ)T X(δ)) ), σ 2 IG(a 0,b 0 )and δ Unif({0, 1} p )wherex(δ) =XD(δ). Almost same as above but δ q P j δj (1 q) p P j δj The advantage of Zellner-type priors is the closed form suitable for rapid computations over large parameter space for δ. Sujit Ghosh, October 3, Sujit Ghosh, October 3, In general we may consider the following independence prior: p π I (δ) = q δj j (1 q j) 1 δj (9) j=1 q j =Pr[δ j = 1] is the inclusion probability of the j-th variable Small q j canbeusedtodownweightthej-th variable Notice that when q j 0.5, models with size p/2 getmoreweight Alternatively, assuming q j = q for all j, onemayuseaprior q Beta(a, b) to obtain the exchangeable prior: π E (δ) =B(a + j δ j,b+ p j δ j)/b(a, b) where B(a, b) is the Beta function. Notice that components of δ are exchangeable but not independent under the previous prior Independent and exchangeable priors on δ may be less satisfactory when the models contain dependent components (e.g., interactions, polynomials, lagged or indicator variables) Consider 3 variables with main effects x 1,x 2,x 3 and three two-factor interactions x 1 x 2,x 2 x 3,x 1 x 3. The importance of the interactions such as x 1 x 2 will often depend only on whether the main effects x 1 and x 2 are included in the model This can be expressed by a prior for δ =(δ 1,...,δ 13 ) of the form π(δ) = 3 j=1 π(δ j) j<k π(δ jk δ j,δ k ) where π(δ jk δ j,δ k ) would require specifying four probabilities one for each pair (δ j,δ k ). E.g., π(δ 12 0, 0) <π(δ 12 0, 1),π(δ 12 1, 0) <π(δ 12 1, 1) Sujit Ghosh, October 3, Sujit Ghosh, October 3,

5 The number of possible models grows exponentially as the number of interactions, polynomials, lagged variables increases In contrast to independent priors of the form (9), priors for dependent components models concentrate mass on plausible models, when the number of possible models is huge This can be crucial in applications such as screening designs, where p>>n(see Chipman, Hama and Wu, 1997) Another limitation of independence priors on δ is their failure to account for covariate collinearity This problem can be resolved by using the so-called dilution priors (George, 1999) A general form of dilution prior can be written as π D (δ) =h(det(x(δ) T X(δ)))π I (δ) Having little prior information on the variables, objective model selection methods are necessary Spiegelhalter and Smith (1982): improper priors used conventional improper priors for β used a pseudo-bayes Factor for inference Mitchell and Beauchamp (1988): spike-and-slab priors β j δ j (1 δ j )δ 0 + δ j Unif( a j,a j ) variable selection problem is solved as an estimation problem Berger and Pericchi (1996): Intrinsic Priors developed a fully automatic prior used intrinsic Bayes factor for inference based on a model encompassing approach Sujit Ghosh, October 3, Sujit Ghosh, October 3, Yuan and Lin (2006) have recently proposed the use of the following dilution prior: π(δ) =q P j δj (1 q) p P j δj det(x(δ) T X(δ)) The main idea behind this prior is to replace a set of highly correlated variables by one of the variable in that set Suppose β j δ j (1 δ j )δ 0 + δ j DE(0,τ)whereDE(0,τ) denotes a double-exponential distribution with density τ exp{ τ β } and δ 0 a distribution with point mass at 0 Yuan and Lin (2005) have shown that if one sets q =(1+τσ π/2) 1 then the model with highest posterior probability is approximately equivalent to LASSO with λ =2σ 2 τ A Gibbs sampling scheme is also presented (seminar on Oct 31st!) Another recent attempt to construct automatic priors has been made by Casella and Moreno (2006) Their proposed methodology is Criterion based: provides clear understanding of properties of selected models Automatic: No tuning parameter (hyperparameter) selection is required Formally carries out the hypothesis tests: H 0 : δ = δ vs. H a : δ = 1 p where δ {0, 1} p but δ 1 p, i.e. tests the null hypothesis of a reduced model verses the full model (this is in sharp contrast to other conjugate prior approaches) Sujit Ghosh, October 3, Sujit Ghosh, October 3,

6 The test of hypothesis is carried out using posterior model probabilities: Pr[δ y, X] = = m(y δ, X) m(y 1 p, X)+ δ 1 p m(y δ, X) BF δ,1 p (y, X) 1+ δ 1 p BF δ,1 p (y, X) where BF δ,1 p (y, X) denotes the Bayes factor for testing H 0 : δ = δ vs. H a : δ = 1 p The fact the every posterior model probability has the same denominator facilitates rapid computation The use of intrinsic priors overcomes the problem of using improper priors while computing the Bayes factors Illustrative Examples A simulation study adopted from Casella and Moreno (2006): Full Model: y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x β 4 x ɛ where ɛ N(0,σ 2 ) and True DGP1: y =1+x 1 + ɛ, x 1,x 2 Unif(0, 10),σ =2 True DGP2: y =1+x 1 +2x 2 + ɛx 1,x 2 Unif(0, 10),σ =2 Asampleofn = 10 is generated and posterior model probabilities are computed for all 2 4 =16models. The procedure was repeated 1000 times and compared with Mallow s C p Sujit Ghosh, October 3, Sujit Ghosh, October 3, Examples, contd. Consider the ancient Hald data (Casella and Moreno, 2006) Measures the effect of heat on composition of cement n = 13 observations on heat (y) and four cement compositions (x j s, j =1,...,4) are available ( 2 4 =16models) Historically, it is known that the subset {x 1,x 2 } is most preferred by earlier analysis subsets Pr[δ y, X] subsets Pr[δ y, X] x 1,x x 1,x 3,x x 1,x x 2,x 3,x x 1,x 2,x x 1,x 2,x 3,x x 1,x 2,x x 3,x All other models have posterior probabilities < 10 5 Sujit Ghosh, October 3,

7 Examples, contd. Based on R 2, Draper and Smith (1981, Sec. 6.1) also concluded in favour of the top two models with preference to {x 1,x 4 } as x 4 is the single best predictor Although {x 1,x 2,x 4 } had a high R 2 the variable x 4 was excluded as Corr(x 2,x 4 )= Interestingly, George and McCulloch (1993) analyzed this data and favored the model with no predictors (δ = 0 4 ) followed by the model with one predictor George and McCulloch (1992) s stochastic search algorithm visited the model {x 1,x 2 } less than 7% of the time! This could be because of considering the no predictors as the null model in all comparisons Their methods were sensitive to the choice of priors for β and δ Examples, contd. C & M (2006) also considered the 10-predictor variables model: y = β β j x j + j=1 3 j=1 η j x 2 j + j<k η jk x j x k + η 123 x 1 x 2,x 3 + ɛ The true DGP2 was used to simulate the data and a stochastic search with intrinsic prior was used to estimate posterior model probabilities A total of 10 4 MCMC samples were generated Exact posterior model probabilities for all 2 10 =1, 024 models were also computed The entire procedure was repeated 1,000 times with n =10 Two values σ =2, 5 were used Sujit Ghosh, October 3, Sujit Ghosh, October 3, Examples, contd. Model Pr[δ y] MCMC visits σ = 2 (exact) (ssvs) x 1,x x 1,x 2,x x 1,x 2,x 1 x x 1,x 2,x σ =5 x 1,x x 1,x x x 2 1,x Conclusions Variable selection can be considered as a multiple testing problem in which we test whether any reduction in complexity of the full model is plausible Default priors typically used for model parameters are improper, and thus they are not suitable for computing model posterior probabilities The commonly used vague priors, (as a limit of conjugate priors) is typically an ill-defined notion Intrinsic priors are well defined, depend on sampling density and do not require the choice of tuning parameters Intrinsic prior for full model parameters is centered at the reduced model and has heavy tails Sujit Ghosh, October 3, Sujit Ghosh, October 3,

8 Conclusions, contd. The role of the SSVS is different from estimating a posterior distribution The goal is to find good models rather than estimating the modes accurately However determining how many MCMC runs to be carried out is a complex issue Rigorous evaluation of SSVS in terms of convergence and mixing is very difficult and might be worth more exploration Open problems: Given two priors (or equivalently penalty functions), how one would rigorously choose a model/method for VSP? Can the computational cost be factored in the loss the function? THANKS! All references mentioned in this talk and many more are available online Sujit Ghosh, October 3, Sujit Ghosh, October 3,

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Objective Bayesian Variable Selection

Objective Bayesian Variable Selection George CASELLA and Elías MORENO Objective Bayesian Variable Selection A novel fully automatic Bayesian procedure for variable selection in normal regression models is proposed. The procedure uses the posterior

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

MODEL AVERAGING by Merlise Clyde 1

MODEL AVERAGING by Merlise Clyde 1 Chapter 13 MODEL AVERAGING by Merlise Clyde 1 13.1 INTRODUCTION In Chapter 12, we considered inference in a normal linear regression model with q predictors. In many instances, the set of predictor variables

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

penalized joint log-likelihood procedure with an adaptive penalty for the selection and estimation of the fixed and random effects.

penalized joint log-likelihood procedure with an adaptive penalty for the selection and estimation of the fixed and random effects. ABSTRACT KRISHNA, ARUN. Shrinkage-Based Variable Selection Methods for Linear Regression and Mixed-Effects Models. (Under the direction of Professors H. D. Bondell and S. K. Ghosh). In this dissertation

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

1 Model Search, Selection, and Averaging.

1 Model Search, Selection, and Averaging. ISyE8843A, Brani Vidakovic Handout 15 There are no true statistical models. 1 Model Search, Selection, and Averaging. Although some model selection procedures boil down to testing hypotheses about parameters

More information

Posterior Model Probabilities via Path-based Pairwise Priors

Posterior Model Probabilities via Path-based Pairwise Priors Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17 Model selection I February 17 Remedial measures Suppose one of your diagnostic plots indicates a problem with the model s fit or assumptions; what options are available to you? Generally speaking, you

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Variance prior forms for high-dimensional Bayesian variable selection

Variance prior forms for high-dimensional Bayesian variable selection Bayesian Analysis (0000) 00, Number 0, pp. 1 Variance prior forms for high-dimensional Bayesian variable selection Gemma E. Moran, Veronika Ročková and Edward I. George Abstract. Consider the problem of

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

tion to derive approximate posterior model probabilities for additive models. We use the concept of regular and nonregular models to reduce the size

tion to derive approximate posterior model probabilities for additive models. We use the concept of regular and nonregular models to reduce the size ABSTRACT CURTIS, STEVEN M. Variable Selection Methods with Applications to Shape Restricted Regression. (Under the direction of Professors S. Ghosal and S. K. Ghosh). This dissertation consists of four

More information

Mixtures of g Priors for Bayesian Variable Selection

Mixtures of g Priors for Bayesian Variable Selection Mixtures of g Priors for Bayesian Variable Selection Feng LIANG, RuiPAULO, GermanMOLINA, Merlise A. CLYDE, and Jim O. BERGER Zellner s g prior remains a popular conventional prior for use in Bayesian variable

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection Prof. Nicholas Zabaras University of Notre Dame Notre Dame, IN, USA Email:

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Problem statement. The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + β j X ij + σz i, Z i N(0, 1),

Problem statement. The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + β j X ij + σz i, Z i N(0, 1), Problem statement The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + p j=1 β j X ij + σz i, Z i N(0, 1), Equivalent form: Normal mean problem (known σ) Y i =

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression Andrew A. Neath Department of Mathematics and Statistics; Southern Illinois University Edwardsville; Edwardsville, IL,

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection Mixtures of g-priors for Bayesian Variable Selection Feng Liang, Rui Paulo, German Molina, Merlise A. Clyde and Jim O. Berger August 8, 007 Abstract Zellner s g-prior remains a popular conventional prior

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell. ABSTRACT POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.) Statisticians are often faced with the difficult task of model

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Mixtures of g-priors for Bayesian Variable Selection

Mixtures of g-priors for Bayesian Variable Selection Mixtures of g-priors for Bayesian Variable Selection January 8, 007 Abstract Zellner s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency

More information

arxiv: v1 [stat.me] 6 Jul 2017

arxiv: v1 [stat.me] 6 Jul 2017 Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection Sung Y. Park CUHK Simple dynamic models A typical simple model: y t = α 0 + α 1 y t 1 + α 2 y t 2 + x tβ 0 x t 1β 1 + u t, where y t

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Handling Sparsity via the Horseshoe

Handling Sparsity via the Horseshoe Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information