Horseshoe, Lasso and Related Shrinkage Methods

Size: px
Start display at page:

Download "Horseshoe, Lasso and Related Shrinkage Methods"

Transcription

1 Readings Chapter 15 Christensen Merlise Clyde October 15, 2015

2 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso

3 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ)

4 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ) β s α, φ, τ, λ N(0, diag(τ 2 )/φ)

5 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ) β s α, φ, τ, λ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ, λ iid Exp(λ 2 /2)

6 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ) β s α, φ, τ, λ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ, λ iid Exp(λ 2 /2) p(α, φ) 1/φ

7 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ) β s α, φ, τ, λ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ, λ iid Exp(λ 2 /2) p(α, φ) 1/φ Can show that β j φ, λ iid DE(λ φ) 0 1 e 1 2 φ β2 s 2πs λ 2 2 e λ 2 s 2 ds = λφ1/2 2 e λφ1/2 β

8 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Y α, β s, φ N(1 n α + X s β s, I n /φ) β s α, φ, τ, λ N(0, diag(τ 2 )/φ) τ , τ 2 p α, φ, λ iid Exp(λ 2 /2) p(α, φ) 1/φ Can show that β j φ, λ iid DE(λ φ) 0 1 e 1 2 φ β2 s 2πs λ 2 2 e λ 2 s 2 ds = λφ1/2 2 e λφ1/2 β Scale Mixture of Normals (Andrews and Mallows 1974)

9 Gibbs Sampling Prior λ 2 Gamma(r, δ)

10 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ)

11 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, )

12 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, ) φ τ, β s, λ, Y G(, )

13 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, ) φ τ, β s, λ, Y G(, ) λ 2 β s, φ, τ 2, Y G(, )

14 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, ) φ τ, β s, λ, Y G(, ) λ 2 β s, φ, τ 2, Y G(, ) 1/τ 2 j β s, φ, λ, Y InvGaussian(, )

15 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, ) φ τ, β s, λ, Y G(, ) λ 2 β s, φ, τ 2, Y G(, ) 1/τ 2 j β s, φ, λ, Y InvGaussian(, ) X InvGaussian(µ, λ) λ 2 f (x) = 2π x 3/2 e 1 λ 2 (x µ) 2 2 µ 2 x x > 0

16 Gibbs Sampling Prior λ 2 Gamma(r, δ) Integrate out α: α Y, φ N(ȳ, 1/(nφ) Full Conditionals β s τ, φ, λ, Y N(, ) φ τ, β s, λ, Y G(, ) λ 2 β s, φ, τ 2, Y G(, ) 1/τ 2 j β s, φ, λ, Y InvGaussian(, ) X InvGaussian(µ, λ) λ 2 f (x) = 2π x 3/2 e 1 λ 2 (x µ) 2 2 µ 2 x x > 0 Homework Nextweek: Derive the full conditionals for β s, φ, 1/τ 2 see

17 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ

18 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation)

19 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation) λ C + (0, 1)

20 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation) λ C + (0, 1) p(α, φ) 1/φ)

21 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation) λ C + (0, 1) p(α, φ) 1/φ) In the case λ = φ = 1 and with canonical representation Y = Iβ + ɛ

22 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation) λ C + (0, 1) p(α, φ) 1/φ) In the case λ = φ = 1 and with canonical representation Y = Iβ + ɛ E[β i Y] = 1 0 (1 κ i )yi p(κ i Y) dκ i = (1 E[κ yi ])yi where κ i = 1/(1 + τi 2 ) shrinkage factor

23 Horseshoe Carvalho, Polson & Scott propose Prior Distribution on β s φ, τ N(0 p, diag(τ 2 ) ) φ τ j λ iid C + (0, λ 2 ) (difference is CPS notation) λ C + (0, 1) p(α, φ) 1/φ) In the case λ = φ = 1 and with canonical representation Y = Iβ + ɛ E[β i Y] = 1 0 (1 κ i )yi p(κ i Y) dκ i = (1 E[κ yi ])yi where κ i = 1/(1 + τi 2 ) shrinkage factor Half-Cauchy prior induces a Beta(1/2, 1/2) distribution on κ i a priori

24 Horseshoe Beta(1/2, 1/2) density x

25 Prior Comparison (from PSC)

26 Bounded Influence Normal means case Y i iid N(βi, 1) (Equivalent to Canonical case) Posterior mean E[β y] = y + d dy log m(y) where m(y) is the predictive denisty under the prior (known λ)

27 Bounded Influence Normal means case Y i iid N(βi, 1) (Equivalent to Canonical case) Posterior mean E[β y] = y + d dy log m(y) where m(y) is the predictive denisty under the prior (known λ) HS has Bounded Influence: lim y d log m(y) = 0 dy

28 Bounded Influence Normal means case Y i iid N(βi, 1) (Equivalent to Canonical case) Posterior mean E[β y] = y + d dy log m(y) where m(y) is the predictive denisty under the prior (known λ) HS has Bounded Influence: lim y d log m(y) = 0 dy lim y E[β y) y (MLE)

29 Bounded Influence Normal means case Y i iid N(βi, 1) (Equivalent to Canonical case) Posterior mean E[β y] = y + d dy log m(y) where m(y) is the predictive denisty under the prior (known λ) HS has Bounded Influence: lim y d log m(y) = 0 dy lim y E[β y) y (MLE) DE is also bounded influence, but bound does not decay to zero in tails

30 R packages The monomvn package in R includes blasso bhs See Diabetes.R code

31 Simulation Study with Diabetes Data RMSE OLS LASSO HORSESHOE

32 Other Options Range of other scale mixtures used

33 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee)

34 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α)

35 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α) f (β s j ) = 1 2ξ (1 + βs j ξα ) (1+α) see

36 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α) f (β s j ) = 1 2ξ (1 + βs j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η)

37 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α) f (β s j ) = 1 2ξ (1 + βs j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density)

38 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α) f (β s j ) = 1 2ξ (1 + βs j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density) See the monomvn package on CRAN

39 Other Options Range of other scale mixtures used Generalized Double Pareto (Armagan, Dunson & Lee) λ Gamma(α, η) then β s j GDP(ξ = η/α, α) f (β s j ) = 1 2ξ (1 + βs j ξα ) (1+α) see Normal-Exponenetial-Gamma (Griffen & Brown 2005) λ 2 Gamma(α, η) Bridge - Power Exponential Priors (Stable mixing density) See the monomvn package on CRAN Choice of prior? Properties? Fan & Li (JASA 2001) discuss Variable selection via nonconcave penalties and oracle properties

40 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero)

41 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage)

42 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0

43 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5)

44 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf

45 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem

46 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem Selection part of model uncertainty add prior

47 Choice of Estimator & Selection? Posterior Mode (may set some coefficients to zero) Posterior Mean (no selection, just shrinkage) Bayesian Posterior does not assign any probability to β s j = 0 selection based on posterior mode ad hoc rule - Select if κ i <.5) See article by Datta & Ghosh http: //ba.stat.cmu.edu/journal/forthcoming/datta.pdf Selection solved as a post-analysis decision problem Selection part of model uncertainty add prior probability that β s j = 0 and combine with decision problem Remember all models are wrong, but some may be useful!

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Hierarchical sparsity priors for regression models

Hierarchical sparsity priors for regression models Hierarchical sparsity priors for regression models arxiv:37.53v [stat.me] Jul 4 J.E. Griffin and P.J. Brown School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury CT 7NF,

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Efficient sampling for Gaussian linear regression with arbitrary priors

Efficient sampling for Gaussian linear regression with arbitrary priors Efficient sampling for Gaussian linear regression with arbitrary priors P. Richard Hahn Arizona State University and Jingyu He Booth School of Business, University of Chicago and Hedibert F. Lopes INSPER

More information

On the Half-Cauchy Prior for a Global Scale Parameter

On the Half-Cauchy Prior for a Global Scale Parameter Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

arxiv: v2 [stat.me] 27 Oct 2012

arxiv: v2 [stat.me] 27 Oct 2012 The Bayesian Bridge Nicholas G. Polson University of Chicago arxiv:119.2279v2 [stat.me] 27 Oct 212 James G. Scott Jesse Windle University of Texas at Austin First Version: July 211 This Version: October

More information

On the half-cauchy prior for a global scale parameter

On the half-cauchy prior for a global scale parameter On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June

More information

Bayesian variable selection and classification with control of predictive values

Bayesian variable selection and classification with control of predictive values Bayesian variable selection and classification with control of predictive values Eleni Vradi 1, Thomas Jaki 2, Richardus Vonk 1, Werner Brannath 3 1 Bayer AG, Germany, 2 Lancaster University, UK, 3 University

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

arxiv: v2 [stat.me] 16 Aug 2018

arxiv: v2 [stat.me] 16 Aug 2018 arxiv:1807.06539v [stat.me] 16 Aug 018 On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models Ray Bai Malay Ghosh August 0, 018 Abstract We study high-dimensional Bayesian

More information

arxiv: v1 [stat.me] 6 Jul 2017

arxiv: v1 [stat.me] 6 Jul 2017 Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

arxiv: v3 [stat.me] 10 Aug 2018

arxiv: v3 [stat.me] 10 Aug 2018 Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,

More information

Scalable MCMC for the horseshoe prior

Scalable MCMC for the horseshoe prior Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics

More information

arxiv: v2 [stat.me] 24 Feb 2018

arxiv: v2 [stat.me] 24 Feb 2018 Dynamic Shrinkage Processes Daniel R. Kowal, David S. Matteson, and David Ruppert arxiv:1707.00763v2 [stat.me] 24 Feb 2018 February 27, 2018 Abstract We propose a novel class of dynamic shrinkage processes

More information

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Sparse Estimation and Uncertainty with Application to Subgroup Analysis Sparse Estimation and Uncertainty with Application to Subgroup Analysis Marc Ratkovic Dustin Tingley First Draft: March 2015 This Draft: October 20, 2016 Abstract We introduce a Bayesian method, LASSOplus,

More information

Handling Sparsity via the Horseshoe

Handling Sparsity via the Horseshoe Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637

More information

arxiv: v3 [stat.me] 26 Feb 2012

arxiv: v3 [stat.me] 26 Feb 2012 Data augmentation for non-gaussian regression models using variance-mean mixtures arxiv:113.547v3 [stat.me] 26 Feb 212 Nicholas G. Polson Booth School of Business University of Chicago James G. Scott University

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Robust Bayesian Regression

Robust Bayesian Regression Readings: Hoff Chapter 9, West JRSSB 1984, Fúquene, Pérez & Pericchi 2015 Duke University November 17, 2016 Body Fat Data: Intervals w/ All Data Response % Body Fat and Predictor Waist Circumference 95%

More information

Bayesian Regression with Undirected Network Predictors with an Application to Brain Connectome Data

Bayesian Regression with Undirected Network Predictors with an Application to Brain Connectome Data Bayesian Regression with Undirected Network Predictors with an Application to Brain Connectome Data Sharmistha Guha and Abel Rodriguez January 24, 208 Abstract This article proposes a Bayesian approach

More information

Robust Bayesian Simple Linear Regression

Robust Bayesian Simple Linear Regression Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for

More information

Hierarchical Bayesian Modeling with Approximate Bayesian Computation

Hierarchical Bayesian Modeling with Approximate Bayesian Computation Hierarchical Bayesian Modeling with Approximate Bayesian Computation Jessi Cisewski Carnegie Mellon University June 2014 Goal of lecture Bring the ideas of Hierarchical Bayesian Modeling (HBM), Approximate

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Regularization with variance-mean mixtures

Regularization with variance-mean mixtures Regularization with variance-mean mixtures Nick Polson University of Chicago James Scott University of Texas at Austin Workshop on Sensing and Analysis of High-Dimensional Data Duke University July 2011

More information

A sparse factor analysis model for high dimensional latent spaces

A sparse factor analysis model for high dimensional latent spaces A sparse factor analysis model for high dimensional latent spaces Chuan Gao Institute of Genome Sciences & Policy Duke University cg48@duke.edu Barbara E Engelhardt Department of Biostatistics & Bioinformatics

More information

Bayesian Models for Structured Sparse Estimation via Set Cover Prior

Bayesian Models for Structured Sparse Estimation via Set Cover Prior Bayesian Models for Structured Sparse Estimation via Set Cover Prior Xianghang Liu,, Xinhua Zhang,3, Tibério Caetano,,3 Machine Learning Research Group, National ICT Australia, Sydney and Canberra, Australia

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015 Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model

More information

Self-adaptive Lasso and its Bayesian Estimation

Self-adaptive Lasso and its Bayesian Estimation Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Flexible Learning on the Sphere Via Adaptive Needlet Shrinkage and Selection

Flexible Learning on the Sphere Via Adaptive Needlet Shrinkage and Selection Flexible Learning on the Sphere Via Adaptive Needlet Shrinkage and Selection James G. Scott McCombs School of Business University of Texas at Austin Austin, TX 78712 James.Scott@mccombs.utexas.edu Original:

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals

A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals Bioinformatics Advance Access published August 30 015 A Hybrid Bayesian Approach for Genome-Wide Association Studies on Related Individuals A. Yazdani 1 D. B. Dunson 1 Human Genetic Center University of

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli

More information

Aspects of Feature Selection in Mass Spec Proteomic Functional Data

Aspects of Feature Selection in Mass Spec Proteomic Functional Data Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 2 2012 Article 5 COMPUTATIONAL STATISTICAL METHODS FOR GENOMICS AND SYSTEMS BIOLOGY Bayesian Sparsity-Path-Analysis of Genetic

More information

Luminosity Functions

Luminosity Functions Department of Statistics, Harvard University May 15th, 2012 Introduction: What is a Luminosity Function? Figure: A galaxy cluster. Introduction: Project Goal The Luminosity Function specifies the relative

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

arxiv: v4 [stat.co] 12 May 2018

arxiv: v4 [stat.co] 12 May 2018 Fully Bayesian Logistic Regression with Hyper-LASSO Priors for High-dimensional Feature Selection Longhai Li and Weixin Yao arxiv:405.339v4 [stat.co] 2 May 208 May 5, 208 Abstract High-dimensional feature

More information

NONPARAMETRIC BAYESIAN DENSITY ESTIMATION ON A CIRCLE

NONPARAMETRIC BAYESIAN DENSITY ESTIMATION ON A CIRCLE NONPARAMETRIC BAYESIAN DENSITY ESTIMATION ON A CIRCLE MARK FISHER Incomplete. Abstract. This note describes nonparametric Bayesian density estimation for circular data using a Dirichlet Process Mixture

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Regularized Skew-Normal Regression. K. Shutes & C.J. Adcock. March 31, 2014

Regularized Skew-Normal Regression. K. Shutes & C.J. Adcock. March 31, 2014 Regularized Skew-Normal Regression K. Shutes & C.J. Adcock March 31, 214 Abstract This paper considers the impact of using the regularisation techniques for the analysis of the (extended) skew-normal distribution.

More information

Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems

Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems Computationally Efficient MCMC for Hierarchical Bayesian Inverse Problems Andrew Brown 1, Arvind Saibaba 2, Sarah Vallélian 2,3 SAMSI January 28, 2017 Supported by SAMSI Visiting Research Fellowship NSF

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Regularized Skew-Normal Regression

Regularized Skew-Normal Regression MPRA Munich Personal RePEc Archive Regularized Skew-Normal Regression Karl Shutes and Chris Adcock Coventry University, University of Sheffield 24. November 213 Online at http://mpra.ub.uni-muenchen.de/52367/

More information

Sparsity and the truncated l 2 -norm

Sparsity and the truncated l 2 -norm Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression Bayesian Analysis (2018) 13, Number 2, pp. 359 383 On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression Joyee Ghosh,YingboLi, and Robin Mitra Abstract. In logistic regression, separation

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

arxiv: v1 [stat.me] 3 Aug 2014

arxiv: v1 [stat.me] 3 Aug 2014 DECOUPLING SHRINKAGE AND SELECTION IN BAYESIAN LINEAR MODELS: A POSTERIOR SUMMARY PERSPECTIVE By P. Richard Hahn and Carlos M. Carvalho Booth School of Business and McCombs School of Business arxiv:1408.0464v1

More information

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression Joyee Ghosh, Yingbo Li, Robin Mitra arxiv:157.717v2 [stat.me] 9 Feb 217 Abstract In logistic regression, separation occurs when

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Andrew Brown 1,2, Arvind Saibaba 3, Sarah Vallélian 2,3 CCNS Transition Workshop SAMSI May 5, 2016 Supported by SAMSI Visiting Research

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Sparse Factor-Analytic Probit Models

Sparse Factor-Analytic Probit Models Sparse Factor-Analytic Probit Models By JAMES G. SCOTT Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. james@stat.duke.edu PAUL R. HAHN Department of Statistical

More information

Simulating Random Variables

Simulating Random Variables Simulating Random Variables Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 23 R has many built-in random number generators... Beta, gamma (also

More information

Sparse additive Gaussian process with soft interactions. Title

Sparse additive Gaussian process with soft interactions. Title Sparse additive Gaussian process with soft interactions arxiv:1607.02670v1 [stat.ml] 9 Jul 2016 Garret Vo Department of Industrial and Manufacturing Engineering, Florida State University, Tallahassee,

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Dirichlet-Laplace priors for optimal shrinkage

Dirichlet-Laplace priors for optimal shrinkage Dirichlet-Laplace priors for optimal shrinkage Anirban Bhattacharya, Debdeep Pati, Natesh S. Pillai, David B. Dunson January 22, 2014 arxiv:1401.5398v1 [math.st] 21 Jan 2014 Abstract Penalized regression

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Consistent change point estimation Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Outline of presentation Change point problem = variable selection

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Bayesian Lasso Binary Quantile Regression

Bayesian Lasso Binary Quantile Regression Noname manuscript No. (will be inserted by the editor) Bayesian Lasso Binary Quantile Regression Dries F Benoit Rahim Alhamzawi Keming Yu the date of receipt and acceptance should be inserted later Abstract

More information

Mixtures of Prior Distributions

Mixtures of Prior Distributions Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF

More information

arxiv: v1 [math.st] 13 Feb 2017

arxiv: v1 [math.st] 13 Feb 2017 Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

The Horseshoe Estimator for Sparse Signals

The Horseshoe Estimator for Sparse Signals The Horseshoe Estimator for Sparse Signals Carlos M. Carvalho Nicholas G. Polson University of Chicago Booth School of Business James G. Scott Duke University Department of Statistical Science October

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information