Partial factor modeling: predictor-dependent shrinkage for linear regression

Similar documents
Factor model shrinkage for linear instrumental variable analysis with many instruments

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian partial factor regression

Bayes methods for categorical data. April 25, 2017

STA 216, GLM, Lecture 16. October 29, 2007

Or How to select variables Using Bayesian LASSO

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Conjugate Analysis for the Linear Model

Package horseshoe. November 8, 2016

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Nonparametric Bayes tensor factorizations for big data

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Sparse Factor-Analytic Probit Models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Lasso & Bayesian Lasso

Bayesian Multivariate Logistic Regression

Factor Analysis (10/2/13)

arxiv: v1 [stat.me] 6 Jul 2017

CS281 Section 4: Factor Analysis and PCA

Bayesian variable selection and classification with control of predictive values

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

arxiv: v1 [stat.me] 5 Aug 2015

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Coupled Hidden Markov Models: Computational Challenges

Bayesian linear regression

CS229 Lecture notes. Andrew Ng

Gibbs Sampling in Endogenous Variables Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Scalable MCMC for the horseshoe prior

Large-scale Ordinal Collaborative Filtering

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Bayesian non-parametric model to longitudinally predict churn

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Bayesian Linear Regression

ECE521 week 3: 23/26 January 2017

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Horseshoe, Lasso and Related Shrinkage Methods

Consistent high-dimensional Bayesian variable selection via penalized credible regions

November 2002 STA Random Effects Selection in Linear Mixed Models

STA 414/2104, Spring 2014, Practice Problem Set #1

Gibbs Sampling in Linear Models #2

P -spline ANOVA-type interaction models for spatio-temporal smoothing

For more information about how to cite these materials visit

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

4 Bias-Variance for Ridge Regression (24 points)

1 Data Arrays and Decompositions

Stat 5101 Lecture Notes

STA414/2104 Statistical Methods for Machine Learning II

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Bayesian Linear Models

A sparse factor analysis model for high dimensional latent spaces

Bayesian shrinkage approach in variable selection for mixed

Dimension Reduction. David M. Blei. April 23, 2012

Hierarchical Modeling for Univariate Spatial Data

Probabilistic Graphical Models

Multivariate Versus Multinomial Probit: When are Binary Decisions Made Separately also Jointly Optimal?

Contents. Part I: Fundamentals of Bayesian Inference 1

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Advanced Introduction to Machine Learning CMU-10715

Probabilistic Graphical Models

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

ST 740: Linear Models and Multivariate Normal Inference

Bayesian Regression Linear and Logistic Regression

Geometric ergodicity of the Bayesian lasso

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Least Squares Regression

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

A short introduction to INLA and R-INLA

Topic 12 Overview of Estimation

Integrated Non-Factorized Variational Inference

Covariance and Correlation

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Ridge Regression Revisited

Marginal Specifications and a Gaussian Copula Estimation

Least Squares Regression

Variational Principal Components

Foundations of Statistical Inference

Variance prior forms for high-dimensional Bayesian variable selection

Math 533 Extra Hour Material

epub WU Institutional Repository

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Transcription:

modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013

Factor framework The factor framework may be written in two parts: Linear for a scalar response Y i : (Y i X i, β, σ 2 ) N (Xi t β, σ 2 ) Marginal model for a p-dimensional vector of predictor variables X i X i = Bf i + ν i, ν i N (0, Ψ) f i N (0, I k ) where B R p k and Ψ is a diagonal matrix

Factor framework The factor framework may be written in two parts: Linear for a scalar response Y i : (Y i X i, β, σ 2 ) N (Xi t β, σ 2 ) Marginal model for a p-dimensional vector of predictor variables X i X i = Bf i + ν i, ν i N (0, Ψ) f i N (0, I k ) where B R p k and Ψ is a diagonal matrix This work considers modification to a Gaussian factor model, suited for and variable. It differs from previous work on Bayesian variable in that it explicitly accounts for predictor correlation structure

Gaussian factor model Linear for a scalar response Y i : (Y i X i, β, σ 2 ) N (Xi t β, σ 2 ) Marginal model for a p-dimensional vector of predictor variables X i X i = Bf i + ν i, ν i N (0, Ψ) f i N (0, I k ) This paper asks the question: how should the prior of β depend on B and Ψ?

Two extreme answers: (1) Pure linear which ignores the marginal distribution the predictors π(x) π(β B, Ψ) = π(β) (2) Pure factor model where Y i depends linearly on the same k latent factors that captures the covariation in X i π(y i X i, f i, θ) = π(y i θ, f i ) and E(Y i θ, f i) = θf i Intuition: Y i N (θf i, σ 2 ), X i N (Bf i, Ψ), f i N (0, I k ) ( ) ( [ Xi 0 N Y i 0 ] [, BB T + Ψ Bθ T θb T θθ T + σ 2 ]) E(Y i X i ) = θb T (BB T + Ψ) 1 X i This entails that β is a deterministic function β T = θb T (BB T + Ψ) 1 Also, f i = B T (BB T + Ψ) 1 X i, projection of X i onto a k-dimensional subspace

Bayesian linear factor model Factor model for the predictors: X i N (Bf i, Ψ), f i N (0, I k ) Integrating over f i : cov(x i ) Σ X = BB T + Ψ Assuming that the p predictors influence Y i only through the k-dimensional vector f i

Effects of misspecifying k

Effects of misspecifying k By the likelihood criterion the two models are nearly identical. In terms of predicting X 10 the two-factor model is nearly always the best

Idea: Relax the assumption that the latent factors capturing the predictor covariance Σ X are sufficient for predicting the response Y i This is achieved by using the covariance structure where V = (v 1,..., v p ), the 1 p row vector, is not exactly equal to θb T Novelty: Prior for V, conditional on θ, B and Ψ v j N ( {θb T } j, ω 2 wj 2 ψj 2 ) where ω 2 is a global variance, wj 2 is a predictor-specific variance (Carvalho, Polson, Scott 2010), ψj 2 is the diagonal element of Ψ

Hierarchical specification Let Λ = (V θb T )Ψ 1/2. Considering the prior v j N ( {θb T } j, ω 2 w 2 j ψ 2 j ) and the reparameterization given by Λ, we have that λ j N (0, ω 2 w 2 j )

Hierarchical specification Priors for τ, ω, and the individual elements of w, q, t: half-cauchy This corresponds to the horseshoe priors ( Carvalho et al, 2010) over the elements of B, θ and Λ Posterior inference: the model can be fit using a Gibbs sampling approach

Out-of-sample prediction applied example They compare partial factor to five other methods: (1) ridge, (2) partial least square, (3) lasso, (4) principal component and (5) Bayesian factor model using the model prior of Bhattacharya and Dunson (2011)

Problem: With the assumption that the predictors X and the data Y come from a joint normal distribution, the variable problem is related to infer exactly zero entries in the precision matrix Σ 1 X,Y From the partial factor model We can use a spike-and-slab prior over Λ such that Analogous priors are placed on the elements of B and θ

: beyond the linear model It is straightforward to extend the method to a binary or categorical response variable Z i by treating the continuous response Y i as an additional latent variable For instance: if Z i is binary Z i = 1(Y i < 0) where Y i follows the partial factor model called partial factor probit model