Partial factor modeling: predictor-dependent shrinkage for linear regression

modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013

Factor framework The factor framework may be written in two parts: Linear for a scalar response Y i : (Y i X i, β, σ 2 ) N (Xi t β, σ 2 ) Marginal model for a p-dimensional vector of predictor variables X i X i = Bf i + ν i, ν i N (0, Ψ) f i N (0, I k ) where B R p k and Ψ is a diagonal matrix This work considers modification to a Gaussian factor model, suited for and variable. It differs from previous work on Bayesian variable in that it explicitly accounts for predictor correlation structure

Gaussian factor model Linear for a scalar response Y i : (Y i X i, β, σ 2 ) N (Xi t β, σ 2 ) Marginal model for a p-dimensional vector of predictor variables X i X i = Bf i + ν i, ν i N (0, Ψ) f i N (0, I k ) This paper asks the question: how should the prior of β depend on B and Ψ?

Two extreme answers: (1) Pure linear which ignores the marginal distribution the predictors π(x) π(β B, Ψ) = π(β) (2) Pure factor model where Y i depends linearly on the same k latent factors that captures the covariation in X i π(y i X i, f i, θ) = π(y i θ, f i ) and E(Y i θ, f i) = θf i Intuition: Y i N (θf i, σ 2 ), X i N (Bf i, Ψ), f i N (0, I k ) ( ) ( [ Xi 0 N Y i 0 ] [, BB T + Ψ Bθ T θb T θθ T + σ 2 ]) E(Y i X i ) = θb T (BB T + Ψ) 1 X i This entails that β is a deterministic function β T = θb T (BB T + Ψ) 1 Also, f i = B T (BB T + Ψ) 1 X i, projection of X i onto a k-dimensional subspace

Bayesian linear factor model Factor model for the predictors: X i N (Bf i, Ψ), f i N (0, I k ) Integrating over f i : cov(x i ) Σ X = BB T + Ψ Assuming that the p predictors influence Y i only through the k-dimensional vector f i

Effects of misspecifying k

Effects of misspecifying k By the likelihood criterion the two models are nearly identical. In terms of predicting X 10 the two-factor model is nearly always the best

Idea: Relax the assumption that the latent factors capturing the predictor covariance Σ X are sufficient for predicting the response Y i This is achieved by using the covariance structure where V = (v 1,..., v p ), the 1 p row vector, is not exactly equal to θb T Novelty: Prior for V, conditional on θ, B and Ψ v j N ( {θb T } j, ω 2 wj 2 ψj 2 ) where ω 2 is a global variance, wj 2 is a predictor-specific variance (Carvalho, Polson, Scott 2010), ψj 2 is the diagonal element of Ψ

Hierarchical specification Let Λ = (V θb T )Ψ 1/2. Considering the prior v j N ( {θb T } j, ω 2 w 2 j ψ 2 j ) and the reparameterization given by Λ, we have that λ j N (0, ω 2 w 2 j )

Hierarchical specification Priors for τ, ω, and the individual elements of w, q, t: half-cauchy This corresponds to the horseshoe priors ( Carvalho et al, 2010) over the elements of B, θ and Λ Posterior inference: the model can be fit using a Gibbs sampling approach

Out-of-sample prediction applied example They compare partial factor to five other methods: (1) ridge, (2) partial least square, (3) lasso, (4) principal component and (5) Bayesian factor model using the model prior of Bhattacharya and Dunson (2011)

Problem: With the assumption that the predictors X and the data Y come from a joint normal distribution, the variable problem is related to infer exactly zero entries in the precision matrix Σ 1 X,Y From the partial factor model We can use a spike-and-slab prior over Λ such that Analogous priors are placed on the elements of B and θ

: beyond the linear model It is straightforward to extend the method to a binary or categorical response variable Z i by treating the continuous response Y i as an additional latent variable For instance: if Z i is binary Z i = 1(Y i < 0) where Y i follows the partial factor model called partial factor probit model