STA 216, GLM, Lecture 16 October 29, 2007
Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural Equation Models
How can we do efficient computation? Efficient posterior computation in factor analysis models is very challenging Typical Gibbs sampler can be subject to extreme slow-mixing Centering does not provide complete solution - can only center one measurement for each latent variable & eliminates conjugacy unless prior non-exchangeable What to do?
Parameter Expansion (PX) Originally proposed as a method for speeding up convergence of EM algorithm Redundant parameters are carefully introduced to allow faster convergence of EM & better mixing Gibbs samplers The idea is to induce the redundant parameters in such a way as to avoid changing the target distribution in the MCMC algorithm Hence, the posterior is not changed, but one reduces autocorrelation.
PX in Hierarchical Models It is very difficult to obtain a PX-accelerated Gibbs sampler in general cases Hard to avoid changing the target distribution Gelman (2005, Bayesian Analysis) proposes to use PX to induce a better prior, while also speeding up mixing in the setting of variance component models Ghosh & Dunson (2007) extend to factor analysis models
Homework Exercise Propose a PX Gibbs sampler for the sperm concentration latent factor regression model from lecture 15 Simulate data under the model and compare the PX Gibbs sampler to a typical Gibbs sampler without PX Due - next Friday
What if our data are categorical? The above models assume that the different elements of y i are continuous and normally distributed In most settings in which factor analysis models are used, at least some of the elements of y i are instead ordered categorical It is appealing to have a general framework for modeling of correlated measurements having different scales (continuous, binary, ordinal)
Underlying Normal Models To solve this problem, we can considering the following modification to the measurement model: y ij = g j (y ij; τ j ), j = 1,..., p y i = µ + Λη i + ɛ i, ɛ i N 3 (0, Σ), Here, y i are the observed variables & yi are normal latent variables underlying y i g j ( ; τ j ) = link function possibly involving threshold parameters τ j
Link functions in underlying normal models For continuous items (i.e., y ij is continuous), g j is chosen as the identity link For binary items (y ij {0, 1}), we choose a threshold link y ij = 1(y ij > 0) For ordered categorical items (y ij {1,..., c j }), we generalize the binary case to let y ij = c j l=1 = l 1(τ j,l 1 < y ij τ jl ), with τ j0 =, τ j1 = 0, τ j,cj =.
Some Comments Note that we are using an underlying multivariate normal model to characterize dependence in observations having a variety of scales In factor analysis, the dependence is induced through shared dependence on the latent factors Posterior computation is straightforward using a data augmentation Gibbs sampler, which imputes the y ij from their truncated normal full conditional distributions. Other sampling steps proceed as if yij were observed data
Generalized Latent Trait Models Underlying normal specification induces normal linear models on the continuous items & probit-type models on categorical items Structure is restrictive - may prefer to use a different GLM for each item, while allowing dependence Replace underlying normal measurement model with generalized latent trait model (GLTM): η ij = µ j + λ jξ i, where η ij =linear predictor in GLM for outcome type j, ξ i = (ξ i1,..., ξ ir ) =vector of latent traits
Comments on GLTMs GLTMs allow modeling also with count outcomes & for more flexible models for the individual items (e.g., logistic, complementary log-log, etc instead of just probit) Important - latent traits impact both dependence in the different elements of y i & lack of fit in the individual item GLMs Harder to fit such models routinely, though adaptive rejection sampling & other tricks possible
Dangers of GLTMs Dual role of latent variable component in accommodating dependence & lack of fit individual item links creates problems in interpretation Consider the case in which y ij is a 0/1 indicator of a disease, with i indexing family of j indexing individual within a family Following model commonly used to assess within-family dependence in probability of disease logit { Pr(y ij = 1 x i, β, ξ i ) } = x iβ + ξ i, ξ i N(0, ψ)
Application to Genetic Epidemiology Studies ξ i = difference in the log-odds of disease for family i relative to the population average Such differences among families are commonly attributable to genetic effects The estimated value of ψ is used to infer the magnitude of the genetic component Diseases having small ψ will exhibit limited within-family dependence & hence should have a small genetic component
Genetic Epidemiology Example (continued) Anything wrong with this interpretation? It has been shown that one can identify the fixed effects, β, & the random effects variance, ψ even if data are only available for a single individual per family. How can this be?? Random effect included to allow within-family dependence with a single individual per family no need for a random effect?
Punch Line We have identifiability even with a single individual per family because induced link function is no longer logistic In particular, we have a logistic-normal link function: Pr(y i = 1 x i, β, ψ) = g ψ (x iβ) ( = {1 + exp( x iβ + ξ i )} 1 (2πψ) 1/2 exp 1 ) 2ψ ξ2 i dξ i, Shape of the link function varies as ψ varies, so we can estimate β, ψ even with a single subject per family
Some Further Comments Is ψ interpretable as a genetic heterogeneity in this case? What if we have a few families with multiple individuals, and many with a single individual? Answer: ψ measures both lack-of-fit in the logistic link & heterogeneity among families. To obtain reliable inferences on genetic heterogeneity, you should use a flexible link function
Some General Comments about GLTMs Used simple random intercept genetic epi example as illustration Need to worry about these issues just as much in more complex settings involving multivariate outcomes having different scales Normal & underlying normal more robust to such issues, since one does not change the link in marginalizing out latent variables To allow additional flexibility, one can work within underlying normal family - e.g., using scale mixtures of normals.
Structural Equation Models (Bollen, 1989) When interest focuses on modeling of relationships among latent traits, factor analysis needs to be extended Structural Equation Models (SEMs) provide a broader framework Specified in two components: (1) measurement model relating observed variables to latent traits; (2) structural model characterizing joint distribution of latent traits
Structural Equation Models (Bollen, 1989) The measured data consist of a vector of response variables, y i = (y i1,..., y ip ), & a vector of predictor variables, x i = (x i1,..., x iq ). Measurement model: y i = µ y + Λ y η i + ɛ y i, ɛy i N p(0, Σ y ) x i = µ x + Λ x ξ i + ɛ x i, ɛ x i N p (0, Σ x ) Just two separate factor analysis models of y i and x i components
Linear Structural Relations (LISREL) model LISREL model: η i = Bη i + Γξ i + δ i, ξ i N s (0, I s ), δ i N r (0, I r ). Describes joint distribution and association among latent variables B has zeros along diagonals - putting η i standard notation in LISREL model (typically, B has upper triangular elements = 0) Γ=often parameters of primary interest - characterize association among latent predictor and response variables