Multiple Group Analysis. Structural Equation Models. Interactions in SEMs. Multiple Group Analysis

Size: px

Start display at page:

Download "Multiple Group Analysis. Structural Equation Models. Interactions in SEMs. Multiple Group Analysis"

Matthew Stafford
6 years ago
Views:

1 Multiple Group Analysis Structural Equation Models Multiple Group Analysis Klaus Kähler Holst Esben Budtz-Jørgensen Department of Biostatistics, University of Copenhagen November 5, 0 Multiple group analysis etends the structural equation model framework to easily Comparing Structural Equation Models across groups (interactions) Combining data sets Regression modeling with eternally estimated parameters Interactions in SEMs Multiple Group Analysis Interactions in SEMs: Between covariates: Include product terms between covariates Between latent variables: η3 = β η + β η + β3 η η + ζ3 Model is not linear in variables. Non-linear SEMs not available in standard software. Between categorical covariates and latent variables: Special case of multiple group analysis. Between continuous covariates and latent variables: Random slopes (growth models). i = (i,..., iq) t : covariates of subject i. yi = (yi,..., yip) t : response variables of subject i. Parameters may depend on a group variable g =,..., G The measurement part: ɛi N(0, Ω g ) The structural part: ζi N(0, Ψ g ) yi = ν g + Λ g ηi + K g i + ɛi ηi = α g + B g ηi + Γ g i + ζi

2 Density Y Y Multiple Group Analysis Likelihood For fied covariates the outcomes are assumed to follow a multivariate normal distribution y y y3 v v v λ λ 3 u ζ Females y y y3 v v v λ λ 3 u ζ Males The distribution is completely characterized by its density distribution, fθ, which is completely characterized by its mean and variance. β β The MLE principal We find the parameters for which the observed data is most likely to be observed n θmle = arg ma θ L(θ; y, ) = fθ(yi, ηi i) dηi i= R l Likelihood In a simple linear regression model we have yi = β0 + βi + ɛi, ɛi N (0, σ ). Multiple Group Analysis Multiple Group Analysis Model defined additively from G different (structural equation) models with likelihood-functions Lg, g =,..., G G log L(θ; y, ) = log Lg(θg, yg, g) g= Density: σ µ σ ( ) f(y ; θ) = ep (y β0 β) πσ σ Likelihood: n ( n i= L(θ; y, ) = f(yi i; θ) = (πσ ep ) n/ i= (Yi β0 βxi) σ ) with parameter (non-linear) constraints across different parameters θg, g =,..., G. > e <- estimate(list(m,m,m3,...),list(d,d,d3,...)) > e <- estimate(rep(list(m),5), split(d,d$))

3 Multiple Group Analysis, lava synta > e <- estimate(list(m,m,m3,...),list(d,d,d3,...)) > e <- estimate(rep(list(m),5), split(d,d$)) Multiple Group Analysis, Eample > m <- lvm() > regression(m) <- c(y,y,y3) ~ u 3 > regression(m) <- u ~ 4 > latent(m) <- ~u 5 > plot(m) Free parameters are unique for each group Named parameters are fied across groups m <- baptize(m): Label all parameters parameter(m) <- alpha+beta: Add parameters to model y y y3 u If data is repeated across groups the cluster argument must be used (GEE type variance estimates). Eamining the model Eamining the model plot: shows the path diagram regression, covariance, intercept, constrain: show parameter restrictions summary: Prints an overview of adjacecy and covariance matrices coef(m): show parameter names subset(m, y+y+): Etract sub-model m%+%m%+%m3: Merge models path(m,y ): Etract (directed) pathways between variables parents(m, y+y); children(m, +): parent and children of nodes (union) vars(m), eogenous(m), endogenous(m), latent, manifest: Etract variable names > plot(m,labels=true,diag=true) y y y3 p5 p6 p7 p p p3 u p4 p8

4 Eamining the model > summary(m) Latent Variable Model with: 5 variables. Npar=8+4 Regression parameters: y y y3 u y y y3 u * * * * Covariance parameters: y y y3 u y * y * y3 * u * Intercept parameters: y y y3 u * * * * Multiple Group Analysis, lava synta NB: Labeled parameters will not be altered to guarantee identification (lava.options()$param)! Automatic solution: > m <- baptize(fisome(m)) Manual solution: > regression(m, y~u[0]) <- We continue the eample by adding some additional constraints and free some parameters for the multigroup analysis: > intercept(m,endogenous(m)) <- 0 > covariance(m,endogenous(m)) <- 0 3 > regression(m,u~) <- NA 4 > covariance(m,~u) <- NA Multiple Group Analysis, Eample > m <- baptize(m) > summary(m)... Regression parameters: y y y3 u y y y3 u y<-u y<-u y3<-u u<- Covariance parameters: y y y3 u y y<->y y y<->y y3 y3<->y3 u u<->u Intercept parameters: y y y3 u y y y3 u Multiple Group Analysis, eample > summary(m)... Regression parameters: y y y3 u y y y3 u y<-u y3<-u * Covariance parameters: y y y3 u y v y v y3 v u * Intercept parameters: y y y3 u u

5 Multiple Group Analysis, eample Multiple Group Analysis, estimation y y y3 v v v y y y3 v v v λ λ 3 u ζ Females λ λ 3 u ζ Males > e <- estimate(list(males=m,females=m),list(d,d)) > e β β Available methods: summary, coef, vcov, confint, compare, plot, score, loglik,... Note that a meaningful test of equality of β and β requires measurement invariance (latent variables should measure the same, i.e. equal factor loadings) Multiple Group Analysis, estimation Multigroup analysis Group : Males (n=00) Estimate Std. Error Z value Pr(> z ) Measurements: y<-u <e- y3<-u <e- Regressions: u< <e- Intercepts: u Residual Variances: y u Group : Females (n=50) Estimate Std. Error Z value Pr(> z ) Measurements: y<-u <e- y3<-u <e- Regressions: u< <e- Intercepts: u Residual Variances: y u Wald test and LRT via compare: > e <- estimate(m,rbind(d,d)) > compare(e,e) Likelihood ratio test data: chisq =.8685, df =, p-value =.08e-05 sample estimates: log likelihood (model ) log likelihood (model ) Omnibus χ -test: compare(e) Non-linear constraints possible. However, parameters must be added across all groups with the parameter method.

6 Combining different models Stacking different models and datasets together... M(θ, ψ): Primary model with parameters (θ, ψ) where ψ is nuisance parameter (may be unidentified). θ: parameter of interest. M(θ, ψ): Model used to estimate nuisance parameter ψ Eample: measurement error model True model: Y = β0 + βx + ɛ, but instead of X we observe W = X + U. In an independent dataset we in addition observe W = X + U and W = X + U. Solution: Multiple Group Analysis (two-stage ignoring uncertainty in estimation leads to too small standard errors!) NB: Sample principle as missing data analysis in SEM. Combining different models > m <- lvm(c(y,w,w)~) > d <- sim(m,000,p=c("y<-"=-)) 3 > d <- sim(m,000) > estimate(y~,d) Estimate Std. Error Z-value P-value Regressions: y< <e- Intercepts: y Residual Variances: y > estimate(y~w,d) Estimate Std. Error Z-value P-value Regressions: y<-w <e- Intercepts: y Residual Variances: y Measurement error Y : response, X: true eposure, W : measured eposure Measurement error Y = β0 + βx + ɛ, W = X + U True data generating mechanism: 4 0 Y = β0 + βx + ɛ Assume X has finite variance σ and X ɛ N (0, σ ɛ ). Y X W Naïve analysis W = X + U, U N (0, σ u) X Naïve analysis replacing X with W attenuates the effect Bias increases with the degree of imprecision Y = β0 + βw + ξ, Violation of Linear Model assumptions? ξ = ɛ βu

7 Measurement error The MLE obtained by regressing Y on W is an unbiased estimate of λβ = Cov(Y, W )/Var(Y ), where Introducing confounder Z λ = σ <. σ + σu Y = β0 + βx + βzz + ɛ, Naïve analysis consistently estimates βvar(x Z) Var(X Z) + Var(U) W = X + U Large bias when variance of eposure is low (for fied levels of confounders) and imprecision is high. Combining different models Group (n=000) Estimate Std. Error Z value Pr(> z ) Measurements: y<-u <e- Intercepts: w y Residual Variances: w u y Group (n=000) Estimate Std. Error Z value Pr(> z ) Intercepts: w Residual Variances: w u Combining different models m: Primary model m: Model used to estimate nuisance parameter (from independent data set) > m <- lvm() > regression(m, c(w,w)~u) <- 3 > intercept(m,~w+w) <- "m" 4 > covariance(m,~w+w) <- "v" 5 > covariance(m,~u) <- "vu" 6 > intercept(m,~u) < > m <- kill(m,~w) 9 > regression(m) <- y~u > estimate(list(m,m),list(d,d)) Miture Models Multiple Group Analysis: Known groups Miture SEM: Known number of unknown groups > library(lava.miture) > miture(list(m,m,m3),data=d) 3 > miture(m,data=d,k=3) Modelling of heterogeneity Estimation by EM-algorithm (slower than multiple group) Controversy in choosing number of components Technical problems (convergence and boundedness of likelihood) Impact of model misspecification?

8 Twin Studies Twin Studies Considerable interest in finding out how much of specific traits and diseases that are inherited. DZ MZ Family and Twin studies can be used to shed light on the genetic and environmental influence. 4.5 Twin studies Include both monozygotic (MZ) and dizygotic (DZ) twin pairs. DZ pairs on averages shares half of their genes MZ pairs are natural copies Difference in similarity of DZ and MZ twins may indicate genetic influence! Birth weight of twin Birth weight of cotwin Twin similarity Similarity The difference in (product-moment) correlation within pairs of MZ and DZ twins is our measure of similarity, i.e. difference in amount of variance between pairs of the total variance of the phenotype. Higher correlation in MZ pairs indicates genetic influence. Twin similarity DZ MZ Decomposition What is contribution of genetic and environmental factors to the variation in the outcome? The phenotype is the sum of genetic and enviromental effects: Density Y Density Y Y = G + E Idea: decompose variance into genetic and environmental components Y Y ΣY = ΣG + ΣE

9 Polygenic model for continuous trait ACDE model Decompose outcome into Yi = Ai + Di + C + Ei, i =, A Additive genetic effects of alleles D Dominante genetic effects of alleles C Shared environmental effects E Unique environmental genetic effects Dissimilarity of MZ twins arises from unshared environmental effects only! Cor(E, E) = 0 and Cor(A MZ, A MZ ) =, Cor(D MZ, D MZ ) =, Cor(A DZ, A DZ ) = 0.5, Cor(D DZ, D DZ ) = 0.5, Polygenic model for continuous trait ACDE model Decompose outcome into Yi = Ai + Di + C + Ei, i =, A Additive genetic effects of alleles D Dominante genetic effects of alleles C Shared environmental effects E Unique environmental genetic effects Assumptions No gene-environment interaction No gene-gene interaction Same marginals of twin and twin, and MZ and DZ. Equal environmental effects for MZ and DZ. Polygenic model for continuous trait Polygenic model Model DZ 0.5/ MZ DZ 0.5/ MZ Yi = Ai + Ci + Di + Ei Ai N (0, σ A), Ci N (0, σ C), Di N (0, σ D), Ei N (0, σ E) A D E C A D E λa λd λe λc λc λa λd λe ( σ A ZAσA ) ZAσA σa where ZA = ( σ + C σc σc σ C Cov(Y, Y) = ) ( σ + D ZDσD ZDσD σd { {, MZ, MZ 0.5 and ZD = DZ 0.5 DZ ) ( ) σ + E 0 0 σe Y Y

10 Polygenic model DZ 0.5/ MZ DZ 0.5/ MZ A D E C A D E Polygenic model Obviously this is a structural equation model which is easily implemented in lava > m <- lvm() > regression(m, c(y,y) ~ A+A) <- c("a","a") 3... λa λd λe λc λc λa λd λe The differences in covariance between zygosities can be defined using a multigroup analysis... Y Y > mz <- dz <- m > covariance(mz,a~a) <- 3 > covariance(dz,a~a) < X Z > estimate(list(mz,dz),split(twinwide,twinwide$zyg.))... or you could cheat and use the mets package. Polygenic model With only MZ and DZ twins we can only identify three of the variance components. Often the following strategy is applied Estimate ACE model Compare with AE model 3 If C can be omitted, estimate ADE model Identification is possible by etending the design to including adoptive siblings or additional family members.. Heritability Heritability Heritability Narrow-sense heritability Shared environmental effect H Y = Var(G) VarY = σ A + σ D σ A + σ C + σ D + σ E h Y = VarA VarY c σc Y = σa + σ C + σ D + σ E In the ACE model the heritability is given by h = (ρmz ρdz)

11 Gene-Environment Interactions Covariate X modifying h : Multivariate Analysis of Twin Data log(σ A) := αa + γax log(σ C) := αc + γcx How should we proceed with multiple traits? log(σ E) := αe + γex For categorical X, easily implemented in lava using multigroup. For continuous X (much slower): > constrain(m, va \ti{} +alpha+gamma) <- function() []+[]*[3] or using random slopes, i.e. λa := αa + γax Why? Models for comorbidity Gain in efficiency Heritability pathways. > regression(m) <- c(y,y) \ti{} f(a,) Multivariate Analysis of Twin Data Multivariate Analysis of Twin Data A E A E A E A E A E A E A E A E C C C C Y Z Y Z Y Z Y Z

12 Cholesky Factor Model Independent pathways (biometric common factors) A A A3 A C E Y Y Y3 Y Y Y3 C E C E E3 C3 A C E A C E A3 C3 E3 Common pathways (psychometric common factors) Hjelmborg et al., Obesity 008 Longitudinal biometric analysis A C E I S t η t tp Y Y Y3 Y Y Yp A C E A C E A3 C3 E3

13 Longitudinal biometric analysis Non-normal traits A C E A C E Dichotomous Probit model / Threshold model I t t S tp Censoring Tobit model / Threshold model Inverse Probability Weights Y Y Yp Available with packages lava.tobit, mets

The Faroese Cohort 2. Structural Equation Models Latent growth models. Multiple indicator growth modeling. Response profiles. Number of children: 182

The Faroese Cohort 2. Structural Equation Models Latent growth models. Multiple indicator growth modeling. Response profiles. Number of children: 182 The Faroese Cohort 2 Structural Equation Models Latent growth models Exposure: Blood Hg Hair Hg Age: Birth 3.5 years 4.5 years 5.5 years 7.5 years Number of children: 182 Multiple indicator growth modeling