Bayesian Mixture Modeling

Similar documents
Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Using Bayesian Priors for More Flexible Latent Class Analysis

Bayesian Analysis of Latent Variable Models using Mplus

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D

Plausible Values for Latent Variables Using Mplus

Running head: BSEM: SENSITIVITY TO THE PRIOR 1. Prior Sensitivity Analysis in Default Bayesian Structural Equation Modeling

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Part 6: Multivariate Normal and Linear Models

STA 216, GLM, Lecture 16. October 29, 2007

Variable-Specific Entropy Contribution

PMR Learning as Inference

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Introduction to Bayesian inference

Nesting and Equivalence Testing

MULTILEVEL IMPUTATION 1

Introduction to Probabilistic Machine Learning

Bayesian Methods for Machine Learning

Bayesian Inference for the Multivariate Normal

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Inference. p(y)

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

COS513 LECTURE 8 STATISTICAL CONCEPTS

Misspecification in Nonrecursive SEMs 1. Nonrecursive Latent Variable Models under Misspecification

Mixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University

Investigating Population Heterogeneity With Factor Mixture Models

Density Estimation. Seungjin Choi

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Bayesian Inference: Concept and Practice

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Probability and Estimation. Alan Moses

Finite Mixture EFA in Mplus

eqr094: Hierarchical MCMC for Bayesian System Reliability

Bayesian Inference: Probit and Linear Probability Models

Linear Models A linear model is defined by the expression

Time Series and Dynamic Models

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Bayesian Linear Regression

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Multilevel regression mixture analysis

Accounting for Complex Sample Designs via Mixture Models

Hierarchical Models & Bayesian Model Selection

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Chapter 19. Latent Variable Analysis. Growth Mixture Modeling and Related Techniques for Longitudinal Data. Bengt Muthén

Markov Chain Monte Carlo methods

UvA-DARE (Digital Academic Repository)

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Introduction into Bayesian statistics

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Bayesian Sparse Correlated Factor Analysis

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Markov Chain Monte Carlo in Practice

Bayesian Models in Machine Learning

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Intro to Probability. Andrei Barbu

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

Linear Dynamical Systems

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

Lecture 16: Mixtures of Generalized Linear Models

Testing the Limits of Latent Class Analysis. Ingrid Carlson Wurpts

Making rating curves - the Bayesian approach

Non-Parametric Bayes

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Bayesian Areal Wombling for Geographic Boundary Analysis

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

On Bayesian Computation

A Parameter Expansion Approach to Bayesian SEM Estimation

Bayesian Inference. Chapter 9. Linear models and regression

Nonparametric Bayes Uncertainty Quantification

STA414/2104 Statistical Methods for Machine Learning II

Multiparameter models (cont.)

Different points of view for selecting a latent structure model

Principles of Bayesian Inference

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Published online: 09 Sep 2014.

Bayes methods for categorical data. April 25, 2017

Review: Statistical Model

Lecture 13 Fundamentals of Bayesian Inference

VCMC: Variational Consensus Monte Carlo

The Bayesian Approach to Multi-equation Econometric Model Estimation

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian SEM: A more flexible representation of substantive theory

Continuous Time Survival in Latent Variable Models

Weakly informative priors

Bayesian data analysis in practice: Three simple examples

David Giles Bayesian Econometrics

The posterior - the goal of Bayesian inference

STAT J535: Introduction

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Model comparison: Deviance-based approaches

Latent class analysis for intensive longitudinal data, Hidden Markov processes, Regime switching models and Dynamic Structural Equations in Mplus

BAYESIAN INFERENCE FOR A COVARIANCE MATRIX

Bayesian model selection: methodology, computation and applications

Transcription:

University of California, Merced July 21, 2014 Mplus Users Meeting, Utrecht

Organization of the Talk Organization s modeling estimation framework Motivating examples duce the basic LCA model Illustrated using Adult California Tobacco Survey Priors for LCA portions of Mplus code Illustrated using Youth Risk Behavior Survey Simulation findings: Estimating mixture models in Mplus of Bayes for mixture models using Bayes with mixture models Concluding remarks

duction to A large part of so-called second-generation SEM is the ability to model different unobserved groups of individuals. Organization s These unobserved groups can be captured through mixture models, and substantive differences across the groups can be identified. modeling has proved to be a useful tool for accounting for heterogeneity within a population, and the flexibility of mixture models has allowed for some innovative modeling techniques. 1 1 For detailed information about mixture modeling, see: McLachlan, G., & Peel, D. (2004). Finite mixture models. John Wiley & Sons.

duction to cont. Organization s s are particularly useful when there is unobserved heterogeneity in the data. In this case, the variables that cause heterogeneity in the data are not known prior to data analysis. The researcher hypothesizes that the population is comprised of an unknown (precise) number of substantively distinct subpopulations (or classes). e.g., depression (subpopulations: non, mild, depressed); reading achievement (subpopulations: low, average, high) Typically, it is known that the population is heterogeneous, but there is no known indicator of class membership. Class membership must be determined based on observed data patterns.

Estimation Framework Organization s The estimation framework is a complex system used to estimate models. 2 One of the key differences between frequentist (e.g., ML/EM) and estimation is the use of prior distributions. Posterior = Data * Prior (1) The prior distribution is moderated by the data and this relationship produces the posterior distribution (estimate). 2 For more details about estimation, see: Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). dat analysis. CRC press.

Estimation cont. Organization s Prior distributions are placed on every model parameter that we wish to estimate. Think of priors similar to a bet. These distributions represent the amount of uncertainty that we have surrounding the parameters in our model. Specifically, priors represent our opinions about each model parameter.

Estimation cont. Organization s Priors allow us to incorporate our (un)certainty about model parameters using probability distributions. For example, Intercept N (μ, σ 2 ). The μ and σ 2 terms are called hyperparameters. In addition, we can also make an assumption about the particular values that the intercept can take on. Diffuse: having no idea about the parameter value. Informative: having a very strong idea about the parameter value. Weak: Using less information then is available for the prior. The specification of these prior distributions is an integral part of using the estimation framework.

Contrived of the Framework Organization s

Priors: Different Levels of Informativeness Organization s

Posterior Chain and Corresponding Density Organization s

Motivating : LCA Smokers Priors Mplus Latent class analysis (LCA) is a method used to capture different response patterns for discrete observed variables. With these types of variables, there can be a very large number of response patterns (e.g., 5 binary items: 2 5 = 32 response patterns). LCA summarizes these response patterns into a few, substantively meaningful latent classes.

Latent Class Analysis cont. Smokers Priors Mplus Each individual has a probability for belonging to each class, and the individual is assigned to the class corresponding with the highest probability of membership. The assigned class consists of individuals with similar response patterns. LCA provides a succinct description of the latent classes through the different patterns of responses on the observed variables.

LCA: of Former Smokers 3 YRSREG QUITTIME SMOK6MOS THINKSMK STARTAGN OTHERTOB CANCER ADDICTIV EXPOSURE CANQUIT MARRIED Smokers Priors Mplus c Using 2500 cases from the Adult California Tobacco Survey to identify possible latent groups of former smokers 3 Clifton, J., Depaoli, S., and Song, A. (under revision). Are all former smokers alike? A latent class analysis.

Estimated Prevalence (%) LCA: of Former Smokers cont. Smokers Priors Mplus 100 80 60 40 20 0 Short-term daily smokers (17%) Long-term daily smokers (33%) Stable realists (36%) Long-term nondaily smokers (9%) At risk (5%)

Priors for a Basic LCA Smokers Priors Mplus e1 e2 e3 e4 e5 e6 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 C Response probabilities: Normal prior, N(0,10 10 ) Latent class proportions: Dirichlet prior, D(10,10) Residual variances: Inverse-Gamma prior, IG(-1,0)

General Mplus code for Estimation Smokers Priors Mplus ANALYSIS: TYPE=MIXTURE; ESTIMATOR = BAYES; CHAINS=1; DISTRIBUTION=50,000; POINT=MODE; ALGORITHM = GIBBS (PX1); 4 BCONVERGENCE=.05 BITERATIONS=50,000 0; FBITERATIONS = 50,000; THIN=1; 4 More information about samplers in: Asparouhov, T, and Muthén B. (2010). analysis using Mplus: Technical implementation. Technica Report. Los Angeles: Muthén & Muthén.

Mplus code for LCA (Collins & Lanza, Youth Risk Behavior 2007, Priors from 2005) Smokers Priors Mplus model priors: d1 D(8917,720); d2 D(1211,720);!c#1 j11 N(-3.132,0.084); j12 N(-3.892,0.112); j13 N(-5.141,0.317);!c#2 j21 N(1.150,0.174); j22 N(-0.770,0.121); j23 N(-1.746,0.158);!c#3 j31 N(0.611,0.134); j32 N(0.674,0.098); j33 N(-0.163,0.093); %overall% [c#1 * 2.595](d1); [c#2 * 0.587](d2); %c#1% [cig 13$1 * 3.132](j11); [cig 30$1 * 3.892](j12); [drive$1 * 5.141](j13); %c#2% [cig 13$1 * 1.150](j21); [cig 30$1 * 0.770](j22); [drive$1 * 1.746](j23); %c#3% [cig 13$1 * 0.611](j31); [cig 30$1 * 0.674](j32); [drive$1 * 0.163](j33);

LCA Simulation Results 5 Average Absolute Bias 0.5 0.4 0.3 0.2 0.1 0 LCA,.33/.33/.33, High Likeness ML Diffuse Info Par8al Weak Es1ma1on Levels 5 Depaoli, S. and Clifton, J. (in preparation). The specification and impact o prior distributions for categorical latent variable models.

LCA Simulation Results cont. Average Absolute Bias 0.5 0.4 0.3 0.2 0.1 0 LCA,.60/.30/.10, High Likeness ML Diffuse Info Par8al Weak Es1ma1on Levels

: General General Info. Priors Inaccuracte In general, the approach can produce a drastic improvement in accuracy in parameter estimates and mixture class proportions, especially when more informative (or weak) priors are specified. This approach can also help identify small but substantively real latent classes. 6 6 see e.g., Depaoli (2013); and van de Schoot, Depaoli, van Loey, N., and Sijbrandij, M. (under review). Integrating background knowledge about traumatic stress experienced after trauma into latent growth mixture models.

: Accurate, Informative Priors General Info. Priors Inaccuracte Informative priors perform quite well in simulation in that they are largely able to uncover small but substantively different mixture classes. 7 Latent class proportions are also well recovered with the use of (weakly) informative Dirichlet priors on the class proportions. 8 7 Depaoli, S. (2013). 8 Depaoli, S. and Clifton, J. (in preparation).

: Inaccurate Priors General Info. Priors Inaccuracte modeling is relatively robust to inaccuracies in prior distributions. 9 This is an important finding given that if the location of a prior distribution is very wrong, then the parameter value can still be accurately recovered by even moderately increasing the variance hyperparameter of the prior. One area not examined yet is the inaccuracy of the Dirichlet prior and the impact this would have on substantive findings. 9 Depaoli, S. (2014).

: Diffuse Priors Diffuse Dirichlet Findings have suggested that more informative priors are necessary in the context of mixture modeling. 10 Diffuse priors (e.g., N(0, 10 10 )) may have a more harmful impact on parameter estimates than inaccurate priors in the case of mixture modeling. 10 Depaoli (2012); Depaoli (2013)

Concluding Remarks about in Mplus Overall Contact IW The estimation framework shows promise in accurately estimating mixture models under various modeling conditions. Recognizing that our priors will undoubtedly contain some level of inaccuracy according to the unknown population, it is important to conduct a sensitivity analysis in order to assess how much of an impact different levels of the prior have on model results. Openness and transparency are vital for implementing any statistical tool, but this is especially the case for tools.

Thank You! Questions or Comments: : sdepaoli@ucmerced.edu Overall Contact IW

Specification of Latent Class Analysis Overall Contact IW The probability of membership in latent class c is represented by γ c and C γ c = 1. (2) c=1 For a given observed item j, the probability of response r j given membership in class c is given by an item-response probability ρ j,rj c. Note that the vector of item-response probabilities for item j conditional on latent class c always sums to 1.0 across all possible responses to item j as denoted by R j ρ j,rj c = 1 (3) for all j observed items. r j =1

Specification of Latent Class Analysis Overall Contact IW In order to define the LCA model, the probability of a given pattern of responses must be computed. Let y j represent the j element for the observed response pattern denoted as vector y. Next, let I (y j = r j ) represents in indicator variable such that the indicator variable equals 1 when variable j = r j and 0 otherwise. Then, the probability of observing a particular set of item responses y can be written as P(Y = y) = C c=1 γ c Π J j=1π R j r j =1 ρi (y j =r j ) j,r j c. (4)

Specification of Latent Class Analysis Overall Contact IW Essentially, Equation 4 indicates that the probability of observing a particular response pattern y is a function of the probability of membership in each of the C latent classes given by the γ c term and the probability of each response conditional on latent class membership denoted by ρ j,rj c. To provide an example of more concrete notation, Equation 4 can be expanded out for observed categorical items j 1,..., j 4 respectively: C P j=1,...,4 = γ c ρ j=1 c ρ j=2 c ρ j=3 c ρ j=4 c, (5) c=1 where the probability of a given response pattern for items j = 1,..., 4 is a product of the proportion of individuals in latent class c and response probabilities for observed items j = 1,..., 4 conditioned on class membership.

Estimation Overall Contact IW We first set up the joint probability distribution which is P(y, Θ) = P(y Θ)P(Θ). (6) Bayes theorem uses this product to compute the probability of Θ given the observed data y through the following P(y Θ)P(Θ) P(Θ y) = Θ P(y Θ)P(Θ)dΘ (7) where P(y Θ) represents the likelihood (the observed data given the distributional parameters), P(Θ) represents something called a prior distribution that is coupled with the likelihood, and P(Θ y) is the posterior distribution of Θ.

Inverse-Wishart Specification in Mplus Overall Contact IW There are three specifications of the inverse Wishart that are discussed as non-informative in Asparouhov and Muthén (2010) 12 The first specification is IW(0,-p-1), which is the current default setting in Mplus version 7.2 for covariance matrices and mimics a uniform prior bounded at (, ). The second specification is IW(0,0). The last specification discussed is IW(I,p+1), where this prior mimics the case where off-diagonal elements (covariances) of the covariance matrix would have uniform priors bounded at [-1,1] and diagonal elements (variances or residual variances) distributed as IG(1,.5). 12 page 35: analysis using Mplus: Technical implementation. Technical Report. Version 3.

: Inverse Wishart Priors Overall Contact IW Although not discussed here, it is important to address some points about the inverse Wishart prior, which is commonly implemented with mixture models (for covariances). Changing default Mplus settings of this prior may create a multivariate prior that is non-positive definite. When the default is changed, you have univariate inverse gamma priors on diagonals and univariate uniform or normal priors on off-diagonals. 13 When in doubt, seek advice from a statistician! 13 For more details see: Depaoli and van de Schoot (under review).

References Overall Contact IW Asparouhov, T, and Muthén, B. (2010). analysis using Mplus: Technical implementation. Technical Report. Los Angeles: Muthén & Muthén. Clifton, J., Depaoli, S., and Song, A. (under revision). Are all former smokers alike? A latent class analysis. Depaoli, S. (2014). The impact of inaccurate informative priors for growth parameters in growth mixture modeling. Structural Equation : A Multidisciplinary Journal, 21, 239-252. Depaoli, S. (2013). class recovery in GMM under varying degrees of class separation: Frequentist versus estimation. Psychological Methods, 18, 186 219.

References cont. Overall Contact IW Depaoli, S. (2012). Measurement and structural model class separation in mixture-cfa: ML/EM versus MCMC. Structural Equation : A Multidisciplinary Journal, 19, 178-203. Depaoli, S. and Clifton, J. (in preparation). The specification and impact of prior distributions for categorical latent variable models. Depaoli, S. and van de Schoot, R. (under review). The WAMBS-checklist: When to worry, and how to avoid the misuse of statistics. Diebolt, J., and Robert, C. P. (1994). Estimation of finite mixture distributions through sampling. Journal of the Royal Statistical Society. Series B (Methodological), 363-375.

References cont. Overall Contact IW Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). data analysis. CRC press. McLachlan, G., and Peel, D. (2004). Finite mixture models. John Wiley & Sons. Natarajan, R., and McCulloch, C. E. (1998). Gibbs sampling with diffuse proper priors: A valid approach to data-driven inference?. Journal of Computational and Graphical Statistics, 7(3), 267-277. Richardson, S., and Green, P. J. (1997). On analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), 59(4), 731-792.

References cont. Overall Contact IW Roeder, K., and Wasserman, L. (1997). Practical density estimation using mixtures of normals. Journal of the American Statistical Association, 92(439), 894-902. van de Schoot, Depaoli, van Loey, N., and Sijbrandij, M. (under review). Integrating background knowledge about traumatic stress experienced after trauma into latent growth mixture models.