PACKAGE LMest FOR LATENT MARKOV ANALYSIS

Similar documents
Dynamic sequential analysis of careers

Markov-switching autoregressive latent variable models for longitudinal data

Package LMest. R topics documented: March 11, Type Package

A dynamic perspective to evaluate multiple treatments through a causal latent Markov model

STA 4273H: Statistical Machine Learning

Testing for time-invariant unobserved heterogeneity in nonlinear panel-data models

STA 414/2104: Machine Learning

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Linear Dynamical Systems (Kalman filter)

Multiscale Systems Engineering Research Group

Composite likelihood inference for hidden Markov models for dynamic networks

Comparison between conditional and marginal maximum likelihood for a class of item response models

Dynamic logit model: pseudo conditional likelihood estimation and latent Markov extension

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Dynamic Approaches: The Hidden Markov Model

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

A class of latent marginal models for capture-recapture data with continuous covariates

Hidden Markov models for time series of counts with excess zeros

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Note Set 5: Hidden Markov Models

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Basic math for biology

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Master 2 Informatique Probabilistic Learning and Data Analysis

Weighted Finite-State Transducers in Computational Biology

Chapter 4 Longitudinal Research Using Mixture Models

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Basics of Modern Missing Data Analysis

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

Log-linear multidimensional Rasch model for capture-recapture

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Convergence Rate of Expectation-Maximization

Use of hidden Markov models for QTL mapping

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Adaptive quadrature for likelihood inference on dynamic latent variable models for time-series and panel data

Introduction to Machine Learning CMU-10701

order is number of previous outputs

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Using Hidden Markov Models To Model Life Course Trajectories

EM Algorithm II. September 11, 2018

Lecture 4: State Estimation in Hidden Markov Models (cont.)

STA 4273H: Statistical Machine Learning

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

O 3 O 4 O 5. q 3. q 4. Transition

arxiv: v1 [stat.ap] 11 Aug 2014

Bayesian Modeling of Conditional Distributions

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

Markov Chains and Hidden Markov Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

MULTILEVEL IMPUTATION 1

Mixtures and Hidden Markov Models for analyzing genomic data

INTRODUCTION TO STRUCTURAL EQUATION MODELS

Hidden Markov Models and Gaussian Mixture Models

Empirical Market Microstructure Analysis (EMMA)

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

PIRLS 2016 Achievement Scaling Methodology 1

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Density Estimation. Seungjin Choi

Correction for Ascertainment

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

Introduction to mtm: An R Package for Marginalized Transition Models

Inference and estimation in probabilistic time series models

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Package rarhsmm. March 20, 2018

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

The sbgcop Package. March 9, 2007

Expectation Maximization (EM)

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

STA 4273H: Statistical Machine Learning

A dynamic latent model for poverty measurement

Title. Description. Remarks and examples. stata.com. stata.com. methods and formulas for gsem Methods and formulas for gsem

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

2 Dipartimento di Medicina Sperimentale, Sapienza - Universita di Roma

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Generalized Linear Models (GLZ)

Linear Dynamical Systems

A general mixed model approach for spatio-temporal regression data

Package sbgcop. May 29, 2018

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Causal inference in paired two-arm experimental studies under non-compliance with application to prognosis of myocardial infarction

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Bayesian Dynamic Linear Modelling for. Complex Computer Models

LCA Distal Stata function users guide (Version 1.0)

Package mlmmm. February 20, 2015

ASA Section on Survey Research Methods

Generalized Linear Models for Non-Normal Data

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Transcription:

PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it, silvia.pandolfi@unipg.it) 2 Department of Statistics and Quantitative Methods, University of Milano-Bicocca (e-mail: fulvia.pennoni@unimib.it) ABSTRACT: Latent Markov (LM) models represent an important tool of analysis of longitudinal data. We illustrate the main functions of the R package LMest that is tailored to fit the basic LM model, and some of its extended formulations, on longitudinal categorical data. The illustration is based on empirical analyses of datasets from a socio-economic perspective. KEYWORDS: Expectation-Maximization algorithm, forward-backward recursions, mixed models, time-varying unobserved heterogeneity. 1 Introduction We illustrate the R package LMest (V2.3, available from http://cran. R-project.org/package=LMest), which provides a collection of functions that can be used to estimate Latent Markov (LM) models for longitudinal categorical data. The package is strongly related to the book of Bartolucci et al., 2013, where these models are illustrated from the methodological point of view. The package is described in detail in the devoted paper of Bartolucci et al., 2017, to which we refer the reader for a more comprehensive overview. The LMest package has several distinguishing features over the existing R packages for similar models. In particular, it is designed to deal with longitudinal data, that is, with (even many) i.i.d. replicates of (usually short) sequences of data, and it can be used with univariate and multivariate categorical outcomes. The package also allows us to deal with missing responses, including drop-out and non-monotonic missingness, under the missing-at-random assumption. Moreover, standard errors for the parameter estimates are obtained by exact computation of the information matrix or through reliable numerical approximations of this matrix. Finally, computationally efficient algorithms are implemented for estimation and prediction of the latent states, by relying on suitable Fortran routines.

In the next sections we show how, through the main functions of the LMest package, we can estimate the basic LM model and LM models with individual covariates; these covariates are included in the model through suitable parameterizations. In addition, we briefly show how to perform model selection and local and global decoding. For reasons of space we just mention that additional discrete random effects can be used to formulate mixed LM models, which are estimable through the R function est lm mixed. In this case, the initial and transition probabilities of the latent process are allowed to vary across different latent subpopulations defined by an additional discrete latent variable. 2 The general latent Markov model formulation In the following we provide a brief review of the main assumptions of LM models for categorical longitudinal data. For a generic sample unit we consider a vector Y (t) of r categorical response variables at T occasions, so that t = 1,...,T. Each response variable is denoted by Y (t) j and has c j categories, labeled from 0 to c j 1, with j = 1,...,r. Also let Ỹ be the vector obtained by stacking Y (t) for t = 1,...,T. When available, we denote by X (t) the vector of individual covariates available at the t-th time occasion and by X the vector of all the individual covariates. As usual, capital letters are used to denote random variables or vectors and small letters for their realizations. The general LM model formulation assumes the existence of a latent process, denoted by U = (U (1),...,U (T ) ), which affects the distribution of the response variables. Such a process is assumed to follow a first-order Markov chain with state space {1,...,k}, where k is the number of latent states. Under the local independence assumption, the response vectors Y (1),...,Y (T ) are assumed to be conditionally independent given the latent process U. Moreover, the elements Y (t) j of Y (t) are conditionally independent given U (t). Parameters of the measurement model are the conditional response probabilities φ (t) (t) jy ux = p(y j = y U (t) = u,x (t) = x), whereas parameters of the structural model are the initial and transition probabilities of the latent process: π u x = p(u (1) = u X (1) = x), π (t) u ūx = p(u (t) = u U (t 1) = ū,x (t) = x). Maximum likelihood estimation of the LM models is performed through the Expectation-Maximization (EM) algorithm based on forward-backward recursions (Baum et al., 1970; Dempster et al., 1977). This algorithm is based on alternating two steps, consisting in obtaining the posterior distribution of the latent states given the observed data (E-step) and updating the parameters by maximizing the expected value of the complete data log-likelihood (M-step).

3 Basic latent Markov model The basic LM model rules out individual covariates and assumes that the conditional response probabilities are time homogenous. In symbols, we have that φ (t) jy ux = φ jy u, π u x = π u, and π (t) u ūx = π(t) u ū. The model is fitted by function est lm basic, which requires the following main input arguments: S: array of response configurations of dimension n TT (number of time occasions) r; missing responses are indicated with NA; yv: vector of frequencies of the response configurations; k: number of latent states; mod: model on the transition probabilities; mod = 0 when these probabilities depend on time, mod = 1 when they are independent of time (i.e., the latent Markov chain is time homogeneous), and mod from 2 to TT when the Markov chain is partially homogeneous; start: equal to 0 for deterministic starting values of the model parameters (default value), to 1 for random starting values, and to 2 for initial values provided as input arguments. The output may be shown through the usual print and summary commands, which display, among others, the maximum log-likelihood, the estimated conditional response probabilities (Psi) and the estimated initial (piv) and transition probabilities (Pi). The illustration of this function is based on the survey data provided by the Russia Longitudinal Monitoring Survey, by considering an ordinal response variable related to job satisfaction measured on a scale ranging from 1 ( absolutely satisfied ) to 5 ( absolutely not satisfied ). A suitable function search.model.lm allows us to select the value of k on the basis of the observed data, by considering different initializations of the EM algorithm which is used to maximize the log-likelihood function. In this way, we can address jointly the problems of model selection and the multimodality of the likelihood function. This function can also be applied to the models illustrated in the following two sections. 4 Covariates in the measurement model When the individual covariates are included in the measurement model, the latent variables account for the unobserved heterogeneity, that is, the heterogene- For more details on the study see http://www.cpc.unc.edu/projects/ rlms-hse, http://www.hse.ru/org/hse/rlms.

ity between individuals that we cannot explain on the basis of the observable covariates. In this case, the conditional distribution of the response variables given the latent states may be parameterized by generalized logits. In formulating the model we can rely on the following parameterization based on global logits for a single ordinal response variable with c categories: log φ(t) y ux +... + φ(t) c 1 ux φ (t) = µ y +α u +x β, u = 1,...,k, y = 1,...,c 1. (1) 0 ux +... + φ(t) y 1 ux In the above expression, the µ y are cut-points, the α u are the support points corresponding to each latent state, and β is the vector of regression parameters for the covariates. The model is estimated by function est lm cov manifest, which requires the following main input arguments: S: matrix of the observed response configurations (of dimension n TT) with categories starting from 0; X: array of covariates of dimension n TT nc, where nc corresponds the number of covariates; k: number of latent states; mod: type of model to be estimated, coded as mod = "LM" for the model based on parameterization (1). In such a context, the latent process is of first order with initial probabilities equal to those of the stationary distribution of the chain. When mod = "FM", the function estimates a model relying on the assumption that the distribution of the latent process is a mixture of AR(1) processes with common variance σ 2 and specific correlation coefficients ρ u (Bartolucci et al., 2014); q: number of support points of the AR(1) structure mentioned above. We illustrate the above functions through the analysis of the survey data provided by Health and Retirement Study conducted by the University of Michigan. The main response of interest is an ordinal variable related to self evaluation of the health status measured on a scale ranging from 1 ( excellent ) to 5 ( poor ). 5 Covariates in the latent model When the covariates are included in the latent model, we suppose that the response variables measure a certain individual characteristic of interest (e.g., For more details on the study see http://hrsonline.isr.umich.edu/

well-being), the evolution of which is represented by the latent Markov process. In fact, this characteristic is not directly observable and we assume it may evolve over time. In such a case, the main research interest is in measuring the effect of covariates on the latent distribution. In particular, the individual covariates are assumed to affect the initial and transition probabilities of the LM chain through the following multinomial logit parameterization: log(π u x /π 1 x ) = β 0u + x β 1u, u = 2,...,k, log(π (t) u ūx /π(t) ū ūx ) = γ 0ūu + x γ 1ūu, t = 2,...,T, ū,u = 1,...,k, ū u, where β u = (β 0u, β 1u ) and γūu = (γ 0ūu, γ 1ūu ) are parameter vectors to be estimated, which are collected in the matrices β and Γ. A more parsimonious model for the transition probabilities is also allowed which is based on the difference between two sets of parameters of the type log(π (t) u ūx /π(t) ū ūx ) = γ 0ūu + x (γ 1u γ 1ū ). (2) In the present case, the covariates are excluded from the measurement model and we adopt the constraint φ (t) jy ux = φ jy u. The above parameterizations are implemented in the R function est lm cov latent, which is based on the following main input arguments: S: array of observed response configurations (of dimension n TT r) with categories starting from 0; missing responses are coded as NA; X1: matrix of covariates affecting the initial probabilities of dimension n nc1, where nc1 is the number of corresponding covariates; X2: array of covariates affecting the transition probabilities of dimension n (TT-1) nc2, where nc2 is the number of corresponding covariates; k: number of latent states; param: type of parameterization for the transition probabilities, coded as param = "multilogit" (default) for the multinomial logit parameterization and as param = "difflogit" for the parameterization based on the difference between two sets of parameters as in (2). 6 Local and global decoding The prediction of the sequence of the latent states for a certain sample unit on the basis of the data observed for this unit can be performed by using function

decoding. In particular, the EM algorithm directly provides the estimated posterior probabilities of U (t), namely p(u (t) = u X = x,ỹ = ỹ), for every covariate and response configuration ( x, ỹ) observed at least once. These probabilities can be directly maximized to obtain a prediction of the latent state of every subject at each time occasion t; this is the so-called local decoding. In order to track the latent state of a subject across time, the a posteriori most likely sequence of states must be obtained, through the so-called global decoding, which is based on an adaptation of the Viterbi algorithm (Viterbi, 1967). Function decoding requires the following input arguments: est: object containing the output of one of the following functions: est lm basic, est lm cov latent, est lm cov manifest, or est lm mixed; Y: vector or matrix of responses; X1: matrix of covariates affecting the initial probabilities (for function est lm cov latent) or affecting the distribution of the responses (for est lm cov manifest); X2: array of covariates affecting the transition probabilities (for function est lm cov latent). References BARTOLUCCI, F., FARCOMENI, A., & PENNONI, F. 2013. Latent Markov Models for Longitudinal Data. Boca Raton: Chapman and Hall/CRC press. BARTOLUCCI, F., BACCI, S., & PENNONI, F. 2014. Longitudinal Analysis of Self-reported Health Status by Mixture Latent Auto-regressive Models. Journal of the Royal Statistical Society: Series C, 63, 267 288. BARTOLUCCI, F., PANDOLFI, S., & PENNONI, F. 2017. LMest: An R Package for Latent Markov Models for Longitudinal Categorical Data. Journal of Statistical Software, Accepted. BAUM, L. E., PETRIE, T., SOULES, G., & WEISS, N. 1970. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics, 41, 164 171. DEMPSTER, A. P., LAIRD, N. M., & RUBIN, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1 38. VITERBI, A. J. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 13, 260 269.