Trajectory analysis. Jan Helmdag University of Greifswald. September 26, Preliminaries Why? How? What to do? Application example

Similar documents
A Stata Plugin for Estimating Group-Based Trajectory Models

2-2: Evaluate and Graph Polynomial Functions

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Marginal Effects in Multiply Imputed Datasets

Polynomial Functions

Title. Description. stata.com. Special-interest postestimation commands. asmprobit postestimation Postestimation tools for asmprobit

9.5. Polynomial and Rational Inequalities. Objectives. Solve quadratic inequalities. Solve polynomial inequalities of degree 3 or greater.

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Addition to PGLR Chap 6

Latent class analysis and finite mixture models with Stata

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Chapter 4E - Combinations of Functions

Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

2. We care about proportion for categorical variable, but average for numerical one.

Analyzing Criminal Trajectory Profiles: Bridging Multilevel and Group-based Approaches Using Growth Mixture Modeling

The degree of a function is the highest exponent in the expression

ISQS 5349 Spring 2013 Final Exam

Step 2: Select Analyze, Mixed Models, and Linear.

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Lecture 3 Linear random intercept models

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Implementing contrasts using SAS Proc GLM is a relatively straightforward process. A SAS Proc GLM contrast statement has the following form:

Exploring Marginal Treatment Effects

Higher-Degree Polynomial Functions. Polynomials. Polynomials

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Subject CS1 Actuarial Statistics 1 Core Principles

Lecture notes to Stock and Watson chapter 8

Basic Regressions and Panel Data in Stata

Model Selection for Galaxy Shapes

Estimating Markov-switching regression models in Stata

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

Categorical and Zero Inflated Growth Models

Jeffrey M. Wooldridge Michigan State University

Complete the following table using the equation and graphs given:

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

STA 4273H: Statistical Machine Learning

Daniel J. Bauer & Patrick J. Curran

Growth Mixture Model

Introduction. A rational function is a quotient of polynomial functions. It can be written in the form

NAME DATE PERIOD. Operations with Polynomials. Review Vocabulary Evaluate each expression. (Lesson 1-1) 3a 2 b 4, given a = 3, b = 2

GUIDED NOTES 4.1 LINEAR FUNCTIONS

Multilevel regression mixture analysis

Generalized Additive Models

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

ECON 594: Lecture #6

Analysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command

Estimation of pre and post treatment Average Treatment Effects (ATEs) with binary time-varying treatment using Stata

Describing Nonlinear Change Over Time

(b)complete the table to show where the function is positive (above the x axis) or negative (below the x axis) for each interval.

polynomial function polynomial function of degree n leading coefficient leading-term test quartic function turning point

crreg: A New command for Generalized Continuation Ratio Models

Final Exam, Machine Learning, Spring 2009

Math 120 Winter Handout 3: Finding a Formula for a Polynomial Using Roots and Multiplicities

Longitudinal and Multilevel Methods for Multinomial Logit David K. Guilkey

Generalized linear models

GUIDED NOTES 5.6 RATIONAL FUNCTIONS

Section September 6, If n = 3, 4, 5,..., the polynomial is called a cubic, quartic, quintic, etc.

Introduction to Within-Person Analysis and RM ANOVA

STA414/2104 Statistical Methods for Machine Learning II

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error

A monomial is measured by its degree To find its degree, we add up the exponents of all the variables of the monomial.

Graphs of Polynomials: Polynomial functions of degree 2 or higher are smooth and continuous. (No sharp corners or breaks).

dqd: A command for treatment effect estimation under alternative assumptions

Precalculus. How to do with no calculator 1a)

Data Preprocessing. Cluster Similarity

Modelling the Covariance

Mixed-effects Polynomial Regression Models chapter 5

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Discrete Choice Modeling

4th Quarter Pacing Guide LESSON PLANNING

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

PreCalculus Notes. MAT 129 Chapter 5: Polynomial and Rational Functions. David J. Gisch. Department of Mathematics Des Moines Area Community College

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Chapter 1- Polynomial Functions

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Warm Up answers. 1. x 2. 3x²y 8xy² 3. 7x² - 3x 4. 1 term 5. 2 terms 6. 3 terms

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

A class of latent marginal models for capture-recapture data with continuous covariates

Describing Within-Person Change over Time

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

7. Estimation and hypothesis testing. Objective. Recommended reading

Robust covariance estimation for quantile regression

Package LCAextend. August 29, 2013

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Multilevel Regression Mixture Analysis

Algebra I Unit Report Summary

Stat 587: Key points and formulae Week 15

Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering

Transcription:

Jan Helmdag University of Greifswald September 26, 2017

1 Preliminaries 2 Why? 3 How? 4 What to do? 5 Application example

Nagin, D. (2005). Group-based modeling of development. Harvard University Press.

Preliminaries Prominent procedures for descriptive and exploratory analysis Description of parameters Graphical inspection of time series graphs Multi-dimensional scaling Principal component analysis Cluster analysis Problem: None of these procedures can truly capture the charateristics of time-series data and is built on inferential processes

Why? Finding subgroups/identifying common trajectories in time-series data Growth curve for comparison indivdual vs sample It makes no sense to assume that everyone is increasing (or decreasing) in depression... many persons will never be high in depression, others will always be high, while others become increasingly [or decreasingly] depressed Raudenbush (2001, p. 59) Applications: Modelling path dependency, identifying turning points (recession, child-birth, EU withdrawal) Guiding question: How to best model population heterogeneity of individual-level trajectories?

How? Finite mixture modeling based on maximum likelihood estimation Estimating probabilities for group membership Choice of parametrical form (normal, logit, poisson, zip) Choice of link (polynomials) 1 x 1 linear (constant, increase, decrease) 2 x 2 square ([inverse]-u-shape, single peak) 3 x 3 cubic (two turning points, steep increase or decrease) 4 x 4 (quartic) and x 5 (quintic) are also possible, but seldomly used

What to do? Six steps for conducting a trajectory analysis 1 Reshape dataset 2 Conduct analysis and find highest BIC 3 Reduce the number of parameters and re-estimate model (lowering the degrees of polynomials) 4 Create graphs and inspect trajectories 5 Conduct a multinomial logit analysis with predicted groups as dependent variable 6 Calculate marginal effects and predicted probabilities

1 st step: Reshape dataset Wide format is needed for trajectory analysis To reshape an NxT dataset into wide format type: reshape wide varlist, i(iso) j(year)

2 nd step: Find highest BIC Every model comes with model fit AIC & BIC Choose model with highest BIC (Kass and Raftery 1995; Raftery 1995; Schwarz 1978) BIC rewards parsimony BIC = log(l) 0.5 k log(n) Advanced: determine Bayes factor with e BIC i BIC j (see Nagin 2005, pp. 68 70)

2 nd step: Find highest BIC Example 1: Loop for finding lowest BIC: local best_bic -10^99 if e(bic_n_data) > best_bic { local best_bic e(bic_n_data) } Example 2: Calculating probability for correct model gen nominator = exp(bicb-bicb[_n]) quietly sum nominator gen denominator = r(sum) gen prob_correct = nominator/denominator

3 rd step: Determine # of polynomials Determine shape of polynomials t-tests and graphical inspection The lower the number of polynomials the better In the majority of analyses consistently low and consistently high groups can be modelled with a constant (and no slope/polynomial)

4 th step: Graphs traj.ado comes with command for creating graphs: trajplot, ci ytitle("yvar") xtitle("time") Currently work in progress: ado for creating trajectory graphs (Helmdag 2017)

5 th step: Multinomial logit Why mlogit? Easily performed Standard errors are properly computed Correct estimation of parameter variance and covariance (see Nagin 2005, pp. 115 6) Example with clustered standard errors on unit-level: mlogit depvar indepvars, vce(cluster iso) rrr

6 th step: mlogit post-estimation Loop over margins and combine them into a single plot forvalues i = 1/4 { margins, dydx(indepvar) atmeans predict(outcome( i )) /// saving(marg i, replace) } *ssc install combomarginsplot combomarginsplot marg1 marg2 marg3 marg4 ///, recast(scatter) yline(0)

Application Simple example with randomly generated dataset (n = 200,t = 20) Continuous dependent variable with four groups 1 consistently low at 0 2 linearly increasing from 1 to 20 3 linearly decreasing from 20 to 1 4 consistently high at 20 Every group has a standard deviation of 2

Application traj, var(y1_*) indep(time*) model(cnorm) /// min(-10) max(30) order(3 3 3 3) detail

Maximum Likelihood Estimates Model: Censored Normal (CNORM) Standard T for H0: Group Parameter Estimate Error Parameter=0 Prob > T 1 Intercept 0.01008 0.30999 0.033 0.9741 Linear -0.00790 0.12473-0.063 0.9495 Quadratic 0.00221 0.01363 0.162 0.8714 Cubic -0.00013 0.00043-0.293 0.7698 2 Intercept -0.19149 0.30999-0.618 0.5368 Linear 1.05909 0.12473 8.491 0.0000 Quadratic -0.00573 0.01363-0.420 0.6743 Cubic 0.00015 0.00043 0.362 0.7173 3 Intercept 19.91765 0.30999 64.254 0.0000 Linear -1.03359 0.12473-8.287 0.0000 Quadratic 0.00447 0.01363 0.328 0.7427 Cubic -0.00011 0.00043-0.255 0.7991 4 Intercept 20.01140 0.30999 64.556 0.0000 Linear -0.02211 0.12473-0.177 0.8593 Quadratic 0.00254 0.01363 0.186 0.8521 Cubic -0.00005 0.00043-0.114 0.9094 Sigma 2.00188 0.02244 89.230 0.0000 1 Group membership (%) 24.99987 3.06914 8.146 0.0000 2 (%) 25.00001 3.06913 8.146 0.0000 3 (%) 25.00008 3.06918 8.146 0.0000 4 (%) 25.00003 3.06913 8.146 0.0000 BIC= -8812.30 (N=4000) BIC= -8782.34 (N=200) AIC= -8749.36 L= -8729.36

Application Reducing the number of parameters traj, var(y1_*) indep(time*) model(cnorm) /// min(-10) max(30) order(0 1 1 0) detail

Maximum Likelihood Estimates Model: Censored Normal (CNORM) Standard T for H0: Group Parameter Estimate Error Parameter=0 Prob > T 1 Intercept -0.03201 0.06341-0.505 0.6137 2 Intercept -0.04348 0.13172-0.330 0.7413 Linear 0.99920 0.01100 90.870 0.0000 3 Intercept 19.77917 0.13172 150.159 0.0000 Linear -0.98210 0.01100-89.315 0.0000 4 Intercept 20.03656 0.06341 316.006 0.0000 Sigma 2.00281 0.02242 89.342 0.0000 1 Group membership (%) 24.99938 3.06524 8.156 0.0000 2 (%) 25.00259 3.06542 8.156 0.0000 3 (%) 24.99922 3.06529 8.156 0.0000 4 (%) 24.99881 3.06528 8.155 0.0000 BIC= -8772.69 (N=4000) BIC= -8757.71 (N=200) AIC= -8741.22 L= -8731.22

Application trajplot, ytitle("yvar") xtitle("time") yvar 0 5 10 15 20 0 5 10 15 20 time 1 25.0% 2 25.0%

Application title 30 20 10 0 10 Group 1 0 5 10 15 20 Size: 25.00% No. of obs.: 1,000 Avg. prob.: 100.00% 30 20 10 0 10 Group 2 0 5 10 15 20 Size: 25.00% No. of obs.: 1,000 Avg. prob.: 100.00% 30 20 10 0 10 Group 3 0 5 10 15 20 Size: 25.00% No. of obs.: 1,000 Avg. prob.: 100.00% 30 20 10 0 10 Group 4 0 5 10 15 20 Size: 25.00% No. of obs.: 1,000 Avg. prob.: 100.00% y1_ Median bands Fitted values time