Monday 7 th Febraury 2005

Similar documents
****Lab 4, Feb 4: EDA and OLS and WLS

Lecture 3 Linear random intercept models

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

LDA Midterm Due: 02/21/2005

LONGITUDINAL DATA ANALYSIS Homework I, 2005 SOLUTION. A = ( 2) = 36; B = ( 4) = 94. Therefore A B = 36 ( 94) = 3384.

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Exercices for Applied Econometrics A

Lecture 3.1 Basic Logistic LDA

Problem Set 10: Panel Data

multilevel modeling: concepts, applications and interpretations

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

sociology 362 regression

Fixed and Random Effects Models: Vartanian, SW 683

Jeffrey M. Wooldridge Michigan State University

General Linear Model (Chapter 4)

Quantitative Methods Final Exam (2017/1)

Autoregressive and Moving-Average Models

1 The basics of panel data

sociology 362 regression

ECON 497 Final Exam Page 1 of 12

Lecture#12. Instrumental variables regression Causal parameters III

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Microeconometrics (PhD) Problem set 2: Dynamic Panel Data Solutions

Section Least Squares Regression

Empirical Application of Panel Data Regression

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

STATA Commands for Unobserved Effects Panel Data

Problem Set 5 ANSWERS

Binary Dependent Variables

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Practice 2SLS with Artificial Data Part 1

Confidence intervals for the variance component of random-effects linear models

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Statistical Modelling in Stata 5: Linear Models

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

1 A Review of Correlation and Regression

Panel Data: Very Brief Overview Richard Williams, University of Notre Dame, Last revised April 6, 2015

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

1

Essential of Simple regression

Econometrics. 9) Heteroscedasticity and autocorrelation

10) Time series econometrics

Homework Solutions Applied Logistic Regression

point estimates, standard errors, testing, and inference for nonlinear combinations

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Lecture 7: OLS with qualitative information

Topic 20: Single Factor Analysis of Variance

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Vector Autogregression and Impulse Response Functions

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

2. We care about proportion for categorical variable, but average for numerical one.

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Empirical Application of Simple Regression (Chapter 2)

Week 3: Simple Linear Regression

Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Acknowledgment References Also see

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Lecture 4: Generalized Linear Mixed Models

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Module 6 Case Studies in Longitudinal Data Analysis

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Understanding the multinomial-poisson transformation

4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore,

7 Introduction to Time Series

Problem Set 1 ANSWERS

Econometrics Homework 4 Solutions

Question 1 [17 points]: (ch 11)

Lecture 11: Simple Linear Regression

Lab 6 - Simple Regression

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

ECON3150/4150 Spring 2016

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

Lab 10 - Binary Variables

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Postgraduate course: Anova and Repeated measurements Day 2 (part 2) Niels Trolle Andersen, Dept. of Biostatistics, Aarhus University, June 2009

Graduate Econometrics Lecture 4: Heteroskedasticity

Suggested Answers Problem set 4 ECON 60303

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Lecture 4: Multivariate Regression, Part 2

Recent Developments in Multilevel Modeling

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Handout 11: Measurement Error

Problem set - Selection and Diff-in-Diff

Interpreting coefficients for transformed variables

Regression Models - Introduction

Correlation and Simple Linear Regression

options description set confidence level; default is level(95) maximum number of iterations post estimation results

Handout 12. Endogeneity & Simultaneous Equation Models

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

4/9/2014. Outline for Stochastic Frontier Analysis. Stochastic Frontier Production Function. Stochastic Frontier Production Function

Transcription:

Monday 7 th Febraury 2 Analysis of Pigs data Data: Body weights of 48 pigs at 9 successive follow-up visits. This is an equally spaced data. It is always a good habit to reshape the data, so we can easily switch form wide to long or long to wide depending on the required analysis. The data is in the wide format; let s reshape it into long format.. set memory 4m (496k). set matsize 1. reshape long week, i(id) j(time) (note: j = 1 2 3 4 6 7 8 9) Data wide -> long ----------------------------------------------------------------- Number of obs. 48 -> 432 Number of variables 1 -> 3 j variable (9 values) -> time xij variables: week1 week2... week9 -> week -----------------------------------------------------------------. tsset Id time panel variable: Id, 1 to 48 time variable: time, 1 to 9. xtdes Id: 1, 2,..., 48 n = 48 time: 1, 2,..., 9 T = 9 Delta(time) = 1; (9-1)+1 = 9 (Id*time uniquely identifies each observation) Distribution of T_i: min % 2% % 7% 9% max 9 9 9 9 9 9 9 Freq. Percent Cum. Pattern ---------------------------+----------- 48 1. 1. 111111111 ---------------------------+----------- 48 1. XXXXXXXXX

Exploratory Data Analysis. sort time. graph week, by(time) box. ** STATA 8 command : graph box week, by(time) ** 88 week 2 1 2 3 4 6 7 8 9. ** mean trend plot **. xtgraph week, ti("mean trend vs time") bar(se) 76.1332 week 24.664 1 9 time Mean trend vs time We can see the mean trend is quite linear across time. Now we make scatter plots of the data, pair wise scatter plots for data at each two time points. For this we requite the data in wide format.. reshape wide week, i(id) j(time)

(note: j = 1 2 3 4 6 7 8 9) Data long -> wide ----------------------------------------------------------------- Number of obs. 432 -> 48 Number of variables 3 -> 1 j variable (9 values) time -> (dropped) xij variables: week -> week1 week2... week9 -----------------------------------------------------------------. graph week1 week2 week3 week4 week week6 week7 week8 week9, matrix half 1 week 39 2 week 26. 3 week 4 4 week 37 week 67. 6 week 48 7 week 81. 8 week 9. 9 week 2 31 32. 48 38. 6 2. 76 64 88 What do you conclude from the above scatter plot matrix? A relatively constant positive linear trend for all pair wise scatter plots indicates that heavier pigs tend to remain heavier across all time points. To see if the pigs gained weight over time lets plot the line (spaghetti) plot. For this we need the data in the long form.. sort Id time. graph week time, c(l) s(i). ** STATA 8 command : twoway line week time, c(l) s(i) **

88 2 1 9 time What do you conclude from the graph? A linear relationship between outcome and time is showed. Also, not much cross-over of these lines indicates that the relative order of pigs, ordered by their weights, remain unchanged over time, which confirms the conclusion we drew from the scatter matrix plot. The above figure is enough to explore the growth data. It is hard to pick out individual response profiles. We can add a second display, obtained form first standardizing each observation. This is achieved by, subtracting the mean, and dividing by the standard deviation of the 48 observations at each time (week). For this we would need the data in wide format.. sort time. by time: sum week -> time = 1 Variable Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------- --more week 48 2.283 2.468866 2 31. reshape wide week, i(id) j(time). gen sweek1 = (week1-2.2)/2.47. gen sweek2 = (week2-31.78)/2.79. gen sweek3 = (week3-38.86)/3.4. gen sweek4 = (week4-44.39)/3.73. gen sweek = (week -.16)/4.3

. gen sweek6 = (week6-6.4)/4.4. gen sweek7 = (week7-62.46)/4.97. gen sweek8 = (week8-69.3)/.42. gen sweek9 = (week9-7.22)/6.34. reshape long week sweek, i(id) j(time). sort Id time. graph sweek time, c(l) s(i). ** STATA 8 command : twoway line sweek time, c(l) s(i) ** 2.7243-2.739 1 9 time The plot is able to highlight the degree of tracking, animals tend to maintain their relative size over time. Exploring the correlation structure Auto-correlation function. For this we require the data in long format.. autocor week time Id Table 1 time1 time2 time3 time4 time time6 time7 time8 time9 ----------+---------------------------------------------------------------------- time1 1. time2.916 1. time3.81.9118 1. time4.798.984.982 1. time.7494.889.928.9621 1. time6.71.833.98.9327.9219 1. time7.61.779.843.8681.846.9633 1. time8.62.7133.8167.8293.814.928.986 1. time9.81.6638.7689.786.786.8893.917.969 1. ---

1-1 - -1 1 - -1 1-1 time1 - time2 time3-1 time4 time -1 1 time6 time7-1 1 2 time8 time9-1 1 2 Table 2 acf 1..942781 2..88716 3..8462396 4..796276..772416 6..7121489 7..6479 8..812 Autocorrelation Scatterplot ACF 1 acf. 2 4 6 8 tau What is the correlation structure taking away the covariate s effect?. regress week time. predict weekrs, resid. autocor weekrs time Id

1-1 - -1 1 - -1 1-1 time1 - time2 time3-1 time4 time -1 1 time6 time7-1 1 2 time8 time9-1 1 2 Autocorrelation Scatterplot ACF 1 acf. 2 4 6 8 tau Notice from the main diagonal of the scatter plot matrix there is positive correlation between repeated observations on the same animal that are 1 week apart. The degree of correlation decreases as the observations are moved farther from the diagonals. Also the correlation is reasonably consistent along the diagonal in the matrix. This indicates that the correlation depends on the time between observations than their absolute times. The estimated correlation matrix for this data is given in table 1. The correlations show some tendency of decrease with increasing time lag. Assuming stationarity, a single correlation estimate can be obtained for each distinct value of the time separation or lag, tij tik. This corresponds to pulling observation pairs along the diagonals of the scatter plot matrix. The autocorrelation function takes the value as in table 2.. variogram weekrs Variogram of weekrs (3 percent of v_ijk's excluded) 3 2 v_ijk 1 2 4 6 8 u_ijk

We can conclude 4 things from this graph: 1. The variogram starts at, that means, there is no measurement error. 2. It doesn t reach the total at the end, that means, we need to model a random intercept for the correlation. 3. We observe an ascending line, that means, there is series correlation in the data, the correlation between measurements from the same pig gets smaller when the time interval becomes larger. 4. No need to model random slope, since we don t observe multiple lines. Before we proceed with the analysis, lets look at some theory. yij j = 1, 2,, n be the sequence of observed measurements on the ith of the m subjects and tj, j = 1, 2,, n be the corresponding times at which the measurements are taken on each unit. Associated with each yij are the values, xijk, k = 1, 2,, p of p explanatory variables. We assume that yij are realizations of random variables Yij which follow the regression model Y = β x + L + β x + ε ij 1 ij1 p ijp ij, In the classical linear model we assume the errors to be mutually independent normal random variables. In our context, the longitudinal structure of the data means that we expect the errors to be correlated within subjects. Let yi = (yi1, yi2,, yin) be the observed sequence of measurements on the ith subject and y = (y1, y2,, ym) be the complete set of N = nm observations. Let X be the matrix of explanatory variables. 2 Y MVN( X, V βσ ) The Uniform Correlation Model In this model we assume that there is positive correlation between any two measurements. The Exponential Correlation Model The correlation between a pair of measurements on the same unit decays towards zero as the time separation between measurements increases. The exponential correlation model is sometimes called the first order autoregressive model.

For the data on pigs we fit a couple of models. We require the data in long format. 1. Ordinary least squares ignoring correlation. regress week time Source SS df MS Number of obs = 432 -------------+------------------------------ F( 1, 43) = 77.41 Model 1116.882 1 1116.882 Prob > F =. Residual 8294.72677 43 19.29622 R-squared =.93 -------------+------------------------------ Adj R-squared =.933 Total 1193.69 431 276.927167 Root MSE = 4.392 week Coef. Std. Err. t P> t [9% Conf. Interval] time 6.29896.81849 7.88. 6.4938 6.3774 _cons 19.361.46447 42.3. 18.441 2.2681 2. Independent correlation model. xtgee week time, i(id) corr(ind) Iteration 1: tolerance = 2.217e-1 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9. Correlation: independent max = 9 Wald chi2(1) = 784.19 Scale parameter: 19.276 Prob > chi2 =. Pearson chi2(432): 8294.73 Deviance = 8294.73 Dispersion (Pearson): 19.276 Dispersion = 19.276 time 6.29896.81613 76.. 6.49862 6.369929 _cons 19.361.494773 42.13. 18.4 2.2617. xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c c6 c7 c8 c9 r1 1. r2. 1. r3.. 1. r4... 1.

r.... 1. r6..... 1. r7...... 1. r8....... 1. r9........ 1.. xtgls week time, i(id) corr(ind) Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: homoscedastic Correlation: no autocorrelation Estimated covariances = 1 Number of obs = 432 Estimated autocorrelations = Number of groups = 48 Estimated coefficients = 2 No. of time periods= 9 Wald chi2(1) = 784.19 Log likelihood = -121.21 Prob > chi2 =. time 6.29896.81613 76.. 6.49862 6.369929 _cons 19.361.494773 42.13. 18.4 2.2617 3. Uniform correlation model. xtgee week time, i(id) corr(exc) Iteration 1: tolerance =.934e-1 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9. Correlation: exchangeable max = 9 Wald chi2(1) = 2337.48 Scale parameter: 19.276 Prob > chi2 =. time 6.29896.39124 19.18. 6.133433 6.28639 _cons 19.361.974 32.4. 18.18472 2.261

. xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c c6 c7 c8 c9 r1 1. r2.7717 1. r3.7717.7717 1. r4.7717.7717.7717 1. r.7717.7717.7717.7717 1. r6.7717.7717.7717.7717.7717 1. r7.7717.7717.7717.7717.7717.7717 1. r8.7717.7717.7717.7717.7717.7717.7717 1. r9.7717.7717.7717.7717.7717.7717.7717.7717 1.. xtreg week time, i(id) pa (equivalent to the previous one) Iteration 1: tolerance = 6.283e-1 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9. Correlation: exchangeable max = 9 Wald chi2(1) = 2337.48 Scale parameter: 19.276 Prob > chi2 =. time 6.29896.39124 19.18. 6.133433 6.28639 _cons 19.361.974 32.4. 18.18472 2.261. Random effect model with random intercept. xtreg week time, re i(id) Random-effects GLS regression Number of obs = 432 Group variable (i) : Id Number of groups = 48 R-sq: within =.981 Obs per group: min = 9 between =. avg = 9. overall =.93 max = 9 Random effects u_i ~ Gaussian Wald chi2(1) = 2271. corr(u_i, X) = (assumed) Prob > chi2 =. time 6.29896.39633 18.97. 6.133333 6.28648 _cons 19.361.63139 32.9. 18.17348 2.3774

sigma_u 3.891228 sigma_e 2.96361 rho.7723 (fraction of variance due to u_i). xtreg week time, i(id) mle Fitting constant-only model: Iteration : log likelihood = -13399.842 Iteration 1: log likelihood = -84.14 Iteration 2: log likelihood = -8.329 Iteration 3: log likelihood = -349.929 Iteration 4: log likelihood = -293.189 Iteration : log likelihood = -2148.9623 Iteration 6: log likelihood = -1938.3661 Iteration 7: log likelihood = -183.2441 Iteration 8: log likelihood = -183.3843 Iteration 9: log likelihood = -1827.312 Iteration 1: log likelihood = -1827.212 Iteration 11: log likelihood = -1827.2118 Fitting full model: Iteration : log likelihood = -114.977 Iteration 1: log likelihood = -114.9268 Iteration 2: log likelihood = -114.9268 Random-effects ML regression Number of obs = 432 Group variable (i) : Id Number of groups = 48 Random effects u_i ~ Gaussian Obs per group: min = 9 avg = 9. max = 9 LR chi2(1) = 1624.7 Log likelihood = -114.9268 Prob > chi2 =. time 6.29896.39124 19.18. 6.133433 6.28639 _cons 19.361.974 32.4. 18.18472 2.261 /sigma_u 3.8493.48114 9.49. 3.3974 4.64472 /sigma_e 2.9362.7471 27.71. 1.94 2.241694 rho.771714.39399.687633.8413114

Likelihood ratio test of sigma_u=: chibar2(1)= 472.6 Prob>=chibar2 =. 6. Exponential correlation model. xtgls week time, igls corr(ar1) i(id) force Iteration 1: tolerance = Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: homoscedastic Correlation: common AR(1) coefficient for all panels (.9161) Estimated covariances = 1 Number of obs = 432 Estimated autocorrelations = 1 Number of groups = 48 Estimated coefficients = 2 No. of time periods= 9 Wald chi2(1) = 7867.89 Log likelihood = -86.921 Prob > chi2 =. time 6.2727.771 88.7. 6.133467 6.41646 _cons 18.84279.61148 31.4. 17.6668 2.1899. xtgee week time, i(id) corr(ar1) t(time) Iteration 1: tolerance =.213276 Iteration 2: tolerance =.9237 Iteration 3: tolerance = 4.366e-7 GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9. Correlation: AR(1) max = 9 Wald chi2(1) = 624.91 Scale parameter: 19.2674 Prob > chi2 =. time 6.27289.7932 79.9. 6.11664 6.42724 _cons 18.84218.67471 27.93. 17.24 2.16431