Lab 4: Two-level Random Intercept Model

Similar documents
BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child.

Chapter 11: Simple Linear Regression and Correlation

Statistics MINITAB - Lab 2

Statistics for Economics & Business

Problem 3.1: Error autotocorrelation and heteroskedasticity Standard variance components model:

β0 + β1xi and want to estimate the unknown

Properties of Least Squares

β0 + β1xi. You are interested in estimating the unknown parameters β

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Lecture 3 Stat102, Spring 2007

Statistics for Business and Economics

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Basic Business Statistics, 10/e

Negative Binomial Regression

Chapter 5 Multilevel Models

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Chapter 13: Multiple Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

x i1 =1 for all i (the constant ).

The Geometry of Logit and Probit

β0 + β1xi. You are interested in estimating the unknown parameters β

This column is a continuation of our previous column

Learning Objectives for Chapter 11

Introduction to Regression

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Regression Analysis. Regression Analysis

Introduction to Analysis of Variance (ANOVA) Part 1

e i is a random error

The Ordinary Least Squares (OLS) Estimator

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

/ n ) are compared. The logic is: if the two

Lecture 4 Hypothesis Testing

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

STATISTICS QUESTIONS. Step by Step Solutions.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Chapter 15 Student Lecture Notes 15-1

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Lecture 6: Introduction to Linear Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Multilevel Logistic Regression for Polytomous Data and Rankings

4.3 Poisson Regression

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

x = , so that calculated

9. Binary Dependent Variables

Advances in Longitudinal Methods in the Social and Behavioral Sciences. Finite Mixtures of Nonlinear Mixed-Effects Models.

Chapter 9: Statistical Inference and the Relationship between Two Variables

Econometrics of Panel Data

Chapter 8 Indicator Variables

U-Pb Geochronology Practical: Background

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

28. SIMPLE LINEAR REGRESSION III

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

ANOVA. The Observations y ij

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Regression with limited dependent variables. Professor Bernard Fingleton

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Chapter 14 Advanced Panel Data Methods

Basic R Programming: Exercises

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Chapter 4: Regression With One Regressor

Laboratory 3: Method of Least Squares

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

STAT 3008 Applied Regression Analysis

Prediction of Random Effects and Effects of Misspecification of Their Distribution

An R implementation of bootstrap procedures for mixed models

Ordinary Least Squares (OLS): Simple Linear Regression (SLR) Assessment: Goodness of Fit & Precision

First Year Examination Department of Statistics, University of Florida

Comparison of Regression Lines

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Singer & Willett, 2003 October 13, 2003

Laboratory 1c: Method of Least Squares

Systems of Equations (SUR, GMM, and 3SLS)

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Basically, if you have a dummy dependent variable you will be estimating a probability.

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Linear Regression Analysis: Terminology and Notation

Logistic Regression Maximum Likelihood Estimation

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Statistical pattern recognition

Unit 10: Simple Linear Regression and Correlation

ECONOMICS 351* -- Stata 10 Tutorial 6. Stata 10 Tutorial 6

Biostatistics 360 F&t Tests and Intervals in Regression 1

Topic 7: Analysis of Variance

Transcription:

BIO 656 Lab4 009 Lab 4: Two-level Random Intercept Model Data: Peak expratory flow rate (pefr) measured twce, usng two dfferent nstruments, for 17 subjects. (from Chapter 1 of Multlevel and Longtudnal Modelng Usng Stata) Goals: 1. Revew how to ft a random ntercept model usng xtreg, xtmxed and gllamm.. Interpret parameters n a random ntercept model. 3. Model measurement error wth random ntercept model. 4. Obtan predctons from multlevel model. PART I Exploratory Data Analyss Data Structure: +----------------------------+ d wp1 wp wm1 wm ---------------------------- 1. 1 494 490 51 55. 395 397 430 415 3. 3 516 51 50 508 4. 4 434 401 48 444 5. 5 476 470 500 500 Varables d: subject d wp1: Wrght peak, occason 1 wp: Wrght peak, occason wm1: Mn Wrght, occason 1 wm: Mn Wrght, occason Dataset s n wde format. Repeated measurements of wp and wm are nested wthn subject. No mssng data Exploratory Analyss (We wll only work wth wm for now): Frst, calculate the overall mean lung functon and store t as a local varable, wm_mean.. generate mean_wm = (wm1+wm)/. summarze mean_wm Varable Obs Mean Std. Dev. Mn Max -------------+-------------------------------------------------------- mean_wm 17 453.9118 111.91 43.5 650. local wm_mean = r(mean) Let s dsplay the values of the repeated Mn Wrght meter measures of lung functon for each subject and the overall mean lung functon.. twoway (scatter wm1 d, msymbol(crcle)) (scatter wm d, symbol(crcle_hollow)), xttle(subject Id) yttle(mn Wrght Measurements) legend(order(1 "Occasson 1" "Occason ")) ylne(`wm_mean') 1

BIO 656 Lab4 009 Mn Wrght Measurements 00 300 400 500 600 700 0 5 10 15 0 Subject Id Occason 1 Occason Measurements taken from the same person were clustered together. It appears that the meann of the two observatons for each ndvdual are normally scattered (lke a normal dstrbuton) around the overall mean. Mght ths suggest a subject-level random ntercept model? (1) For an ndvdual, the two repeated Mn Wrght values (y 1 and y ) are tryng to capture the same true peak expratory flow rate (β ) that s unobservable. () Let s assume what we actually measured s the true value (β ) plus some random (measurement) error (ε ). So y = β + ε (3) Note that ths looks lke our typcal random-ntercept model: y = β + v + ε where β = β + v. By wrtng β ths way, we also allow ths model to accommodate pefr from dfferent people. (4) Now let s nclude the random components of our model:

BIO 656 Lab4 009 A measurement error dstrbuton that s dentcal for each ndvdual: ~ Normal 0, σ ε ( ) A dstrbuton descrbng the varaton n the true pefr n the populaton: v ~ Normal( 0, τ ) (5) Our fnal model: y ( 0, σ ), v ~ Normal( 0, ) = β + v + ε, ε ~ Normal τ Note that here β can be nterpreted as the average true pefr n the populaton (smlar to the red lne n the above graph). How would you descrbe the other model parameters presence n the scatter plot above? Reshape Data We need to reshape the data to a long format for the data analyss.. reshape long wm wp, (d) j(occason) note: j = 1 ) Data wde -> long ----------------------------------------------------------------------------- Number of obs. 17 -> 34 Number of varables 5 -> 4 j varable ( values) -> occason x varables: wm1 wm -> wm wp1 wp -> wp ----------------------------------------------------------------------------- +---------------------+ d occas~n wm --------------------- 1. 1 1 51 ( = 1, j = 1). 1 55 ( = 1, j = ) 3. 1 430 ( =, j = 1) 4. 415 ( =, j = 1) 5. 3 1 50 More Exploratory Analyss: Let s check some of the dstrbutonal assumptons (note that we only have 17 people). (1) Check ~ Normal( 0, τ ) v : sort(d) by d, egen mean_wm mean(wm) hst mean_wm, norm 3

BIO 656 Lab4 009 () Check ~ Normal( 0, σ ) ε gen wm_resd = wm-mean_wm hst wm_resd, norm Densty 0.001.00.003.004 00 300 400 500 600 700 mean_wm Densty 0.01.0.03.04-50 0 50 wm_resd PART II Fttng the Model and Interpretaton Fttng the random ntercept model wth xtreg. xtreg wm, (d) mle Iteraton 0: log lkelhood = -187.89003 Iteraton 1: log lkelhood = -184.95979 Iteraton : log lkelhood = -184.76189 Iteraton 3: log lkelhood = -184.5855 Iteraton 4: log lkelhood = -184.5784 Iteraton 5: log lkelhood = -184.57839 Random-effects ML regresson Number of obs = 34 Group varable (): d Number of groups = 17 Random effects u_ ~ Gaussan Obs per group: mn = avg =.0 max = Wald ch(0) = 0.00 Log lkelhood = -184.57839 Prob > ch =. wm Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons 453.9118 6.18616 17.33 0.000 40.5878 505.357 -------------+---------------------------------------------------------------- /sgma_u 107.0464 18.67858 76.0406 150.6949 /sgma_e 19.91083 3.414659 14.69 7.8656 rho.966560.0159494.910943.9878545 Lkelhood-rato test of sgma_u=0: chbar(01)= 46.7 Prob>=chbar = 0.000 Does the estmate of β (_const) = 453.9118 look famlar? 4

BIO 656 Lab4 009 In the output above, ρ (rho) can be nterpreted as ether the proporton of the total varance that s between subjects (or due to subjects) varance.between Var( v ) τ ρ = = = total.varance Var(y ) τ + σ the correlaton between the measurements on dfferent occasons for the same subject (ntra-class correlaton) Cov(y, y' ) τ τ ρ = Corr(y, y' )= = = Var(y ) Var(y ) τ + σ τ + σ τ + σ It can be a lttle confusng because, the covarance between measurements on dfferent occasons for the same subject s σ. Interpretatons Notce that ρ =.966 s very hgh! The repeated observatons wthn ndvduals are hghly correlated and the proporton of the total varance that s between subjects s very large. /sgma_u s 107.05, the estmate of the standard devaton of the random ntercepts. Hence we expect about 95% of the random ntercepts to fall wthn 00 (= approxmately 107.05*) unts on ether drecton of the estmated overall mean, 453.91, or n other words, between 50 and 650. The estmated wthn-subject standard devaton s /sgma_e = 19.9. Hence we expect 95% of the repeated observatons on an ndvdual to fall wthn 40 (= approxmately 19.9*) unts from the subject-specfc mean. The results from xtreg, mle are equvalent to those from xtmxed, mle. The dfference between xtreg and xtmxed s that xtreg s desgned more for cross-sectonal tme-seres lnear regresson and can only be used to ft a random ntercept. On the other hand, xtmxed s desgned for mult-level mxed effects lnear regresson and can be used to ft random coeffcents and dfferent levels of mxed effects. Fttng the random ntercept model wth xtmxed. xtmxed wm d:, mle Performng EM optmzaton: Performng gradent-based optmzaton: Iteraton 0: log lkelhood = -184.57839 Iteraton 1: log lkelhood = -184.57839 Computng standard errors: ' 5

BIO 656 Lab4 009 Mxed-effects ML regresson Number of obs = 34 Group varable: d Number of groups = 17 Obs per group: mn = avg =.0 max = Wald ch(0) =. Log lkelhood = -184.57839 Prob > ch =. wm Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons 453.9118 6.18616 17.33 0.000 40.5878 505.357 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ d: Identty sd(_cons) 107.0464 18.67857 76.0406 150.6949 -----------------------------+------------------------------------------------ sd(resdual) 19.91083 3.414679 14.688 7.86565 LR test vs. lnear regresson: chbar(01) = 46.7 Prob >= chbar = 0.0000 Fttng the random ntercept model wth gllamm. gllamm wm, (d) np(1) adapt Runnng adaptve quadrature Iteraton 0: log lkelhood = -07.70 Iteraton 1: log lkelhood = -05.79654 Iteraton : log lkelhood = -185.7467 Iteraton 3: log lkelhood = -184.63453 Iteraton 4: log lkelhood = -184.57846 Iteraton 5: log lkelhood = -184.5784 Adaptve quadrature has converged, runnng Newton-Raphson Iteraton 0: log lkelhood = -184.5784 Iteraton 1: log lkelhood = -184.57839 number of level 1 unts = 34 number of level unts = 17 Condton Number = 15.64774 gllamm model log lkelhood = -184.57839 wm Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons 453.9116 6.18394 17.34 0.000 40.59 505.31 Varance at level 1 396.70879 (136.11609) Varances and covarances of random effects ***level (d) var(1): 11456.88 (3997.7689) 6

BIO 656 Lab4 009 Note that gllamm returns varances and not standard devatons. PART III Predcton Goal 1: So what s our best estmate of each subject s true peak expratory flow rate Recall that when constructng our model: y = β + v + ε, ε ~ Normal 0, σ, v ~ Normal 0, τ ( ) ( ). So we d lke to obtan the estmated value of β + v for each ndvdual. β s gven n the output so we need to extract the v s. Estmatng the random ntercepts usng emprcal Bayes and gllamm. gllapred eb, u (means and standard devatons wll be stored n ebm1 ebs1) Non-adaptve log-lkelhood: -0.5846-45.1480-5.1857-11.35-199.5193-190.8173-186.50-184.7457-184.5784-184.5784 log-lkelhood:-184.57839 Emprcal Bayes estmate of the subject-specfc mean,.e. β + v. gllapred eb, lnpred (lnear predctor wll be stored n eb) Non-adaptve log-lkelhood: -0.5846-45.1480-5.1857-11.35-199.5193-190.8173-186.50-184.7457-184.5784-184.5784 log-lkelhood:-184.57839. reshape wde wm wp eb ebm1 ebs1, (d) j(occason) (note: j = 1 ) Data long -> wde ----------------------------------------------------------------------------- Number of obs. 34 -> 17 Number of varables 8 -> 1 j varable ( values) occason -> (dropped) x varables: wm -> wm1 wm wp -> wp1 wp eb -> eb1 eb ebm1 -> ebm11 ebm1 ebs1 -> ebs11 ebs1 ----------------------------------------------------------------------------- Let s plot the estmated peak expratory flow rate:. twoway (scatter wm1 d, msymbol(crcle)) (scatter wm d, msymbol(crcle_hollow)) (scatter eb1 d, msymbol(x)), xttle(subject Id) yttle(mn Wrght Measurements) legend(order(1 "Occasson 1" "Occason " 3 "EB Subject-Spec Intercept")) ylne(`wm_mean') 7

BIO 656 Lab4 009 Mn Wrght Measurements 00 300 400 500 600 700 0 5 10 15 0 Subject Id Occason 1 Occason EB Subject-Spec Intercept Note that the estmated peak expratory flow rate (x) do not always fall n between the measurements at occason 1 and occason!!! Why? (Hnt: look at subject 6 and 13). Let s check our model assumptons agan wth the estmated ntercepts and resduals:. hst eb, norm. gen eb_resd = wm-eb. hst eb_resd, norm Densty 0.001.00.003.004.005 00 300 400 500 600 700 eb Densty 0.01.0.03.04-50 0 50 eb_resd 8

BIO 656 Lab4 009 Goal : Based on our model, can we make predcton about future observaton of a new measurement taken from an exstng subject or a new measurement from a new subject? Extra The random effect model above s motvated by measurement error. It s smlar to the usual LDA settng where we can vew the data as: wm 00 300 400 500 600 700 1 1. 1.4 1.6 1.8 occason To ncorporate both wp and wm measurements n a model we can use a threelevel random effect model: Subject (level 3) Method (level ) Repeated measurements (level 1) See textbook. 9