BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Similar documents
Lab 4: Two-level Random Intercept Model

Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child.

Lecture 6: Introduction to Linear Regression

The Geometry of Logit and Probit

Chapter 5 Multilevel Models

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Statistics for Economics & Business

January Examinations 2015

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

STATISTICS QUESTIONS. Step by Step Solutions.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

/ n ) are compared. The logic is: if the two

Econometrics of Panel Data

Statistics for Business and Economics

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Lecture 3 Stat102, Spring 2007

x i1 =1 for all i (the constant ).

Basic Business Statistics, 10/e

β0 + β1xi. You are interested in estimating the unknown parameters β

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Statistics MINITAB - Lab 2

Chapter 9: Statistical Inference and the Relationship between Two Variables

Comparison of Regression Lines

Lecture 3: Probability Distributions

β0 + β1xi. You are interested in estimating the unknown parameters β

Introduction to Regression

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

SIMPLE LINEAR REGRESSION

Chapter 11: Simple Linear Regression and Correlation

Negative Binomial Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture Notes on Linear Regression

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

28. SIMPLE LINEAR REGRESSION III

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Statistics Chapter 4

Chapter 13: Multiple Regression

Basically, if you have a dummy dependent variable you will be estimating a probability.

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Polynomial Regression Models

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Heterogeneous Treatment Effect Analysis

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

18. SIMPLE LINEAR REGRESSION III

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Multilevel Logistic Regression for Polytomous Data and Rankings

e i is a random error

Diagnostics in Poisson Regression. Models - Residual Analysis

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Limited Dependent Variables

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

NUMERICAL DIFFERENTIATION

Chapter 8 Indicator Variables

First Year Examination Department of Statistics, University of Florida

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

Chapter Newton s Method

Dummy variables in multiple variable regression model

STAT 511 FINAL EXAM NAME Spring 2001

Chapter 15 - Multiple Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Financing Innovation: Evidence from R&D Grants

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Learning Objectives for Chapter 11

Linear Approximation with Regularization and Moving Least Squares

β0 + β1xi and want to estimate the unknown

17 Nested and Higher Order Designs

Problem 3.1: Error autotocorrelation and heteroskedasticity Standard variance components model:

LECTURE 9 CANONICAL CORRELATION ANALYSIS

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Lecture 10 Support Vector Machines II

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Chapter 3 Describing Data Using Numerical Measures

Scatter Plot x

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Statistics II Final Exam 26/6/18

4.3 Poisson Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Regression Analysis. Regression Analysis

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Economics 130. Lecture 4 Simple Linear Regression Continued

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

17 - LINEAR REGRESSION II

Transcription:

Lab : TWO-LEVEL NORMAL MODELS wth school chldren popularty data Purpose: Introduce basc two-level models for normally dstrbuted responses usng STATA. In partcular, we dscuss Random ntercept models wthout covarates Random ntercept models wth covarates Random ntercept models wth covarates and random slopes Cross-level nteractons We wll use the followng 5 varables n the popular.dta dataset: pupl: pupl dentfcaton number school: school dentfcaton number popular: the outcome varable popularty (Y), measured by a self-ratng scale that range from 0 (very unpopular) to 10 (very popular). sex: the pupl sex, 0 boy 1 grl texp: teacher experence n years The data are from 000 pupls from 100 classes (each at a dfferent school), the average number of students per school (class) s 0. Therefore, we have pupls nested wthn schools, and we need to account for the possble correlaton between pupls n the same school n our model. Intal EDA: Our outcome varable of nterest s self-reported popularty score:. hst popular, freq (bn=33, start=, wdth=.1111) Frequency 0 00 400 600 4 6 8 10 popularty accordng to socometrc score 1

Q1. What s the average self-ratng score? Two-stage model of the popularty score of pupl (j) n school () MODEL 1: The ntercept-only (varance components) model: popular j U β + U + ε = 0 ε ~ N(0, σ ) j U ~ N(0, τ ) j Interpretatons of model parameters: β 0 + U : average score for pupls n school β : average score for pupls n a typcal school ( U =0) 0 U : school-level random ntercept (random effect). Represents the dfference between the average popularty score for a specfc school() and the average popularty score of a typcal (U =0) school. τ : varance of the random ntercept. Represents the varablty (or amount of dsperson) n specfc schools average scores around the average score of the typcal school. ε j : dfference between the popularty score for chld(j) and the average popularty score n school() σ : varance of the error. Represents the varablty of ndvdual chldren s scores n school() around the average score for school().

(At least) three ways to ft ths multlevel model for a contnuous outcome usng STATA, n order of generalty: 1. xtreg, re mle - estmates model parameters by fndng the maxmum of a closed form soluton for the lkelhood (fast) - can only do two level models - can only do a random ntercept, not random slopes - only for contnuous outcomes - usng the pa opton for xtreg s equvalent to usng xtgee, so wth xtreg you can easly swtch from dong a cluster-specfc random ntercept model to dong a margnal GEE model - automatcally gves you the ICC (rho) n the output - by default, returns the sd of the random ntercept. xtmxed, mle - estmates model parameters by fndng the maxmum of a closed form soluton for the lkelhood (fast) - very close to beng equvalent to xtreg, re mle - as many levels as you want (wthn reason!!) - can do random ntercepts and slopes - only for contnuous outcomes - by default returns the sd of the random effects 3. gllamm - estmates model parameters usng quadrature methods to approxmate the lkelhood functon, and then fnds the maxmum of the approxmated lkelhood (slow!) - results won t be exactly the same as xtmxed, but they should be close. You can ncrease np() and use the adapt opton to get better estmates n gllamm (even slower!) - can do many levels - can do random ntercepts and slopes - for many types of outcomes (contnuous, bnary, posson, etc ) - by default returns the varance of the random effects For contnuous outcomes, xtmxed s preferred because t uses a closed form soluton for the lkelhood and hence produces accurate results relatvely quckly. 3

Syntax for fttng the ntercept-only model xtreg. xtreg popular, re (school) mle Random-effects ML regresson Number of obs = 000 Group varable: school Number of groups = 100 Random effects u_ ~ Gaussan Obs per group: mn = 16 avg = 0.0 max = 6 Wald ch(0) = 0.00 Log lkelhood = -556.361 Prob > ch =. _cons 5.307603.0950194 55.86 0.000 5.11369 5.493838 /sgma_u.933105.0684368.8081665 1.077359 /sgma_e.799176.019644.774165.849907 rho.5768565.0367346.5039739.6471936 Lkelhood-rato test of sgma_u=0: chbar(01)= 1376.81 Prob>=chbar = 0.000 xtmxed. xtmxed popular school:, mle Mxed-effects ML regresson Number of obs = 000 Group varable: school Number of groups = 100 Obs per group: mn = 16 avg = 0.0 max = 6 Wald ch(0) =. Log lkelhood = -556.361 Prob > ch =. _cons 5.307603.095031 55.86 0.000 5.11361 5.493845 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Identty sd(_cons).9331053.0684433.8081556 1.077374 sd(resdual).799176.019645.774164.849907 LR test vs. lnear regresson: chbar(01) = 1376.81 Prob >= chbar = 0.0000 4

gllamm. gllamm popular, (school) adapt Runnng adaptve quadrature Iteraton 0: log lkelhood = -860.4937 Iteraton 1: log lkelhood = -77.8365 Iteraton : log lkelhood = -556.54 Iteraton 3: log lkelhood = -556.3615 Iteraton 4: log lkelhood = -556.3613 Adaptve quadrature has converged, runnng Newton-Raphson Iteraton 0: log lkelhood = -556.3613 Iteraton 1: log lkelhood = -556.3613 (backed up) Iteraton : log lkelhood = -556.361 number of level 1 unts = 000 number of level unts = 100 Condton Number = 5.8577667 gllamm model log lkelhood = -556.361 _cons 5.307603.09503 55.86 0.000 5.11361 5.493845 Varance at level 1.63867684 (.007167) Varances and covarances of random effects ***level (school) var(1):.87068733 (.177979) xtmxed can gve you the varance of the error and the random ntercept, not the sd. xtmxed popular school:, mle var _cons 5.307603.095031 55.86 0.000 5.11361 5.493845 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Identty var(_cons).8706855.17797.6531154 1.160734 var(resdual).6386768.00717.599375.6806097 LR test vs. lnear regresson: chbar(01) = 1376.81 Prob >= chbar = 0.0000 5

Q. Do gender and teachng experence affect the self-ratng score? Gender s a level-l covarate (t vares for each chld (j) nested n school ()). Teachng experence s a level- covarate snce t s fxed for all students n a gven class (school). Exploratory analyss of the relatonshp between texp and popularty gnorng clusterng. lowess popular texp, bw(0.5) jtter(3) Lowess smoother popularty accordng to socometrc score 4 6 8 10 0 5 10 15 0 5 teacher experence n years bandwdth =.5 But what we ll really be modelng s how the average popularty scores of chldren n a gven school vary accordng to the covarate values of student gender and teacher s experence snce we have the model: MODEL : popular j = β β β texp + ε U,grlj, texp ( 0 + U ) + 1grlj + j Or, n other words, E ( popularj U, grlj, texp ) = ( β 0 + U ) + β1grlj + β texp A more nformatve plot mght look at the relatonshp between the mean popularty score at each school and teacher experence.. sort school. by school: egen mscore_sch=mean(popular). lowess mscore_sch texp, yttle("school Average Popularty score") xttle("teacher Experence") bw(0>.5) 6

Lowess smoother School Average Popularty score 3 4 5 6 7 8 0 5 10 15 0 5 Teacher Experence bandwdth =.5 We fnd a postve assocaton between the teachers experence and the average popularty score for each school. Next we explore the relatonshp between the mean popularty score at each school for grls versus boys.. sort school grl. by school grl: egen mscore_sch_grl = mean(popular). sort grl. graph box mscore_sch_grl, over(grl) yttle("school Average Score") School Average Score 3 4 5 6 7 8 boy grl We fnd the average reported scores for grls are hgher than for boys. 7

Do we see evdence of an nteracton?. lowess mscore_sch_grl texp, by(grl) boy Lowess smoother grl mscore_sch_grl 4 6 8 0 10 0 30 0 10 0 30 teacher experence n years bandwdth =.8 The slopes are relatvely smlar, but the two curves defntely start at dfferent ponts, so for ths example we ll contnue by just ncludng a lnear term for teacher s experence and an ndcator varable for pupl gender, but not an nteracton term (an nteracton term would allow the slope on the relaton between average popularty score and texp to vary accordng gender). Random ntercept model wth fxed effects for texp and grl. xtmxed popular texp grl school:, mle Mxed-effects ML regresson Number of obs = 000 Group varable: school Number of groups = 100 Obs per group: mn = 16 avg = 0.0 max = 6 Wald ch() = 80.79 Log lkelhood = -14.871 Prob > ch = 0.0000 texp.0934455.0107449 8.70 0.000.073858.114505 grl.844715.0309418 7.30 0.000.7840676.9053574 _cons 3.560661.169765 0.97 0.000 3.797 3.893394 8

Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Identty sd(_cons).6897604.0511783.5964053.797784 sd(resdual).6779987.0109988.6567804.699904 LR test vs. lnear regresson: chbar(01) = 1107.01 Prob >= chbar = 0.0000 Interpretaton of the coeffcent on texp : For a gven school, we estmate that ncreasng the teacher s experence by one year would result n average popularty scores that are.09 ponts hgher for both male and female students. Interpretaton of the coeffcent on grl : Wthn a school, we estmate that the average popularty scores are.845 ponts hgher for grls than for boys, controllng for teacher s experence. Q3. Does the dfference n average self-ratng score between female and male students vary across schools? To address ths queston usng EDA, we ll frst create a varable that contans the mean scores for boys for each school and then another varable that contans the mean score for grls for each school.. sort school grl. by school: gen meanpop_boys = mscore_sch_grl[1]. gen boy = 1-grl. sort school boy. by school: gen meanpop_grls = mscore_sch_grl[1] We ll subtract the average boys score from the average females score to get the dfference n scores across genders for each school.. gen genddff = meanpop_grls - meanpop_boys. replace genddff =. f pupl!=1 We then make a hstogram of the gender dfferences for each school to see how much heterogenety (or varablty) we have between schools n our gender dfference.. hst genddff, norm freq xttle(school-specfc dfference between female and male average scores) 9

Frequency 0 5 10 15 0 5-1 0 1 3 School-specfc dfference between female and male average scores We do see qute a bt of heterogenety n the gender dfferences across the dfferent school. We can buld the heterogenety of the gender effect nto our model: Random ntercept model wth fxed effects for texp and grl and a random coeffcent on pupl gender MODEL 3: popular j U 0, U1, grlj, texp ( 0 + U 0 ) + ( β1 + U1 )grlj + β = β texp + ε U U ε ~ N(0, σ ) j 0 1 ~ MVN (0, Σ) j In the above model, popularty score s functon of pupl gender and teachng experence. We allow for dfferent baselne scores for dfferent schools by usng a random ntercept, and we allow dfferent gender effects for dfferent schools by usng a random slope for gender. Note: We could NOT ft a model that ncludes a random slope on teachng experence because teachng experence does not vary wthn school. 10

. xtmxed popular texp grl school: grl, cov(unstr) mle Mxed-effects ML regresson Number of obs = 000 Group varable: school Number of groups = 100 Obs per group: mn = 16 avg = 0.0 max = 6 Wald ch() = 316.4 Log lkelhood = -130.5877 Prob > ch = 0.0000 texp.108356.01011 10.7 0.000.0885334.181718 grl.843175.0593856 14.0 0.000.767815.9595688 _cons 3.339973.1591614 0.98 0.000 3.080 3.65193 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Unstructured sd(grl).51937.0483111.437695.631966 sd(_cons).63449.049556.5443643.7393807 corr(grl,_cons).0640675.1309317 -.1911435.3111648 sd(resdual).664869.0104455.6063449.64798 LR test vs. lnear regresson: ch(3) = 174.41 Prob > ch = 0.0000 Note: LR test s conservatve and provded only for reference. Important! Always allow the random slope and random ntercept to be correlated Q4. Does teachng experence explan the between-school heterogenety n the gender effect? MODEL 4: popular j U 0, U1, grlj, texp ( 0 + U 0 ) + ( β1 + β3texp + U1 )grlj + β = β texp + ε U U ε ~ N(0, σ ) j 0 1 ~ MVN (0, Σ) j We can answer ths queston by fttng the above model that ncludes a cross-level nteracton of the varables sex and texp. It s called cross-level snce gender s a level- 1 covarate (vares at the pupl level) whle teacher s experence s a level- unt (vares at the class (school) level). 11

. gen grlxtexp = grl*texp. xtmxed popular texp grl grlxtexp school: grl, cov(unstr) mle Mxed-effects ML regresson Number of obs = 000 Group varable: school Number of groups = 100 Obs per group: mn = 16 avg = 0.0 max = 6 Wald ch(3) = 365.74 Log lkelhood = -1.95 Prob > ch = 0.0000 texp.11093.010187 10.88 0.000.0903774.1300811 grl 1.39479.131709 10.09 0.000 1.071346 1.58761 grlxtexp -.034051.0083716-4.06 0.000 -.0504331 -.017617 _cons 3.313651.1593869 0.79 0.000 3.00158 3.66044 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Unstructured sd(grl).46951.045865.3874439.5683341 sd(_cons).6347378.0495438.5446967.7396631 corr(grl,_cons).0798403.147735 -.1645989.3150401 sd(resdual).6643.010446.606956.647371 LR test vs. lnear regresson: ch(3) = 169.8 Prob > ch = 0.0000 Note: LR test s conservatve and provded only for reference. The estmate of fxed coeffcent on teacher s experence s smlar for both model 3 and model 4. However, the regresson slope for pupl gender s consderable larger n model 4 (the model wth the cross-level nteracton between gender and teacher s experence) and ths coeffcent now has the nterpretaton of beng the school-specfc dfference n average scores between genders when the teacher s experence s zero for a typcal school. The coeffcent on the nteracton between gender and teacher experence s estmated as -0.03, whch s statstcally sgnfcant. The negatve value means the dfference between grls and boys s smaller wth more experenced teachers. The standard devaton of the random slope on gender decreases from 0.5 to 0.47, whch means that the varaton n teacher experence between schools explans some of the varablty between schools n the estmated gender dfference n self-reported popularty scores. 1