A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Similar documents
Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Perugia, 13 maggio 2011

PIRLS 2016 Achievement Scaling Methodology 1

Plausible Values for Latent Variables Using Mplus

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview

A multilevel multinomial logit model for the analysis of graduates skills

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Hierarchical Linear Modeling. Lesson Two

Multilevel Analysis of Assessment Data

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

An Introduction to Mplus and Path Analysis

Multilevel Modeling: A Second Course

PACKAGE LMest FOR LATENT MARKOV ANALYSIS


Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

1 Introduction. 2 Example

Estimation and Centering

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

An Introduction to Path Analysis

Introduction and Background to Multilevel Analysis

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

More on Roy Model of Self-Selection

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Technical Appendix C: Methods

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

Machine Learning Linear Regression. Prof. Matteo Matteucci

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

MULTILEVEL IMPUTATION 1

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multiple Linear Regression

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Technical Appendix C: Methods. Multilevel Regression Models

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

A dynamic perspective to evaluate multiple treatments through a causal latent Markov model

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D

Multilevel Analysis, with Extensions

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data

Impact Evaluation of Mindspark Centres

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Discussing Effects of Different MAR-Settings

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Don t be Fancy. Impute Your Dependent Variables!

Stochastic Approximation Methods for Latent Regression Item Response Models

The Factorial Survey

Random Intercept Models

Extension of the NAEP BGROUP Program to Higher Dimensions

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

WISE International Masters

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Applied Statistics and Econometrics

Outline. Selection Bias in Multilevel Models. The selection problem. The selection problem. Scope of analysis. The selection problem

Erasmus Teaching staff mobility INTRODUCTION TO MULTILEVEL MODELLING

Making sense of Econometrics: Basics

PIRLS International Report PIRLS. Chapter 2

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Selection on Observables: Propensity Score Matching.

Pooling multiple imputations when the sample happens to be the population.

International Student Achievement in Mathematics

Lecture 12: Effect modification, and confounding in logistic regression

Answer Key: Problem Set 6

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Categorical and Zero Inflated Growth Models

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

A Multilevel Analysis of Graduates Job Satisfaction

4. Nonlinear regression functions

Simple Linear Regression: One Quantitative IV

Using Power Tables to Compute Statistical Power in Multilevel Experimental Designs

Model fit evaluation in multilevel structural equation models

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Inference in Regression Analysis

Describing Change over Time: Adding Linear Trends

Introducing Generalized Linear Models: Logistic Regression

In Class Review Exercises Vartanian: SW 540

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

Multilevel Structural Equation Modeling

Review of the General Linear Model

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Basics of Modern Missing Data Analysis

Experimental Design and Data Analysis for Biologists

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Some methods for handling missing values in outcome variables. Roderick J. Little

Joint Modeling of Longitudinal Item Response Data and Survival

Review of CLDP 944: Multilevel Models for Longitudinal Data

arxiv: v1 [stat.ap] 11 Aug 2014

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Introduction to GSEM in Stata

Econometrics Problem Set 4

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Regression tree-based diagnostics for linear multilevel models

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Transcription:

A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo 2 1 Department of Statistics, Informatics, Applications G. Parenti - University of Florence, e-mail: grilli@disia.unifi.it, rampichini@disia.unifi.it 2 Department of Statistics and Quantitative Methods - University of Milano-Bicocca, e-mail: fulvia.pennoni@unimib.it, isabella.romeo@unimib.it July 24 th, 2014

TIMSS AND PIRLS SURVEYS TIMSS and PIRLS are large scale assessment surveys held by the International Association for the Evaluation of Educational Achievement (IEA). Rutkowski, L., Gonzalez, E., Joncas, M. von Davier, M. (2010). International Large-Scale Assessment Data: Issues in Secondary Analysis and Reporting. Educational Researcher TIMSS (Trends in International Mathematics and Science Study): at fourth and eighth grades every four years since 1995; PIRLS (Progress in International Reading Literacy Study): at fourth grade every five years since 2001. In 2011 - for the first time - TIMSS and PIRLS cycles coincided. The TIMSS&PIRLS 2011 Combined International Database concerns fourth grade students and collects data from questionnaires administrated to students, parents, teachers, and school principals. 1 / 23

2 / 23 TIMSS & PIRLS 2011 DATA Sample design Two stage (according to the hierarchical structure): schools are first sampled proportionally to their size (number of students) then 1 or 2 classes are randomly sampled and all students are interviewed Martin, M. O., Mullis, I. V. S. (2012). Methods and procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Italian TIMSS&PIRLS 2011 sample: 4125 students nested in 239 classes nested in 202 schools 483 teachers (each class may have from one to three teachers)

PLAUSIBLE VALUES Rotating scheme of item administration each student answers a subset of items in order to minimize testing burden ensure accurate population estimates For any student, the total score is missing and replaced by five multiple imputations (Plausible Values) Plausible Values (PVs) WHAT are PVs?: they are random draws from the distribution of the total score derived from an IRT model. Mislevy (1991) Randomization-based inference about latent variables from complex samples, Psychometrika. HOW using PVs?: run separate analyses with each PV and combine the results through multiple imputation procedures. Rubin (1987) Multiple imputation for nonresponse in sample surveys. 3 / 23

4 / 23 OBJECTIVES OF THE ANALYSIS Using TIMSS&PIRLS 2011 data for Italy, we aim to explore the relationships among performances in the three subjects: Reading, Math and Science analyse the determinants of the achievement at different hierarchical levels (students and classes) perform effectiveness analysis at class level We need a model that is both multilevel (students in classes) and multivariate (Reading, Math and Science) Remark: to the best of our knowledge, all reports and papers exploit multilevel models for a single outcome - no multivariate modelling!

5 / 23 THE MULTIVARIATE MULTILEVEL MODEL Features the three scores on Reading, Math and Science are a joint outcome the level 2 is represented by classes (instead of schools) since several factors act at the class level (e.g. peer effects) the school is not added as level 3 since in most schools only one class was sampled (however, cluster-robust standard errors are used) Advantages estimating the (residual) correlations between pairs of outcomes at both hierarchical levels testing whether the effects of the covariates are identical across outcomes (e.g. differences between males and females are the same in Reading and Math?)

6 / 23 MODEL EQUATION We specify the following multivariate two-level model: Y mij = α m + β m x mij + γ m w mj + u mj + e mij outcome m with m = 1, 2, 3 (1: Reading, 2: Math, 3: Science) student i with i = 1,..., n j class j with (j = 1,..., 239) x mij vector of student-level covariates w mj vector of class-level covariates (also including covariates at higher level, e.g. school or province) u mj class-level errors e mij student-level errors Remark: the model allows for outcome-specific covariates, e.g. the experience of the teacher

MODEL ERRORS: COVARIANCE MATRICES Student-level errors: e ij = (e 1ij, e 2ij, e 3ij ) Class-level errors: u j = (u 1j, u 2j, u 3j ) e mij indep. across students, u mj indep. across classes e mij independent from u mj e mij and u mj multivariate normal with zero means Covariance matrix at student level Covariance matrix at class level σ1 2 σ 12 σ 13 τ Var(e ij ) = Σ = σ2 2 1 2 τ 12 τ 13 σ 23 Var(u j ) = T = τ 2 σ3 2 2 τ 23 τ3 2 Y ij = (Y 1ij, Y 2ij, Y 3ij ) has residual covariance matrix Σ + T. We tried several alternative specifications (e.g. heteroscedastic class-level errors) but with no significant improvement of the fit Grilli L., Rampichini C. (2014) Specification of random effects in multilevel models: a review. Quality & Quantity (to appear) 7 / 23

8 / 23 MODEL FITTING Estimation sample: 3741 students in 237 classes (indeed, 284 students and 2 classes have been excluded due to missing values in the covariates) Estimation method: maximum likelihood Plausible values: estimation is performed separately for each of the five plausible values and then results are combined using Multiple Imputation (MI) formulas (Rubin, 1987) Next steps Software: mixed and mi commands of Stata 13 results from the null model model selection results from the final model

9 / 23 RESULTS FROM THE NULL MODEL Decomposition of the correlation matrix: Correlations % Between class Within class Between class Total of (co)variances Subject Read Math Scie Read Math Scie Read Math Scie Read Math Scie Read 1.00 1.00 1.00 19.8 Math 0.71 1.00 0.93 1.00 0.76 1.00 29.5 28.8 Science 0.81 0.74 1.00 0.97 0.98 1.00 0.85 0.81 1.00 28.2 35.0 29.4 Correlations among outcomes are higher between classes rather than within classes

10 / 23 RESULTS FROM THE NULL MODEL (CONT.) Decomposition of the correlation matrix: Correlations % Between class Within class Between class Total of (co)variances Subject Read Math Scie Read Math Scie Read Math Scie Read Math Scie Read 1.00 1.00 1.00 19.8 Math 0.71 1.00 0.93 1.00 0.76 1.00 29.5 28.8 Science 0.81 0.74 1.00 0.97 0.98 1.00 0.85 0.81 1.00 28.2 35.0 29.4 Reading has the lowest percentage of class-level variance, maybe because it is the subject more influenced by the student background characteristics

11 / 23 MODEL SELECTION STRATEGY The selection process in principle requires fitting the multivariate model repeatedly, each time combining the estimates with MI To speed up the process, we adopt two simplifications: the outcomes are analyzed separately by means of univariate multilevel models, retaining covariates being significant in at least one of the univariate models the estimation is carried out using only the first plausible value (underestimated standard errors conservative selection of the covariates) Covariates are added in the following hierarchical order: student, teacher, class, school, province Remark: we center continuous covariates at their sample grand means, and we do not center student-level covariates at their class-level means

GROSS VALUE ADDED (GVA) We control for differences in wealth across Italy by means of the per capita Gross Value Added (GVA) at market prices in 2011. The GVA is measured for each of the 110 Italian provinces, ranging from 45 to 142 (national average = 100). The relationships between the achievement scores and the GVA are explored through local polynomial regression (see the plot for Math) Average Math score 400 450 500 550 600 60 80 100 120 140 Gross Value Added The line for GVA< 100 (national average) has a significant positive slope, the line for GVA> 100 is nearly flat and the slope is not significantly different from zero. We constrain to zero the slope of the second line of the spline (i.e. GVA> 100). 12 / 23

SELECTED COVARIATES Student characteristics Gender Language spoken at home Pre-school Home resources for learning 1 Early literacy/numeracy tasks 2 1 It is derived from items on the number of books and study supports available at home and parents levels of education and occupation (Martin & Mullis, 2013). 2 It is derived from the parents responses to how well their child could do some early literacy and numeracy activities when he/she began primary school (Martin & Mullis, 2013). Teacher characteristics Gender Years been teaching Degree Class and School characteristics % Students attended pre-school % Language spoken at home is not Italian Average of home resources for learning Average of Early literacy/numeracy tasks Average of home resources for learning Average of Early literacy/numeracy tasks School is safe and orderly School with Italian students >90% 1 < 10% of students has a low SES 1 School is located in a big area 1 Area with more than 50.000 inhabitants 1 6 days of school per week Adequate environment and resources 1 GVA 2 1 Declared by the school principal 2 per capita Gross Value Added (GVA) at market prices in 2011 (proxy of the school socio-economic context) 13 / 23

14 / 23 ESTIMATED REGRESSION COEFFICIENTS Estimates and robust standard errors of the selected multivariate multilevel model (MI combined results) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Joint test F Test F for the equality of regression coefficients among the three outcomes: H 0 : β Read = β Math = β Science Except for Pre-school, student-level covariates have significantly different effects

15 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Gender Females have a significantly lower performance in Math and Science, but not in Reading.

16 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Read&Science vs Math Family background covariates have a similar effect on Read and Science, as opposed to Math the abilities required for Science seem to be closer to those for Read Likely, this is a consequence of the way Science is taught in Italian primary schools.

17 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Gross Value Added (GVA) The effect of GVA is modelled by a linear spline with a single knot in 100 (the national average) GVA has a significant effect only for provinces below the national average, with no significant difference across outcomes. For the province with the lowest value of GVA (55) the effect is minus 22.5 points.

18 / 23 EXPLAINED VARIANCES AND RESIDUAL ICC S The proportions of variance explained by the final model with respect to the null model are higher at class level: the within-class variances reduce by 15% for the three outcomes the between-class variances reduce by 33% for Reading, 20% for Math and 26% for Science compositional and contextual effects are more relevant for the achievement in Reading The residual ICC s are quite high: 16% for Reading, 28% for Math and 27% for Science relevant unobserved class-level factors The correlations among outcomes are similar to those in the null model.

EMPIRICAL BAYES RESIDUALS The level 2 error (class random effect) u mj is the contribution of class j to the achievement of students in outcome m (it may be interpreted in terms of effectiveness) Random effect prediction 200 100 0 100 200 0 50 100 150 200 250 Rank Empirical Bayes residuals for Math with 95% confidence intervals good classes (CI above 0): students on average achieve substantially more than expected on the basis of the covariates poor classes (CI below 0): students on average achieve substantially less than expected The residuals give an indication or further territorial differences not captured by GVA: in North-West good classes prevail on poor classes, while in the Centre the pattern is reversed (we tried to add geographical dummies in the fixed part of the model not significant) in the South there are high percentages of both good and poor classes greater variability of achievement (we tried to specify heteroscedastic random effects not significant) 19 / 23

20 / 23 FINAL REMARKS Outcomes in Reading, Math and Science from large-scale assessment surveys are usually studied one by one (univariate multilevel models) Using the Italian subset of the TIMSS&PIRLS 2011 combined dataset, we performed a joint analysis of achievement in Reading, Math and Science by means of a multivariate multilevel model the multivariate approach allowed us to obtain the following findings: estimating correlations among outcomes: we found that correlations at class level are higher than correlations at student level (so high that the three outcomes yield the same results of school/class effectiveness) testing for differential effects of covariates on the outcomes: we found that background covariates have similar effects on Reading and Science, as opposed to Math; moreover, females have a lower performance in Math and Science, but not in Reading

21 / 23 FINAL REMARKS (CONT.) We accounted for territorial differences in wealth through the Gross Value Added (GVA) at province level (instead of adding dummy variables for geographical areas more interesting interpretation) The class-level Empirical Bayes residuals allowed us to identify good and poor classes and to point out further territorial patterns concerning both the mean and the variance (e.g. greater variability of achievement in the South of Italy, not included in the model due to lack of statistical significance) thanks for your attention :-)