Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Similar documents
A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Perugia, 13 maggio 2011

PIRLS 2016 Achievement Scaling Methodology 1

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Plausible Values for Latent Variables Using Mplus

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview

A multilevel multinomial logit model for the analysis of graduates skills

1 Introduction. 2 Example

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Using Power Tables to Compute Statistical Power in Multilevel Experimental Designs

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Multilevel Analysis of Assessment Data

An Introduction to Mplus and Path Analysis

Technical Appendix C: Methods

Hypothesis Testing for Var-Cov Components

Utilizing Hierarchical Linear Modeling in Evaluation: Concepts and Applications

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

An Introduction to Path Analysis

Erasmus Teaching staff mobility INTRODUCTION TO MULTILEVEL MODELLING

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models

Hierarchical Linear Modeling. Lesson Two

A dynamic perspective to evaluate multiple treatments through a causal latent Markov model

Technical Appendix C: Methods. Multilevel Regression Models

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Notation Survey. Paul E. Johnson 1 2 MLM 1 / Department of Political Science

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Estimating chopit models in gllamm Political efficacy example from King et al. (2002)

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

F-tests for Incomplete Data in Multiple Regression Setup

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

Multilevel regression mixture analysis

Citation for published version (APA): Jak, S. (2013). Cluster bias: Testing measurement invariance in multilevel data

MULTILEVEL IMPUTATION 1

Partitioning variation in multilevel models.

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models


Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Some methods for handling missing values in outcome variables. Roderick J. Little

A Multilevel Analysis of Graduates Job Satisfaction

WU Weiterbildung. Linear Mixed Models

Describing Change over Time: Adding Linear Trends

Introducing Generalized Linear Models: Logistic Regression

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Recent Developments in Multilevel Modeling

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multilevel Modeling: A Second Course

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Investigating Models with Two or Three Categories

PIRLS International Report PIRLS. Chapter 2

Estimation and Centering

Estimating the Hausman test for multilevel Rasch model. Kingsley E. Agho School of Public health Faculty of Medicine The University of Sydney

Multilevel Regression Mixture Analysis

Impact Evaluation of Mindspark Centres

ROBUSTNESS OF MULTILEVEL PARAMETER ESTIMATES AGAINST SMALL SAMPLE SIZES

Nonlinear Regression Functions

Logistic And Probit Regression

The Basic Two-Level Regression Model

Sampling. February 24, Multilevel analysis and multistage samples

Confidence intervals for the variance component of random-effects linear models

Outline. Selection Bias in Multilevel Models. The selection problem. The selection problem. Scope of analysis. The selection problem

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

Multilevel Analysis of Grouped and Longitudinal Data

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755

International Student Achievement in Mathematics

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Don t be Fancy. Impute Your Dependent Variables!

Machine Learning Linear Regression. Prof. Matteo Matteucci

Creating and Interpreting the TIMSS Advanced 2015 Context Questionnaire Scales

Longitudinal Modeling with Logistic Regression

Applied Statistics and Econometrics

ECON3150/4150 Spring 2015

Introduction to Linear Regression Analysis

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

More on Roy Model of Self-Selection

Introduction. Dottorato XX ciclo febbraio Outline. A hierarchical structure. Hierarchical structures: type 2. Hierarchical structures: type 1

Random Intercept Models

A User's Guide To Principal Components

COMBINING ROBUST VARIANCE ESTIMATION WITH MODELS FOR DEPENDENT EFFECT SIZES

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Econometrics Problem Set 4

Basics of Modern Missing Data Analysis

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

4. Nonlinear regression functions

Models for Clustered Data

Categorical and Zero Inflated Growth Models

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Transcription:

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis of socio-economic data Rome - January 23-24, 2015 Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo 2 1 Department of Statistics, Informatics, Applications G. Parenti - University of Florence, e-mail: grilli@disia.unifi.it, rampichini@disia.unifi.it 2 Department of Statistics and Quantitative Methods - University of Milano-Bicocca, e-mail: fulvia.pennoni@unimib.it, isabella.romeo@unimib.it

TIMSS AND PIRLS SURVEYS TIMSS and PIRLS are large scale assessment surveys held by the International Association for the Evaluation of Educational Achievement (IEA). Rutkowski, L., Gonzalez, E., Joncas, M. von Davier, M. (2010). International Large-Scale Assessment Data: Issues in Secondary Analysis and Reporting. Educational Researcher TIMSS (Trends in International Mathematics and Science Study): at fourth and eighth grades every four years since 1995; PIRLS (Progress in International Reading Literacy Study): at fourth grade every five years since 2001. In 2011 - for the first time - TIMSS and PIRLS cycles coincided. The TIMSS&PIRLS 2011 Combined International Database concerns fourth grade students and collects data from questionnaires administrated to students, parents, teachers, and school principals. 1 / 23

2 / 23 TIMSS & PIRLS 2011 DATA Sample design Two stage (according to the hierarchical structure): schools are first sampled proportionally to their size (number of students) then 1 or 2 classes are randomly sampled and all students are assessed Martin, M. O., Mullis, I. V. S. (2012). Methods and procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. We analyse the TIMSS&PIRLS 2011 sample for Italy

PLAUSIBLE VALUES Rotating scheme of item administration each student answers a subset of items in order to minimize testing burden ensure accurate population estimates For any student, the total score is missing and replaced by five Plausible Values Plausible Values (PVs) PVs are random draws (imputed values) from the distribution of the total score derived from an IRT model. Mislevy (1991) Randomization-based inference about latent variables from complex samples, Psychometrika. PVs are handled by running separate analyses with each PV and combining the results through multiple imputation procedures. Rubin (1987) Multiple imputation for nonresponse in sample surveys. 3 / 23

4 / 23 OBJECTIVES OF THE ANALYSIS Using TIMSS&PIRLS 2011 data for Italy, we aim to explore the relationships among performances in the three subjects: Reading, Math and Science analyse the determinants of the achievement at different hierarchical levels (students and classes) perform effectiveness analysis at class level We need a model that is both multilevel (students in classes) and multivariate (Reading, Math and Science) To the best of our knowledge, all reports and papers exploit multilevel models for a single outcome - no multivariate modelling!

5 / 23 THE MULTIVARIATE MULTILEVEL MODEL Features the three scores on Reading, Math and Science are a joint outcome the level 2 is represented by classes (instead of schools) since several factors act at the class level (e.g. peer effects) the school is not added as level 3 since in most schools only one class was sampled (however, cluster-robust standard errors are used) Advantages estimating the (residual) correlations between pairs of outcomes at both hierarchical levels testing whether the effects of the covariates are identical across outcomes (e.g. differences between males and females are the same in Reading and Math?)

6 / 23 MODEL EQUATION We specify the following multivariate two-level model: outcome m student i class j Y mij = [α m + β m x mij + γ m w mj ] + u mj + e mij (1: Reading, 2: Math, 3: Science) x mij vector of student-level covariates w mj vector of class-level covariates (also including covariates at higher level, e.g. school or province) u mj class-level errors e mij student-level errors Remark: the model allows for outcome-specific covariates, e.g. the experience of the teacher

MODEL ERRORS: COVARIANCE MATRICES Student-level errors: e ij = (e 1ij, e 2ij, e 3ij ) Class-level errors: u j = (u 1j, u 2j, u 3j ) e mij indep. across students, u mj indep. across classes e mij independent from u mj e mij and u mj multivariate normal with zero means Covariance matrix at student level Covariance matrix at class level σ1 2 σ 12 σ 13 τ Var(e ij ) = Σ = σ2 2 1 2 τ 12 τ 13 σ 23 Var(u j ) = T = τ 2 σ3 2 2 τ 23 τ3 2 Y ij = (Y 1ij, Y 2ij, Y 3ij ) has residual covariance matrix Σ + T. We tried several alternative specifications (e.g. heteroscedastic class-level errors) but with no significant improvement of the fit Grilli L., Rampichini C. (2014) Specification of random effects in multilevel models: a review. Quality & Quantity (to appear - available on L.Grilli s page on Research Gate) 7 / 23

8 / 23 MODEL FITTING Estimation sample: 3741 students in 237 classes Estimation method: maximum likelihood Plausible values: estimation is performed separately for each of the five plausible values and then results are combined using Multiple Imputation (MI) formulas (Rubin, 1987) Next steps Software: mixed and mi commands of Stata 13 results from the null model model selection results from the final model

9 / 23 RESULTS FROM THE NULL MODEL Decomposition of the correlation matrix: Correlations % Between class Within class Between class Total of (co)variances Subject Read Math Scie Read Math Scie Read Math Scie Read Math Scie Read 1.00 1.00 1.00 19.8 Math 0.71 1.00 0.93 1.00 0.76 1.00 29.5 28.8 Science 0.81 0.74 1.00 0.97 0.98 1.00 0.85 0.81 1.00 28.2 35.0 29.4 Correlations among outcomes are higher between classes rather than within classes Reading has the lowest percentage of class-level variance (Intraclass Correlation Coefficient)

SELECTED COVARIATES Covariates are added in the following hierarchical order: student, teacher, class, school, province Student covariates Class and School covariates Gender Language spoken at home Pre-school Home resources for learning 1 Early literacy/numeracy tasks 2 1 Derived from items on the number of books and study supports available at home and parents levels of education and occupation (Martin & Mullis, 2013). 2 Derived from parents responses to how well their child could do some early literacy/numeracy activities when beginning primary school (Martin & Mullis, 2013). Teacher covariates Gender Years been teaching % Students attended pre-school % Language spoken at home is not Italian Average of home resources for learning Average of Early literacy/numeracy tasks School is safe and orderly School with Italian students >90% 1 < 10% of students has a low SES 1 School is located in a big area 1 Adequate environment and resources 1 GVA 2 1 Declared by the school principal 2 per capita Gross Value Added (GVA) at market prices in 2011 (proxy of the school socio-economic context) 10 / 23

GROSS VALUE ADDED (GVA) We control for differences in wealth across Italy by means of the per capita Gross Value Added (GVA) at market prices in 2011. The GVA is measured for each of the 110 Italian provinces, ranging from 45 to 142 (national average = 100). The relationships between the achievement scores and the GVA are explored through local polynomial regression (see the plot for Math) Average Math score 400 450 500 550 600 60 80 100 120 140 Gross Value Added The line for GVA< 100 (national average) has a significant positive slope, the line for GVA> 100 is nearly flat and the slope is not significantly different from zero. We constrain to zero the slope of the second line of the spline (i.e. GVA> 100). 11 / 23

12 / 23 ESTIMATED REGRESSION COEFFICIENTS Estimates and robust standard errors of the selected multivariate multilevel model (MI combined results) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Joint test F Test F for the equality of regression coefficients among the three outcomes: H 0 : β Read = β Math = β Science Except for Pre-school, student-level covariates have significantly different effects

13 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Gender Females have a significantly lower performance in Math and Science, but not in Reading.

14 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Read&Science vs Math Family background covariates have a similar effect on Read and Science, as opposed to Math the abilities required for Science seem to be closer to those for Read Likely, this is a consequence of the way Science is taught in Italian primary schools.

EMPIRICAL BAYES RESIDUALS The level 2 error (class random effect) u mj is the contribution of class j to the achievement of students in outcome m (it may be interpreted in terms of effectiveness) Random effect prediction 200 100 0 100 200 0 50 100 150 200 250 Rank Empirical Bayes residuals for Math with 95% confidence intervals good classes (CI above 0): students on average achieve substantially more than expected on the basis of the covariates poor classes (CI below 0): students on average achieve substantially less than expected Closer inspection of residuals reveals further territorial differences not captured by GVA: in North-West good classes prevail on poor classes, while in the Centre the pattern is reversed (we tried to add geographical dummies in the fixed part of the model not significant) in the South there are high percentages of both good and poor classes greater variability of achievement (we tried to specify heteroscedastic random effects not significant) 15 / 23

16 / 23 FINAL REMARKS Outcomes in Reading, Math and Science from large-scale assessment surveys are usually studied one by one (univariate multilevel models) Using the Italian subset of the TIMSS&PIRLS 2011 combined dataset, we performed a joint analysis of achievement in Reading, Math and Science by means of a multivariate multilevel model the multivariate approach allowed us to obtain the following findings: estimating correlations among outcomes: we found that correlations at class level are higher than correlations at student level (so high that the three outcomes yield the same results of school/class effectiveness) testing for differential effects of covariates on the outcomes: we found that background covariates have similar effects on Reading and Science, as opposed to Math; moreover, females have a lower performance in Math and Science, but not in Reading

17 / 23 FINAL REMARKS (CONT.) We accounted for territorial differences in wealth through the Gross Value Added (GVA) at province level (instead of adding dummy variables for geographical areas more interesting interpretation) The class-level Empirical Bayes residuals allowed us to identify good and poor classes and to point out further territorial patterns concerning both the mean and the variance (e.g. greater variability of achievement in the South of Italy, not included in the model due to lack of statistical significance)

18 / 23 FINAL REMARKS (CONT.) thanks for your attention :-) A draft of our paper is available on arxiv (http://arxiv.org/abs/1409.2642v1) and on Research Gate. Leonardo Grilli, Fulvia Pennoni, Carla Rampichini and Isabella Romeo (2014) Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

REFERENCES Foy, P.: TIMSS and PIRLS 2011 user guide for the fourth grade Combined International Database. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College (2013) Foy, P., O Dwyer, L.M.: Technical Appendix B. School Effectiveness Models and Analyses. in: TIMSS and PIRLS 2011 Relationships report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College (2013) Martin, M.O., Mullis, I.V.S.: Timss and Pirls 2011: Relationships Among Reading, Mathematics, and Science Achievement at the Fourth GradeImplications for Early Learning. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College (2013) Rabe-Hesketh, S., Skrondal, A.: Multilevel and Longitudinal Modeling using Stata (3th ed). Stata Press, College Station:TX (2012) Raudenbush, S.W., Bryk, A.S.: Hierarchical linear models: Applications and data analysis methods. Sage, Thousand Oaks: CA (2002) Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Sage, John Wiley & Son: NY (1987) Snijders, T.A.B., Bosker, R.J.: Multilevel analysis: an introduction to basic and advanced multilevel modelling (2nd ed). SAGE Publications, Inc (2011) Wu, M.: The role of plausible values in large-scale surveys. Studies in Educational Evaluation 31, 114 128 (2005) 19 / 23

20 / 23 WEIGHTS At each hierarchical level, the weight is defined as the product of : the sampling weight (i.e. the reciprocal of the conditional sampling probability); the adjustment weight which accounts for non participation of sampled units. Weights are obtained by multiplying the weights across the hierarchical levels (i: student; j: class; k: school): student weight is obtained as: w ijk = w i jk w j k w k ; class weight is obtained as: w jk = w j k w k. In order to perform weighted estimation in a multilevel model, the weights must refer to the relevant hierarchical levels: conditional student weight: w i jk ; unconditional class weight: w jk = w j k w k.

21 / 23 ESTIMATED REGRESSION COEFFICIENTS (CONT.) Read Math Science Test F Coef. s.e. Coef. s.e. Coef. s.e. p-value Intercept 531.73 3.57 514.99 4.25 531.47 3.92 0.0006 Student covariates Female 2.92 2.41-11.96 3.05-10.64 2.28 0.0000 Language at home is not Italian -22.57 3.12-14.94 3.27-23.74 3.53 0.0161 Pre-school 8.85 3.01 8.46 2.51 10.91 3.15 0.6386 Home resources for learning 14.04 0.84 10.64 0.84 13.23 0.93 0.0009 Early literacy/numeracy tasks 7.24 0.77 10.07 0.76 6.53 0.83 0.0051 School covariates Adequate environment & resources 5.28 1.92 8.61 3.19 7.00 2.96 0.1950 Province covariates GVA (below 100) 0.45 0.15 0.48 0.21 0.55 0.20 0.3983 Gross Value Added (GVA) The effect of GVA is modelled by a linear spline with a single knot in 100 (the national average) GVA has a significant effect only for provinces below the national average, with no significant difference across outcomes. For the province with the lowest value of GVA (55) the effect is minus 22.5 points.

22 / 23 EXPLAINED VARIANCES AND RESIDUAL ICC S The proportions of variance explained by the final model with respect to the null model are higher at class level: the within-class variances reduce by 15% for the three outcomes the between-class variances reduce by 33% for Reading, 20% for Math and 26% for Science compositional and contextual effects are more relevant for the achievement in Reading The residual ICC s are quite high: 16% for Reading, 28% for Math and 27% for Science relevant unobserved class-level factors The correlations among outcomes are similar to those in the null model.

23 / 23 MODEL SELECTION STRATEGY The selection process in principle requires fitting the multivariate model repeatedly, each time combining the estimates with MI To speed up the process, we adopt two simplifications: the outcomes are analyzed separately by means of univariate multilevel models, retaining covariates being significant in at least one of the univariate models the estimation is carried out using only the first plausible value (underestimated standard errors conservative selection of the covariates) Covariates are added in the following hierarchical order: student, teacher, class, school, province Remark: we center continuous covariates at their sample grand means, and we do not center student-level covariates at their class-level means