Mutilevel Models: Pooled and Clustered Data

Size: px

Start display at page:

Download "Mutilevel Models: Pooled and Clustered Data"

Jayson Wilcox
6 years ago
Views:

1 Mutilevel Models: Pooled and Clustered Data ICPSR, June 1-5, 2015 Tom Carsey

2 In the Beginning... We start with a simple linear model Where i subscripts individual observations. Y i = β 0 + β 1X 1i + β 2X 2i ε i (1) We assume that observations Y i (conditional on the X s) are independently and identically distributed (iid). In other words,ε i are assumed to be iid. Independent means that each ε i is unrelated/uncorrelated to: other ε i s anything else - most importantly, the X s Identically means that they can be considered drawn from a common probability distribution 2 / 96

3 What Does iid Look Like? If we have iid, we can write: ε i N(0, σ 2 ) Which means distributed normally with a mean=0 and a constant variance = σ 2 Another way to write this is with a variance-covariance matrix for the residuals, as follows: σ εε = 0 σ σ 2 0 (2) σ 2 The main diagonal shows a constant variance. The off diagonals show no correlation among residuals. if we extract σ 2 from the matrix, we are left with σ 2 times an Identity Matrix. 3 / 96

4 If All is Well... If this is a normal model, then we can estimate the parameters via OLS as follows: β = (X X) 1 X Y (3) And we can estimate proper standard errors as follows: VCV (β) = σ 2 (X X) 1 (4) Where ˆσ 2 = ε 2 i (n p) = ε ε (n p) with n=sample size and p=# of parameters in the regression model. Estimates of the parameters will be unbiased and efficient. Estimates of the standard errors will also be correct.... and Life is Good! (5) 4 / 96

5 What if We Violate iid? Violation of the Independently distributed part of iid due to correlation among residuals takes the form of serial, spatial, or cluster correlation. If that is our only problem, it will leave coefficient estimates unbiased, but our standard errors will be wrong. Violation of the Identically distributed part of iid usually takes the form of non-constant variance. If that is our only problem, it will leave coefficient estimates unbiased, but our standard errors will be wrong. Let s look at the math. 5 / 96

6 Violation of iid I presented the formula for the standard errors of a normal regression model as: VCV (β) = σ 2 (X X) 1 (6) However, the formula really is: VAR(β) = (X X ) 1 X εε X (X X ) 1 (7) Assume that εε equals σ 2 times some n n matrix called V. We can write: VAR(β) = (X X ) 1 X σ 2 VX (X X ) 1 = σ 2 (X X ) 1 X VX (X X ) 1 6 / 96

7 Violation of iid (cont.) Carried forward from the previous slide VAR(β) = (X X ) 1 X σ 2 VX (X X ) 1 = σ 2 (X X ) 1 X VX (X X ) 1 If V is an Identity matrix, then the whole thing simplifies down to what we have seen before because X X(X X) 1 = I if V is an Identity matrix. However, if we violate iid, then V will not be an identity matrix. Thus, when we violate iid in the ways shown here, our default formula for estimating standard errors will not be correct. Pooled or clustered data almost certainly violates iid. 7 / 96

8 Standard Heteroscedasticity Standard heteroscedasticity violates identically distributed part of iid, and looks like this: σi εε = 0 σi σi 2 0 (8) σi 2 The major difference now is that σ 2 is subscripted by i because it is no longer a single constant value. 8 / 96

9 Standard Serial Correlation (AR1) Standard first-order serial correlation violates the independently distributed part of iid, and looks like this: 1 ρ ρ 2 ρ 3 εε = σ2 ν ρ 1 ρ ρ 2 1 ρ 2 ρ 2 ρ 1 ρ (9) ρ 3 ρ 2 ρ 1 Note that σ 2 is back to being constant here so no heteroscedasticity to worry about. 9 / 96

10 Pooled/Clustered Data and iid Pooled or clustered data can violate one or both i s of iid simply because of the clustered structure of the data. Remember, regression models (LMs and GLMs alike), assume no structure to the residuals they are supposed to be iid. The clustered nature of the data can leave such structure in the residuals if we do not account for it intentionally. The model changes from: To something like: Y i = β 0 + β 1X 1i + β 2X 2i ε i (10) Y ij = X ij β + e ij (11) with i and j subscripting individuals and clusters, respectively, or Level 1 and Level 2. And where (if nested) e ij = α j + ε ij (12) 10 / 96

11 The Basic Multi-Level Model Multi-level models are written in many ways. For example, we can write Level 1 as follows: Y ij = β 0j + β 1j X 1ij + ε ij (13) And then two Level 2 equations like this: And β 0j = γ 00 + γ 01Z j + µ 0j (14) β 1j = γ 10 + γ 11Z j + µ 1j (15) In multi-level models, we are generally concerned with the potential for a Level-2 variable to affect Y through affecting either the intercept or slope of the Level-1 equation. Note that we could put Z in the first equation directly if we wanted. 11 / 96

12 Expressing the Multi-level model If we do some substitutions from the previous slide, we get: Y ij = γ 00 + γ 10X ij + γ 01Z j + γ 11X ij Z j + µ 1j X ij + µ 0j + ε ij (16) This is just one X and one Z, showing that multi-level models get complex quickly. Notice that the basic form of the multi-level model implies an interaction between X and Z if the slopes operating on X are allowed to vary (either randomly across Z or directly as a function of Z). This does NOT preclude the explicit inclusion of interaction terms, however. What does clustered data mean for iid? 12 / 96

13 The Matrix Reloaded Remember again the formula for standard errors VAR(β) = (X X ) 1 X σ 2 VX (X X ) 1 = σ 2 (X X ) 1 X VX (X X ) 1 Pooled or clustered data can be thought of as sets of Level 1 observations stacked on top if each other by clusters (e.g. Level 2). There there are N observations per cluster and M clusters, then V will be an M M matrix: P V = 0 P P 0 (17) P where each 0 and P is itself an N N submatrix for each cluster. If each P matrix can be written as the same σ 2 times an Identity matrix, we re fine. 13 / 96

14 The Matrix Reloaded (cont.) If each P submatrix does not have the same σ 2 for each cluster (even if it is constant within clusters), then we violate the Identically distributed i again. If each cluster shares something in common about the average level of Y (conditional on X), then each cluster of residuals in each P matrix will be correlated with each other (even if not across clusters). That violates the Independently distributed i. Cluster correlation might look like this 1 ρ ρ ρ P j = ρ 1 ρ ρ ρ ρ 1 ρ (18) ρ ρ ρ 1 Where there is constant correlation within clusters. That correlation could be the same for each cluster, or we could subscript ρ by j to let it vary across clusters. 14 / 96

15 Multilevel Data Classical Multilevel data: observations (i) in groups (j). Can have covariates at the individual or group level. Students in classes Voters in states Events in countries Repeated measures: repeated measures on units leads to clustering within units. Predictors can be available at the measurement (i.e. how you feel today) or unit (i.e. your gender) level. Time series cross sectional data (TSCS): e.g., state-year data (clustered within states and also within years) Inter-cluster correlation, often labeled ρ, measures the degree of clustering: s 2 within s 2 between + s2 within (19) Where sbetween 2 = variance between cluster Where swithin 2 = variance within clusters 15 / 96

16 Some Options for Dealing with Multilevel Data Complete Pooling: a single regression completely ignoring the group information No Pooling: Run a single regression that includes group indicators (e.g. a set of dummy variables) but no group level predictors (what TSCS often calls fixed effects ) Run a different regression for every group Measure context with replicated values artificially deflates standard errors by failing to acknowledge proper sample size at Level 2. Multilevel Models: A flexible compromise between no pooling and complete pooling (equivalent to what TSCS often calls random effects ) Example of shrinkage models Outlying groups contribute some information to parameter estimation, but are shrunk toward the overall mean 16 / 96

17 Another Look at the Model 17 / 96

18 Another Look at the Model (cont.) 18 / 96

19 What is Fixed and What is Random? From the previous slide, all of those gamma parameters are called fixed because they do not vary. They may have uncertainty about them, but they do not vary. From the previous slide, the mu s are random because they represent the stochastic components of the intercept and the slopes on the X s. This is NOT the same as estimating a standard error for those parameters. Most attention for the random components part is devoted to their variances (and possible co-variances). 19 / 96

20 Linear Model Visually 20 / 96

21 Multi-Level, Random Slopes 21 / 96

22 Multi-Level, Random Slopes and Intercepts 22 / 96

23 Reasons to Use Multilevel Models Accounting for individual and group level variation in estimating group level coefficients. Interest may lie in group independent variables, but a regression can t have group level independent variables AND a set of group level dummies. Modeling variation among individual level coefficients. Convenient when we want to model the variation in coefficients across groups Letting coefficients vary randomly across levels is NOT the same as modeling an interaction between two independent variables. However, you can include multiplicative interaction terms (at the same level or across levels) in the model if you want. Estimating coefficients for particular groups. Can get reasonable estimates even for group s with small sample sizes because those estimates borrow strength from the estimates for other groups in the analysis, which would be hard with standard regression 23 / 96

24 Limits of Multilevel Models All the assumptions of a single level model are still there. Plus, each level must meet the basic assumptions of the model at that level. Plus, nothing at one level confounds the model at another level. Importantly, it assumes: That mean unit effects are uncorrelated with the means of Level-1 independent variables or to the actual values of Level-2 variables (bias) The variance of unit effects is constant across units (efficiency) Reduces to complete pooling or no pooling in extreme cases. Generally requires more than 5 groups to make it worth moving beyond no pooling (e.g. just use group dummies if number of groups is less than 5). 24 / 96

25 Basic Concepts: Varying Intercept Models (G&H notation) y i = α j[i] + βx i + ε i All groups have same slope Each group has different intercept Notation: i observations and j groups Common to see the i dropped and the model written α j Y X 25 / 96

26 Basic Concepts: Varying Slope Models y i = α + β j[i] x i + ε i All groups have the same intercept Each group has different slope Usually a very poor model to estimate Y X 26 / 96

27 Basic Concepts: Varying Intercept, Varying Slope Models y i = α j[i] + β j[i] x i + ε i All groups have different slopes Each group has different intercept Provides interactions between X and the group designations Also, easy to extend to group level variables (later) Y X 27 / 96

28 The Core of Multilevel Models The main point of multilevel models is to provide a statistical model for the group level. At minimum, group intercepts are given a common distribution. Additional, group slopes could be given a common distribution. Or slopes and intercepts are both estimated for each group. Or group-level parameters can be modeled as functions of group-level predictors. Again, the point is to give a model to these group-level parameters. 28 / 96

29 Terminology Problems Multilevel varying coefficients (α j or β j ) are often called random effects in reference to the randomness in the probability model of group level coefficients (details to come) Fixed effects are usually defined as varying coefficients that are not themselves modeled (i.e. J 1 group level indicators) However, fixed effects sometimes refer to models where the coefficients do not vary by group (so they are fixed, not random) In other words, sometimes fixed effects refers to models that include a set of dummy variables, but other times fixed effects refer to parameters (βs) that are estimated, but there is only one parameter operating on a given variable. For example, just a single β1 operating on X 1 rather than β1 j for X 1 that varies across j groups. 29 / 96

30 Simulation in Action Simulations are valuable in statistics because they let you define a DGP and then evaluate whether a particular statistical method is good at accurately recovering that DGP from a sample of data. The researcher defines Truth and uses a computer as a laboratory to evaluate how well sample estimates and the methods on which they are based map to that Truth. We ll start with a simple OLS regression Y i = β 0 + β 1X 1i + ε i (20) The DGP has an exogenous X, a deterministic model linking X to Y, and a random stochastic component. How can we represent this in R? 30 / 96

31 DGP in R for OLS set.seed(74392) N < B0 <-.3 B1 <-.5 X <- rnorm(n,0,1) error <- rnorm(n,0,1) Y <- B0 + B1*X + error M1 <- lm(y~x) summary(m1) 31 / 96

32 Produces these results Call: lm(formula = Y ~ X) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** X <2e-16 *** --- Signif. codes: 0?***? 0.001?**? 0.01?*? 0.05?.? 0.1?? 1 Residual standard error: on 998 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 998 DF, p-value: < 2.2e / 96

33 Results (cont.) The intercept is close to, but not exactly 0.3. The slope s close to, but not exactly 0.5. The following shows that a 95% confidence interval round each does include the True values. > Results <- cbind(confint(m1),c(b0,b1)) > Results 2.5 % 97.5 % (Intercept) X But this is just one cut. Simulations should include lots (often thousands) of repetitions. Let s Look 33 / 96

34 Setting the Parameters for the OLS Simulation reps < Sim.Slopes <- matrix(na,nrow=reps, ncol=1) set.seed(74392) N < B0 <-.3 B1 <-.5 X <- rnorm(n,0,1) 34 / 96

35 Code for the for() loop for(i in 1:reps){ error <- rnorm(n,0,1) Y <- B0 + B1*X + error m1 <- lm(y~x) Sim.Slopes[i] <- coef(m1)[2] } 35 / 96

36 Density Plot for β 1 from Prior Simulation 36 / 96

37 How does OLS Compare When Data is Clustered? Run a simulation several times, each time estimating three models Simple OLS Unit dummy variables Multi-level with random intercepts. Each time will include hundreds of repetitions. (Look at Rcode for illustration) 37 / 96

38 No Clustering; Densities of β 1 38 / 96

39 No Clustering; Coverage Probability fo β 1 39 / 96

40 Results with No Clustering All three, including OLS, appear to be unbiased and efficient. OLS with Fixed Effects (e.g. No Pooling) appears to be less efficient. RCSE fall short on the coverage probability (with a CI of 85 to 91). In short, plain OLS works as well as the other two models; and robust cluster standard error do not perform well. Now, let s add a random unit effect 40 / 96

41 Now let s generate clustered data We ll look at the code in more detail later. Still generates Y as a function of a single X measured at Level 1 True β 0 = 0.3 True β 1 = 0.5 We estimate three models Simple OLS Unit dummy variables Multi-level with random intercepts. We use three scenarios: No clustering, Clustering only, and Clustering correlated with the mean of X. 41 / 96

42 No Clustering, OLS lm(formula = Y ~ X) coef.est coef.se (Intercept) X n = 400, k = 2 residual sd = 0.96, R-Squared = / 96

43 No Clustering, Unit Dummies lm(formula = Y ~ X + factor(c.label) - 1) coef.est coef.se X factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) n = 400, k = 11 residual sd = 0.96, R-Squared = / 96

44 No Clustering, MLM with Random Intercepts lmer(formula = Y ~ X + (1 c.label)) coef.est coef.se (Intercept) X Error terms: Groups Name Std.Dev. c.label (Intercept) 0.00 Residual number of obs: 400, groups: c.label, 10 AIC = , DIC = deviance = / 96

45 Unit-level Coefficients from MLM (Intercept) X / 96

46 Simulation Study, No Unit Effects Simulation study with 500 repetitions True β 0 = 0.3 True β 1 = 0.5 Intercluster correlation (ρ) = 0 Correlation between units and X j = 0 46 / 96

47 Density Plots for β 1, No Unit Effects 47 / 96

48 Coverage Probs for β 1, No Unit Effects 48 / 96

49 Unit Effects, OLS lm(formula = Y ~ X) coef.est coef.se (Intercept) X n = 400, k = 2 residual sd = 0.97, R-Squared = / 96

50 Unit Effects, Unit Dummies lm(formula = Y ~ X + factor(c.label) - 1) coef.est coef.se X factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) n = 400, k = 11 residual sd = 0.68, R-Squared = / 96

51 Unit Effects, MLM lmer(formula = Y ~ X + (1 c.label)) coef.est coef.se (Intercept) X Error terms: Groups Name Std.Dev. c.label (Intercept) 0.74 Residual number of obs: 400, groups: c.label, 10 AIC = 874.3, DIC = 854 deviance = / 96

52 Unit-level Coefficients from MLM (Intercept) X / 96

53 Comparison of FE and MLM Intercepts FE MLM [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] > var(fe.mlm.intercepts[,1]) [1] > var(fe.mlm.intercepts[,2]) [1] / 96

54 Simulation Study, Unit Effects Simulation study with 500 repetitions True β 0 = 0.3 True β 1 = 0.5 Intercluster correlation (ρ) = 0.5 Correlation between units and X j = 0 54 / 96

55 Density Plots for β 1, Unit Effects 55 / 96

56 Coverage Probs for β 1, Unit Effects 56 / 96

57 Unit Effects Correlated with Mean of X, OLS lm(formula = Y ~ X) coef.est coef.se (Intercept) X n = 400, k = 2 residual sd = 0.93, R-Squared = / 96

58 Unit Effects Correlated with Mean of X, FE lm(formula = Y ~ X + factor(c.label) - 1) coef.est coef.se X factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) factor(c.label) n = 400, k = 11 residual sd = 0.65, R-Squared = / 96

59 Unit Effects Correlated with Mean of X, MLM lmer(formula = Y ~ X + (1 c.label)) coef.est coef.se (Intercept) X Error terms: Groups Name Std.Dev. c.label (Intercept) 0.71 Residual number of obs: 400, groups: c.label, 10 AIC = 835.7, DIC = deviance = / 96

60 Simulation Study, Unit Effects Corr. with X j Simulation study with 500 repetitions True β 0 = 0.3 True β 1 = 0.5 Intercluster correlation (ρ) = 0.5 Correlation between units and X j = / 96

61 Density Plots for β 1, Unit Effects Corr With X 61 / 96

62 Challengers in State Legislative Races (Hogan 2008, AJPS) Dependent variables: challenged = challenger emerged or not (0,1) challe a = challenger spending (in thousands of dollars I believe) Main IV: partisan. Measure of extremity of incumbent s voting record Level 2 IV: legislat. Legislative Professionalism Observations clustered in states 62 / 96

63 Complete Pooling Complete pooling: y i = α + βx i + ε i m1 <- lm(challe_a ~ partisan, data = hogan) display(m1) # Could also do summary(m1) lm(formula = challe_a ~ partisan, data = hogan) coef.est coef.se (Intercept) partisan n = 1280, k = 2 residual sd = 64.66, R-Squared = / 96

64 No pooling: y i = α j[i] + βx i + ε i No Pooling m2 <- lm(challe_a ~ -1 + partisan + factor(state), data = hogan) display(m2) lm(formula = challe_a ~ -1 + partisan + factor(state), data = hogan) coef.est coef.se partisan factor(state)ak factor(state)ca factor(state)fl factor(state)id factor(state)il factor(state)ky factor(state)me factor(state)mi factor(state)mn factor(state)oh factor(state)or factor(state)ut n = 1280, k = 13 residual sd = 60.36, R-Squared = / 96

65 Separate Models states <- unique(hogan$state) results <- list() coef.sm <- numeric(length(states)) se.sm <- numeric(length(states)) for(i in 1:length(states)){ results[[i]] <- lm(challe_a ~ partisan, data = subset(hogan, state == paste(states[i]))) coef.sm[i] <- results[[i]]$coef[2] se.sm[i] <- sqrt(vcov(results[[i]])[2, 2]) } 65 / 96

66 Complete Pooling/No Pooling/Separate Models The coefficient on partisan: Complete Pooling No Pooling Separate Models β partisan ^ ME AK UT ID MN KY OR MI IL OH FL CA State 66 / 96

67 Basic Multilevel Analysis y i N (α j[i] + βx i, σ 2 y ), for i = 1,..., n This looks kind of like the no pooling model In no pooling, α j s are set to the standard OLS estimates, which correspond to the fitted intercept shifts for each state Estimates of intercept shifts could be way off if sample size in a state is small Also looks kind of like the complete pooling model In complete pooling, α j s are given a hard constraint (all fixed at common α) Probably not a realistic constraint if we think there is a group effect in the data 67 / 96

68 Basic Multilevel Analysis Multilevel models have a soft constraint applied to the α j s: they are assigned a probability distribution α j N (µ α, σ 2 α), for j = 1,..., J with their mean µ α and the standard deviation σ α estimated from the data Result: a weighted average between the complete and no pooling models This has the effect of pulling estimates of α j towards the mean µ α, but not all the way Thus, we have partial pooling As σ α, the soft constraint does nothing and there is no pooling As σ α 0, they pull the estimates all the way to a single common mean: complete pooling 68 / 96

69 Varying Intercept Model m4 <- lmer(challe_a ~ partisan + (1 state), data = hogan) display(m4) lmer(formula = challe_a ~ partisan + (1 state), data = hogan) coef.est coef.se (Intercept) partisan Error terms: Groups Name Std.Dev. state (Intercept) Residual number of obs: 1280, groups: state, 12 AIC = 14158, DIC = deviance = The number 1 in the parentheses tells the lmer() function that you want the intercept to vary by state. 69 / 96

70 Estimated Coefficients To see the estimated model within each state > round(coef(m4)$state, digits = 3) (Intercept) partisan AK CA FL ID IL KY ME MI MN OH OR UT In Alaska the estimated line is y = (partisan) Slopes are all identical because only the intercepts were allowed to vary 70 / 96

71 [,1] [,2] factor(state)ak factor(state)ca factor(state)fl factor(state)id factor(state)il factor(state)ky factor(state)me factor(state)mi factor(state)mn factor(state)oh factor(state)or factor(state)ut > var(np.mlm.intercepts[,1]) [1] > var(np.mlm.intercepts[,2]) [1] Compare Unit Effects Estimates 71 / 96

72 Fixed and Random Effects Alternatively, we can look separately at the estimated model averaging over the states (fixed effects) and the state-level errors (random effects) > round(fixef(m4), digits = 3) (Intercept) partisan In an average state the estimated line is y = (partisan) 72 / 96

73 Fixed and Random Effects The state-level errors > round(ranef(m4)$state, digits = 3) (Intercept) AK CA FL ID IL KY ME MI MN OH OR UT These tell us how much the intercept shifts up or down in particular states. Example: For Alaska, the intercept is above average 73 / 96

74 Uncertainties in the Estimated Coefficients The arm package has functions to compute the standard errors from a lmer() object > round(se.fixef(m4), digits = 3) (Intercept) partisan > data.frame(se.ranef = round(se.ranef(m4)$state, digits = 3), state.n = as.vector( X.Intercept. state.n AK CA FL ID IL KY ME MI MN OH OR UT State intercept standard errors differ according to the sample size in each state 74 / 96

75 Complete Pooling, No Pooling, and Multilevel Model 75 / 96

76 Complete Pooling, No Pooling, and Multilevel Model What do we see in the previous figure? The intercept and slopes are identical for every state with complete pooling The intercepts vary for no pooling (dummy variables) but the slopes are the same. The intercepts vary for the multi-level model, but the black lines are all a bit closer to the red one than are the blue lines. Still, the slopes are identical. NOTE: the figure set the limits on the Y and X axes to be the same to facilitate comparison, but some actual data points don t appear for a few states as a result. 76 / 96

77 Comparing the Intercepts 77 / 96

78 Formulating/Reformulating the Varying Intercept Model We have just seen the varying intercept model can be written as y i = α j[i] + β 1X i1 + ε i Or, in matrix notation y i = α j[i] + X i β + ε i where X includes all independent variables but not the constant X can even include group-level variables 78 / 96

79 Formulating/Reformulating the Varying Intercept Model The second level of the model is α j N (µ α, σα) 2 Which can also be written as α j = µ α + η j, with η j N (0, σα) 2 where the η j are group level errors 79 / 96

80 Group-Level Predictors Let s add a state-level predictor, legislative professionalism We use the formulation y i N (α j[i] + βx i, σ 2 y ), α j N (γ 0 + γ 1u j, σ 2 α), for i = 1,..., n for j = 1,..., J where x i is the variable partisan and u j is legislative professionalism (legislat) 80 / 96

81 Group-Level Predictors m5 <- lmer(challe_a ~ partisan + legislat + (1 state), data = hogan) display(m5) lmer(formula = challe_a ~ partisan + legislat + (1 state), data = hogan) coef.est coef.se (Intercept) partisan legislat Error terms: Groups Name Std.Dev. state (Intercept) Residual number of obs: 1280, groups: state, 12 AIC = , DIC = deviance = / 96

82 Coefficient Results > round(coef(m5)$state, digits = 3) (Intercept) partisan legislat AK CA FL ID IL KY ME MI MN OH OR UT / 96

83 Intercepts as a Function of Leg. Prof. m5 Estimated state intercepts plotted against legislative professionalism and the state-level regression line α j = γ 0 + γ 1u j 83 / 96

84 Varying Intercepts and Varying Slopes Next step: allow more than one coefficient to vary by group Let s start simple by (temporarily) dropping the state-level predictor The model is y i N (α j[i] + β j[i] x i, σ 2 y ), for i = 1,..., n ( ) (( ) ( ) ) α j µ α σ 2 N, α ρσ ασ β, for j = 1,..., J β j µ b ρσ ασ β which allows variation in both the α j s and the β j s, as well as a between-group correlation parameter ρ σ 2 β 84 / 96

85 Fitting in R > m6 <- lmer(challe_a ~ partisan + (1 + partisan state), data = hogan) > display(m6) lmer(formula = challe_a ~ partisan + (1 + partisan state), data = hogan) coef.est coef.se (Intercept) partisan Error terms: Groups Name Std.Dev. Corr state (Intercept) partisan Residual number of obs: 1280, groups: state, 12 AIC = 14161, DIC = deviance = The unexplained within-state variation has an estimated SD of ˆσ y = The estimated SD of the state intercepts is ˆσ α = The estimated SD of the state slopes is 2.69 The intercepts and slopes correlate at AIC increased 85 / 96

86 Intercepts and Slopes by State > round(coef(m6)$state, digits = 3) (Intercept) partisan AK CA FL ID IL KY ME MI MN OH OR UT / 96

87 Including Group Level Predictors We can expand the basic model by including group level predictors ( ) (( ) α j γ α 0 + γ α ( ) 1 u ) j σ 2 N β j γ β 0 + γβ 1 u, α ρσ ασ β j ρσ ασ β σβ 2, for j = 1,..., J 87 / 96

88 Fitting in R m7 <- lmer(challe_a ~ partisan + legislat + (1 + partisan state), data = hogan) display(m7) lmer(formula = challe_a ~ partisan + legislat + (1 + partisan state), data = hogan) coef.est coef.se (Intercept) partisan legislat Error terms: Groups Name Std.Dev. Corr state (Intercept) partisan Residual number of obs: 1280, groups: state, 12 AIC = , DIC = deviance = / 96

89 Slope and Intercept Estimates by State > round(coef(m7)$state, digits = 3) (Intercept) partisan legislat AK CA FL ID IL KY ME MI MN OH OR UT / 96

90 Slopes and Intercepts, Deviated from State Means > round(ranef(m7)$state, digits = 3) (Intercept) partisan AK CA FL ID IL KY ME MI MN OH OR UT / 96

91 Intercepts as a Function of Leg. Prof. m7 91 / 96

92 Slopes on X as a Function of Leg. Prof. m7 92 / 96

93 State intercepts and state slopes 93 / 96

94 Some Practical Questions How many groups? When J is small (e.g. less than 5), it is difficult to estimate between-group variation Result is that multilevel models add little beyond standard no pooling models Basic problem: when σ α can t be estimated well, it tends to be overestimated, and so the partially pooled estimates are close to no pooling However, multilevel should not do worse than no pooling 94 / 96

95 Some Practical Questions One or two groups With only one or two groups, MLM reduces to standard regression Typically, in this situation you would would run a standard regression with an indicator (i.e. an indicator for gender rather than a multilevel model with male and female groups) 95 / 96

96 Some Practical Questions How many observations per group? Even two observations per group is enough to fit a MLM Technically, acceptable to have only one observation in a group Groups with small n will have imprecisely estimated coefficients, but may still contain useful information 96 / 96

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and