Introduction to the Analysis of Hierarchical and Longitudinal Data

Size: px
Start display at page:

Download "Introduction to the Analysis of Hierarchical and Longitudinal Data"

Transcription

1 Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7,

2 Graphical overview of selected concepts Nature of hierarchical models Interpretation of components of the models: data space and beta space Understanding T BLUPs Mixed model: marginal or conditional effects Contextual and compositional effects Use one data example: MathAch ~ SES in a sample of Public and Catholic Schools used by Bryk and Raudenbush (1992) from the 1982 High School and Beyond survey. The data used comprise 7185 U.S. high-school students from 160 schools: 90 are public and 70 are Catholic. WARNING: Babel of notation. Early bilingualism an asset. HLM Notation SAS Notation Level 1 true effects β b Population fixed effects γ β Random effects variance T G matrix Within-cluster variance σ 2 σ 2 2

3 Looking at a single school: Public School P4458 Relationship between MathAch and SES Regression output > summary(lm(mathach ~ SES, zz1)) Call: lm(formula = MathAch ~ SES, data = zz1) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) SES Residual standard error: on 46 degrees of freedom Multiple R-Squared: F-statistic: on 1 and 46 degrees of freedom, the p-value is Correlation of Coefficients: (Intercept) SES

4 Model: Y = β + β X + ε ε iid N(0, σ 2 ) i 0 1 i i i Fit: Y = b + bx i 0 1 i 4

5 Public School 4458 MathAch SES Figure 1: Data in one school 5

6 Public School 4458 MathAch SES Figure 2: Data with least squares line 6

7 Public School 4458 MathAch SES Figure 3: Data with least squares line and vertical axis at SES = 0 7

8 P 4458: Estimate of int. and slope b b(ses) Figure 4: Beta space: every line in data space is represented by a point in beta space. The least squares line corresponds to the point ˆβ 8

9 P 4458: Estimate of int. and slope b b(ses) Figure 5: Beta space: every line in data space is represented by a point in beta space. The slope of the least squares line, ˆβ, is obtained by projecting the point βˆ onto the horizontal axis. SES 9

10 P 4458: Estimate of int. and slope b b(ses) Figure 6: Beta space: every line in data space is represented by a point in beta space. The intercept of the least squares line, ˆβ, is obtained by projecting the point βˆ onto the horizontal axis. 0 10

11 P 4458: Estimate of int. and slope b b(ses) Figure 7: 95% Scheffe simultaneous confidence ellipse for the intercept and slope of the true line. 11

12 P 4458: Estimate of int. and slope b b(ses) Figure 8: 95% 2-dimensional Scheffe confidence ellipse rejects hypothesis with 2 constraints: H : β = β =

13 P 4458: Estimate of int. and slope b b(ses) Figure 9: 95% confidence interval ellipse (in red). Shadow on any line is a 95% confidence interval for projections of intercept and slope on the line. 13

14 P 4458: Estimate of int. and slope b b(ses) Figure 10: Shadows are ordinary 95% confidence intervals. Shadow on the horizontal axis is a 95% confidence interval for the true slope. Note that the interval includes β 1 = 0 so we accept H 0 : β 1 = 0 14

15 P 4458: Estimate of int. and slope b b(ses) Figure 11: Shadows of 2-dim CE are Scheffe confidence intervals adjusted for fishing in 2-D space (but we don t need that now) 15

16 P 4458: Estimate of int. and slope b b(ses) Figure 12: Another approach to testing H 0 : β 1 = 0: does the line β 1 = 0 cross the ellipse? 16

17 Comparing 2 schools: P4458 and C1433 Regression output: > summary(lm(mathach ~ SES * Sector, zz)) Call: lm(formula = MathAch ~ SES * Sector, data = zz) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) SES Sector SES:Sector Residual standard error: on 79 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 79 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) SES Sector SES Sector SES:Sector

18 > fit$contrasts $Sector: Catholic Public 0 Catholic 1 18

19 P4458 and C1433 MathAch SES Figure 13: Data space : Public school = o, Catholic shool = + 19

20 P4458 and C1433 MathAch SES Figure 14: Data space : Public school = o, Catholic shool = triangles 20

21 P4458 and C1433 MathAch SES Figure 15: Data with least squares lines 21

22 P4458 and C1433 MathAch SES Figure 16: Interpreting the least squares estimated coefficients 22

23 Testing no school (=sector) effect Two ways: Method 1: Compare 2 models with and without Sector factor With Sector(full model): Value Std. Error t value Pr(> t ) (Intercept) SES Sector SES:Sector Residual standard error: on 79 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 79 degrees of freedom, the p-value is 0 Without Sector: Value Std. Error t value Pr(> t ) (Intercept) SES Residual standard error: 5.3 on 81 degrees of freedom Multiple R-Squared: F-statistic: 110 on 1 and 81 degrees of freedom, the p-value is 1.11e

24 Comparison: Analysis of Variance Table Response: MathAch Terms Res.Df RSS Test Df Sum of Sq F Value Pr(F) 1 SES SES * Sector Sector+SES:Sector e-009 Method 2: Linear hypothesis: H : β = β = 0 0 Sector Sector:SES With Sector(full model): Value Std. Error t value Pr(> t ) (Intercept) SES Sector SES:Sector Residual standard error: on 79 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 79 degrees of freedom, the p-value is 0 24

25 Test of Linear Hypothesis: > library(car) # John Fox s Companion to Applied Regression > L <- diag(4)[c(3, 4), ] > L [,1] [,2] [,3] [,4] [1,] [2,] > coef(fitfull) (Intercept) SES Sector SES:Sector > linear.hypothesis(fitfull, L) F-Test SS = SSE = F = Df = 2 and 79 p = e

26 The dual in Beta space 26

27 P4458 and C1433 MathAch SES Figure 17: Least squares lines in data space 27

28 P 4458 and C1433: Estimates of int. and slope b b(ses) Figure 18: Least squares lines in beta space 28

29 P 4458 and C1433: Estimates of int. and slope b b(ses) Figure 19: Least squares lines in beta space 29

30 P4458 and C1433: Estimates of int. and slope b b(ses) Figure 20: Least squares lines in beta space with 95% confidence ellipses 30

31 Code in Splus? (even easier using library car in R) > plot(0, 0, xlim = c(-5, 10), ylim = c(-5, 25), xlab = "b(ses)", ylab = "b0", main = "P 4458 and C1433: Estimates of int. and slope", type = "n") > summary(fitfull <- lm(mathach ~ SES * Sector, zz)) Call: lm(formula = MathAch ~ SES * Sector, data = zz) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) SES Sector SES:Sector Residual standard error: on 79 degrees of freedom Multiple R-Squared: F-statistic: on 3 and 79 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) SES Sector SES Sector SES:Sector > coef(fitfull) 31

32 (Intercept) SES Sector SES:Sector > L.Public <- diag(4)[c(2, 1), ] > L.Public [,1] [,2] [,3] [,4] [1,] [2,] > L.Catholic <- diag(4)[c(2, 1), ] + diag(4)[c(4, 3), ] > L.Catholic [,1] [,2] [,3] [,4] [1,] [2,] > lines(ell( h.pub <- glh(fitfull, L.Public)), col = 6) > lines(ell( h.cat <- glh(fitfull, L.Catholic)), col = 8) > abline( v = h.pub[[1]][1], col = 6) > abline( v = h.pub[[2]][1], col = 6) > abline( h = h.pub[[1]][2], col = 8) > abline( h = h.pub[[2]][2], col = 8) 32

33 Comparing 2 Sectors Some approaches: Two-stage or derived variables : Stage1: Perform a regression on each School to get ˆβ for each school Stage2: Analyze ˆβ s from each school using multivariate methods to estimate mean line in each sector. Use multivariate methods to compare the two lines Doing this graphically: 33

34 Public Schools: Estimates of int. and slope (data space) M athach SES Figure 21: Public school regression in data space 34

35 All Schools: Estimates of int. and slope (data space) M athach SES Figure 22: Mean public school regressions 35

36 Catholic Schools: Estimates of int. and slope (data space) MathAch SES Figure 23: Catholic school regressions in data space 36

37 All Schools: Estimates of int. and slope (data space) MathAch SES Figure 24: All schools: least squares lines in data space 37

38 All Schools: Estimates of int. and slope (data space) MathAch SES Figure 25: Mean regression in each sector in data space 38

39 All Schools: Estimates of int. and slope (beta space) b b(ses) Figure 26: All schools: least squares lines in beta space 39

40 All Schools: Estimates of int. and slope (beta space) b b(ses) Figure 27: Standard dispersion ellipse for each sector 40

41 All Schools: Estimates of int. and slope (beta space) b b(ses) Figure 28: Standard dispersion ellipse and 95% confidence ellipse for each sector 41

42 What s wrong with this? If every school sample had: 1: the same number of subjects 2: the same set of values of SES 3: the same σ 2 (we ve been tacitly assuming this anyways) Then the shape of the confidence ellipses in beta space would be identical for every school and the 2 stage procedure would give the same result as a classical repeated measures analysis. With the present data, with different N s and varying distributions of SES in each school, the 2-stage process ignores the fact that some β ˆi s are measured with greater precision than others (e.g. if the ith school has a large N or a large spread in SES). We could attempt to correct this problem by using weights equal to Var( βˆ ) ( XX ) but this is the ' 1 i =σ 2 i i variance of β ˆi as an estimator of the true β i for the ith School, NOT the underlying βfor the Sector. Thus using σ 2 ( XX ' ) 1 i i would provide weights that are too variable from school to school. Note: To extent that the schools are similar wrt predictors then the approach above is plausible -- but we ll see mixed models are often easier. 42

43 Other approach: Fixed effects model: Use School as a categorical predictor. Construct contrasts with 1 degree of freedom for between Sector comparisons. Drawbacks: Inferences generalize to news students from same schools, not new schools from population of schools 43

44 The Hierarchical Model Basic structure: 1. Each school has a true regression line that is not directly observed 2. The observations from each school are generated by taking random observations generated with the school s true regression line 3. The true regression lines come from a population of regression lines Within School model: For school i: (For now we suppose all schools come from the same population, e.g. only one Sector) β 1) True but unknown 0i β β 0i i = β = β for each school 2) The data are genereated as SESi 1i Y = β + β X + ε ε ~ N (0, σ 2 ) independent of β ' s ij 0i 0i ij ij ij i 44

45 Between School model: β are sampled from a population of schools i β = γ+ u u ~ ( 0, Τ ) i i i N where τ τ Τ = 1 τ τ is a variance matrix. Note: Var( β i) = τ Var( β ) = τ Cov( β, β ) = τ i 11 0i 1i 10 45

46 Example: A simulated sample. To generate an example we need to do something with SES although its distribution is not part of the model. In the model the values of SES are taken as given constants. We will take: γ= 12, Τ = 16 8, σ 2 = Once we have generated β i we generate Ni ~ Poisson (30) and SES ~ N( 0,1) Here s our first simulated school in detail: For i = 1: u 6.756, i = β i = γ+ u i = + =

47 Ni = 23 SES : ε : Y = β + β SES + ε : ij 0i 1i ij ij

48 Simulated school (data space) MathAch SES Figure 29: Simulation: mean population regression line γ 48

49 Simulated school (data space) MathAch SES Figure 30: Simulated school: True regression line in School 1: β = γ+ u i i 49

50 Simulated school (data space) MathAch SES Figure 31: Simulated school: True regression line and data 50

51 Simulated school (data space) MathAch SES Figure 32: Simulated school with true line (dashed), data and least-squares line (solid) 51

52 Simulated school (beta space) b b(ses) Figure 33: Simulated school: population mean line in beta space 52

53 Simulated school (beta space) b b(ses) Figure 34: Simulated school: population mean 'line' and standard dispersion ellipse (shadows are means +/- 1 SD) 53

54 Simulated school (beta space) b b(ses) Figure 35: Simulated school: population mean line and 'true' line for School 1 54

55 Simulated school (beta space) b b(ses) Figure 36: Simulated school: true line with dispersion ellipse for least-squares line from true line. 55

56 Simulated school (beta space) b b(ses) Figure 37: Simulated school: true line with dispersion ellipse for least-squares line from true line and least-squares line from School 1. 56

57 Simulated school (beta space) b b(ses) Figure 38: Simulated school: dispersion ellipse of least-squares lines from population regression line which is almost coincident with the dispersion ellipse of the true line from population line. 57

58 Note that with smaller N, larger σ 2 or smaller dispsersion for SES, these dispersion ellipse for the true β i (with matrix T ) and the dispersion ellipse for β ˆ 2 i (with matrix T+ σ ( X' X ) 1 ) could differ much more than they do here What does T mean? Subtitle: T and the variance of Y It s fairly clear what Τ= Var( β ) means in beta space. What does it mean for data space? i The following graphs show 50 true lines in beta space and then in data space. 58

59 50 simulated schools (beta space) b b(ses) Figure 39: 50 simulated β i s in beta space 59

60 50 simulated schools (beta space) b b(ses) Figure 40: 50 simulated β i s in beta space 60

61 50 simulated schools (beta space) b b(ses) Figure 41: 50 simulated β i s in beta space and the mean of 61

62 50 simulated schools (beta space) b b(ses) Figure 42: Standard data ellipse (blue). 62

63 50 simulated schools (beta space) b b(ses) Figure 43: 95% confidence ellipse for mean regression line (we happen to be lucky to include the true value this time) 63

64 What does this look like in data space? i.e. What does T really mean? Frequently: theory easier in beta space, interpretation easier in data space. 64

65 50 simulated schools (data space) MathAch SES 12 Figure 44: Population mean line γ = 2 65

66 50 simulated schools (data space) MathAch SES 12 Figure 45: Population mean line γ=, and 50 values of ˆ 16 8 βi = γ+ ui with Var( ui) = Τ =

67 50 simulated schools (data space) MathAch SES Figure 46: Standard dispersion bands: Over any values of SES approximately 68% of sampled lines should lie between the bands 67

68 50 simulated schools (data space) MathAch SES Figure 47: Standard dispersion bands with 50 sampled true lines 68

69 50 simulated schools (data space) MathAch SES Figure 48: Width of standard dispersion bands at 3 values of SES 69

70 50 simulated schools (data space) MathAch SES Figure 49: Point of minimum variance: x = τ / τ = 8/25= 0.32 min

71 50 simulated schools (data space) MathAch SES Figure 50: SD at x = 0: τ 00 = 16 = 4 SD at x = x 2 2 min : τ 00 τ 01 / τ 11 = 16 8 /25 = =

72 To complete the simulation we still need to generate X s and Y s for the remaining 49 schools and analyze the results but let s analyze the actual data 72

73 Recap: 50 Simulated Schools: data space Level 1: Yij = β0 i+ β1 ixij + εij with ε independent N(0, σ 2 ) ij Level 2: β0 i = γ00+ γ01sectori + u β = γ + γ Sector + u 0i 1i i 1i with u 0i independently ~ N, u 1i 0 τ10 τ 11 τ τ and independent of ε ij s So EY ( ) = E( β + β X + ε ) ij 0i 1i ij ij = E( β ) + E( β X ) + E( ε ) = γ + γ X 0 1 0i 1i ij ij ij and Var( Y ) = Var( β + β X + ε ) ij 0i 1i ij ij = Var( β + β X ) + Var( ε ) 0i 1i τ τ 1 = + σ X 00 0 ij τ X 10 τ 11 ij 2X X2 00 ij ij + ij = τ + τ + τ ij σ2 73

74 The variance of the height of the true line is Var( β + β X ) = 1 X 0i 1i ij τ τ ij τ X 10 τ 11 ij 2X 2 00 ij 01 11Xij = τ + τ + τ This is a quadratic in X which has a mininum of at τ τ τ / 11 X τ τ = / min

75 Analysis of data (Public Schools only) Model: Level 1: Yij = β0 i+ β1 ixij + εij with εij independent N(0, σ 2 ) β0 i = γ00+ γ01sectori + u0 Level 2: β = γ + γ Sector + u 1i i 1i i u 0i 1 with independently ~ N 0 τ τ 00 0, u 0 τ τ 1i and independent of ε ij s Reduced Model: Y = β + β X + ε ij 0i 1i ij ij ( i) ( i) = γ + γ Sector + u + γ + γ Sector + u X + ε i i ij ij = γ + γ X + γ Sector + γ Sector X ij 01 i 11 i ij + u + u X + ε 0i 1i ij ij [fixed effects] [random effects] [micro level error] 75

76 If analyse only one Sector with Sector = 0, then the model is: Level 1: Same Level 2: β = γ + u 0i 00 β = γ + u 0i 1i 10 1i u 0 τ τ u 1i 0 τ10 τ 11 0i with independently ~ N, and independent of ε ij s Reduced Model: Y = β + β X + ε ij 0i 1i ij ij ( 00 0i) ( 10 1i) = γ + u + γ + u X + ε = γ + γ X ij + u + u X + ε 0i 1i ij ij ij ij [fixed effects] [random effects] [micro level error] 76

77 A look at data: 77

78 School MEANSES Minority Sex Data Yes No Female Male Fraction of 3642 (NA: 0 ) SES MathAch Size Sector Data Data Data Catholic Public Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) PRACAD DISCLIM HIMINTY PropFemale Data Data Data Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Figure 51: Uniform quantile plots and bar charts 78

79 SecSchool PropMinority MeanSES C8628 C9508 C2629 C3533 C4511 C4223 C7364 C5192 C4523 C7011 C5650 C4173 C5720 C3610 C3838 C3427 C1477 C7688 C9347 C2458 C3020 C3705 C7635 C7332 C8150 C8165 C6366 C2526 C1317 C3039 C9359 C4868 C4931 C3992 C4042 C3688 C5667 C5619 C2208 C2658 C3499 C9198 C1906 C8009 C8857 C1308 C6816 C2755 C1436 C9586 C9021 C2990 C3498 C1433 C9104 C6469 C5404 C2277 C8800 C1462 C2305 C4530 C6578 C4292 C7342 C4253 C5761 C7172 C6074 C8193 P3332 P5937 P1461 P1942 P5762 P4458 P2639 P2917 P8854 P1637 P2655 P5815 P6464 P3657 P9292 P3377 P7890 P6291 P6990 P3088 P7734 P1499 P6144 P1224 P1296 P3716 P9158 P9340 P6443 P8946 P8874 P8775 P2771 P2467 P6170 P2995 P8983 P3967 P4420 P6397 P8477 P6415 P6484 P5640 P8175 P7341 P8357 P6808 P3999 P7232 P2626 P2768 P6600 P4325 P3013 P1358 P1374 P1946 P7101 P8367 P3152 P7345 P9550 P2651 P4383 P7276 P8202 P6089 P4410 P8627 P2818 P4642 P1288 P3881 P9397 P8188 P4350 P8707 P5838 P5783 P5819 P1909 P9225 P7697 P3351 P2030 P6897 P8531 P2336 P7919 Data Data Fraction of 3642 (NA: 0 ) Fraction of 3642 (NA: 0 ) Figure 52: Uniform quantile plots and bar charts 79

80 MathAch SES Figure 53: Trellis plot of data in each school ordered by mean SES (lowest 30 Public schools) 80

81 MathAch SES Figure 54: Trellis plot of data in each school ordered by mean SES (middle 30 Public schools) 81

82 MathAch SES Figure 55: Trellis plot of data in each school ordered by mean SES (highest 30 Public schools) 82

83 83

84 MathAch SES Figure 56: Trellis plot of data in each school ordered by mean SES (lowest 30 Public schools) 84

85 MathAch SES Figure 57: Trellis plot of data in each school ordered by mean SES (middle 30 Public schools) 85

86 MathAch SES Figure 58: Trellis plot of data in each school ordered by mean SES (highest 30 Public schools) 86

87 Public Schools: Estimates of int. and slope (beta space) b b(ses) Figure 59: Least squares estimates of intercept and slope in beta space 87

88 Public Schools: Estimates of int. and slope (data space) MathAch SES Figure 60: Least-squares lines from each school in data space 88

89 Each point β ˆi has variance: Var( βˆ ) = T+σ 2 ( X X ) ' 1 i i i If the X i s are all the same, then all the β ˆi s have the same variance and each gets the same weight in γ 00 estimating γ =. We can use classical methods. γ 10 Otherwise.. mixed models!!! Here even N i differs from school to school so we use a mixed model analysis 89

90 S-Plus library(nlme) output > fitp <- lme( MathAch ~ SES, dsp, random = ~ 1 + SES School ) > summary(fitp) Linear mixed-effects model fit by REML Data: dsp AIC BIC loglik Random effects: Formula: ~ 1 + SES School Structure: General positive-definite StdDev Corr (Intercept) (Inter ˆ τ 00 SES ˆ τ ˆ τ / ˆ τ ˆ τ Residual ˆ σ 2 Fixed effects: MathAch ~ SES Value Std.Error DF t-value p-value (Intercept) <.0001 SES <.0001 γ γ ˆ00 ˆ10 90

91 Public Schools: Estimates of int. and slope (beta space) b b(ses) Figure 61: Least-squares estimates (open circles) and mixed model estimate (solid circle) 91

92 Public Schools: Estimates of int. and slope (beta space) b b(ses) Figure 62: Least-squares estimates, mixed model estimate and unweighted mean of least-squares estimates 92

93 Public Schools: Estimates of int. and slope (data space) MathAch SES Figure 63: Least-squares estimates from each school (data space) 93

94 Public Schools: Estimates of int. and slope (data space) MathAch SES Figure 64: Least-squares estimates and mixed model overall estimate 94

95 Public Schools: Estimates of int. and slope (data space) MathAch SES Figure 65: Least-squares estimates, mixed model overall estimate and unweighted average of leastsquares lines 95

96 Interpreting estimated variances ˆT: Estimated variance of the true β i 96

97 Public Schools: Estimates of int. and slope (beta space) b b(ses) Figure 66: Least-squares estimates and mixed model overall estimate 97

98 Public Schools: Estimates of int. and slope (beta space) b b(ses) Figure 67: above + estimated dispersion ellipse for T: note that it has almost collapsed to a line 98

99 ˆT has almost collapsed to a line. What does this mean? The variability in β has tw T and ' ( ) 1 ˆi o components: the variability in one dimension entirely with σ ' ( ) 1 σ 2 XX. The REML process finds that it can explain i i i i 2 XX and no contribution from T. In this data, the departure from parallelism of the β ˆi s can be explained by the random variation within schools. There is little evidence of differences in true slope from school to school. We can test this hypothesis by fitting a model with only a random intercept and no random slope. But we leave the discussion to later. 99

100 Estimating β i Is least squares all there is? If we had only one school, the data from that school would be the only source of information (except to a Bayesian) about the true β for that school. i With a sample of many schools we have two sources of information for the ith school: 1. the data from the school 2. the data from other schools The data from the school is generated from the true line. The error of estimation of the true line with ( ' ) 1 σ 2 XX β ˆi is But ˆγ is also an estimate of β i with variance T We can combine them together into an optimal estimator by using inverse variance weights. 100

101 If we knew the variance and population parameters: Source Estimator Variance ith school ' ˆi σ 2 XX i i β ( ) 1 Other schools γ T Optimal combination: Best Linear Unbiased Predictor (contrast with Best Linear Unbiased Estimator) BLUP: 1 1 [ ] 1 ' 1 ( ) ˆ i σ 2 = i i i σ2 ' 1 ' 1 ( ) ( ) ˆ i i σ2 = XX + T XX i i βi+ T γ 1 σ 2 ' 1 ' 1 ( ) ( ) ˆ i i σ 2 = XX + T XX i i βi+ T γ β XX β T γ EBLUP (Empirical BLUP): same but use ˆ σ 2, Tˆ, γ ˆ What does this look like in practice? Note: Small Τ implies a big Τ 1 and γ gets a large weight. 101

SPIDA 2012 Part 2 Mixed Models with R: Hierarchical Models to Mixed Models

SPIDA 2012 Part 2 Mixed Models with R: Hierarchical Models to Mixed Models SPIDA 2012 Part 2 Mixed Models with R: Hierarchical Models to Mixed Models Georges Monette May 2012 Email: georges@yorku.ca http://scs.math.yorku.ca/index.php/spida_2012june 2010 1 Contents The many hierarchies

More information

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij Hierarchical Linear Models (HLM) Using R Package nlme Interpretation I. The Null Model Level 1 (student level) model is mathach ij = β 0j + e ij Level 2 (school level) model is β 0j = γ 00 + u 0j Combined

More information

SPIDA Mixed Models with R Hierarchical Models to Mixed Models

SPIDA Mixed Models with R Hierarchical Models to Mixed Models SPIDA 2010 Mixed Models with R Hierarchical Models to Mixed Models Georges Monette 1 June 2010 e-mail: georges@yorku.ca web page: http://wiki.math.yorku.ca/spida_2010 1 with thanks to many contributors:

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 ) Lab 2 STAT209 1/31/13 A complication in doing all this is that the package nlme (lme) is supplanted by the new and improved lme4 (lmer); both are widely used so I try to do both tracks in separate Rogosa

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

A First Look at Multilevel and Longitudinal Models York University Statistical Consulting Service (Revision 3, March 2, 2006) March 2006

A First Look at Multilevel and Longitudinal Models York University Statistical Consulting Service (Revision 3, March 2, 2006) March 2006 A First Look at Multilevel and Longitudinal Models York University Statistical Consulting Service (Revision 3, March, 006) March 006 Georges Monette with help from Ernest Kwan, Alina Rivilis, Qing Shao

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Workshop 9.3a: Randomized block designs

Workshop 9.3a: Randomized block designs -1- Workshop 93a: Randomized block designs Murray Logan November 23, 16 Table of contents 1 Randomized Block (RCB) designs 1 2 Worked Examples 12 1 Randomized Block (RCB) designs 11 RCB design Simple Randomized

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Random Coefficients Model Examples

Random Coefficients Model Examples Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26 Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign Random Effects Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign Fall 2012 Outline Introduction Empirical Bayes inference

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data Berwin Turlach School of Mathematics and Statistics Berwin.Turlach@gmail.com The University of Western Australia Models

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ) 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Longitudinal Data Analysis. with Mixed Models

Longitudinal Data Analysis. with Mixed Models SPIDA 09 Mixed Models with R Longitudinal Data Analysis with Mixed Models Georges Monette 1 June 09 e-mail: georges@yorku.ca web page: http://wiki.math.yorku.ca/spida_09 1 with thanks to many contributors:

More information

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette What is a mixed model "really" estimating? Paradox lost - paradox regained - paradox lost again. "Simple example": 4 patients

More information

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students

More information

36-463/663: Hierarchical Linear Models

36-463/663: Hierarchical Linear Models 36-463/663: Hierarchical Linear Models Lmer model selection and residuals Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline The London Schools Data (again!) A nice random-intercepts, random-slopes

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

STK4900/ Lecture 10. Program

STK4900/ Lecture 10. Program STK4900/9900 - Lecture 10 Program 1. Repeated measures and longitudinal data 2. Simple analysis approaches 3. Random effects models 4. Generalized estimating equations (GEE) 5. GEE for binary data (and

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

III. Inferential Tools

III. Inferential Tools III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation 22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects Covariance Models (*) Mixed Models Laird & Ware (1982) Y i = X i β + Z i b i + e i Y i : (n i 1) response vector X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information