Simulating MLM. Paul E. Johnson 1 2. Descriptive 1 / Department of Political Science
|
|
- Olivia Cleopatra Horn
- 5 years ago
- Views:
Transcription
1 Descriptive 1 / 76 Simulating MLM Paul E. Johnson Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015
2 Descriptive 2 / 76 Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component
3 Descriptive 3 / 76 Introduction Manufacture example data sets with random intercepts Explore visual manifestations of the related problems
4 Descriptive 4 / 76 Orientation: MLM Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component
5 Descriptive 5 / 76 Orientation: MLM Ordinary Regression An ordinary regression model y 1 β 0 y 2 β 0 y 3 β 0 y 4 = β 0 +. y N. β 0 X 1 1 β 1 X 1 2 β 1 X 1 3 β 1 X 1 4 β 1. X 1 N β ɛ 1 ɛ 2 ɛ 3 ɛ 4. ɛ N (1)
6 Descriptive 6 / 76 Orientation: MLM Shocks b 1, b 2,... affect sets of rows y 1 y 2 y 3 y 1 y 2. y N = + β 0 β 0 β 0 β 0 β 0. β 0 + X 1 1β 1 X 1 2β 1 X 1 3β 1 X 1 1β 1 X 1 2β 1. X 1 N β b 1 b 2. b M ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ 5. ɛ N (2) (3) Z is a matrix of contrast variables y = X β + Zb + ɛ (4)
7 Descriptive 7 / 76 Orientation: MLM The ij Subscripted Version (i = level 2, j = level 1) y 11 y 12 y 13 y 21 y 22. y Mni = + β 0 β 0 β 0 β 0 β 0. β 0 + X 1 11β 1 X 1 12β 1 X 1 13β 1 X 1 21β 1 X 1 22β 1. X 1 Mni β b 1 b 2. b M ɛ 11 ɛ 12 ɛ 13 ɛ 21 ɛ 22. ɛ Mni (5) (6)
8 Descriptive 8 / 76 Explore Simulated Data Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component
9 Descriptive 9 / 76 Explore Simulated Data Generate Data Data Generating Process I can t understand any statistical problem until I can make up data to represent it. In GitHub, I started a rather large simulation exercise called mlmsim. It is in a repository called R-crmda. It is not done yet, but I ll share some ideas here. I *think* the point of the simulations included here is the following: If then we have lots of groups, and they are all the same number of observations, and the distribution of X is the same in each group, Analysis with ordinary one level regression is not so horribly dangerous Scatterplots are not entirely deceptive however, we can see situations where We might be badly deceived by the data The MLM will fix the problem only sometimes, under conditions we can state
10 Descriptive 10 / 76 Explore Simulated Data Generate Data Simple simulation code This generates data for M = 10 sets of level-2 groupings, with n = 3 observations per group. gen1 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) b < rnorm (M, m = 0, bsd ) dat $b < r e p ( b, each = n ) e r r o r < rnorm (M*n, m = 0, sd = esd ) dat $ ynob < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + e r r o r dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $ y < dat $ ynoe + rnorm (M*n, m = 0, sd = esd ) l i s t ( dat = dat, b = b ) } In case you saw a previous version of these notes, I ve converted a script into a function so I can draw samples over and over later (if I want to)
11 Descriptive 11 / 76 Explore Simulated Data Generate Data Simple simulation code... Note, necessary to specify mean & std. dev. of x for each group (xbari, xsdi), as well as value of b i Summary of arguments beta: vector of fixed effect coefficients (slope & intercept) xbari: the expected value of x i xsdi: the standard deviation of x i. xsd: the standard deviation of x ij (diversity within groups) M: number of level 2 groupings n: number of observations within each group bsd: the standard deviation of b i (diversity of group intercepts) esd: the standard deviation of the error term ɛ ij If we want all of the level 2 groups to have the same distribution on x, then we set xsdi=0.
12 Descriptive 12 / 76 Explore Simulated Data Generate Data Simple simulation code... I drew the b i separately from a normal N(0, σ b ), but we could have made this more elaborate by writing out a covariance matrix and using an MVN random draw. We don t need to do that here because b i N(0, bsd 2 )
13 Descriptive 13 / 76 Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component
14 Descriptive 14 / 76 Draw One Simulated Sample Lets suppose there are 10 groups of 3 each. M < 1 0 ; n < 3 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen1 (M = M, n = 3, b e t a = c ( beta0, beta1 ) ) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ] That uses the other default settings of gen1, so we expect the means of x to vary across groups, as x i N(5, 25 2 ). Thus, the groups are not identically distributed The observed values x ij are N( x i, 4 2 )
15 Descriptive 15 / 76 Here s how it appears to a Naive Researcher x y
16 Descriptive 16 / 76 One Group At A Time y x
17 Descriptive 17 / 76 One Group At A Time y x
18 Descriptive 18 / 76 One Group At A Time y x
19 Descriptive 19 / 76 One Group At A Time y x
20 Descriptive 20 / 76 One Group At A Time y x
21 Descriptive 21 / 76 One Group At A Time y x
22 Descriptive 22 / 76 One Group At A Time y x
23 Descriptive 23 / 76 One Group At A Time y x
24 Descriptive 24 / 76 One Group At A Time y x
25 Descriptive 25 / 76 One Group At A Time y x
26 Descriptive 26 / 76 Fitted OLS regression (Recall the true slope β 1 == 0.5) m1 < lm ( y x, data = dat ) o u t r e g ( l i s t ( True S l o p e 0. 5 = m1), t i g h t = FALSE) True Slope 0.5 Estimate (S.E.) (Intercept) (3.520) x 0.398** (0.124) N 30 RMSE R p 0.05 p 0.01 p 0.001
27 Descriptive 27 / 76 Fitted OLS regression (Recall the true slope β 1 == 0.5) x y Predicted values 0.95 confidence interval
28 Descriptive 28 / 76 Stop and Think About Regression Assumptions The OLS error term is a blend of ɛ ij and b i, y ij = β 0 + β 1 X 1 ij + (ɛ ij + b i ) Let e ij =ɛ ij + b i In OLS, we assert E[e ij ] = 0 (The expected value of the error is 0) In an unconditional sense, that is correct, E[ɛ ij ] = 0 and E[b i ] = 0. As long as ɛ ij and b i are uncorrelated with x,that is And remember: given b i, E[e ij b i ] = b i Homoskedasticity: e ij = σ 2 ɛ (the variance of each error draw is the same for every ij) which is obviously untrue Var(e ij ) = Var(b i ) + σ 2 ɛ The rows are uncorrelated, Cov(e ij, e kl ) = 0 for all i k or j l, which is obviously untrue when i = k
29 Descriptive 29 / 76 Parallel True Lines y x
30 Descriptive 30 / 76 The Blue line is β 0 + β 1 x (i.e, no b i ) y x
31 Descriptive 31 / 76 Parallel True Lines with observations, but no ɛ ij y x
32 Descriptive 32 / 76 Parallel True Lines with errors, ɛ ij x y
33 Descriptive 33 / 76 Parallel True Lines with observations and predicted OLS estimate x y
34 Descriptive 34 / 76 OLS estimate of slope x y
35 Descriptive 35 / 76 Lets add the MLM estimate, for comparison x y OLS MLM
36 Descriptive 36 / 76 Repeat with fresh sample simdata < gen1 (M = M, n = n, b e t a = c ( beta0, beta1 ) ) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]
37 Descriptive 37 / 76 Repeat with fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (3.309) x 0.561*** (0.134) N 30 RMSE R p 0.05 p 0.01 p 0.001
38 Descriptive 38 / 76 Repeat with fresh sample x y OLS MLM
39 Descriptive 39 / 76 Repeat with another fresh sample simdata < gen1 (M = M, n = n, b e t a = c ( beta0, beta1 ) ) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]
40 Descriptive 40 / 76 Repeat with another fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (3.632) x 0.442** (0.144) N 30 RMSE R p 0.05 p 0.01 p 0.001
41 Descriptive 41 / 76 Repeat with another fresh sample x y OLS MLM
42 Descriptive 42 / 76 Repeat with 1000 fresh samples i f ( runsim ) { betahet < matrix (NA, nrow = 1000, n c o l = 2) betanob < matrix (NA, nrow = 1000, n c o l = 2) betamlm < matrix (NA, nrow = 1000, n c o l = 2) s e t. s e e d (234234) f o r ( i i n 1 : ) { simdata < gen1 (M = M, b e t a = c ( beta0, beta1 ) ) m1 < lm ( y x, data = s imdata [ [ dat ] ] ) b e t a h e t [ i, ] < c o e f (m1) m2 < lm ( ynob x, data = s imdata [ [ dat ] ] ) betanob [ i, ] < c o e f (m2) m3 < lmer ( y x + (1 i ), data = simdata [ [ dat ] ] ) betamlm [ i, ] < f i x e f (m3) } saverds ( b e t a h e t, p a s t e 0 ( workingdata, b e t a h e t. r d s ) ) saverds ( betanob, p a s t e 0 ( workingdata, b e t a n o b. r d s ) ) saverds ( betamlm, paste0 ( workingdata, betamlm. rds ) ) } e l s e { b e t a h e t < readrds ( p a s t e 0 ( workingdata, b e t a h e t. r d s ) ) betanob < readrds ( p a s t e 0 ( workingdata, b e t a n o b. r d s ) ) betamlm < readrds ( paste0 ( workingdata, betamlm. rds ) ) }
43 Descriptive 43 / 76 OLS Estimates: 1000 fresh samples y x
44 Descriptive 44 / 76 Histograms of Intercept & Slope Estimates Intercept Estimates Slope Estimates Density Density betahet[, 1] betahet[, 2]
45 Descriptive 45 / 76 Scatterplot of Intercept & Slope Estimates Estimates of Intercept Estimates of Slope
46 Descriptive 46 / 76 Wondered what that looks like with True OLS Data Estimates of Intercept Estimates of Slope
47 Descriptive 47 / 76 Wondered how the MLM Compared y x Display not great
48 Descriptive 48 / 76 OLS versus & MLM Slope Estimates OLS Slope Estimates MLM Slope Estimates Density Density betahet[, 2] Display not great betamlm[, 2]
49 Descriptive 49 / 76 OLS versus & MLM Slope Estimates OLS Slope Estimates Density OLS MLM betahet[, 2]
50 Descriptive 50 / 76 OLS versus & MLM Intercept Estimates OLS Intercept Estimates Density OLS MLM betahet[, 1]
51 Descriptive 51 / 76 What I think I Learned From That If the variance among the intercepts is not huge, compared to the variance of the error term, then OLS gives pretty good estimates of the slope of the fixed effect
52 Descriptive 52 / 76 Case 2: A Larger Variance Component Repeat with fresh sample simdata < gen1 (M = M, n = n, beta = c ( beta0, beta1 ), bsd = 20) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]
53 Descriptive 53 / 76 Case 2: A Larger Variance Component Repeat with fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (10.005) x ( 0.391) N 30 RMSE R p 0.05 p 0.01 p 0.001
54 Descriptive 54 / 76 Case 2: A Larger Variance Component Repeat with fresh sample y OLS MLM x
55 Descriptive 55 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample simdata < gen1 (M = M, n = n, beta = c ( beta0, beta1 ), bsd = 20) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]
56 Descriptive 56 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (8.133) x (0.341) N 30 RMSE R p 0.05 p 0.01 p 0.001
57 Descriptive 57 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample y OLS MLM x
58 Descriptive 58 / 76 Case 2: A Larger Variance Component Repeat with 1000 fresh samples i f ( runsim ) { betahet < matrix (NA, nrow = 1000, n c o l = 2) betanob < matrix (NA, nrow = 1000, n c o l = 2) betamlm < matrix (NA, nrow = 1000, n c o l = 2) s e t. s e e d (234234) f o r ( i i n 1 : ) { simdata < gen1 (M = M, beta = c ( beta0, beta1 ), bsd = 20) m1 < lm ( y x, data = s imdata [ [ dat ] ] ) b e t a h e t [ i, ] < c o e f (m1) m2 < lm ( ynob x, data = s imdata [ [ dat ] ] ) betanob [ i, ] < c o e f (m2) m3 < lmer ( y x + (1 i ), data = simdata [ [ dat ] ] ) betamlm [ i, ] < f i x e f (m3) } saverds ( b e t a h e t, p a s t e 0 ( workingdata, v b e t a h e t. r d s ) ) saverds ( betanob, p a s t e 0 ( workingdata, v b e t a n o b. r d s ) ) saverds ( betamlm, paste0 ( workingdata, vbetamlm. rds ) ) } e l s e { b e t a h e t < readrds ( p a s t e 0 ( workingdata, v b e t a h e t. r d s ) ) betanob < readrds ( p a s t e 0 ( workingdata, v b e t a n o b. r d s ) ) betamlm < readrds ( paste0 ( workingdata, vbetamlm. rds ) ) }
59 Descriptive 59 / 76 Case 2: A Larger Variance Component Wondered how the MLM Compared y x Hmm. Blue and Red are different
60 Descriptive 60 / 76 Case 2: A Larger Variance Component OLS versus & MLM Slope Estimates OLS Slope Estimates MLM Slope Estimates Density Density betahet[, 2] I see a difference there betamlm[, 2]
61 Descriptive 61 / 76 Case 2: A Larger Variance Component OLS versus & MLM Slope Estimates Density OLS MLM Slope Estimates
62 Descriptive 62 / 76 Case 2: A Larger Variance Component Unfair to compare Intercept Estimates (but still will) OLS Intercept Estimates MLM Intercept Estimates Density Density betahet[, 1] betamlm[, 1] Seems unfair, OLS is the wrong model (supposed β 0 was intercept for all data points).
63 Descriptive 63 / 76 Deceptive Data Generators Deceptive Data Generators If b i is uncorrelated with the average of x in the groups, then we expect the OLS estimate is unbiased.
64 Descriptive 64 / 76 Deceptive Data Generators Recall the Parallel True Lines The true lines vary due to the random intercept y x
65 Descriptive 65 / 76 Deceptive Data Generators Put the true values of y on there This is still the pleasant scenario, where σ 2 ɛ = x y That makes regression easy!
66 Descriptive 66 / 76 Deceptive Data Generators Suppose the Data-Generating Genie is Evil Correlate the mean of observed X with the values of b. I m doing this by first drawing the values b i and then creating x ij = b i + N(0, xsd) gen2 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { b < rnorm (M, m = 0, bsd ) x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) dat $b < r e p ( b, each = n ) dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $y < dat $ ynoe + rnorm (M*n, m=0, sd = esd ) l i s t ( dat = dat, b = b ) }
67 Descriptive 67 / 76 Deceptive Data Generators Suppose the Data-Generating Genie is Evil... beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < 20 + rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = 0, bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b Now we draw a sample: M < 1 0 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen2 (M = M, b e t a = c ( beta0, beta1 ), bsd = 20, x s d i = 10) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ]
68 Descriptive 68 / 76 Deceptive Data Generators The Parallel True Lines y x
69 Descriptive 69 / 76 Deceptive Data Generators Put the points on the lines: No ɛ ij y x
70 Descriptive 70 / 76 Deceptive Data Generators Recall the Parallel True Lines Especially when we have a lot of groups, the data cloud is filled out well and OLS estimate of β 1 won t be horrible. M < 1 0 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen1 (M = M, b e t a = c ( beta0, beta1 ), x s d i = 0) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ] m15 < lm ( y x, data = dat ) summary (m15)
71 Descriptive 71 / 76 Deceptive Data Generators Recall the Parallel True Lines... C a l l : lm ( formula = y x, data = dat ) R e s i d u a l s : Min 1Q Median 3Q Max C o e f f i c i e n t s : E s t i m a t e S t d. E r r o r t v a l u e Pr ( > t ) ( I n t e r c e p t ) * x S i g n i f. c o des : 0 ' *** ' ' ** ' ' * ' '. ' 0. 1 ' ' 1 R e s i d u a l s t a n d a r d e r r o r : on 28 d e g r e e s o f freedom M u l t i p l e R 2 : , A d j u s t e d R 2 : F s t a t i s t i c : on 1 and 28 DF, p value :
72 Descriptive 72 / 76 Deceptive Data Generators Recall the Parallel True Lines... y x
73 Descriptive 73 / 76 Deceptive Data Generators Parallel True Lines If b i and the mean of x within each sub-group fall into order, then the data generating process may not be so misleading. gen2 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { b < rnorm (M, m = 0, bsd ) x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) dat $b < r e p ( b, each = n ) dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $y < dat $ ynoe + rnorm (M*n, m=0, sd = esd ) l i s t ( dat = dat, b = b ) }
74 Descriptive 74 / 76 Deceptive Data Generators Parallel True Lines... beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < 20 + rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = 0, bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b
75 Descriptive 75 / 76 Deceptive Data Generators Parallel True Lines beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 2 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = ( x b a r i x b a r s ), bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b
76 i Descriptive 76 / 76 Deceptive Data Generators Special Tool: Dotplot Suggested graphical representation in Pinheiro & Bates Observed Outcome
Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science
(Mostly QQ and Leverage Plots) 1 / 63 Graphical Diagnosis Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas. (Mostly QQ and Leverage
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple
More informationRegression Diagnostics
Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationActivity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression
Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationNote on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin
Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationBusiness Statistics. Lecture 5: Confidence Intervals
Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean
More informationSimple linear regression
Simple linear regression Thomas Lumley BIOST 578C Linear model Linear regression is usually presented in terms of a model Y = α + βx + ɛ ɛ N(0, σ 2 ) because the theoretical analysis is pretty for this
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationChapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line
Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationLecture 2: Linear and Mixed Models
Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationRegression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationChaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...
Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationMutilevel Models: Pooled and Clustered Data
Mutilevel Models: Pooled and Clustered Data ICPSR, June 1-5, 2015 Tom Carsey carsey@unc.edu In the Beginning... We start with a simple linear model Where i subscripts individual observations. Y i = β 0
More informationMultiple Linear Regression for the Supervisor Data
for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationCoping with Additional Sources of Variation: ANCOVA and Random Effects
Coping with Additional Sources of Variation: ANCOVA and Random Effects 1/49 More Noise in Experiments & Observations Your fixed coefficients are not always so fixed Continuous variation between samples
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationLecture 24: Weighted and Generalized Least Squares
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)
More informationRegression Analysis in R
Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical
More informationSTAT 420: Methods of Applied Statistics
STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationIntroduction and Background to Multilevel Analysis
Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and
More informationThe linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.
8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationIntroduction to LMER. Andrew Zieffler
Introduction to LMER Andrew Zieffler Traditional Regression General form of the linear model where y i = 0 + 1 (x 1i )+ 2 (x 2i )+...+ p (x pi )+ i yi is the response for the i th individual (i = 1,...,
More informationTopic 16 Interval Estimation
Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares
More informationWEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.
1 2 WEIGHTED LEAST SQUARES Recall: We can fit least squares estimates just assuming a linear mean function. Without the constant variance assumption, we can still conclude that the coefficient estimators
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationChapter 8 Handout: Interval Estimates and Hypothesis Testing
Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationThe Classical Linear Regression Model
The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:
More informationWeek 3: Simple Linear Regression
Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationRegression in R. Seth Margolis GradQuant May 31,
Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence
More informationSimple Linear Regression for the MPG Data
Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationProbability Distributions & Sampling Distributions
GOV 2000 Section 4: Probability Distributions & Sampling Distributions Konstantin Kashin 1 Harvard University September 26, 2012 1 These notes and accompanying code draw on the notes from Molly Roberts,
More information7.0 Lesson Plan. Regression. Residuals
7.0 Lesson Plan Regression Residuals 1 7.1 More About Regression Recall the regression assumptions: 1. Each point (X i, Y i ) in the scatterplot satisfies: Y i = ax i + b + ɛ i where the ɛ i have a normal
More informationBig Data Analysis with Apache Spark UC#BERKELEY
Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»
More informationMultiple Regression and Regression Model Adequacy
Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,
More informationGenerating OLS Results Manually via R
Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationEconomics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects
Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More information