Simulating MLM. Paul E. Johnson 1 2. Descriptive 1 / Department of Political Science

Size: px
Start display at page:

Download "Simulating MLM. Paul E. Johnson 1 2. Descriptive 1 / Department of Political Science"

Transcription

1 Descriptive 1 / 76 Simulating MLM Paul E. Johnson Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015

2 Descriptive 2 / 76 Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component

3 Descriptive 3 / 76 Introduction Manufacture example data sets with random intercepts Explore visual manifestations of the related problems

4 Descriptive 4 / 76 Orientation: MLM Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component

5 Descriptive 5 / 76 Orientation: MLM Ordinary Regression An ordinary regression model y 1 β 0 y 2 β 0 y 3 β 0 y 4 = β 0 +. y N. β 0 X 1 1 β 1 X 1 2 β 1 X 1 3 β 1 X 1 4 β 1. X 1 N β ɛ 1 ɛ 2 ɛ 3 ɛ 4. ɛ N (1)

6 Descriptive 6 / 76 Orientation: MLM Shocks b 1, b 2,... affect sets of rows y 1 y 2 y 3 y 1 y 2. y N = + β 0 β 0 β 0 β 0 β 0. β 0 + X 1 1β 1 X 1 2β 1 X 1 3β 1 X 1 1β 1 X 1 2β 1. X 1 N β b 1 b 2. b M ɛ 1 ɛ 2 ɛ 3 ɛ 4 ɛ 5. ɛ N (2) (3) Z is a matrix of contrast variables y = X β + Zb + ɛ (4)

7 Descriptive 7 / 76 Orientation: MLM The ij Subscripted Version (i = level 2, j = level 1) y 11 y 12 y 13 y 21 y 22. y Mni = + β 0 β 0 β 0 β 0 β 0. β 0 + X 1 11β 1 X 1 12β 1 X 1 13β 1 X 1 21β 1 X 1 22β 1. X 1 Mni β b 1 b 2. b M ɛ 11 ɛ 12 ɛ 13 ɛ 21 ɛ 22. ɛ Mni (5) (6)

8 Descriptive 8 / 76 Explore Simulated Data Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component

9 Descriptive 9 / 76 Explore Simulated Data Generate Data Data Generating Process I can t understand any statistical problem until I can make up data to represent it. In GitHub, I started a rather large simulation exercise called mlmsim. It is in a repository called R-crmda. It is not done yet, but I ll share some ideas here. I *think* the point of the simulations included here is the following: If then we have lots of groups, and they are all the same number of observations, and the distribution of X is the same in each group, Analysis with ordinary one level regression is not so horribly dangerous Scatterplots are not entirely deceptive however, we can see situations where We might be badly deceived by the data The MLM will fix the problem only sometimes, under conditions we can state

10 Descriptive 10 / 76 Explore Simulated Data Generate Data Simple simulation code This generates data for M = 10 sets of level-2 groupings, with n = 3 observations per group. gen1 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) b < rnorm (M, m = 0, bsd ) dat $b < r e p ( b, each = n ) e r r o r < rnorm (M*n, m = 0, sd = esd ) dat $ ynob < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + e r r o r dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $ y < dat $ ynoe + rnorm (M*n, m = 0, sd = esd ) l i s t ( dat = dat, b = b ) } In case you saw a previous version of these notes, I ve converted a script into a function so I can draw samples over and over later (if I want to)

11 Descriptive 11 / 76 Explore Simulated Data Generate Data Simple simulation code... Note, necessary to specify mean & std. dev. of x for each group (xbari, xsdi), as well as value of b i Summary of arguments beta: vector of fixed effect coefficients (slope & intercept) xbari: the expected value of x i xsdi: the standard deviation of x i. xsd: the standard deviation of x ij (diversity within groups) M: number of level 2 groupings n: number of observations within each group bsd: the standard deviation of b i (diversity of group intercepts) esd: the standard deviation of the error term ɛ ij If we want all of the level 2 groups to have the same distribution on x, then we set xsdi=0.

12 Descriptive 12 / 76 Explore Simulated Data Generate Data Simple simulation code... I drew the b i separately from a normal N(0, σ b ), but we could have made this more elaborate by writing out a covariance matrix and using an MVN random draw. We don t need to do that here because b i N(0, bsd 2 )

13 Descriptive 13 / 76 Outline 1 Orientation: MLM 2 Explore Simulated Data 3 Case 2: A Larger Variance Component

14 Descriptive 14 / 76 Draw One Simulated Sample Lets suppose there are 10 groups of 3 each. M < 1 0 ; n < 3 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen1 (M = M, n = 3, b e t a = c ( beta0, beta1 ) ) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ] That uses the other default settings of gen1, so we expect the means of x to vary across groups, as x i N(5, 25 2 ). Thus, the groups are not identically distributed The observed values x ij are N( x i, 4 2 )

15 Descriptive 15 / 76 Here s how it appears to a Naive Researcher x y

16 Descriptive 16 / 76 One Group At A Time y x

17 Descriptive 17 / 76 One Group At A Time y x

18 Descriptive 18 / 76 One Group At A Time y x

19 Descriptive 19 / 76 One Group At A Time y x

20 Descriptive 20 / 76 One Group At A Time y x

21 Descriptive 21 / 76 One Group At A Time y x

22 Descriptive 22 / 76 One Group At A Time y x

23 Descriptive 23 / 76 One Group At A Time y x

24 Descriptive 24 / 76 One Group At A Time y x

25 Descriptive 25 / 76 One Group At A Time y x

26 Descriptive 26 / 76 Fitted OLS regression (Recall the true slope β 1 == 0.5) m1 < lm ( y x, data = dat ) o u t r e g ( l i s t ( True S l o p e 0. 5 = m1), t i g h t = FALSE) True Slope 0.5 Estimate (S.E.) (Intercept) (3.520) x 0.398** (0.124) N 30 RMSE R p 0.05 p 0.01 p 0.001

27 Descriptive 27 / 76 Fitted OLS regression (Recall the true slope β 1 == 0.5) x y Predicted values 0.95 confidence interval

28 Descriptive 28 / 76 Stop and Think About Regression Assumptions The OLS error term is a blend of ɛ ij and b i, y ij = β 0 + β 1 X 1 ij + (ɛ ij + b i ) Let e ij =ɛ ij + b i In OLS, we assert E[e ij ] = 0 (The expected value of the error is 0) In an unconditional sense, that is correct, E[ɛ ij ] = 0 and E[b i ] = 0. As long as ɛ ij and b i are uncorrelated with x,that is And remember: given b i, E[e ij b i ] = b i Homoskedasticity: e ij = σ 2 ɛ (the variance of each error draw is the same for every ij) which is obviously untrue Var(e ij ) = Var(b i ) + σ 2 ɛ The rows are uncorrelated, Cov(e ij, e kl ) = 0 for all i k or j l, which is obviously untrue when i = k

29 Descriptive 29 / 76 Parallel True Lines y x

30 Descriptive 30 / 76 The Blue line is β 0 + β 1 x (i.e, no b i ) y x

31 Descriptive 31 / 76 Parallel True Lines with observations, but no ɛ ij y x

32 Descriptive 32 / 76 Parallel True Lines with errors, ɛ ij x y

33 Descriptive 33 / 76 Parallel True Lines with observations and predicted OLS estimate x y

34 Descriptive 34 / 76 OLS estimate of slope x y

35 Descriptive 35 / 76 Lets add the MLM estimate, for comparison x y OLS MLM

36 Descriptive 36 / 76 Repeat with fresh sample simdata < gen1 (M = M, n = n, b e t a = c ( beta0, beta1 ) ) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]

37 Descriptive 37 / 76 Repeat with fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (3.309) x 0.561*** (0.134) N 30 RMSE R p 0.05 p 0.01 p 0.001

38 Descriptive 38 / 76 Repeat with fresh sample x y OLS MLM

39 Descriptive 39 / 76 Repeat with another fresh sample simdata < gen1 (M = M, n = n, b e t a = c ( beta0, beta1 ) ) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]

40 Descriptive 40 / 76 Repeat with another fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (3.632) x 0.442** (0.144) N 30 RMSE R p 0.05 p 0.01 p 0.001

41 Descriptive 41 / 76 Repeat with another fresh sample x y OLS MLM

42 Descriptive 42 / 76 Repeat with 1000 fresh samples i f ( runsim ) { betahet < matrix (NA, nrow = 1000, n c o l = 2) betanob < matrix (NA, nrow = 1000, n c o l = 2) betamlm < matrix (NA, nrow = 1000, n c o l = 2) s e t. s e e d (234234) f o r ( i i n 1 : ) { simdata < gen1 (M = M, b e t a = c ( beta0, beta1 ) ) m1 < lm ( y x, data = s imdata [ [ dat ] ] ) b e t a h e t [ i, ] < c o e f (m1) m2 < lm ( ynob x, data = s imdata [ [ dat ] ] ) betanob [ i, ] < c o e f (m2) m3 < lmer ( y x + (1 i ), data = simdata [ [ dat ] ] ) betamlm [ i, ] < f i x e f (m3) } saverds ( b e t a h e t, p a s t e 0 ( workingdata, b e t a h e t. r d s ) ) saverds ( betanob, p a s t e 0 ( workingdata, b e t a n o b. r d s ) ) saverds ( betamlm, paste0 ( workingdata, betamlm. rds ) ) } e l s e { b e t a h e t < readrds ( p a s t e 0 ( workingdata, b e t a h e t. r d s ) ) betanob < readrds ( p a s t e 0 ( workingdata, b e t a n o b. r d s ) ) betamlm < readrds ( paste0 ( workingdata, betamlm. rds ) ) }

43 Descriptive 43 / 76 OLS Estimates: 1000 fresh samples y x

44 Descriptive 44 / 76 Histograms of Intercept & Slope Estimates Intercept Estimates Slope Estimates Density Density betahet[, 1] betahet[, 2]

45 Descriptive 45 / 76 Scatterplot of Intercept & Slope Estimates Estimates of Intercept Estimates of Slope

46 Descriptive 46 / 76 Wondered what that looks like with True OLS Data Estimates of Intercept Estimates of Slope

47 Descriptive 47 / 76 Wondered how the MLM Compared y x Display not great

48 Descriptive 48 / 76 OLS versus & MLM Slope Estimates OLS Slope Estimates MLM Slope Estimates Density Density betahet[, 2] Display not great betamlm[, 2]

49 Descriptive 49 / 76 OLS versus & MLM Slope Estimates OLS Slope Estimates Density OLS MLM betahet[, 2]

50 Descriptive 50 / 76 OLS versus & MLM Intercept Estimates OLS Intercept Estimates Density OLS MLM betahet[, 1]

51 Descriptive 51 / 76 What I think I Learned From That If the variance among the intercepts is not huge, compared to the variance of the error term, then OLS gives pretty good estimates of the slope of the fixed effect

52 Descriptive 52 / 76 Case 2: A Larger Variance Component Repeat with fresh sample simdata < gen1 (M = M, n = n, beta = c ( beta0, beta1 ), bsd = 20) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]

53 Descriptive 53 / 76 Case 2: A Larger Variance Component Repeat with fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (10.005) x ( 0.391) N 30 RMSE R p 0.05 p 0.01 p 0.001

54 Descriptive 54 / 76 Case 2: A Larger Variance Component Repeat with fresh sample y OLS MLM x

55 Descriptive 55 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample simdata < gen1 (M = M, n = n, beta = c ( beta0, beta1 ), bsd = 20) b < s imdata [ [ b ] ] dat < simdata [ [ dat ] ]

56 Descriptive 56 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample m1 < lm ( y x, data = s imdata [ [ dat ] ] ) o u t r e g ( l i s t ( Run 2, True S l o p e 0. 5 = m1), t i g h t = FALSE) Run 2, True Slope 0.5 Estimate (S.E.) (Intercept) (8.133) x (0.341) N 30 RMSE R p 0.05 p 0.01 p 0.001

57 Descriptive 57 / 76 Case 2: A Larger Variance Component Repeat with another fresh sample y OLS MLM x

58 Descriptive 58 / 76 Case 2: A Larger Variance Component Repeat with 1000 fresh samples i f ( runsim ) { betahet < matrix (NA, nrow = 1000, n c o l = 2) betanob < matrix (NA, nrow = 1000, n c o l = 2) betamlm < matrix (NA, nrow = 1000, n c o l = 2) s e t. s e e d (234234) f o r ( i i n 1 : ) { simdata < gen1 (M = M, beta = c ( beta0, beta1 ), bsd = 20) m1 < lm ( y x, data = s imdata [ [ dat ] ] ) b e t a h e t [ i, ] < c o e f (m1) m2 < lm ( ynob x, data = s imdata [ [ dat ] ] ) betanob [ i, ] < c o e f (m2) m3 < lmer ( y x + (1 i ), data = simdata [ [ dat ] ] ) betamlm [ i, ] < f i x e f (m3) } saverds ( b e t a h e t, p a s t e 0 ( workingdata, v b e t a h e t. r d s ) ) saverds ( betanob, p a s t e 0 ( workingdata, v b e t a n o b. r d s ) ) saverds ( betamlm, paste0 ( workingdata, vbetamlm. rds ) ) } e l s e { b e t a h e t < readrds ( p a s t e 0 ( workingdata, v b e t a h e t. r d s ) ) betanob < readrds ( p a s t e 0 ( workingdata, v b e t a n o b. r d s ) ) betamlm < readrds ( paste0 ( workingdata, vbetamlm. rds ) ) }

59 Descriptive 59 / 76 Case 2: A Larger Variance Component Wondered how the MLM Compared y x Hmm. Blue and Red are different

60 Descriptive 60 / 76 Case 2: A Larger Variance Component OLS versus & MLM Slope Estimates OLS Slope Estimates MLM Slope Estimates Density Density betahet[, 2] I see a difference there betamlm[, 2]

61 Descriptive 61 / 76 Case 2: A Larger Variance Component OLS versus & MLM Slope Estimates Density OLS MLM Slope Estimates

62 Descriptive 62 / 76 Case 2: A Larger Variance Component Unfair to compare Intercept Estimates (but still will) OLS Intercept Estimates MLM Intercept Estimates Density Density betahet[, 1] betamlm[, 1] Seems unfair, OLS is the wrong model (supposed β 0 was intercept for all data points).

63 Descriptive 63 / 76 Deceptive Data Generators Deceptive Data Generators If b i is uncorrelated with the average of x in the groups, then we expect the OLS estimate is unbiased.

64 Descriptive 64 / 76 Deceptive Data Generators Recall the Parallel True Lines The true lines vary due to the random intercept y x

65 Descriptive 65 / 76 Deceptive Data Generators Put the true values of y on there This is still the pleasant scenario, where σ 2 ɛ = x y That makes regression easy!

66 Descriptive 66 / 76 Deceptive Data Generators Suppose the Data-Generating Genie is Evil Correlate the mean of observed X with the values of b. I m doing this by first drawing the values b i and then creating x ij = b i + N(0, xsd) gen2 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { b < rnorm (M, m = 0, bsd ) x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) dat $b < r e p ( b, each = n ) dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $y < dat $ ynoe + rnorm (M*n, m=0, sd = esd ) l i s t ( dat = dat, b = b ) }

67 Descriptive 67 / 76 Deceptive Data Generators Suppose the Data-Generating Genie is Evil... beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < 20 + rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = 0, bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b Now we draw a sample: M < 1 0 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen2 (M = M, b e t a = c ( beta0, beta1 ), bsd = 20, x s d i = 10) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ]

68 Descriptive 68 / 76 Deceptive Data Generators The Parallel True Lines y x

69 Descriptive 69 / 76 Deceptive Data Generators Put the points on the lines: No ɛ ij y x

70 Descriptive 70 / 76 Deceptive Data Generators Recall the Parallel True Lines Especially when we have a lot of groups, the data cloud is filled out well and OLS estimate of β 1 won t be horrible. M < 1 0 ; beta0 < 3 ; beta1 < 0. 5 simdata < gen1 (M = M, b e t a = c ( beta0, beta1 ), x s d i = 0) dat < simdata [ [ dat ] ] b < s imdata [ [ b ] ] m15 < lm ( y x, data = dat ) summary (m15)

71 Descriptive 71 / 76 Deceptive Data Generators Recall the Parallel True Lines... C a l l : lm ( formula = y x, data = dat ) R e s i d u a l s : Min 1Q Median 3Q Max C o e f f i c i e n t s : E s t i m a t e S t d. E r r o r t v a l u e Pr ( > t ) ( I n t e r c e p t ) * x S i g n i f. c o des : 0 ' *** ' ' ** ' ' * ' '. ' 0. 1 ' ' 1 R e s i d u a l s t a n d a r d e r r o r : on 28 d e g r e e s o f freedom M u l t i p l e R 2 : , A d j u s t e d R 2 : F s t a t i s t i c : on 1 and 28 DF, p value :

72 Descriptive 72 / 76 Deceptive Data Generators Recall the Parallel True Lines... y x

73 Descriptive 73 / 76 Deceptive Data Generators Parallel True Lines If b i and the mean of x within each sub-group fall into order, then the data generating process may not be so misleading. gen2 < f u n c t i o n ( b e t a = c ( 3, 0. 5 ), x b a r i = 25, x s d i = 5, xsd = 4, M = 10, n = 3, bsd = 2, esd = 4) { b < rnorm (M, m = 0, bsd ) x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $x < rep ( xbars, each = n ) + rnorm (M*n, m = 0, sd = xsd ) dat $b < r e p ( b, each = n ) dat $ ynoe < b e t a [ 1 ] + b e t a [ 2 ] * dat $ x + dat $b dat $y < dat $ ynoe + rnorm (M*n, m=0, sd = esd ) l i s t ( dat = dat, b = b ) }

74 Descriptive 74 / 76 Deceptive Data Generators Parallel True Lines... beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < 20 + rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = 0, bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b

75 Descriptive 75 / 76 Deceptive Data Generators Parallel True Lines beta0 < 3 ; beta1 < 0. 5 ; x b a r i < 2 5 ; x s d i < 5 ; M < 4 ; n < 3 ; bsd < 2 ; esd < 4 x b a r s < rnorm (M, m = x b a r i, sd = x s d i ) b < rnorm (M, m = ( x b a r i x b a r s ), bsd ) dat < d a t a. f r a m e ( i = r e p ( 1 :M, each = n ) ) dat $b < r e p ( b, each = n ) dat $ x < u n l i s t ( l a p p l y ( xbars, f u n c t i o n ( x b a r ) { x < x b a r + rnorm ( n, 4) }) ) dat $y < beta0 + beta1 * dat $x + rnorm (M*n, esd ) + dat $b

76 i Descriptive 76 / 76 Deceptive Data Generators Special Tool: Dotplot Suggested graphical representation in Pinheiro & Bates Observed Outcome

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science (Mostly QQ and Leverage Plots) 1 / 63 Graphical Diagnosis Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas. (Mostly QQ and Leverage

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple

More information

Regression Diagnostics

Regression Diagnostics Diag 1 / 78 Regression Diagnostics Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2015 Diag 2 / 78 Outline 1 Introduction 2

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

Simple linear regression

Simple linear regression Simple linear regression Thomas Lumley BIOST 578C Linear model Linear regression is usually presented in terms of a model Y = α + βx + ɛ ɛ N(0, σ 2 ) because the theoretical analysis is pretty for this

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ... Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Mutilevel Models: Pooled and Clustered Data

Mutilevel Models: Pooled and Clustered Data Mutilevel Models: Pooled and Clustered Data ICPSR, June 1-5, 2015 Tom Carsey carsey@unc.edu In the Beginning... We start with a simple linear model Where i subscripts individual observations. Y i = β 0

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Coping with Additional Sources of Variation: ANCOVA and Random Effects Coping with Additional Sources of Variation: ANCOVA and Random Effects 1/49 More Noise in Experiments & Observations Your fixed coefficients are not always so fixed Continuous variation between samples

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Regression Analysis in R

Regression Analysis in R Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical

More information

STAT 420: Methods of Applied Statistics

STAT 420: Methods of Applied Statistics STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs. 8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression 1. Regression Equation A simple linear regression (also known as a bivariate regression) is a linear equation describing the relationship between an explanatory

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Introduction to LMER. Andrew Zieffler

Introduction to LMER. Andrew Zieffler Introduction to LMER Andrew Zieffler Traditional Regression General form of the linear model where y i = 0 + 1 (x 1i )+ 2 (x 2i )+...+ p (x pi )+ i yi is the response for the i th individual (i = 1,...,

More information

Topic 16 Interval Estimation

Topic 16 Interval Estimation Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares

More information

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function. 1 2 WEIGHTED LEAST SQUARES Recall: We can fit least squares estimates just assuming a linear mean function. Without the constant variance assumption, we can still conclude that the coefficient estimators

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Chapter 8 Handout: Interval Estimates and Hypothesis Testing Chapter 8 Handout: Interval Estimates and Hypothesis esting Preview Clint s Assignment: aking Stock General Properties of the Ordinary Least Squares (OLS) Estimation Procedure Estimate Reliability: Interval

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

The Classical Linear Regression Model

The Classical Linear Regression Model The Classical Linear Regression Model ME104: Linear Regression Analysis Kenneth Benoit August 14, 2012 CLRM: Basic Assumptions 1. Specification: Relationship between X and Y in the population is linear:

More information

Week 3: Simple Linear Regression

Week 3: Simple Linear Regression Week 3: Simple Linear Regression Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Probability Distributions & Sampling Distributions

Probability Distributions & Sampling Distributions GOV 2000 Section 4: Probability Distributions & Sampling Distributions Konstantin Kashin 1 Harvard University September 26, 2012 1 These notes and accompanying code draw on the notes from Molly Roberts,

More information

7.0 Lesson Plan. Regression. Residuals

7.0 Lesson Plan. Regression. Residuals 7.0 Lesson Plan Regression Residuals 1 7.1 More About Regression Recall the regression assumptions: 1. Each point (X i, Y i ) in the scatterplot satisfies: Y i = ax i + b + ɛ i where the ɛ i have a normal

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

Generating OLS Results Manually via R

Generating OLS Results Manually via R Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information