STK 2100 Oblig 1. Zhou Siyu. February 15, 2017
|
|
- Andra McDaniel
- 5 years ago
- Views:
Transcription
1 STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter box of Boston data set.
2 Here we get a general idea of the interactions between the response and each of the predictor variables as well as interactions between any two predictor variables. b) Divide the data set into training and test data sets. Discuss the advantages and shortcomings in doing so. Answer: The code is R is the as the question suggests: #b) 2 set. seed (345) 3 ind <- sample (: nrow ( Boston ), 250, replace = FALSE ) 4 Boston. train <- Boston [ ind,] 5 Boston. test <- Boston [-ind,] In splitting the data set, we have ended up with fewer observations to train the model, which is obviously a shortcoming in terms of model accuracy and efficiency. On the other hand, we can also gain information concerning the model s prediction efficiency by testing it with the test data (for prediction). It is with the test data we can see how much the model deviates from the real value and so as to evaluate model efficiency and accuracy. c) Explain the important assumptions about the linear model. Use crim as the response variable, adjust the model to the training data and discuss about the result. Answer: For linear regression models, there are some very important conditions that should be satisfied. First and foremost, ε i should be independent from each other. This is arguably the most important assumption for the model, otherwise we would see some correlation in the error terms in the response variable. Another condition is that we expect the expectation of ε i to be 0, namely, E[ε i ] = 0 while its variance to be σ 2, V [ε i ] = σ 2. This can also be a strong condition. Last but not least, we usually require ε i to be normally distributed. To fit the model with the data set, we run the following code in R. #c) 2 > fit. lim = lm( crim ~., data = Boston. train ) 3 > summary ( fit. lim ) 4 5 Call : 6 lm( formula = crim ~., data = Boston. train ) 7 8 Residuals : 9 Min Q Median 3Q Max Coefficients : 3 Estimate Std. Error t value Pr ( > t ) 4 ( Intercept ) zn indus chas nox
3 9 rm age dis ** 22 rad e -06 *** 23 tax ptratio black lstat medv *** Signif. codes : 0 *** 0.00 ** 0.0 * Residual standard error : on 236 degrees of freedom 32 Multiple R- squared : 0.45, Adjusted R- squared : F- statistic : 4.92 on 3 and 236 DF, p- value : < 2.2e -6 Here we see the programme generated a linear regression model based on the training data set. But the problem with this model is that many coefficients involved have very big P-value, which suggests they might not belong in the model. In the meantime, the R 2 as well as adjusted R 2 are too small to convince us this is a reliable model. d) Remove the predictor variable with the biggest P-value and find a new model. Explain why this is a reasonable procedure. Explain the new p-values in terms of correlation between predictor variables. Answer: We can sort out the data frame with the following code in R. And then we can see which variable has the largest P-value. > newmodel = summary ( fit. lim ) $ coefficients 2 > newmodel = newmodel [ order (- newmodel [,"Pr ( > t )"]),] 3 > newmodel 4 Estimate Std. Error t value Pr ( > t ) 5 chas e -0 6 age e -0 7 black e -0 8 tax e -0 9 indus e -0 0 lstat e -0 ptratio e -0 2 rm e -0 3 ( Intercept ) e -0 4 nox e zn e dis e medv e rad e > I decide to use the update function to remove the predictor variable with the largest P-value, namely, chas, and then run for a new model with the following code: 3
4 > fit. lim = update ( fit.lim, ~.- chas ) 2 > summary ( fit. lim ) 3 4 Call : 5 lm( formula = crim ~ zn + indus + nox + rm + age + dis + rad + 6 tax + ptratio + black + lstat + medv, data = Boston. train ) 7 8 Residuals : 9 Min Q Median 3Q Max Coefficients : 3 Estimate Std. Error t value Pr ( > t ) 4 ( Intercept ) zn indus nox rm age dis ** 2 rad e -06 *** 22 tax ptratio black lstat medv *** Signif. codes : 0 *** 0.00 ** 0.0 * Residual standard error : 7.52 on 237 degrees of freedom 30 Multiple R- squared : 0.45, Adjusted R- squared : F- statistic : 6.23 on 2 and 237 DF, p- value : < 2.2e > newmodel = summary ( fit. lim ) $ coefficients 34 > newmodel = newmodel [ order (- newmodel [,"Pr ( > t )"]),] 35 > 36 > 37 > newmodel 38 Estimate Std. Error t value Pr ( > t ) 39 age e black e -0 4 tax e indus e lstat e ptratio e rm e ( Intercept ) e zn e nox e dis e medv e rad e -06 Compare the two results and we will see that in removing chas, we have actually obtained a better model, in that the adjusted R 2 has slightly increased 4
5 while R 2 remains unchanged. This means that the new model explains the data set better than the previous one. In the meantime, we see that P-values and standard errors of the remaining predictor variables do not change much after we have removed chas, suggesting that these predictor variables are mutually independent. If there is any predictor variable collinear with chas, we are likely to see dramatic increase in the standard error for its least square estimation. e) Keep improving the model in this way until you get a reasonable model. Make different plots to show that this selection is reasonable. Answer: We can, of course, remove each variable by hand, but the following code in R will make it easier. > y. name <-" crim " 2 > alpha < > fit. lim = lm( crim ~., data = Boston. train ) 4 > beta <-max ( max ( summary ( fit. lim )$ coefficients [,4]),alpha ) 5 > tokeep <- which ( summary ( fit. lim )$ coefficients [,4] < beta ) 6 > print ( length ( tokeep )) 7 [] 3 8 > while (beta > alpha ) 9 + { 0 + if( length ( tokeep ) ==0) + { 2 + warning (" Nothing is significant ") 3 + break 4 + } 5 + if( names ( tokeep ) []== "( Intercept )") 6 + { 7 + names ( tokeep ) [] <-"" 8 + } else 9 + { 20 + names ( tokeep ) [] <-" -" 2 + } form <-as. formula ( paste (y.name,"~",paste ( names ( tokeep ), collapse = "+"))) 24 + fit. lim = lm( formula = form, data = Boston. train ) 25 + beta <-max ( max ( summary ( fit. lim )$ coefficients [,4]),alpha ) 26 + newmodel = summary ( fit. lim ) $ coefficients 27 + tokeep <- which ( summary ( fit. lim )$ coefficients [,4] < beta ) 28 + if( length ( tokeep ) ==) 29 + { 30 + names ( tokeep ) <-row. names ( summary ( fit. lim )$ coefficients ) [] 3 + } 32 + print ( names ( tokeep )) 33 + print ( length ( tokeep )) } 36 [] "( Intercept )" "zn" " indus " 37 [4] " nox " "rm" " dis " 38 [7] " rad " " tax " " ptratio " 39 [0] " black " " lstat " " medv " 40 [] 2 4 [] "( Intercept )" "zn" " indus " 5
6 42 [4] " nox " "rm" " dis " 43 [7] " rad " " tax " " ptratio " 44 [0] " lstat " " medv " 45 [] 46 [] "( Intercept )" "zn" " indus " 47 [4] " nox " "rm" " dis " 48 [7] " rad " " ptratio " " lstat " 49 [0] " medv " 50 [] 0 5 [] "( Intercept )" "zn" " nox " 52 [4] "rm" " dis " " rad " 53 [7] " ptratio " " lstat " " medv " 54 [] 9 55 [] "( Intercept )" "zn" " nox " 56 [4] "rm" " dis " " rad " 57 [7] " ptratio " " medv " 58 [] 8 59 [] "( Intercept )" "zn" " nox " 60 [4] " dis " " rad " " ptratio " 6 [7] " medv " 62 [] 7 63 [] "( Intercept )" "zn" " nox " 64 [4] " dis " " rad " " medv " 65 [] 6 66 [] "( Intercept )" "zn" " dis " 67 [4] " rad " " medv " 68 [] 5 69 [] "( Intercept )" "zn" " dis " 70 [4] " rad " " medv " 7 [] 5 72 > summary ( fit. lim ) Call : 75 lm( formula = form, data = Boston. train ) Residuals : 78 Min Q Median 3Q Max Coefficients : 82 Estimate Std. Error t value Pr ( > t ) 83 ( Intercept ) ** 84 zn * 85 dis * 86 rad e -5 *** 87 medv e -05 *** Signif. codes : 90 0 *** 0.00 ** 0.0 * Residual standard error : on 245 degrees of freedom 93 Multiple R- squared : , Adjusted R- squared : F- statistic : on 4 and 245 DF, p- value : < 2.2e -6 We can also run the backward model selection with the following code in R. 6
7 > library ( leaps ) 2 >fit. backward = regsubsets ( crim ~., data = Boston. train, nvmax =3, method =" backward ") 3 > summary. backward = summary ( fit. backward ) 4 > summary. backward 5 6 Selection Algorithm : backward 7 zn indus chas nox rm age dis rad tax ptra black lstat medv 8 " " " " " " " " " " " " " " "*" " " " " " " " " " " 9 " " " " " " " " " " " " " " "*" " " " " " " " " "*" 0 " " " " " " " " " " " " "*" "*" " " " " " " " " "*" "*" " " " " " " " " " " "*" "*" " " " " " " " " "*" 2 "*" " " " " "*" " " " " "*" "*" " " " " " " " " "*" 3 "*" " " " " "*" " " " " "*" "*" " " "*" " " " " "*" 4 "*" " " " " "*" "*" " " "*" "*" " " "*" " " " " "*" 5 "*" " " " " "*" "*" " " "*" "*" " " "*" " " "*" "*" 6 "*" "*" " " "*" "*" " " "*" "*" " " "*" " " "*" "*" 7 "*" "*" " " "*" "*" " " "*" "*" "*" "*" " " "*" "*" 8 "*" "*" " " "*" "*" " " "*" "*" "*" "*" "*" "*" "*" 9 "*" "*" " " "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" 20 "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" 2 22 > summary. backward $ adjr2 23 [] [8] > summary. backward $cp 26 [] [8] In the above result we see that the adjust R 2 increased slightly after the model takes up more than four predictor variables. In the meantime, Mallows C p is slightly bigger than m + with four variables. Therefore we decide to take four variables: > coef ( fit. backward,4) 2 ( Intercept ) zn dis rad medv And if we run the model with the above selected predictor variables: > fit. lim2 = lm( crim ~zn+dis + rad +medv, data = Boston. train ) 2 > summary ( fit. lim2 ) 3 4 Call : 5 lm( formula = crim ~ zn + dis + rad + medv, data = Boston. train ) 6 7 Residuals : 8 Min Q Median 3Q Max
8 0 Coefficients : 2 Estimate Std. Error t value Pr ( > t ) 3 ( Intercept ) ** 4 zn * 5 dis * 6 rad e -5 *** 7 medv e -05 *** Signif. codes : 0 *** 0.00 ** 0.0 * Residual standard error : on 245 degrees of freedom 22 Multiple R- squared : , Adjusted R- squared : F- statistic : on 4 and 245 DF, p- value : < 2.2e -6 Figure 2: Scatter box of the model with four predictor variables. I have also plotted standardised residual against each predictor variable in the model. As we can see in the following figures, there seems to exit a pattern that as the predictor variable gets large, standardised residual tends to reduce. Based on the plots, as well as a low adjust R 2, I tend to think we need to consider interactions between predictor variables or models with polynomial terms. 8
9 Figure 3: Standarised residual plotted against each predictor variable. f) Use the averaged square error to see how good the model is. Answer: To calculate the required error term, we run the following code in R: > fit. lim3 = lm( crim ~zn+dis + rad +medv, data = Boston. train ) 2 > sum (( Boston. test $crim - predict ( fit.lim3, data. frame ( Boston. test ))) ^2) /n 3 [] In the meantime, if we run the following R code, we can see how the error term changes with the number of predictor variables we use in the model: > n= nrow ( Boston. test ) 2 > k= ncol ( Boston. test ) - 3 > mat = model. matrix ( crim ~., data = Boston. test ) 4 > fit.lm= regsubsets ( crim ~., data = Boston. train, nvmax =k, method = " backward ") 5 > cv=rep (0,k) 9
10 6 > for ( m in : k) 7 + { 8 + coef.m= coef ( fit.lm, m) 9 + for ( i in : n) 0 + { + pred = sum ( mat [i, names ( coef.m)]* coef.m) 2 + diff =( Boston. test $ crim [i]- pred )^2 3 + cv[ m]= cv[ m]+ diff 4 + } 5 + } 6 > cv/ n 7 [] [6] [] > cv [4] /n 2 [] We see that model with four predictor variables has an averaged square error term only slightly larger than the model using one variable. This is also consistent with the previous result, attesting to the fact that our model selection is correct. g) Repeat the model selection procedure, using the whole data set. Discuss the difference. Answer: With a slight change of code, we see that the model has also changed with a different data set. > y. name <-" crim " 2 > alpha < > fit. lim = lm( crim ~., data = Boston ) 4 > beta <-max ( max ( summary ( fit. lim )$ coefficients [,4]),alpha ) 5 > tokeep <- which ( summary ( fit. lim )$ coefficients [,4] < beta ) 6 > print ( length ( tokeep )) 7 [] 3 8 > while (beta > alpha ) 9 + { 0 + if( length ( tokeep ) ==0) + { 2 + warning (" Nothing is significant ") 3 + break 4 + } 5 + if( names ( tokeep ) []== "( Intercept )") 6 + { 7 + names ( tokeep ) [] <-"" 8 + } else 9 + { 20 + names ( tokeep ) [] <-" -" 2 + } form <-as. formula ( paste (y.name,"~",paste ( names ( tokeep ), collapse = "+"))) 24 + fit. lim = lm( formula = form, data = Boston ) 25 + beta <-max ( max ( summary ( fit. lim )$ coefficients [,4]),alpha ) 26 + newmodel = summary ( fit. lim ) $ coefficients 27 + tokeep <- which ( summary ( fit. lim )$ coefficients [,4] < beta ) 0
11 28 + if( length ( tokeep ) ==) 29 + { 30 + names ( tokeep ) <-row. names ( summary ( fit. lim )$ coefficients ) [] 3 + } 32 + print ( names ( tokeep )) 33 + print ( length ( tokeep )) } 36 [] "( Intercept )" "zn" " indus " 37 [4] " nox " "rm" " dis " 38 [7] " rad " " tax " " ptratio " 39 [0] " black " " lstat " " medv " 40 [] 2 4 [] "( Intercept )" "zn" " indus " 42 [4] " nox " "rm" " dis " 43 [7] " rad " " ptratio " " black " 44 [0] " lstat " " medv " 45 [] 46 [] "( Intercept )" "zn" " indus " 47 [4] " nox " " dis " " rad " 48 [7] " ptratio " " black " " lstat " 49 [0] " medv " 50 [] 0 5 [] "( Intercept )" "zn" " nox " 52 [4] " dis " " rad " " ptratio " 53 [7] " black " " lstat " " medv " 54 [] 9 55 [] "( Intercept )" "zn" " nox " 56 [4] " dis " " rad " " ptratio " 57 [7] " black " " medv " 58 [] 8 59 [] "( Intercept )" "zn" " nox " 60 [4] " dis " " rad " " black " 6 [7] " medv " 62 [] 7 63 [] "( Intercept )" "zn" " nox " 64 [4] " dis " " rad " " black " 65 [7] " medv " 66 [] 7 67 > summary ( fit. lim ) Call : 70 lm( formula = form, data = Boston ) 7 72 Residuals : 73 Min Q Median 3Q Max Coefficients : 77 Estimate Std. Error t value Pr ( > t ) 78 ( Intercept ) e -05 *** 79 zn ** 80 nox * 8 dis *** 82 rad < 2e -6 *** 83 black *
12 84 medv e -07 *** Signif. codes : 87 0 *** 0.00 ** 0.0 * Residual standard error : on 499 degrees of freedom 90 Multiple R- squared : 0.444, Adjusted R- squared : F- statistic : on 6 and 499 DF, p- value : < 2.2e -6 Now the model keeps six predictor variables, compared with four from before. The averaged square error: > n= nrow ( Boston ) 2 > fit. lim4 = lm( crim ~zn+dis + rad + medv + black +nox, data = Boston ) 3 > sum (( Boston $crim - predict ( fit.lim4, data. frame ( Boston ))) ^2) /n 4 [] We see that the error seems to get bigger. In the mean time, the biggest problem is that we do not have unused data to test the accuracy of the prediction. > n= nrow ( Boston ) 2 > k= ncol ( Boston ) - 3 > mat = model. matrix ( crim ~., data = Boston ) 4 > fit.lm= regsubsets ( crim ~., data = Boston, nvmax =k, method =" backward ") 5 > cv=rep (0,k) 6 > 7 > for ( m in : k) 8 + { 9 + coef.m= coef ( fit.lm, m) 0 + for ( i in : n) + { 2 + pred = sum ( mat [i, names ( coef.m)]* coef.m) 3 + diff =( Boston $ crim [i]- pred )^2 4 + cv[ m]= cv[ m]+ diff 5 + } } 8 > cv/ n 9 [] [6] [] > cv [6] /n 23 [] Another problem with my solution is that, because we do not have any fresh data to use, error terms will decrease as the number of predictor variable increases. (The error term, obtained in either way, is the same.) 2
13 Question 2 a) Show two models are equivalent. Answer: To show equivalence, we know that with c i = i, K k= x ik =. Then: Y i = β 0 + β 2 x i β K x ik + ε i K = β 0 x ik + β 2 x i β K x ik + ε i k= = β 0 x i + (β 0 + β 2 )x i (β 0 + β K )x ik + ε i Then we can see the correspondence between α and β. { β 0, j = α j = β 0 + β j, j = 2, 3, 4... K Each α j represents the mean of the population in category j. And it corresponds to certain β i according to the above mentioned rule. In the meantime, in the first model, β 0 serves as the baseline, while the difference between different categories is captured by the difference between β 0 and β j, where j = 2, 3, 4... K. In other words, β j, where j = 2, 3, 4... K represents the difference between the mean of population in category j and category. b) Show that the matrix has certain properties and some other results. Answer: The properties are straightforward: x x 2 x 3... x n x x 2 x 3... x K X T x 2 x 22 x x n2 x 2 x 22 x x 2K X = x K x 2K x 3K... x nk x n x n2 x n3... x nk x 2 i xi x i2 xi x i3... xi x ik xi x i2 x 2 i =. xi x..... i xi x ik x 2 ik We see the resulting matrix is diagonal with n i x2 i,j as the j-th element on the main diagonal. And it is very easy to see n i x2 i,j = n j, since the other terms in the sum will go to 0. This represents how many times response variable Y i falls into category j. For X T y: 3
14 x x 2 x 3... x n y xi y i X T x 2 x 22 x x n2 y 2 xi2 y i y = = xi3 y i. x K x 2K x 3K... x nk y n xik y i The resulting vector clearly shows that the result is i:c i=j y i represents the j-th element in the vector, namely, the sum of the response variable that falls into the same category. To find the least square estimator, we use matrix differentiation. RSS = (y αx) T (y αx) = (y T X T α T )(y αx) = y T y y T αx X T α T y + X T α T αx Then we differentiate RSS with respect to α, and equate it to 0. RSS α = yt y y T αx X T α T y + X T α T αx α = X T y + X T Xα = 0 Then we have the estimator: ˆα = (X T X) X T y = xiy i n xi2y i n 2 xi3y i n 3. xik y i The last equality follows from the result we get from before. In this way we see that ˆα j represents, reasonably, the mean of response variables that fall into the same category j. c) Construct a least square estimator ˆβ from ˆα. Show that it is indeed a least square estimator. Answer: According to the one-to-one correspondence we have established from before, we can formulate ˆβ as: n K β 0 α 0 β 2 α 2 α ˆβ = β 3 = α 3 α = ˆα α... β K α K α 4
15 I intend to prove that ˆβ is least square estimator by contradiction. Assume for the moment that the ˆβ thus obtained is not the least square estimator, and there exits a least square estimator ˆβ L. Then we can construct an ˆα L by using the above rule, and then: RSS = y ˆβ L X 2 = y ˆα L X 2 The last equality holds because there is a one-to-one correspondence between ˆα L and ˆβ L. Therefore we can see ˆα L is the least square estimator, and this is in contradiction with the definition of ˆα. d) Show model 3 is equivalent to model 2 or. Give an interpretation of the coefficients. Answer: First I show equivalence between model 3 and model 2. Y i = γ 0 + γ x i γ K x ik + ε i K = γ 0 x ik + γ x i γ K x ik + ε i k= = (γ 0 + γ )x i + (γ 0 + γ 2 )x i (γ 0 + γ K )x ik + ε i Together with the condition that K j= γ j = 0 we can see the correspondence between γ and α. γ j = { αj K, j = 0 αj α j, j =, 2, 3, 4... K K Here we see, γ 0 represents the mean of the whole population, regardless of category. In other words, γ 0 is the real mean. In the meantime, γ j for j =, 2, 3,..., K, represents effects of each category on the response variable, making it deviate from the real mean. e) Try the following code in R, discuss its meaning. Answer: First we see the how the code is executed in R, and if there is something going amiss. > #e) 2 >Fe = read. table (" http :// www. uio.no/ studier / emner / matnat / math / STK200 / v7 /fe.txt ",header =T, sep =",") 3 > fit = lm(fe~ form +0, data =Fe) 4 > summary ( fit ) 5 6 Call : 7 lm( formula = Fe ~ form + 0, data = Fe) 8 9 Residuals : 0 Min Q Median 3Q Max
16 3 Coefficients : 4 Estimate Std. Error t value Pr ( > t ) 5 form <2e -6 *** Signif. codes : 8 0 *** 0.00 ** 0.0 * Residual standard error : on 39 degrees of freedom 2 Multiple R- squared : , Adjusted R- squared : F- statistic : 33.6 on and 39 DF, p- value : < 2.2e -6 We see here the problem is that different forms are not being treated as different categories, but rather as a whole predictor variable. In the meantime, as the command corresponds to model 2, there is no intercept coefficient in the linear model as well. But if we put in one extra command as the question suggests, then we can see the model seems to work: > Fe$ form = as. factor (Fe$ form ) 2 > fit = lm(fe~ form +0, data =Fe) 3 > summary ( fit ) 4 5 Call : 6 lm( formula = Fe ~ form + 0, data = Fe) 7 8 Residuals : 9 Min Q Median 3Q Max Coefficients : 3 Estimate Std. Error t value Pr ( > t ) 4 form <2e -6 *** 5 form <2e -6 *** 6 form <2e -6 *** 7 form <2e -6 *** Signif. codes : 20 0 *** 0.00 ** 0.0 * Residual standard error : on 36 degrees of freedom 23 Multiple R- squared : , Adjusted R- squared : F- statistic : on 4 and 36 DF, p- value : < 2.2e -6 This model clearly corresponds to model 2, with α 0 = 0 being the restriction. And each form coefficient in the above table corresponds to one categotical coefficient α j in model 2. f) Try the following code in R, determine which model it corresponds to, and list all the coefficients. Answer: As can be seen in the following result, the first code snippet 6
17 corresponds to model, namely, β = 0, while the second corresponds to model3 where j= γ j = 0 is the restriction. The first model is R s default setting while the restrictions for model 3 are implemented through the command: options(contrasts=c("contr.sum","contr.sum")). > options ( ) $ contrasts 2 [] " contr. treatment " " contr. poly " 3 > fit2 = lm( Fe~form, data =Fe ) 4 > summary ( fit2 ) 5 6 Call : 7 lm( formula = Fe ~ form, data = Fe) 8 9 Residuals : 0 Min Q Median 3Q Max Coefficients : 4 Estimate Std. Error t value Pr ( > t ) 5 ( Intercept ) < 2e -6 *** 6 form form * 8 form e -05 *** Signif. codes : 2 0 *** 0.00 ** 0.0 * Residual standard error : on 36 degrees of freedom 24 Multiple R- squared : , Adjusted R- squared : F- statistic : 0.85 on 3 and 36 DF, p- value : 3.99e > 28 > 29 > options ( contrasts =c(" contr. sum "," contr. sum ")) 30 > options ( ) $ contrasts 3 [] " contr. sum " " contr. sum " 32 > fit3 = lm( Fe~form, data =Fe) 33 > summary ( fit3 ) Call : 36 lm( formula = Fe ~ form, data = Fe) Residuals : 39 Min Q Median 3Q Max Coefficients : 43 Estimate Std. Error t value Pr ( > t ) 44 ( Intercept ) < 2e -6 *** 45 form * 46 form *** 47 form Signif. codes : 50 0 *** 0.00 ** 0.0 *
18 5 52 Residual standard error : on 36 degrees of freedom 53 Multiple R- squared : , Adjusted R- squared : F- statistic : 0.85 on 3 and 36 DF, p- value : 3.99e -05 The three set of regression coefficients correspond respectively to ˆα, ˆβ and ˆγ. The models are equivalent to each other, but the exact coefficients are different. But we have already established the rules connecting them to each other in earlier question. g) Do a variance analysis on the iron types. Make use of the results from the previous questions. Answer: To tell if there is any difference between the four types of iron, we are asking whether β j, where j = 2, 3, 4, and γ j, for j =, 2, 3, 4, are equal to 0 at the same time. In other words, we have the hypothesis: H 0 : β 2 = β 3 = β 4 = 0 against a hypothesis that at least one of them is not 0: We can run the F -test: H α : at least one of β j 0. F = (T SS RSS)/p RSS/(n p ) Fortunately, R has already run the test and printed out the result in the last line of the table, where we can see the P-value for the F -test is 3.99e 05. Therefore we can safely reject H 0. We can also run an anova analysis for the model and the result is the same for both models as well: > anova ( fit2 ) 2 Analysis of Variance Table 3 4 Response : Fe 5 Df Sum Sq Mean Sq F value Pr( >F) 6 form e -05 *** 7 Residuals Signif. codes : 0 0 *** 0.00 ** 0.0 * > anova ( fit3 ) 2 Analysis of Variance Table 3 4 Response : Fe 5 Df Sum Sq Mean Sq F value Pr( >F) 6 form e -05 *** 8
19 7 Residuals Signif. codes : 20 0 *** 0.00 ** 0.0 * h) Suggest a reasonable way to further simplify the model. Answer: We can use the F -test to see if we can leave out q predictor variable. H 0 : β 2 = β 3 =... = β q = 0 Then F = (RSS RSS 0)/q RSS/(n p ) Here RSS 0 is under H 0 while RSS is for the full model. In this way hopefully we can find some predictors that are not quite different from each other, then we can combine the cotegories together. We can also run a Tukey s procedure on this question and see the mean difference among the four types: aov. fit = aov ( Fe~ factor ( form ), data =Fe) 2 summary ( aov. fit ) 3 tukey. fit = TukeyHSD ( aov.fit, ordered =T) 4 plot ( tukey. fit ) Figure 4: Tukey plot for the iron types. We see here that type and 2 are very close to each other, while 3 and 4 is another pair. Therefore this is good chance of further simplifying the model. 9
Multiple Regression Part I STAT315, 19-20/3/2014
Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.
More informationStat 401B Final Exam Fall 2016
Stat 40B Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationDISCRIMINANT ANALYSIS: LDA AND QDA
Stat 427/627 Statistical Machine Learning (Baron) HOMEWORK 6, Solutions DISCRIMINANT ANALYSIS: LDA AND QDA. Chap 4, exercise 5. (a) On a training set, LDA and QDA are both expected to perform well. LDA
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More information1 Use of indicator random variables. (Chapter 8)
1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting
More information1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species
Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationHW1 Roshena MacPherson Feb 1, 2017
HW1 Roshena MacPherson Feb 1, 2017 This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Question 1: In this question we will consider some real
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationANOVA (Analysis of Variance) output RLS 11/20/2016
ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.
More informationStat 5303 (Oehlert): Analysis of CR Designs; January
Stat 5303 (Oehlert): Analysis of CR Designs; January 2016 1 > resin
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationGRAD6/8104; INES 8090 Spatial Statistic Spring 2017
Lab #5 Spatial Regression (Due Date: 04/29/2017) PURPOSES 1. Learn to conduct alternative linear regression modeling on spatial data 2. Learn to diagnose and take into account spatial autocorrelation in
More informationSTAT 213 Two-Way ANOVA II
STAT 213 Two-Way ANOVA II Colin Reimer Dawson Oberlin College May 2, 2018 1 / 21 Outline Two-Way ANOVA: Additive Model FIT: Estimating Parameters ASSESS: Variance Decomposition Pairwise Comparisons 2 /
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationLast updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder
Last updated: Oct 18, 2012 LINEAR REGRESSION Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. Simple Linear Objectives
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationWorkshop 7.4a: Single factor ANOVA
-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.
More informationStat588 Homework 1 (Due in class on Oct 04) Fall 2011
Stat588 Homework 1 (Due in class on Oct 04) Fall 2011 Notes. There are three sections of the homework. Section 1 and Section 2 are required for all students. While Section 3 is only required for Ph.D.
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationFACTORIAL DESIGNS and NESTED DESIGNS
Experimental Design and Statistical Methods Workshop FACTORIAL DESIGNS and NESTED DESIGNS Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Factorial
More informationEmpirical Application of Simple Regression (Chapter 2)
Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget
More informationLinear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.
Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach
More informationMultiple Regression: Example
Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationCOMPARING SEVERAL MEANS: ANOVA
LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationNotes on Maxwell & Delaney
Notes on Maxwell & Delaney PSCH 710 6 Chapter 6 - Trend Analysis Previously, we discussed how to use linear contrasts, or comparisons, to test specific hypotheses about differences among means. Those discussions
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationSTAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis
STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More informationLinear Model Specification in R
Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,
More informationRecall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:
1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted
More informationLecture 19 Multiple (Linear) Regression
Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression
More informationCHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS
CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationPrincipal components
Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,
More informationIII. Inferential Tools
III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More information> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing
Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSKLearn Tutorial: DNN on Boston Data
SKLearn Tutorial: DNN on Boston Data This tutorial follows very closely two other good tutorials and merges elements from both: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/boston.py
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationAdvanced Regression Summer Statistics Institute. Day 3: Transformations and Non-Linear Models
Advanced Regression Summer Statistics Institute Day 3: Transformations and Non-Linear Models 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions of our linear regression model:
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationStatistics Lab #6 Factorial ANOVA
Statistics Lab #6 Factorial ANOVA PSYCH 710 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More information