(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%.

Size: px
Start display at page:

Download "(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%."

Transcription

1 STOR 664 Homework 2 Solution Part A Exercise (Faraway book) Ch2 Ex1 > data(teengamb) > attach(teengamb) > tgl<-lm(gamble ~ sex+status+income+verbal) > summary(tgl) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) sex * status income e-05 *** verbal Signif codes: 0 *** 0001 ** 001 * Residual standard error: 2269 on 42 degrees of freedom Multiple R-squared: 05267, Adjusted R-squared: F-statistic: 1169 on 4 and 42 DF, p-value: 1815e-06 (a) The percentage of variation in the response is given by the Multiple R-squared, which is 5267% (b) The th case has the largest residual > whichmax(tgl$residuals) (c) The mean of the residuals is almost 0( ) and the median is 1451 > mean(tgl$residuals) > median(tgl$residuals) (d) The correlation of the residuals with fitted values is > cor(tgl$residuals,tgl$fittedvalues) (e) The correlation of the residuals with the income is > cor(tgl$residuals,income) (f) Based on the summary, the fitted model can be explicitly written as: gamble = sex status income verbal If all the predictors except sex are held constant, the difference in predicted expenditure on gambling between male (sex=0) and female (sex=1) will be equal to the regression coefficient of sex, ie, Therefore whenever sex changes from male (sex=0) to female (sex=1), the value of gamble decreases by In other words, according to the current regression model, a female spends $ less than a comparable (ie, other predictors being held constant) male on gambling 1

2 Part B Ch3 Ex2 The model is Y = Xβ + ɛ where x 11 x 12 Y = (y 1, y 2,, y n ) x 21 x 22, X =, β = (β 1, β 2 ), ɛ = (ɛ 1, ɛ 2,, ɛ n ) Using those calculations, (X X) 1 = x n1 x n2 ( 1 x 2 i2 ) x i1 x i2 x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 x 2, X Y = i1 ( ) xi1 y i xi2 y i Least Squared Estimation of β is given by Normal equation; ( ˆβ = (X X) 1 X 1 x 2 Y = i2 xi1 y i ) x i1 x i2 xi2 y i x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 xi1 y i + x 2 i1 xi2 y i Clearly estimates are unbiased and Cov( ˆβ) = σ 2 (X X) 1 : Var( ˆβ 1 ) = σ 2 x 2 i2 /A, Cov( ˆβ 1, ˆβ 2 ) = σ 2 x i1 x i2 /A where A = x 2 2 i1 i2 ( x i1 x i2 ) 2 Var( ˆβ 2 ) = σ 2 x 2 i1 /A, Ch3 Ex4 (a) Write considered model as Y = Xβ +ɛ, the statement that ˆθ = c ˆβ is the BLUE for θ = c β is equivalent to say that For any linear unbiased θ = b Y, Var(ˆθ) Var( θ) Now, unbiasedness of θ gives that Eb Y = b Xβ = c β X b = c and Var(c ˆβ) = σ 2 c (X X) 1 c = σ 2 c (X X) 1 X X(X X) 1 c, Var(b Y ) = σ 2 b b Thus it is also equivalent to c (X X) 1 X X(X X) 1 c b b b st X b = c X(X X) 1 c = argmin b b b subject to X b = c This proves the argument (b) Let P = {Xβ R n β R p } be as in the book Define more sets; P c = {b R n X b = c} and P 0 = {b R n X b = 0} First, note that for any Xβ P and for any b P 0, the inner product is 0 (ie (Xβ) b = β X b = 0) So P 0 P The problem in (a) is equivalent to find a vector a P c st a a is minimized With some geometric understanding, it is enough to find a vector a which is orthogonal to b a for all b P c Now, since P 0 P, a is also in P ie a P P c a = Xβ for some β R p a P c = X a = X Xβ = β = (X X) 1 c = a = Xβ = X(X X) 1 c a P c (Optional) Alternatively, it is easily proved by the fact that I H = I X(X X) 1 X is symmetric, idempotent so nnd so that Var( θ) Var(ˆθ) 0 Or, use Cov(ˆθ, θ ˆθ) = 0, so that Var( θ) Var(ˆθ) Ch3 Ex6 (a) Follow scheme (ii), set X = , β = β 1 β 4, ɛ = ɛ 1 ɛ 12, Y = y 1 y 12 2

3 5 1 1 ˆβ = (X X) 1 X Y = X Y (b) Write z i for the weighings of scheme (i), while y i is for scheme (ii) Suppose sd(z i ) = σ, then sd(y i ) = 2σ Then, Var( ˆβ (i) ) = 1 3 σ2 and Var( ˆβ (ii) ) = 5 22 σ 2 = 5 6 σ2 Scheme (i) is superior in this case (c) scheme (i) : weigh each item 10 times 1 10 y Y = X = y = I , β = β 1 β 6, X X = (I )(I ) = I = 10I 6 ( 10 ) ˆβ = (X X) 1 X Y = 10 1 I 6 X Y = 1 60 y i,, y i and 10 i=1 i=51 Var( ˆβ 1 ) = 1 10 σ2 scheme(ii) : weigh each pair 4 times X = , scheme(iii) : weigh each triple 3 times X = , X X = 16I 6 + 4J 6, (X X) 1 = 1 16 I (16+4 6) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 X X = 18I J 6, (X X) 1 = 1 18 I ( ) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 Since Var(iii) < Var(ii) < Var(i), Scheme (iii) is the best Ch3 Ex18 In general, the question what is the best functional form of the relationship should be answered carefully, and there is no correct (or wrong) answer In this problem, strength of yarn would grow in ratio of fiber length, tensile strength of the fibers, and fiber fineness, their relationship would be a multiple form so X 1 = ax2x b 3X c 4 d Fit this through simple linear regression method; the model we regress is log x i1 = β 1 + β 2 log x i2 + β 3 log x i3 + β 4 log x i4 + ɛ i Since ˆβ 3 is not significant(009126(039551)), after omitting log x 3, fitted result is; lm(formula = lx1 ~ lx2 + lx4) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) ** lx e-05 *** 3

4 lx ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 07175, Adjusted R-squared: F-statistic: 2158 on 2 and 17 DF, p-value: 2159e-05 We have s 2 = For (a)-(c), with confidence level α = 005, number of points K = 4 and n = 3, confidence intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (a), t 1 α/2k,n 3 for (b-bonferroni), and 3F 1 α,3,n 3 for (c-scheffe) num ŷ CI for each sci(bonferroni) sci(scheffe) If you want to fit the model without logarithm transformation, the following should be the answer lm(formula = fiber$x1 ~ X2 + X4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** X e-05 *** X ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 641 on 17 degrees of freedom Multiple R-Squared: 07453, Adjusted R-squared: F-statistic: 88 on 2 and 17 DF, p-value: 8931e-06 and confidence intervals are given: num ŷ CI for each sci(bonferroni) sci(scheffe) (d) Each interval of Bonferroni is narrower than corresponding interval of Scheffe method Thus, Bonferroni method is preferred Ch3 Ex19 (a) Call: lm(formula = protein ~ L1 + L2 + L3 + L4 + L5 + L6) Residuals: Min 1Q Median 3Q Max

5 Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) * L L L ** L ** L L Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 09821, Adjusted R-squared: F-statistic: 1559 on 6 and 17 DF, p-value: 6654e-14 Residual sum of square is 17s 2 = (b)(l1, L2, L5, L6) regression coefficients of which are smaller in magnitude than twice their standard errors respectively Thus we shall keep L3 and L4 only lm(formula = protein ~ L3 + L4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** L < 2e-16 *** L e-16 *** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 077 on 21 degrees of freedom Multiple R-Squared: 09721, Adjusted R-squared: F-statistic: 3662 on 2 and 21 DF, p-value: < 22e-16 Residual sum of square is 17s 2 = (c) Set H 0 : model in (b) vs H 1 : model in (a) Model 1: protein ~ L3 + L4 Model 2: protein ~ L1 + L2 + L3 + L4 + L5 + L6 ResDf RSS Df Sum of Sq F Pr(>F) P-value is 00918; not quite small enough to reject H 0 Model in (b) is good (d) With confidence level α = 010, number of points K = 6, p = 3 and n =, prediction intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (i), t 1 α/2k,n 3 for (ii-bonferroni), and 6F 1 α,6,n 3 for (iii-scheffe) 5

6 num ŷ PI for each spi(bonferroni) spi(scheffe) Bonferroni is more preferable Part C Ch3 Ex7 (a) Given level α test in ANOVA model is to reject H 0 if F F I 1,n I,1 α Power function is given by β(δ) = P δ (test rejects H 0 ) = P(F F I 1,n I,1 α ) where F non-central F I 1,n I,δ To get δ, use substitution rule, δ 2 σ 2 = n i (y i ȳ) 2 yi=θ i,ȳ= θ = n i (θ i θ) 2 Now by the following claim, conclude that the power is maximized when δ is the greatest possible ie ni (θ i θ) 2 subject to n i = n is maximized Claim: Let F δ F m,n,δ for some m, n, then P(F δ1 a) P(F δ2 a) if δ 1 δ 2 0 (Heuristic argument: observe that mass of F m,n,δ tends to move toward right when δ increases) proof: Let Z N(0, 1), X χ 2 m 1, Y χ 2 n and independent with each other Write their densities as f Z, f X and f Y resp then if P((Z + δ) 2 a) decreases as δ decreases, P(F δ1 a) = P( (Z + δ 1) 2 + X a n Y m ) = P((Z + δ 1) 2 + X a Y ) = = P(F δ2 a) P((Z + δ 1 ) 2 a y x)f X (x)f Y (y)dxdy P((Z + δ 2 ) 2 a y x)f X (x)f Y (y)dxdy where a = a n m Thus it is left to prove that P((Z + δ) 2 a) decreases as δ decreases Equivalently, P((Z + δ) 2 a) increases as δ decreases, since {(Z + δ) 2 a} = { a δ Z a δ} tends to cover more neighborhood of 0 (where most mass of N(0, 1) is on) when δ decreases One can prove this argument more mathematically (b) When I = 2, n i = na i, a 1 + a 2 = 1, write f(a 1 ) = n i (θ i θ) 2 = a 1 (θ 1 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 + (1 a 1 )(θ 2 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 The fact that f(a 1 ) is concave with peak at a 1 = 1/2 proves the argument (c) Write v = θ 3 θ 2 = θ 2 θ 1 Let B j = a i (θ i θ) 2 where j indicates allocation scheme (i) and (ii) Since δ = n B j, scheme (j) with larger B j gives more powerful test (j = i, ii) Calculation gives B 1 = 2/3v 2, B 2 = v 2 Thus, scheme (ii) gives more power and if we have 3/2n instead of n for total number of samples, then the power would be approximately the same (note that the second df of the F distributions would be different then due to the different sample sizes) 6

7 Appendix: Sample code for Ex18 and Ex19 #### #318 fiber <- readtable("d:/2006-fall/stat664/hw2/fiber2dat") names(fiber) <- c("no","x1","x2","x3","x4") lfiber <- log(fiber) names(lfiber) <- c("no","lx1","lx2","lx3","lx4") X2 <- fiber$x2 - mean(fiber$x2) X3 <- fiber$x3 - mean(fiber$x3) X4 <- fiber$x4 - mean(fiber$x4) rm <- lm(fiber$x1 ~ X2+X4) logrm <- lm(lx1 ~ lx2+lx4, data=lfiber) summary(rm) summary(logrm) #### confidence interval for mean X1 xc <- t(matrix(ncol=4, nrow=3, c(75- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),75- mean(fiber$x3),42 - mean(fiber$x4), 65- mean(fiber$x2),80- mean(fiber$x3),40 - mean(fiber$x4)))) xc<-dataframe(xc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(rm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*p$sefit, p$fit + qt(1-a/2,rm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),rm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit) #### confidence interval for logx1 xc <- t(matrix(ncol=4, nrow=3, c(75,70,45, 80,70,45, 80,75,42, 65,80,40))) logxc <- log(xc) xc<-dataframe(logxc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(logrm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,logrm$dfresidual)*p$sefit, p$fit + qt(1-a/2,logrm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),logrm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),logrm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit) 7

8 #### #319 gw <- readtable("d:/2007-fall/664 - solution/hw2/proteindat") names(gw) <- c("no","protein","l1","l2","l3","l4","l5","l6") attach(gw) #par(mfrow=c(2,3)) #### (a) Fit linear model with all covariates fm <- lm(protein~l1+l2+l3+l4+l5+l6) summary(fm) ####(b) ## (L1, L2, L5, L6) regression coefficients of which are smaller in magnitude ## than twice their standard errors resp rm <- lm(protein~l3+l4) summary(rm) ####(c) anova(rm,fm) ## H0 : rm ## H1 : fm ## p-value is 00918; not quite small enough to reject H0(rm) ## model in (b) (rm) is good #### confidence interval for rm in (b) a <- 010 ## 1-a prediction intervals K <- 6 ## number of points pt <- readtable("d:/2007-fall/664 - solution/hw2/reflectdat") pt_rm<-pt[,4:5] names(pt_rm)<-c("l3","l4") ####(d)-i p <- predict(rm, dataframe(pt_rm),sefit=t) ## prediction sep <- p$residualscale * sqrt(1+(p$sefit/p$residualscale)^2) ##se for PI p$fit PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*sep, p$fit + qt(1-a/2,rm$dfresidual)*sep) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*sep, p$fit + qt(1-a/(2*k),rm$dfresidual)*sep) PI_Sche <- cbind(p$fit - sqrt(6*qf(1-005,6,rm$dfresidual))*sep, p$fit + sqrt(6*qf(1-005,6,rm$dfresidual))*sep) 8

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

CHAPTER 2 SIMPLE LINEAR REGRESSION

CHAPTER 2 SIMPLE LINEAR REGRESSION CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Nested 2-Way ANOVA as Linear Models - Unbalanced Example Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5 Lecture 5: Comparing Treatment Means Montgomery: Section 3-5 Page 1 Linear Combination of Means ANOVA: y ij = µ + τ i + ɛ ij = µ i + ɛ ij Linear combination: L = c 1 µ 1 + c 1 µ 2 +...+ c a µ a = a i=1

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Model Specification and Data Problems. Part VIII

Model Specification and Data Problems. Part VIII Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model Prof. Alan Wan 1 / 57 Table of contents 1. Assumptions in the Linear Regression Model 2 / 57

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

The Distribution of F

The Distribution of F The Distribution of F It can be shown that F = SS Treat/(t 1) SS E /(N t) F t 1,N t,λ a noncentral F-distribution with t 1 and N t degrees of freedom and noncentrality parameter λ = t i=1 n i(µ i µ) 2

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix Regression Analysis lab 3 1 Multiple linear regression 1.1 Import data delivery

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information