(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%.
|
|
- Erik McBride
- 6 years ago
- Views:
Transcription
1 STOR 664 Homework 2 Solution Part A Exercise (Faraway book) Ch2 Ex1 > data(teengamb) > attach(teengamb) > tgl<-lm(gamble ~ sex+status+income+verbal) > summary(tgl) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) sex * status income e-05 *** verbal Signif codes: 0 *** 0001 ** 001 * Residual standard error: 2269 on 42 degrees of freedom Multiple R-squared: 05267, Adjusted R-squared: F-statistic: 1169 on 4 and 42 DF, p-value: 1815e-06 (a) The percentage of variation in the response is given by the Multiple R-squared, which is 5267% (b) The th case has the largest residual > whichmax(tgl$residuals) (c) The mean of the residuals is almost 0( ) and the median is 1451 > mean(tgl$residuals) > median(tgl$residuals) (d) The correlation of the residuals with fitted values is > cor(tgl$residuals,tgl$fittedvalues) (e) The correlation of the residuals with the income is > cor(tgl$residuals,income) (f) Based on the summary, the fitted model can be explicitly written as: gamble = sex status income verbal If all the predictors except sex are held constant, the difference in predicted expenditure on gambling between male (sex=0) and female (sex=1) will be equal to the regression coefficient of sex, ie, Therefore whenever sex changes from male (sex=0) to female (sex=1), the value of gamble decreases by In other words, according to the current regression model, a female spends $ less than a comparable (ie, other predictors being held constant) male on gambling 1
2 Part B Ch3 Ex2 The model is Y = Xβ + ɛ where x 11 x 12 Y = (y 1, y 2,, y n ) x 21 x 22, X =, β = (β 1, β 2 ), ɛ = (ɛ 1, ɛ 2,, ɛ n ) Using those calculations, (X X) 1 = x n1 x n2 ( 1 x 2 i2 ) x i1 x i2 x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 x 2, X Y = i1 ( ) xi1 y i xi2 y i Least Squared Estimation of β is given by Normal equation; ( ˆβ = (X X) 1 X 1 x 2 Y = i2 xi1 y i ) x i1 x i2 xi2 y i x 2 2 i1 i2 ( x i1 x i2 ) 2 x i1 x i2 xi1 y i + x 2 i1 xi2 y i Clearly estimates are unbiased and Cov( ˆβ) = σ 2 (X X) 1 : Var( ˆβ 1 ) = σ 2 x 2 i2 /A, Cov( ˆβ 1, ˆβ 2 ) = σ 2 x i1 x i2 /A where A = x 2 2 i1 i2 ( x i1 x i2 ) 2 Var( ˆβ 2 ) = σ 2 x 2 i1 /A, Ch3 Ex4 (a) Write considered model as Y = Xβ +ɛ, the statement that ˆθ = c ˆβ is the BLUE for θ = c β is equivalent to say that For any linear unbiased θ = b Y, Var(ˆθ) Var( θ) Now, unbiasedness of θ gives that Eb Y = b Xβ = c β X b = c and Var(c ˆβ) = σ 2 c (X X) 1 c = σ 2 c (X X) 1 X X(X X) 1 c, Var(b Y ) = σ 2 b b Thus it is also equivalent to c (X X) 1 X X(X X) 1 c b b b st X b = c X(X X) 1 c = argmin b b b subject to X b = c This proves the argument (b) Let P = {Xβ R n β R p } be as in the book Define more sets; P c = {b R n X b = c} and P 0 = {b R n X b = 0} First, note that for any Xβ P and for any b P 0, the inner product is 0 (ie (Xβ) b = β X b = 0) So P 0 P The problem in (a) is equivalent to find a vector a P c st a a is minimized With some geometric understanding, it is enough to find a vector a which is orthogonal to b a for all b P c Now, since P 0 P, a is also in P ie a P P c a = Xβ for some β R p a P c = X a = X Xβ = β = (X X) 1 c = a = Xβ = X(X X) 1 c a P c (Optional) Alternatively, it is easily proved by the fact that I H = I X(X X) 1 X is symmetric, idempotent so nnd so that Var( θ) Var(ˆθ) 0 Or, use Cov(ˆθ, θ ˆθ) = 0, so that Var( θ) Var(ˆθ) Ch3 Ex6 (a) Follow scheme (ii), set X = , β = β 1 β 4, ɛ = ɛ 1 ɛ 12, Y = y 1 y 12 2
3 5 1 1 ˆβ = (X X) 1 X Y = X Y (b) Write z i for the weighings of scheme (i), while y i is for scheme (ii) Suppose sd(z i ) = σ, then sd(y i ) = 2σ Then, Var( ˆβ (i) ) = 1 3 σ2 and Var( ˆβ (ii) ) = 5 22 σ 2 = 5 6 σ2 Scheme (i) is superior in this case (c) scheme (i) : weigh each item 10 times 1 10 y Y = X = y = I , β = β 1 β 6, X X = (I )(I ) = I = 10I 6 ( 10 ) ˆβ = (X X) 1 X Y = 10 1 I 6 X Y = 1 60 y i,, y i and 10 i=1 i=51 Var( ˆβ 1 ) = 1 10 σ2 scheme(ii) : weigh each pair 4 times X = , scheme(iii) : weigh each triple 3 times X = , X X = 16I 6 + 4J 6, (X X) 1 = 1 16 I (16+4 6) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 X X = 18I J 6, (X X) 1 = 1 18 I ( ) J 6, ˆβ = (X X) 1 X Y, Var( ˆβ 1 ) = σ2 Since Var(iii) < Var(ii) < Var(i), Scheme (iii) is the best Ch3 Ex18 In general, the question what is the best functional form of the relationship should be answered carefully, and there is no correct (or wrong) answer In this problem, strength of yarn would grow in ratio of fiber length, tensile strength of the fibers, and fiber fineness, their relationship would be a multiple form so X 1 = ax2x b 3X c 4 d Fit this through simple linear regression method; the model we regress is log x i1 = β 1 + β 2 log x i2 + β 3 log x i3 + β 4 log x i4 + ɛ i Since ˆβ 3 is not significant(009126(039551)), after omitting log x 3, fitted result is; lm(formula = lx1 ~ lx2 + lx4) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) ** lx e-05 *** 3
4 lx ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 07175, Adjusted R-squared: F-statistic: 2158 on 2 and 17 DF, p-value: 2159e-05 We have s 2 = For (a)-(c), with confidence level α = 005, number of points K = 4 and n = 3, confidence intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (a), t 1 α/2k,n 3 for (b-bonferroni), and 3F 1 α,3,n 3 for (c-scheffe) num ŷ CI for each sci(bonferroni) sci(scheffe) If you want to fit the model without logarithm transformation, the following should be the answer lm(formula = fiber$x1 ~ X2 + X4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** X e-05 *** X ** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 641 on 17 degrees of freedom Multiple R-Squared: 07453, Adjusted R-squared: F-statistic: 88 on 2 and 17 DF, p-value: 8931e-06 and confidence intervals are given: num ŷ CI for each sci(bonferroni) sci(scheffe) (d) Each interval of Bonferroni is narrower than corresponding interval of Scheffe method Thus, Bonferroni method is preferred Ch3 Ex19 (a) Call: lm(formula = protein ~ L1 + L2 + L3 + L4 + L5 + L6) Residuals: Min 1Q Median 3Q Max
5 Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) * L L L ** L ** L L Signif codes: 0 *** 0001 ** 001 * Residual standard error: on 17 degrees of freedom Multiple R-Squared: 09821, Adjusted R-squared: F-statistic: 1559 on 6 and 17 DF, p-value: 6654e-14 Residual sum of square is 17s 2 = (b)(l1, L2, L5, L6) regression coefficients of which are smaller in magnitude than twice their standard errors respectively Thus we shall keep L3 and L4 only lm(formula = protein ~ L3 + L4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) < 2e-16 *** L < 2e-16 *** L e-16 *** Signif codes: 0 *** 0001 ** 001 * Residual standard error: 077 on 21 degrees of freedom Multiple R-Squared: 09721, Adjusted R-squared: F-statistic: 3662 on 2 and 21 DF, p-value: < 22e-16 Residual sum of square is 17s 2 = (c) Set H 0 : model in (b) vs H 1 : model in (a) Model 1: protein ~ L3 + L4 Model 2: protein ~ L1 + L2 + L3 + L4 + L5 + L6 ResDf RSS Df Sum of Sq F Pr(>F) P-value is 00918; not quite small enough to reject H 0 Model in (b) is good (d) With confidence level α = 010, number of points K = 6, p = 3 and n =, prediction intervals are of the form [ŷ ± t se(ŷ)] where t = t 1 α/2,n 3 for (i), t 1 α/2k,n 3 for (ii-bonferroni), and 6F 1 α,6,n 3 for (iii-scheffe) 5
6 num ŷ PI for each spi(bonferroni) spi(scheffe) Bonferroni is more preferable Part C Ch3 Ex7 (a) Given level α test in ANOVA model is to reject H 0 if F F I 1,n I,1 α Power function is given by β(δ) = P δ (test rejects H 0 ) = P(F F I 1,n I,1 α ) where F non-central F I 1,n I,δ To get δ, use substitution rule, δ 2 σ 2 = n i (y i ȳ) 2 yi=θ i,ȳ= θ = n i (θ i θ) 2 Now by the following claim, conclude that the power is maximized when δ is the greatest possible ie ni (θ i θ) 2 subject to n i = n is maximized Claim: Let F δ F m,n,δ for some m, n, then P(F δ1 a) P(F δ2 a) if δ 1 δ 2 0 (Heuristic argument: observe that mass of F m,n,δ tends to move toward right when δ increases) proof: Let Z N(0, 1), X χ 2 m 1, Y χ 2 n and independent with each other Write their densities as f Z, f X and f Y resp then if P((Z + δ) 2 a) decreases as δ decreases, P(F δ1 a) = P( (Z + δ 1) 2 + X a n Y m ) = P((Z + δ 1) 2 + X a Y ) = = P(F δ2 a) P((Z + δ 1 ) 2 a y x)f X (x)f Y (y)dxdy P((Z + δ 2 ) 2 a y x)f X (x)f Y (y)dxdy where a = a n m Thus it is left to prove that P((Z + δ) 2 a) decreases as δ decreases Equivalently, P((Z + δ) 2 a) increases as δ decreases, since {(Z + δ) 2 a} = { a δ Z a δ} tends to cover more neighborhood of 0 (where most mass of N(0, 1) is on) when δ decreases One can prove this argument more mathematically (b) When I = 2, n i = na i, a 1 + a 2 = 1, write f(a 1 ) = n i (θ i θ) 2 = a 1 (θ 1 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 + (1 a 1 )(θ 2 (a 1 θ 1 + (1 a 1 )θ 2 )) 2 The fact that f(a 1 ) is concave with peak at a 1 = 1/2 proves the argument (c) Write v = θ 3 θ 2 = θ 2 θ 1 Let B j = a i (θ i θ) 2 where j indicates allocation scheme (i) and (ii) Since δ = n B j, scheme (j) with larger B j gives more powerful test (j = i, ii) Calculation gives B 1 = 2/3v 2, B 2 = v 2 Thus, scheme (ii) gives more power and if we have 3/2n instead of n for total number of samples, then the power would be approximately the same (note that the second df of the F distributions would be different then due to the different sample sizes) 6
7 Appendix: Sample code for Ex18 and Ex19 #### #318 fiber <- readtable("d:/2006-fall/stat664/hw2/fiber2dat") names(fiber) <- c("no","x1","x2","x3","x4") lfiber <- log(fiber) names(lfiber) <- c("no","lx1","lx2","lx3","lx4") X2 <- fiber$x2 - mean(fiber$x2) X3 <- fiber$x3 - mean(fiber$x3) X4 <- fiber$x4 - mean(fiber$x4) rm <- lm(fiber$x1 ~ X2+X4) logrm <- lm(lx1 ~ lx2+lx4, data=lfiber) summary(rm) summary(logrm) #### confidence interval for mean X1 xc <- t(matrix(ncol=4, nrow=3, c(75- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),70- mean(fiber$x3),45 - mean(fiber$x4), 80- mean(fiber$x2),75- mean(fiber$x3),42 - mean(fiber$x4), 65- mean(fiber$x2),80- mean(fiber$x3),40 - mean(fiber$x4)))) xc<-dataframe(xc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(rm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*p$sefit, p$fit + qt(1-a/2,rm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),rm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,rm$dfresidual))*p$sefit) #### confidence interval for logx1 xc <- t(matrix(ncol=4, nrow=3, c(75,70,45, 80,70,45, 80,75,42, 65,80,40))) logxc <- log(xc) xc<-dataframe(logxc[,c(1,3)]) names(xc) <- c("x2","x4") a <- 005 ## 1-a prediction intervals K <- 4 ## number of points p <- predict(logrm, xc,sefit=t) ## prediction PI_each <- cbind(p$fit - qt(1-a/2,logrm$dfresidual)*p$sefit, p$fit + qt(1-a/2,logrm$dfresidual)*p$sefit) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),logrm$dfresidual)*p$sefit, p$fit + qt(1-a/(2*k),logrm$dfresidual)*p$sefit) PI_Sche <- cbind(p$fit - sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit, p$fit + sqrt(3*qf(1-005,3,logrm$dfresidual))*p$sefit) 7
8 #### #319 gw <- readtable("d:/2007-fall/664 - solution/hw2/proteindat") names(gw) <- c("no","protein","l1","l2","l3","l4","l5","l6") attach(gw) #par(mfrow=c(2,3)) #### (a) Fit linear model with all covariates fm <- lm(protein~l1+l2+l3+l4+l5+l6) summary(fm) ####(b) ## (L1, L2, L5, L6) regression coefficients of which are smaller in magnitude ## than twice their standard errors resp rm <- lm(protein~l3+l4) summary(rm) ####(c) anova(rm,fm) ## H0 : rm ## H1 : fm ## p-value is 00918; not quite small enough to reject H0(rm) ## model in (b) (rm) is good #### confidence interval for rm in (b) a <- 010 ## 1-a prediction intervals K <- 6 ## number of points pt <- readtable("d:/2007-fall/664 - solution/hw2/reflectdat") pt_rm<-pt[,4:5] names(pt_rm)<-c("l3","l4") ####(d)-i p <- predict(rm, dataframe(pt_rm),sefit=t) ## prediction sep <- p$residualscale * sqrt(1+(p$sefit/p$residualscale)^2) ##se for PI p$fit PI_each <- cbind(p$fit - qt(1-a/2,rm$dfresidual)*sep, p$fit + qt(1-a/2,rm$dfresidual)*sep) PI_Bonf <- cbind(p$fit - qt(1-a/(2*k),rm$dfresidual)*sep, p$fit + qt(1-a/(2*k),rm$dfresidual)*sep) PI_Sche <- cbind(p$fit - sqrt(6*qf(1-005,6,rm$dfresidual))*sep, p$fit + sqrt(6*qf(1-005,6,rm$dfresidual))*sep) 8
Linear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationThe Simple Regression Model. Part II. The Simple Regression Model
Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationAdvanced Quantitative Methods: ordinary least squares
Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationChapter 3: Multiple Regression. August 14, 2018
Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationStudy Sheet. December 10, The course PDF has been updated (6/11). Read the new one.
Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationNested 2-Way ANOVA as Linear Models - Unbalanced Example
Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationBooklet of Code and Output for STAC32 Final Exam
Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure
More informationAMS-207: Bayesian Statistics
Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLinear Models and Estimation by Least Squares
Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationChapter 12: Linear regression II
Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model
More informationLecture 5: Comparing Treatment Means Montgomery: Section 3-5
Lecture 5: Comparing Treatment Means Montgomery: Section 3-5 Page 1 Linear Combination of Means ANOVA: y ij = µ + τ i + ɛ ij = µ i + ɛ ij Linear combination: L = c 1 µ 1 + c 1 µ 2 +...+ c a µ a = a i=1
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationHomework 9 Sample Solution
Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationExercise 2 SISG Association Mapping
Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationStat 401B Final Exam Fall 2015
Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More information3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.
7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of
More informationSCHOOL OF MATHEMATICS AND STATISTICS
SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination
More informationRecall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:
1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationModel Specification and Data Problems. Part VIII
Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means
More informationIES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc
IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationCHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model
CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model Prof. Alan Wan 1 / 57 Table of contents 1. Assumptions in the Linear Regression Model 2 / 57
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationMarcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design
Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationThe Distribution of F
The Distribution of F It can be shown that F = SS Treat/(t 1) SS E /(N t) F t 1,N t,λ a noncentral F-distribution with t 1 and N t degrees of freedom and noncentrality parameter λ = t i=1 n i(µ i µ) 2
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationRegression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix
Regression Analysis lab 3 1 Multiple linear regression 1.1 Import data delivery
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More information