CAS MA575 Linear Models

Similar documents
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Simple Linear Regression

Inference for Regression

Stat 5102 Final Exam May 14, 2015

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Correlation and Regression

MS&E 226: Small Data

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

MATH 644: Regression Analysis Methods

Chapter 12 - Lecture 2 Inferences about regression coefficient

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Section 4.6 Simple Linear Regression

ST430 Exam 2 Solutions

Simple Linear Regression

ST430 Exam 1 with Answers

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Handout 4: Simple Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

14 Multiple Linear Regression

Ch 2: Simple Linear Regression

STAT 100C: Linear models

Chapter 8 Conclusion

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Variance Decomposition and Goodness of Fit

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Multiple Linear Regression

36-707: Regression Analysis Homework Solutions. Homework 3

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Diagnostics and Transformations Part 2

MODELS WITHOUT AN INTERCEPT

Lecture 1: Linear Models and Applications

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Chapter 14. Linear least squares

Lecture 15. Hypothesis testing in the linear model

Lecture 18: Simple Linear Regression

Regression diagnostics

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

13 Simple Linear Regression

Math 423/533: The Main Theoretical Topics

General Linear Model (Chapter 4)

Lecture 34: Properties of the LSE

Lecture 4 Multiple linear regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

STAT 100C: Linear models

Concordia University (5+5)Q 1.

1 Multiple Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Multiple Linear Regression

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Regression and the 2-Sample t

Swarthmore Honors Exam 2012: Statistics

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Scatter plot of data from the study. Linear Regression

where x and ȳ are the sample means of x 1,, x n

Dealing with Heteroskedasticity

Lecture 6 Multiple Linear Regression, cont.

Next is material on matrix rank. Please see the handout

Tests of Linear Restrictions

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Simple Linear Regression

Simple and Multiple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Booklet of Code and Output for STAC32 Final Exam

Multiple Regression: Example

AMS-207: Bayesian Statistics

[y i α βx i ] 2 (2) Q = i=1

STA 2101/442 Assignment Four 1

Ch 3: Multiple Linear Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Chapter 3: Multiple Regression. August 14, 2018

Scatter plot of data from the study. Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

2 Regression Analysis

Introduction and Background to Multilevel Analysis

Linear Model Specification in R

Second Midterm Exam Name: Solutions March 19, 2014

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Linear Regression (9/11/13)

2.1 Linear regression with matrices

1 Introduction 1. 2 The Multiple Regression Model 1

STAT 350: Summer Semester Midterm 1: Solutions

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Matrix Approach to Simple Linear Regression: An Overview

11 Hypothesis Testing

Statistical Inference. Part IV. Statistical Inference

(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%.

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Transcription:

CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers found in your Blue Book(s). 1. Simple Linear Regression [60pts] In this section, we are considering a standard linear regression model with intercept on pairs of data points (y i, x i ), such that the mean and variance functions are for every i = 1,..., n. E[Y i X = x i ] = β 0 + β 1 x i, and Var[Y i X = x i ] = σ 2, 1. [10pts] In the following list, identify the quantities that are treated as not random: Y i, x i, β 1, β1, e i, ê i, ŷ i, n. Here, x i, β 1 and n are the sole quantities, which are unambiguously non-random. Three points should be deducted if either of them are not included. Moreover, Y i is random, so three points should also be deducted if this is included. However, no points should be deducted for including β 1, e i, ê i and ŷ i, which are in lower-cases, and therefore possibly non-random. 2. [10pts] What is the relationship between the OLS estimator, β 1 := SXY SXX, and the estimated correlation coefficient r xy between the y i s and the x i s? It suffices to show that: β 1 = n (x i x)(y i ȳ) n (x = Ĉov[X, Y ] σ y = r i x) 2 xy, Var[X] σ x 1

using the definition of the correlation coefficient, r xy := σ2 xy Ĉov[X, Y ] =. σ x σ y Var[X] Var[Y ] 3. [20pts] How is the F -test testing the null hypothesis, H 0 : E[Y X = x] = β 0, versus the alternative hypothesis, H 1 : E[Y X = x] = β 0 + β 1 x, related to the R 2 for this model? The F -statistic and R 2 have identical numerators, but different denominators, F = SSreg RSS( β 0, β 1 )/(n 2) = SSreg σ 2, and R 2 = SSreg RSS( β 0 ) = SSreg SYY. 4. [20pts] Show that the F -statistic for testing the null hypothesis H 0 : E[Y X = x] = β 0 is equal to the square of the t-statistic for testing H 0 : β 1 = 0, using the fact, Taking the square of the t-statistic for β 1, we have t 2 1 = ( β1 SSreg = RSS 1 ( β 0 ) RSS 2 ( β 0, β 1 ) = SXY2 SXX. se( β 1 ) ) 2 = 2. Multiple Linear Regression [40pts] β 2 1 σ 2 / SXX = SXY2 / SXX 2 σ 2 / SXX = SXY2 σ 2 SXX = F. Here, we are considering a multiple regression model with intercept on pairs of data points (y i, x i ), where x i := [x i0,..., x ip ] T, such that the mean and variance functions are respectively E[y X] = Xβ, and Var[y X] = σ 2 I n. Throughout this section, the design matrix X of order (n p ) is assumed to be full-rank, with p := p + 1. 1. [20pts] Compute the variance of the random vector of OLS estimators, β := (X T X) 1 X T y. The variance of this vector of estimators can be derived using the now familiar formula for the covariance matrix of a random vector, Var[Ay] = A Var[y]A T, and recalling that (X T X) 1 is self-transpose; such that we obtain Var[ β X] = Var[(X T X) 1 X T y X] = (X T X) 1 X T Var[y X] ( (X T X) 1 X T ) T = (X T X) 1 X T σ 2 I n X(X T X) 1 = σ 2 [(X T X) 1 (X T X)(X T X) 1 = σ 2 (X T X) 1. Department of Mathematics and Statistics, Boston University 2

2. [20pts] Show that the vector of residuals ê and the vector of fitted values ŷ are orthogonal to each other, in the sense that ê T ŷ = 0. Using the hat matrix, we have ê := (I H)y, and ŷ := Hy, and therefore 3. Maximum Likelihood [40pts] ê T ŷ = [(I H)y] T Hy = y T (I H) T Hy = y T (I H)Hy = y T (H HH)y = y T (H H)y = 0. In this section, the multiple linear regression model is identical to the one in the previous section. In addition, we are also assuming that ind y i N(x T i β, σ 2 ), i = 1,..., n. This gives the following likelihood function, L(β, σ 2 ; y, X) := n { 1 exp 1 } 2πσ 2 2σ 2 (y i x T i β) 2. (1) 1. [20pts] Show that the OLS and MLE estimators for the vector β are identical, such that β MLE := argmax L(β, σ 2 ; y, X) = argmin RSS(β) =: β OLS. β R p β R p First, we take the log of the likelihood function, log L(β, σ 2 ; y, X) = n ( { 1 log exp 1 }) 2πσ 2 2σ 2 (y i x T i β) 2 = n 2 log(2πσ2 ) 1 2σ 2 n (y i x T i β) 2. We can then omit the first term in the log-likelihood since this does not depend on β. This gives us an expression, which is closely related to the residual sum of squares. log L(β, σ 2 ; y, X) = n 2 log(2πσ2 ) 1 2σ 2 (y Xβ)T (y Xβ). Clearly, the second term is proportional to the RSS for β, and one can maximize this quantity after ignoring the first term. Department of Mathematics and Statistics, Boston University 3

2. [20pts] Given that you already know β MLE, maximize the likelihood function in equation (1), with respect to σ 2. We have seen that we can exploit the orthogonality of β and σ 2 in a Normal model, in order to maximize the likelihood by selecting these two sets of parameters independently of each other. Thus, once we have chosen β MLE, it suffices to select which gives Then, straightforwardly we have, σ MLE 2 := argmax log L( β MLE, σ 2 ; y, X), σ 2 R + ( n σ 2 2 log(2π) + n 2 log(σ2 ) + 1 2σ 2 ) n (y i x T β i MLE ) 2 = 0. n σ 2 2 log(σ2 ) = 1 σ 2 2σ 2 RSS( β MLE ) n 1 2 σ 2 = 1 2σ 4 RSS( β MLE ) n 1 σ 2 = 1 σ 4 RSS( β MLE ) σ 2 = 1 n RSS( β MLE ). 4. Data Analysis [60pts] You have been asked to re-analyze a data set originally published by Ericksen, Kadane and Tukey, in 1989. These authors were interested in the 1980 Census of Population and Housing. The data set represents 66 geographical areas in the United-States. In each of these areas, three variables have been collected during the census: Crime: Rate of serious crimes per 1000 inhabitants in that area. Poverty: Percentage of inhabitants living below the poverty line. Language: Percentage of inhabitants having difficulty speaking or writing English. The purpose of this particular study is to predict crime on the basis of the two other variables. A matrix scatterplot showing the marginal distribution of these variables, and their pairwise correlations has been provided in figure 1. Moreover, the correlation matrix between these variables is also given below: crime poverty language crime 1.0000000 0.3691061 0.5116460 poverty 0.3691061 1.0000000 0.1515658 language 0.5116460 0.1515658 1.0000000 1. [30pts] We fit the following multiple regression model in R: and obtain this summary output: Crime Poverty, (2) Department of Mathematics and Statistics, Boston University 4

crime 10 15 20 40 60 80 120 10 15 20 poverty 40 60 80 100 120 140 0 2 4 6 8 10 12 0 2 4 6 8 10 12 language Figure 1. Scatterplot matrix for the three variables in the Census of Population and Housing. Call: lm(formula = crime ~ poverty, data = data) Residuals: Min 1Q Median 3Q Max -50.449-13.583-3.182 16.691 62.857 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 35.4471 9.1527 3.873 0.000255 *** poverty 2.0503 0.6453 3.177 0.002290 ** Residual standard error: 23.31 on 64 degrees of freedom Multiple R-squared: 0.1362, Adjusted R-squared: 0.1227 F-statistic: 10.09 on 1 and 64 DF, p-value: 0.00229 Next, we fit the following multiple regression model in R: Crime Poverty + Language, (3) and produce a new summary output for this model. Can you anticipate how (a) the estimate for β, Department of Mathematics and Statistics, Boston University 5

(b) the t-statistic and (c) the p-value for poverty will differ, and explain why they will differ? Here is the summary output in R, after including Language in the model: Call: lm(formula = crime ~ poverty + language, data = data) Residuals: Min 1Q Median 3Q Max -38.188-10.638-1.675 8.426 72.874 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 31.6251 8.0542 3.927 0.000216 *** poverty 1.6576 0.5713 2.901 0.005114 ** language 4.7310 1.0433 4.535 2.64e-05 *** Residual standard error: 20.4 on 63 degrees of freedom Multiple R-squared: 0.3488, Adjusted R-squared: 0.3281 F-statistic: 16.87 on 2 and 63 DF, p-value: 1.356e-06 Since poverty and language are only weakly correlated the estimate of β for poverty will not be substantially affected by the introduction of this new variable. However, because language has the highest correlation with crime, it follows that it will account for a substantial amount of variability in the observed variable, crime, thereby slightly decreasing the variance in crime explained by poverty. Altogether, (a) [10pts] the estimate of β for poverty will slightly decrease, (b) [10pts] its t-value will also slightly decrease, (c) [10pts] and its p-value will consequently increase. 2. [30pts] We now consider the ANOVA table for the model described in equation (3): Analysis of Variance Table Response: crime Df Sum Sq Mean Sq F value Pr(>F) poverty 1 5486.6 5486.6 13.180 0.000569 *** language 1 8559.6 8559.6 20.562 2.645e-05 *** Residuals 63 26225.5 416.3 This model is compared to another one for which we have changed the ordering of the variables, such that we fit, Crime Language + Poverty, (4) Department of Mathematics and Statistics, Boston University 6

and produce a new ANOVA table for this model. Can you anticipate how (a) the sum of squares, (b) the F -statistic, and (c) the p-value for language will change, and justify your answers? That is, which of these quantities is likely to increase/decrease? Here is the ANOVA output in R, after changing the order of the variables, as shown in equation (4): Analysis of Variance Table Response: crime Df Sum Sq Mean Sq F value Pr(>F) language 1 10542.4 10542.4 25.3254 4.306e-06 *** poverty 1 3503.8 3503.8 8.4171 0.005114 ** Residuals 63 26225.5 416.3 Since language and poverty are weakly correlated, it follows that changing the ordering of the variables in the ANOVA table will modify the R output. In particular, some of the variance explained by poverty will now be accounted for by language. Therefore, (a) [10pts] the sum of squares for language will slightly increase. (b) [10pts] its F -statistic will also slightly increase. (c) [10pts] and its p-value will consequently decrease. Department of Mathematics and Statistics, Boston University 7