MATH 644: Regression Analysis Methods

Similar documents
Stat 5102 Final Exam May 14, 2015

Applied Regression Analysis

STATISTICS 110/201 PRACTICE FINAL EXAM

Inference for Regression

ST430 Exam 2 Solutions

Ch 2: Simple Linear Regression

ST430 Exam 1 with Answers

Math 3330: Solution to midterm Exam

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Ch 3: Multiple Linear Regression

Inferences for Regression

Concordia University (5+5)Q 1.

MODELS WITHOUT AN INTERCEPT

STAT 350: Summer Semester Midterm 1: Solutions

Lecture 6 Multiple Linear Regression, cont.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Tests of Linear Restrictions

MS&E 226: Small Data

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Multiple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Correlation Analysis

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Simple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

SCHOOL OF MATHEMATICS AND STATISTICS

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Chapter 14 Simple Linear Regression (A)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Biostatistics 380 Multiple Regression 1. Multiple Regression

14 Multiple Linear Regression

Coefficient of Determination

Variance Decomposition and Goodness of Fit

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Linear Regression Model. Badr Missaoui

SCHOOL OF MATHEMATICS AND STATISTICS

Density Temp vs Ratio. temp

Comparing Nested Models

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

CAS MA575 Linear Models

Lecture 10 Multiple Linear Regression

Exam Applied Statistical Regression. Good Luck!

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Lecture 18: Simple Linear Regression

Multiple Linear Regression

Regression and the 2-Sample t

Chapter 4: Regression Models

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Categorical Predictor Variables

Formal Statement of Simple Linear Regression Model

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

STA 4210 Practise set 2a

STAT 525 Fall Final exam. Tuesday December 14, 2010

Lecture 4 Multiple linear regression

Homework 2: Simple Linear Regression

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Swarthmore Honors Exam 2012: Statistics

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Stat 401B Exam 2 Fall 2015

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Chapter 12: Linear regression II

BNAD 276 Lecture 10 Simple Linear Regression Model

Simple Linear Regression

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

ST505/S697R: Fall Homework 2 Solution.

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

WISE International Masters

Chapter 2 Multiple Regression (Part 4)

Linear Model Specification in R

Statistics & Data Sciences: First Year Prelim Exam May 2018

Lec 1: An Introduction to ANOVA

Business 320, Fall 1999, Final

1 Multiple Regression

1 Use of indicator random variables. (Chapter 8)

First Year Examination Department of Statistics, University of Florida

6. Multiple Linear Regression

Chapter 12: Multiple Linear Regression

Basic Business Statistics 6 th Edition

UNIVERSITY OF TORONTO Faculty of Arts and Science

School of Mathematical Sciences. Question 1

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Introduction and Single Predictor Regression. Correlation

36-707: Regression Analysis Homework Solutions. Homework 3

Measuring the fit of the model - SSR

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Lecture 5: Clustering, Linear Regression

Transcription:

MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100 marks. 3. This is an open-book and open-note test; you can use any materials you have. 4. Write your name on the front of your answer booklet and on any additional sheets you write on. 1

1. True/False. Please read each statement and put T(True)/F(False) in the beginning. Notice: the standard least squares estimator is applied whenever needed in the statements. Define the standard multiple linear regression model as follows: Y i = β 0 + β 1 X i1 + + β p X ip + ε i, where ε i i.i.d. N(0, σ 2 ). (a) related. A coefficient of determination zero indicates that X and Y are not (b) (c) (d) (e) A high coefficient of determination indicates that the estimated regression line is a good fit. If two multiple linear regression models have the same mean squared error (MSE), we prefer the model with less variables. For any F-test associated with the multiple linear regression model, we can find an equivalent t-test. In a standard multiple linear regression model, the variance of the prediction becomes larger as X j deviates from the sample mean X j. (f) (g) In a standard multiple linear regression model, define the residuals to be e i = Y i Ŷi, we have n i=1 e ix ij = 0 for all j = 1,..., p 1. In a standard multiple linear regression model, the prediction for a new observation with predictors X (new) = ( X 1, X 2,..., X p ) is Ȳ = n 1 n i=1 Y i, where X j = n 1 n i=1 X ji, j = 1,..., p is the sample mean. 2. Yes/No. Suppose you have four possible predictor variables X 1, X 2, X 3, and X 4 that could be used in a regression analysis. You run a forward selection procedure, and the variables are entered as follows: Step 1: X 2 Step 2: X 4 Step 3: X 1 Step 4: X 3 In other words, after Step 1, the model is E{Y } = β 0 + β 1 X 2. After Step 2, the model is E{Y } = β 0 + β 1 X 2 + β 2 X 4. 2

And so on. You also run an all subsets regression analysis using R 2 as the criterion for the best model for each possible number of predictors. Would the same models result from this analysis as from the forward selection procedure? In other words, would all subsets regression definitely identify the following as the best models for 1, 2, 3, and 4 variables? Choose Yes or No in each case. (a) β 0 + 1 variable, the best model would be E{Y } = β 0 + β 1 X 2. (b) β 0 + 2 variables, the best model would be E{Y } = β 0 + β 1 X 2 + β 2 X 4. (c) β 0 + 3 variables, the best model would be E{Y } = β 0 + β 1 X 2 + β 2 X 4 + β 3 X 1. (d) β 0 + 4 variables, the best model would be E{Y } = β 0 + β 1 X 2 + β 2 X 4 + β 3 X 1 + β 4 X 3. 3. Given data pairs (X i, Y i ), where i = 1,..., n. We fit the simple linear regression Y i = β 0 + β 1 X i + ε i. Suppose, in addition, ε i. are independent, normally distributed with mean 0 and variance σ 2. For each of the following three scenarios, how are b 0, b 1, σ 2, R 2 and the t-test of H 0 : β 1 = 0 v.s. H a : β 1 0 affected? Please answer accordingly and make necessary explanations. (a) X i is replaced by 2X i and Y i remains the same. (b) Y i is replaced by 2Y i and X i remains the same. (c) X i is replaced by 2X i and Y i is replaced by 2Y i. 3

(Extra Space for Answers) 4

4. Suppose we have the following two multiple linear regression models: Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i (1) and Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + ε i, (2) where ε i i.i.d. N(0, σ 2 ). We first perform the analysis for model (1) in R: > fit12 = lm(y ~ X1 + X2) > summary(fit12) Call: lm(formula = Y ~ X1 + X2) Residuals: Min 1Q Median 3Q Max -1.72610-0.71385 0.03204 0.62244 3.04545 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.002956 0.094429-0.031 0.975 X1 2.171693 0.108222 20.067 <2e-16 X2 2.949736 0.098936 29.814 <2e-16 Residual standard error: 0.9428 on 97 degrees of freedom Multiple R-squared: 0.938, Adjusted R-squared:??? F-statistic: 733.5 on 2 and 97 DF, p-value: < 2.2e-16 (a) Calculate the adjusted R-square value from the output. (b) Calculate the SSR (Regression Sum of Squares) from the output. (c) Perform the hypothesis test, H 0 : β 1 = β 2 = 0 v.s. H 1 : not both β 1 and β 2 equal zero. Write down the test method and calculate the test statistic. 5

Now, we perform the analysis for model (2) in R: > fit = lm(y ~ X1 + X2 + X3) > summary(fit) Call: lm(formula = Y ~ X1 + X2 + X3) Residuals: Min 1Q Median 3Q Max -1.72110-0.71459 0.02617 0.62992 3.04839 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.002549 0.095098-0.027 0.979 X1 2.184818 0.217837 10.030 <2e-16 X2 2.968544 0.288143 10.302 <2e-16 X3-0.063097 0.907274-0.070 0.945 Residual standard error: 0.9476 on 96 degrees of freedom Multiple R-squared: 0.938, Adjusted R-squared: 0.936 F-statistic: 484 on 3 and 96 DF, p-value: < 2.2e-16 (d) Calculate the Extra Sum of Squares SSR(X 3 X 1, X 2 ) and the coefficient of partial correlation RY 2 3 12. (e) Compare model (1) and model (2), which one do you prefer and explain the reasons. 6

(Extra Space for Answers) 7

5. An analyst decided to fit the multiple regression model Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + β 4 X i1 X i2 + β 5 X i1 X i3 + β 6 X i2 X i3 + ε i, where ε i N(0, σ 2 ), i = 1,..., 20. To reduce correlation between the covariates in this model, the centered variables x i1 = X i1 X 1 = X i1 25.305, x i2 = X i2 X 2 = X i2 51.170, and x i3 = X i3 X 3 = X i3 27.620 are used. The fitted regression equation is given by Ŷ = 20.53 + 3.43x 1 2.095x 2 1.616x 3 + 0.00888x 1 x 2 0.08479x 1 x 3 + 0.09042x 2 x 3, MSE = 6.745, where the true model is Y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i1 x i2 + β 5 x i1 x i3 + β 6 x i2 x i3 + ε i. One would like to test whether the interaction terms between the three predictor variables should be included in the regression model. Use the above information and the following Table to conduct a F-test at 5% significance level. Clearly state the null and alternate hypotheses, test statistic, decision rule and the conclusion. Variable Extra Sum of Squares Value x 1 SSR(x 1 ) = 352.270 x 2 SSR(x 2 x 1 ) = 33.169 x 3 SSR(x 3 x 1, x 2 ) = 11.546 x 1 x 2 SSR(x 1 x 2 x 1, x 2, x 3 ) = 1.496 x 1 x 3 SSR(x 1 x 3 x 1, x 2, x 3, x 1 x 2 ) = 2.704 x 2 x 3 SSR(x 2 x 3 x 1, x 2, x 3, x 1 x 2, x 1 x 3 ) = 6.515 F(0.975, 3, 13) = 4.3472, F(0.95, 3, 13) = 3.4105, F(0.975, 7, 19) = 3.0509, F(0.95, 7, 19) = 2.5435, F(0.95, 4, 13) = 3.1791, F(0.975, 4, 13) = 3.9959, F(0.95, 4, 19) = 2.8951, F(0.975, 4, 19) = 3.5587, F(0.975, 3, 19) = 3.9034, F(0.95, 3, 19) = 3.1274. 8

(Extra Space for Answers) 9

6. Suppose we have the following two multiple linear regression models: Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + ε i (3) and Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + β p X i,p + ε i, (4) where ε i i.i.d. N(0, σ 2 ). (a) Denote the R 2 (the coefficient of multiple determination) for the two models (3) and (4) as R 2 (3) and R 2 (4). Is it true that R 2 (3) R 2 (4) always holds? If yes, prove it. If not, give a counter example. (If you are providing a counter example, please write down the design matrix X and the response vector Y explicitly. The reasoning of R 2 (3) > R 2 (4) is required.) (b) Denote the Ra 2 (the adjusted coefficient of multiple determination) for the two models as Ra(3) 2 and Ra(4). 2 Is it true that Ra(3) 2 Ra(4) 2 always holds? If yes, prove it. If not, give a counter example. (If you are providing a counter example, please write down the design matrix X and the response vector Y explicitly. The reasoning of Ra(3) 2 > Ra(4) 2 is required.) 10

(Extra Space for Answers) 11