MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

Similar documents
MS&E 226: Small Data

MS&E 226: Small Data

STAT 462-Computational Data Analysis

Multiple Linear Regression

Linear Methods for Regression. Lijun Zhang

MS&E 226: Small Data

High-dimensional regression

MS&E 226: Small Data

STATS DOESN T SUCK! ~ CHAPTER 16

1 The Classic Bivariate Least Squares Model

ECON The Simple Regression Model

Lecture 14: Shrinkage

STA 302f16 Assignment Five 1

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Weighted Least Squares

Applied Econometrics (QEM)

Data Mining Stat 588

Weighted Least Squares

Simple Linear Regression

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

MS&E 226: Small Data

ISyE 691 Data mining and analytics

The Simple Linear Regression Model

STAT5044: Regression and Anova. Inyoung Kim

Correlation and Regression

Lecture 6: Linear Regression (continued)

Linear Models in Machine Learning

The prediction of house price

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

LECTURE 15: SIMPLE LINEAR REGRESSION I

4 Bias-Variance for Ridge Regression (24 points)

Linear regression methods

Ch 2: Simple Linear Regression

Final Exam - Solutions

ECON3150/4150 Spring 2016

Lecture 14 Simple Linear Regression

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Linear Regression. Chapter 3

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Linear Model Selection and Regularization

Lecture 5: Clustering, Linear Regression

Lecture 13: Simple Linear Regression in Matrix Format

Association studies and regression

Inferences for Regression

14 Multiple Linear Regression

Categorical Predictor Variables

Multiple Linear Regression CIVL 7012/8012

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

COMS 4771 Regression. Nakul Verma

Lasso, Ridge, and Elastic Net

Statistics 910, #5 1. Regression Methods

Section 3: Simple Linear Regression

Math 3330: Solution to midterm Exam

Model Selection. Frank Wood. December 10, 2009

Lecture 5: Clustering, Linear Regression

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Minwise hashing for large-scale regression and classification with sparse data

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Lecture 5: Clustering, Linear Regression

II. Linear Models (pp.47-70)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Lecture 6 Multiple Linear Regression, cont.

Week 3: Simple Linear Regression

Machine Learning for OR & FE

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Lecture 6: Linear Regression

Section 4: Multiple Linear Regression

Properties of the least squares estimates

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

A Modern Look at Classical Multivariate Techniques

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Linear model selection and regularization

ECON3150/4150 Spring 2015

Final Exam - Solutions

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Simple Linear Regression Using Ordinary Least Squares

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Proteomics and Variable Selection

9. Least squares data fitting

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Two-Way Factorial Designs

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

MS&E 226: Small Data

download instant at

Day 4: Shrinkage Estimators

Simple and Multiple Linear Regression

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Masters Comprehensive Examination Department of Statistics, University of Florida

Multivariate Regression (Chapter 10)

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Regression, Ridge Regression, Lasso

Applied Regression Analysis. Section 2: Multiple Linear Regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Simple Linear Regression

Applied Statistics and Econometrics

Transcription:

MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates X (assume all numeric covariates are numeric). She shares her results with Bob. Bob wants to replicate the results, and also uses ordinary least squares to fit a linear regression model, but does so after standardizing each column of data (the outcome as well as all covariates). When they compare the sum of squared residuals, they notice that they are wildly different. This catches Alice and Bob by surprise, because they were taught that standardizing doesn t change anything for linear regression. Why was the sum of squared residuals so different in their respective fitted models? (a) Because the intercept is not scaled. (b) Because the outcome is measured on a different scale. (c) Because they should have compared the square root of the sum of squared residuals, instead of just the sum of squared residuals. (d) One of them must have made a coding mistake, because the sum of squared residuals should have been the same. (b) When outcomes are not measured in the same units, we cannot compare the sum of squared residuals directly. PROBLEM 2. Suppose we have data with covariates X and outcome Y, and we build a linear regression model of Y against the covariates X. Let A be the resulting R 2 value. Now suppose we add new covariates to X. However, assume these covariates are just random noise (e.g., they might be i.i.d. N (0, 1) random variables), without any relationship to X or Y. We now build another linear regression model using all the original and new covariates, and compute the resulting R 2 value; let this be B. What can you say about how A and B are related to each other? (a) A B. (b) A = B. (c) A B. 1

(c) R 2 always increases when we add new covariates. PROBLEM 3. You are given data with covariates X and outcome Y, and fit three different models: one by ordinary least squares (OLS), one by ridge regression with λ > 0, and one by lasso with λ > 0. How does the sum of squared residuals compare across these methods? (a) The sum of squared residuals is smallest for OLS. (b) The sum of squared residuals is smallest for ridge regression. (c) The sum of squared residuals is smallest for lasso. (a) OLS directly minimizes the SSE while ridge and lasso also take into account the penalty term, so OLS must have lowest SSE. PROBLEM 4. Suppose we are given data with covariates X and outcome Y, and fit a linear regression model by ordinary least squares; let ˆβ be the resulting coefficient vector. We send the data to a friend, so that he can also analyze the data. By mistake, before running his analysis, our friend duplicates a few (but not all) rows of the data. He then computes the ordinary least squares solution, and finds a vector of coefficients β. Are ˆβ and β equal? (a) Yes, they are always equal. (b) They are equal, but only if the data was centered. (c) They are equal, but only if the data contained more rows of data than covariates. (d) In general they are not equal. (d) By duplicating rows, the coefficients will generally change. This can actually be seen as a form of weighted linear regression, the duplicated rows are considered more important, as they will appear multiple times in the sum of squared residuals that OLS minimizes. PROBLEM 5. Suppose you are given n data points (X i, Y i ), i = 1,..., n. You fit a simple linear regression model of Y on X. Suppose the resulting regression line is y = ˆβ 0 + ˆβ 1 x. 2

You also fit a simple linear regression model of X against Y. Suppose the resulting regression line is x = β 0 + β 1 y. Which of the following are true? (a) The intercepts are equal: ˆβ 0 = β 0. (b) The slopes are inverses of each other: ˆβ 1 = 1/ β 1. (c) Both (a) and (b). (d) Neither (a) nor (b). (d) We note that the coefficient ˆβ X from the regression of X on Y is r xy s y s x. From this, we see that (b) is false. For β 0 we have ˆβ 0 = ȳ ˆβ X x. This we cannot invert. PROBLEM 6. You have a dataset consisting of heights (measured in inches) and weights (measured in pounds) of n individuals. You fit a linear regression model of log(weight) on height by ordinary least squares, and find the following fitted model: log(weight) = -2.5 + 0.02 * height What is the meaning of the coefficient on height? (a) A 1% increase in height will cause a 2% increase in weight. (b) A 1% increase in height will cause a 0.02 pound increase in weight. (c) A 1 inch increase in height will cause a 0.02 pound increase in weight. (d) A 1 inch increase in height will cause a 2% increase in weight. (d) w = e 2.5+0.02(h+1) = e 2.5+0.02h e 0.02 1.02 e 2.5+0.02h. PROBLEM 7. Consider the kidiq dataset we have seen in class. The first few rows look like: kid_score mom_hs mom_iq 1 65 1 121.11753 2 98 1 89.36188 3 85 1 115.44316 4 83 1 99.44964 5 115 1 92.74571 6 98 0 107.90184... 3

We fit two different regression models. First, we fit the following model using all the data: kid_score 1 + mom_hs + mom_iq + mom_hs:mom_iq Let A be the coefficient on mom_iq in the resulting model. Next, we keep only those rows of the data where mom_hs is zero, and we fit the following model using only this data: kid_score 1 + mom_iq Let B be the coefficient on mom_iq in the resulting model. How do A and B compare to each other? (a) A > B. (b) B > A. (c) A = B. (c) In both models, the coefficient mom_iq measures the effect of mom_iq for moms for which mom_iq = 0, due to the interaction term in the full model. Hence, the coefficients are the same. PROBLEM 8. Suppose you are given n data points (X i, Y i ), i = 1,..., n. You fit a regression model of Y i on ˆβ 0 + ˆβ 1 X i by ordinary least squares. Let ˆβ be the resulting vector of coefficients. Define the respective sample means as follows: Y = 1 n Which of the following is true? (a) Y = ˆβ 0. (b) X = ˆβ 0. (c) Y = ˆβ 0 + ˆβ 1 X. (d) None of the above. n Y i ; X = 1 n i=1 n X i.. i=1 (c) The regression line always goes through the point of means: (X, Y ). In particular, we have that Y = ˆβ 0 + ˆβ 1 X. 4

PROBLEM 9. You are given a dataset that you split into a training set A and test set B. You train a linear regression model (call this Model 1 ) on the training set A, and then compute its mean squared prediction error E 1 on the test set B. After you do so, inspection of the results suggests that you might have been better off including an interaction term in the original regression; so you go back and train a new model (call this Model 2 ) on the training set A with this interaction term added, and test it again on your test set B. Let E 2 be the resulting mean squared prediction error. Tomorrow a colleague is going to give you a new test set C, coming from the same data generating process as your original data. Which of the following are true in general? (a) E 1 is unbiased as an estimate of the prediction error of Model 1 on test set C. (b) E 2 is unbiased as an estimate of the prediction error of Model 2 on test set C. (c) Both (a) and (b). (d) Neither (a) or (b). (a) E 1 is an unbiased estimate, as we used the test set B for the first time to test Model 1, and we obtained E 1. E 2 is not unbiased because Model 2 is fitted based on information derived from test set B. Therefore, in general, E 2 is an underestimate of the prediction error of Model 2 on the test set C. PROBLEM 10. Suppose you fit two linear regression models, Model 1 and Model 2, using the same data (and in particular the same outcome variable), but different subsets of the available covariates. Each model is fit using ordinary least squares. Model 1 has a lower C p score than Model 2, and a lower R 2 than Model 2. How does the number of covariates compare across the models? (a) Model 1 uses a smaller number of covariates than Model 2. (b) Model 1 uses a larger number of covariates than Model 2. (c) Model 1 uses the same number of covariates as Model 2. (a) Because the R 2 is lower, this means the SSE of Model 1 must be larger than that of Model 2. For Model 1 to have a lower C p score, it must then be the case that it has fewer covariates (as the number of data points n and the sample standard deviation of the residuals on the full fitted model, ˆσ 2, are common to both models). 5

PROBLEM 11. I generate training data as follows: For i = 1,..., 1000, X i are i.i.d. N (0, 1) random variables; and for i = 1,..., 1000, Y i = 1 + X i + X 2 i + X 3 i + X 4 i + ε i, where the ε i are i.i.d. N (0, 2) random variables. You take the training data X and Y, and produce a predictive model that always predicts the sample mean of the Y, i.e., for any new X, ˆf(X) = Y. Which of the following is true of this predictive model? (a) It has no bias. (b) It has low variance. (c) It has no variance. (d) None of the above. (b) It is clear that the model has low variance: the mean value Y is the average of 1000 points, so it won t change much from one training set to another. However, it does have some variance: a different training set leads to different predictions. PROBLEM 12. Last week, a friend of mine gave me a dataset with outcomes Y and design matrix X (with an intercept column). In addition, he gave me the coefficients ˆβ he claimed to have computed by ordinary least squares. (The columns of X were linearly independent.) However, after a quick check, I concluded that my friend had made a mistake in his calculation of ˆβ. Which one of the following could be the reason? (a) The R 2 of the fit was close to 1. (b) The data was not centered, but the intercept was zero. (c) The residuals did not add up to zero. (d) One of the coefficients ˆβ j was zero. (c) As the design matrix includes an intercept, we know that the OLS solution is such that the sum of residuals is zero (the vector of residuals is orthogonal to every column of X, in particular, to the vector of ones). PROBLEM 13. A dataset X, Y with n rows and p covariates is generated according to a linear model Y = Xβ + ε, where ε is i.i.d. N (0, σ 2 ). Following what you learned in MS&E 226, you fit the OLS solution and obtain ˆβ. In addition, it is your lucky day. Your favorite fortune-teller happens to be around, and she tells you the value of the true β. Now you are given a test set X, Ỹ with m rows. By using your models wisely, what s the mean squared prediction error you expect to obtain? 6

(a) σ 2 (1 + p/n). (b) σ 2 (1 + p/m). (c) σ 2. (d) Zero. (c) The wise thing to do is to use β for prediction. Still, we don t get perfect predictions, due to the noise in the population model. In this case, our noise ε has variance σ 2, and that s exactly the mean squared prediction error we expect, also known as irreducible error. 7