UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Similar documents
ST430 Exam 1 with Answers

Lecture 4 Multiple linear regression

Lecture 6 Multiple Linear Regression, cont.

Inference for Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Lecture 1: Linear Models and Applications

Ch 3: Multiple Linear Regression

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Stat 5102 Final Exam May 14, 2015

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

ST430 Exam 2 Solutions

14 Multiple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Simple Linear Regression

MATH 644: Regression Analysis Methods

CAS MA575 Linear Models

Density Temp vs Ratio. temp

AMS-207: Bayesian Statistics

Simple Linear Regression

Lecture 4: Regression Analysis

Coefficient of Determination

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Ch 2: Simple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Simple and Multiple Linear Regression

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Multiple Linear Regression

Lecture 2. Simple linear regression

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 18: Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Introduction to Linear Regression

Introduction and Single Predictor Regression. Correlation

MS&E 226: Small Data

Math 2311 Written Homework 6 (Sections )

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Weighted Least Squares

Simple Linear Regression

5. Linear Regression

Comparing Nested Models

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Scatter plot of data from the study. Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Linear Regression Model. Badr Missaoui

Master s Written Examination - Solution

Lectures on Simple Linear Regression Stat 431, Summer 2012

Exam Applied Statistical Regression. Good Luck!

Applied Regression Analysis

Unit 6 - Introduction to linear regression

Appendix A: Review of the General Linear Model

Problem Selected Scores

STATISTICS 110/201 PRACTICE FINAL EXAM

Linear models and their mathematical foundations: Simple linear regression

Chapter 1. Linear Regression with One Predictor Variable

STAT 350: Summer Semester Midterm 1: Solutions

Statistics 135 Fall 2008 Final Exam

STA 2101/442 Assignment 3 1

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

36-707: Regression Analysis Homework Solutions. Homework 3

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

STAT 215 Confidence and Prediction Intervals in Regression

Regression Analysis Chapter 2 Simple Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Dealing with Heteroskedasticity

Simple Linear Regression

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Weighted Least Squares

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Beam Example: Identifying Influential Observations using the Hat Matrix

Regression diagnostics

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Tests of Linear Restrictions

13 Simple Linear Regression

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

Ch 13 & 14 - Regression Analysis

Regression. Marc H. Mehlman University of New Haven

1 Least Squares Estimation - multiple regression.

Unit 6 - Simple linear regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Simple linear regression

Inferences for Regression

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

The Simple Regression Model. Part II. The Simple Regression Model

Statistics 135: Fall 2004 Final Exam

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Multiple Linear Regression

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Matrix Approach to Simple Linear Regression: An Overview

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Can you tell the relationship between students SAT scores and their college grades?

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Transcription:

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75 to pass at the PhD level 1 This question is about the relationship between a baby s age and the length of its foot Figure 1 shows data for n = 39 babies 26 length 24 22 05 10 age Figure 1: foot length (cm) versus age (yr) Linear regression output is shown below Call: lm(formula = length ~ age, data = feet) Residuals: Min 1Q Median 3Q Max -35355-06058 -00159 05841 28037 Coefficients: 1

Estimate Std Error t value Pr(> t ) (Intercept) 235970 04828 48880 <2e-16 *** age 16783 06568 2555 00149 * --- Signif codes: 0 *** 0001 ** 001 * 005 01 1 Residual standard error: 1231 on 37 degrees of freedom Multiple R-squared: 015,Adjusted R-squared: 0127 F-statistic: 6529 on 1 and 37 DF, p-value: 001485 (a) (5pts) The parameters are the intercept β 0, the slope β 1, and the error SD σ From the regression output, what are the estimates ˆβ 0, ˆβ 1, and ˆσ? (b) (5pts) The X matrix for this regression is 1 age 1 1 age 2 X = 1 age 39 Using matrix notation in terms of X and ˆσ 2, what is the estimated covariance matrix for the vector ( ˆβ 0, ˆβ 1 ) t? (c) (5pts) We re interested in estimating θ, the average footlength of babies who are 075 years old Write a simple formula for ˆθ in terms of ( ˆβ 0, ˆβ 1 ) t For these data it turns out that ˆθ 2486 (d) (5pts) Using your answers to parts (b) and (c) write a formula for the estimated SD of ˆθ For these data it turns out that the estimated SD of ˆθ, according to this formula, is about 020 (e) (5pts) Another way to estimate θ, call it ˆθ 2, is to calculate the mean footlength of the eight babies who are 075 years old It turns out ˆθ 2 2514 Assuming the SD of the sample is the same as the residual SD from the regression, show how to derive that the estimated SD of ˆθ 2 043 (f) (7pts) Suppose that we have also recorded gender of the 39 babies It s possible that the relationship between age and length is different for boys than for girls If so, then we should include a second predictor in the linear model: boy, a variable that would be 1 for a boy and 0 for a girl 2

(i) Write a linear model, using notation similar to length ~ age, but expanded to include boy, that allows for two parallel lines, one for boys and one for girls (ii) Write a linear model using similar notation that allows for two nonparallel lines, one for boys and one for girls Denote this linear model as model ( ) (g) (7pts) Consider the following statements (a) For every age, there is no difference in the mean foot length for boys and girls (b) The effect of age on the foot length does not depend on gender Translate (a) and (b) into two hypotheses on the parameters of model ( ) (h) (7pts) Let (y i, x 1i, x 2i ), i = 1,, 39, be the 39 observations on foot length, age and gender Assume that the error term is normally distributed with a mean of zero and a variance of σ 2 How would you test the hypotheses for (a) and (b) in the previous item? What are the test statistics and their null distributions? 2 (15pts) Consider the figure below The solid line shows the regression of y on x based on only the scatter of data points in the left-hand portion of the plot The dotted line shows the same regression as the first, but with the point in the upper right-hand corner included too Similarly, the dashed line shows the same regression as the first, but with the point in the lower right-hand corner included too Comment on the degree of (i) outlying-ness, (ii) leverage, and (iii) influence of the point in the upper right-hand corner Do the same for the point in the lower right-hand corner Justify your answer through appropriate description of the likely values of the statistics t i, h ii, and D i (That is, the studentized residual value, the hat-matrix entry, and Cook s distance) 3

3 Suppose that in an experimental study you suspect that many observations were tainted by a technician and now you want to test them jointly for being outliers To this end, you organize the suspected observations as the last q observations from a total of n and adopt a mean shift outlier model (MSOM) on these last observations: y 1 = x T 1 β + ɛ 1 This model can be specified in matrix form by y n q = x T n qβ + ɛ n q y n q+1 = x T n q+1β + δ 1 + ɛ n q+1 y n = x T n β + δ q + ɛ n y 1 y 2 = X 1 0 X 2 I q β δ + ɛ where E(ɛ X) = 0 and Var(ɛ X) = σ 2 I n, X = we can show that (X T X) 1 = X 1 0 X 2 I q (X T 1 X 1 ) 1 (X T 1 X 1 ) 1 X T 2, and δ = (δ 1,, δ q ) T After some algebra, X 2 (X T 1 X 1 ) 1 I q + X 2 (X T 1 X 1 ) 1 X T 2 Now consider ˆβ and ˆδ, the Least Square Estimator (LSE) for β and δ under this model, and ˆβ 1, the LSE for β when regressing only y 1 on X 1, that is, when ignoring the last q observations (a) (6pts) Show that (i) ˆβ = ˆβ 1 and (ii) ˆδ = y 2 X 2 ˆβ1, that is, the LSE for δ is the difference between the (removed) observed values and the fitted values for X 2 in the model without the last q suspected observations Hint: The general formula for the LSE of β in the regression model y = Xβ + ɛ is ˆβ = (X T X) 1 Xy (b) (6pts) Show that the last q observations are perfectly fit by the MSOM: ŷ 2 = X 2 ˆβ + ˆδ = y2 What can you say about the relation between the LSE ˆσ 2 for σ 2 under the MSOM and the LSE ˆσ 1 2 for σ 2 under the model without the last q observations? (c) (6pts) Find the hat matrix for the MSOM and comment on the leverage for the suspected data 4

points in light of the results from the previous item Hint: The hat matrix H for a general regression model y = Xβ +ɛ is the matrix such that ŷ = Hy Because ŷ = X ˆβ = X(X T X) 1 Xy, we have H = X(X T X) 1 X (d) (6pts) Conduct a joint outlier test by testing δ 1 = = δ q = 0 State the test statistic and its distribution under the null 4 (15pts) In the previous question, assume we have observations (y i, x i ), i = 1,, n and we suspect the last q = 2 observations might be tainted Write R code to perform the hypothesis test in part (d) of the question 3 5