y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Similar documents
ST430 Exam 1 with Answers

MATH 644: Regression Analysis Methods

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Unit 6 - Introduction to linear regression

Intro to Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Intro to Linear Regression

Lecture 18: Simple Linear Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

22s:152 Applied Linear Regression

We d like to know the equation of the line shown (the so called best fit or regression line).

Unit 6 - Simple linear regression

R 2 and F -Tests and ANOVA

AMS-207: Bayesian Statistics

1 The Classic Bivariate Least Squares Model

Lecture 4: Multivariate Regression, Part 2

Inference for Regression

How to mathematically model a linear relationship and make predictions.

Simple Linear Regression

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

STAT 3022 Spring 2007

Introduction and Single Predictor Regression. Correlation

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Extensions of One-Way ANOVA.

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Math 2311 Written Homework 6 (Sections )

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Correlation & Simple Regression

Ch 3: Multiple Linear Regression

ST505/S697R: Fall Homework 2 Solution.

1 Use of indicator random variables. (Chapter 8)

Regression on Faithful with Section 9.3 content

Passing-Bablok Regression for Method Comparison

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Mrs. Poyner/Mr. Page Chapter 3 page 1

Lecture 4: Multivariate Regression, Part 2

Lecture 4 Multiple linear regression

Multiple Linear Regression

Regression and the 2-Sample t

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Business Statistics. Lecture 10: Course Review

Generalised linear models. Response variable can take a number of different formats

14 Multiple Linear Regression

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Density Temp vs Ratio. temp

Biostatistics 380 Multiple Regression 1. Multiple Regression

lm statistics Chris Parrish

Ch 2: Simple Linear Regression

Applied Regression Analysis

Confidence Intervals, Testing and ANOVA Summary

Simple and Multiple Linear Regression

AMS 7 Correlation and Regression Lecture 8

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Chapter 9 - Correlation and Regression

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Extensions of One-Way ANOVA.

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

1 Multiple Regression

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Chapter 12: Linear regression II

Comparing Nested Models

Lecture 6 Multiple Linear Regression, cont.

Simple Linear Regression

Regression. Marc H. Mehlman University of New Haven

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Example: 1982 State SAT Scores (First year state by state data available)

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

L21: Chapter 12: Linear regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlated Data: Linear Mixed Models with Random Intercepts

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Logistic Regressions. Stat 430

Data Analysis Using R ASC & OIR

Beers Law Instructor s Guide David T. Harvey

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Swarthmore Honors Exam 2012: Statistics

STATS DOESN T SUCK! ~ CHAPTER 16

Logistic Regression - problem 6.14

General Linear Model (Chapter 4)

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Final Exam - Solutions

Linear Regression Model. Badr Missaoui

Interactions in Logistic Regression

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Exercise 2 SISG Association Mapping

Regression Analysis: Exploring relationships between variables. Stat 251

ANOVA (Analysis of Variance) output RLS 11/20/2016

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

STA 101 Final Review

Section 4.6 Simple Linear Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Transcription:

Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume that the noise variables ɛ i are Gaussian. 1. Show that the least squares estimators can be written as pβ 0 β 0 ` pβ 1 β 1 ` ˆ 1 n xx i x ɛ ns 2 i X ˆxi x ɛ i ns 2 X 2. Let mpxq β 0 ` β 1 x and pmpxq p β 0 ` pβ 1 x. Show that pmpxq mpxq ` ˆ 1 n ` px xqx i x ɛ ns 2 i X 3. Define δ ij to be 1 when i j and 0 otherwise. Show that the i-th residual (i.e. the residual at x x i ) can be written as e i ÿ j pδ ij 1 n px i xq x j x qɛ j ns 2 X 4. Given that ˆ Ere 2 i s σ 2 1 1 n px i xq 2 ns 2 X show that Erpσ 2 s n 2 n σ2 where pσ 2 p1{nq ř n e2 i. And explain why the estimator s 2 n n 2 pσ2 is most commonly used, rather than pσ 2 p1{nq ř n e2 i. 1

Question 2 In a study of the impact of poverty on the standardized test scores students in a US school, data on 22 schools were collected. For each school the data comprised an average standardized math. score MATH, an average standardized reading score READING, and a measure of poverty (POVERTY, percentage of households below the national poverty line). The data were analyzed in R and produced the following output: 1 > head ( Fcat ) # List first six elements 2 MATH READING POVERTY 3 1 166.4 165.0 91.7 4 2 159.6 157.2 90.2 5 3 159.1 164.4 86.0 6 4 155.5 162.4 83.9 7 5 164.3 162.5 80.4 8 6 169.8 164.9 76.5 9.... 10.... 11 > summary (lm( MATH POVERTY, data = Fcat )) 12 Coefficients : 13 Estimate Std. Error t value Pr ( > t ) 14 ( Intercept ) 189.81582 3.02148 62.822 < 2e -16 *** 15 POVERTY -0.30544 0.04759-6.418 2.93 e -06 *** 16 --- 17 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 18 19 Residual standard error : 5.366 on 20 degrees of freedom 20 Multiple R- squared : 0.6731, Adjusted R- squared : 0.6568 21 F- statistic : 41.19 on 1 and 20 DF, p- value : 2.927e -06 (listing continued on the next page) 2

22 > summary (lm( READING POVERTY, data = Fcat )) 23 Coefficients : 24 Estimate Std. Error t value Pr ( > t ) 25 ( Intercept ) 187.01262 1.92762 97.017 < 2e -16 *** 26 POVERTY -0.27081 0.03036-8.919 2.09 e -08 *** 27 --- 28 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 29 30 Residual standard error : 3.423 on 20 degrees of freedom 31 Multiple R- squared : 0.7991, Adjusted R- squared : 0.789 32 F- statistic : 79.55 on 1 and 20 DF, p- value : 2.088e -08 33 34 > summary (lm( MATH READING, data = Fcat )) 35 Coefficients : 36 Estimate Std. Error t value Pr ( > t ) 37 ( Intercept ) -21.152 18.663-1.133 0.27 38 READING 1.128 0.109 10.352 1.76 e -09 *** 39 --- 40 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 41 42 Residual standard error : 3.722 on 20 degrees of freedom 43 Multiple R- squared : 0.8427, Adjusted R- squared : 0.8349 44 F- statistic : 107.2 on 1 and 20 DF, p- value : 1.763e -09 1. Three separate analyses have been carried out. Summarize the results of the analyses in terms of the regression relationships between the variables in each analysis, assuming that all the simple linear regression model assumptions are correct. 2. If for MATH, s 2 X 1 n ř n px i xq 2 83.88, from the output, compute the sample correlation coefficient for the relationship between POVERTY and MATH READING and MATH Question 3 The data in the plot below record the results of a survey of 147 US University students who were asked to select the height (in inches) of their ideal spouse or life 3

partner. The specified value is recorded as IdealHt. Then the students own height was recorded as Height, as well as their gender, recorded in the study using a binary indicator (Gender) taking the values F -- recorded female -- and M -- recorded male. The data were stored in the R data frame idh and analyzed. Ideal height (in) 55 60 65 70 75 80 Female Male 55 60 65 70 75 80 Height (in) 45 > summary (lm( IdealHt Height, data = idh )) 46 Coefficients : 47 Estimate Std. Error t value Pr ( > t ) 48 ( Intercept ) 86.01770 4.78629 XXXXXX < 2e -16 *** 49 Height -0.26046 0.07035-3.702 0.000303 *** 50 --- 51 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 52 4

53 Residual standard error : 3.643 on XXX degrees of freedom 54 Multiple R- squared : 0.08636, Adjusted R- squared : 0.08006 55 F- statistic : 13.71 on 1 and 145 DF, p- value : 0.0003031 56 57 58 59 60 > summary (lm( IdealHt Height, subset =( Gender =='F '), data = idh )) 61 # Females only 62 Coefficients : 63 Estimate Std. Error t value Pr ( > t ) 64 ( Intercept ) 39.30353 6.20615 6.333 2.00 e -08 *** 65 Height 0.49261 0.09602 5.130 2.47 e -06 *** 66 --- 67 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 68 69 Residual standard error : 2.322 on 70 degrees of freedom 70 Multiple R- squared : 0.2732, Adjusted R- squared : 0.2629 71 F- statistic : 26.32 on 1 and 70 DF, p- value : 2.474e -06 72 73 74 75 76 > summary (lm( IdealHt Height, subset =( Gender =='M '), data = idh )) 77 # Males only 78 Coefficients : 79 Estimate Std. Error t value Pr ( > t ) 80 ( Intercept ) 23.27364 6.33336 3.675 0.000451 *** 81 Height 0.59630 0.08902 6.698 XXXXXXXX *** 82 --- 83 Signif. codes : 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 84 85 Residual standard error : 2.057 on 73 degrees of freedom 86 Multiple R- squared : 0.3807, Adjusted R- squared : 0.3722 87 F- statistic : 44.87 on 1 and 73 DF, p- value : 3.758e -09 88 89 90 5

91 92 > confint (lm( IdealHt Height, subset =( Gender =='F '), data = idh )) # Compute CIs 93 2.5 % 97.5 % 94 ( Intercept ) 26.9257619 51.681305 95 Height 0.3010997 0.684121 96 > confint (lm( IdealHt Height, subset =( Gender =='M '), data = idh )) # Compute CIs 97 2.5 % 97.5 % 98 ( Intercept ) 10.6512597 35.8960136 99 Height 0.4188794 0.7737227 Note that the R command subset=(gender== M ) means perform the analysis for those data for which Gender takes the value M (that is, analyze the males separately). 1. How many males were in the study? 2. There is an entry XXXXXX on line 48 - what is the correct numerical value to enter in that part of that coefficients table? 3. What is the value XXX on line 53? 6

4. What is the value of the sum of squares for the residuals, ř n e2 i, in the analysis of the Females only subset? What is the value of pσ 2. 5. What is the missing p-value marked XXXXXXXX on line 81? 6. Explain how the results on lines 49, 65 and 81 should be interpreted. 7. For the dataset subset=(gender== M ), based on the estimated coefficients, can you give an estimate of ErY X 0s? If yes, what is it (and show your work); if not, explain why not. 7

8. Note that the confint results reported between lines 92 and 99 report the 95% confidence intervals for the slope and intercept parameters for the relevant data subsets. On the basis of these results, what can an analyst conclude about the relationship between height of subject and selected ideal height of partner in females and males? Justify your answer. 9. For the dataset subset=(gender== M ), give a 95% confidence interval for β 1, (assuming all the model assumptions hold), using one of the following results, qt(0.975,df=70)=1.994437 qt(0.975,df=73)=1.992997 qt(0.975,df=145)=1.97646 10. On the basis of these results, predict (showing working) the selected ideal height of partner for a female of height 62 inches, a male of height 70 inches. 8