(X i X)(Y i Y ) = 1 n

Similar documents
1 Inferential Methods for Correlation and Regression Analysis

Properties and Hypothesis Testing

11 Correlation and Regression

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Statistical Properties of OLS estimators

Stat 139 Homework 7 Solutions, Fall 2015

Estimation for Complete Data

Chapter 6 Sampling Distributions

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Linear Regression Demystified

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Problem Set 4 Due Oct, 12

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

The standard deviation of the mean

Correlation Regression

Machine Learning Brett Bernstein

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Topic 9: Sampling Distributions of Estimators

Random Variables, Sampling and Estimation

Statistics 511 Additional Materials

6.3 Testing Series With Positive Terms

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

This is an introductory course in Analysis of Variance and Design of Experiments.

Linear Regression Models

Frequentist Inference

Understanding Samples

The Random Walk For Dummies

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

6 Sample Size Calculations

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

ECON 3150/4150, Spring term Lecture 3

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Stat 421-SP2012 Interval Estimation Section

Lesson 11: Simple Linear Regression

Final Examination Solutions 17/6/2010

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

A statistical method to determine sample size to estimate characteristic value of soil parameters

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Expectation and Variance of a random variable

Algebra of Least Squares

Statistics 20: Final Exam Solutions Summer Session 2007

Section 14. Simple linear regression.

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Power and Type II Error

GG313 GEOLOGICAL DATA ANALYSIS

Statisticians use the word population to refer the total number of (potential) observations under consideration

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Asymptotic Results for the Linear Regression Model

Math 140 Introductory Statistics

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

1 Introduction to reducing variance in Monte Carlo simulations

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Infinite Sequences and Series

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Common Large/Small Sample Tests 1/55

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Topic 9: Sampling Distributions of Estimators

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

Ismor Fischer, 1/11/

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Mathematical Notation Math Introduction to Applied Statistics

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 6 Principles of Data Reduction

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Regression and Correlation

Module 1 Fundamentals in statistics

Polynomial Functions and Their Graphs

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

(all terms are scalars).the minimization is clearer in sum notation:

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Matrix Representation of Data in Experiment

5. Likelihood Ratio Tests

Economics Spring 2015

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Describing the Relation between Two Variables

Transcription:

L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may cases we would like to take this a step further ad try to use iformatio from oe variable to make predictios about the outcome of the other. For istace... 10.1 sample covariace ad correlatio We have so far cosidered summarizig a set of observatios where oe measuremet is made o each idividal or uit, but ofte i real-life radom experimets we make multiple measuremets o each idividual. For example, durig a health check-up a doctor might record the height, weight, age, sex, pulse rate, ad blood pressure. Just as we did for sigle measuremets, we ca represet the observed data by their empirical distributio, which is ow a fuctio of multiple argumets. For example, if we measure two radom variables (X i, Y i ) for the ith idividual (say weight ad blood pressure), the the empirical distributio fuctio is give by f(t, s) 1 #{X t, Y s}. We ca ow use this to estimate populatio features by the correspodig feature of the empirical distributio. For example, the populatio covariace Cov[X, Y ] E[(X E[X])(Y E[Y ])] E[XY ] E[X]E[Y ] gives a measure of how X ad Y relate to each other. The sample versio of this is the sample covariace S XY 1 1 (X i X)(Y i Y ) 1 i1 X i Y i X Y. (10.1.1) The sample correlatio coefficiet is defied similarly to populatio correlatio coefficiet ρ[x, Y ] as i1 r[x, Y ] S XY S X S Y, (10.1.2) where S X ad S Y are the sample stadard deviatios of X ad Y respectively. As with ρ[x, Y ], r[x, Y ] is bouded betwee 1 ad 1, ad is ivariat to scale ad locatio trasformatios, that is, for real umbers a, b, c, d, r[ax + b, cy + d] r[x, Y ] 10.2 simple liear model We will assume that the variable Y depeds o X i a liear fashio, but that it is also affected by radom factors. Specifically we will assume there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j α + βx j + ɛ j, (10.2.1) for j 1, 2,..., ad where each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). Equatio (10.2.1) is referred to as the simple liear model. I particular ɛ j are the (radom) vertical distace of the poit (X j, Y j ) from the regressio lie. For all results below we assume σ 2 > 0 is the variace of the errors, assumed to be the same for every data poit. We also assume that ot all of the X j quatities are the same so that the variace of these quatities is o-zero. I particular this meas 2. 241 Versio: April 25, 2016

242 liear regressio 10.3 the least squares lie The values of (X 1, Y 1 ),..., (X, Y ) are collected data. Though we assume that this data is produced via the simple liear model, we typically do ot kow the actual values of the slope β or the y-itercept α. The goal of this sectio is to illustrate a way to estimate these values from the data. For a lie y a + bx the residual of a data poit (X j, Y j ) is defied to be the quatity Y j (a + bx j ). This is the differece betwee the actual y-value of the data poit ad the locatio where the lie predicts the y-value should be. I other words, it may be viewed as the error of the lie whe attemptig to predict the y-value correspodig to the X j data poit. Amog all possible lies through the data, there is oe which miimizes the sum of these squared residual errors. This is called the least squares lie. Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be poits o the plae. Suppose we wish to fid a lie that miimises the sum of squared residual errors. That is, let g : R 2 R be defied as g(a, b) [Y j (a + bx j )] 2. The objective is to miimize g. So usig calculus, From equatio (10.3.1) we have 0 [Y j a bx j ] 0 g a 2 ad 0 g b 2 Y j [Y j a bx j ] (10.3.1) X j [Y j a bx j ]. (10.3.2) a b X j Y a bx (Y (a + bx)) Therefore 1 Y a + bx, (10.3.3) which shows that the poit (X, Y ) must lie o the least squares lie. The poit (X, Y ) is kow as the poit of averages. Similarly from equatio (10.3.2), so that 0 X j [Y j a bx j ] (X j Y j ax j bxj 2 ) X j Y j ax + b X j Y j ax + b Xj 2. (10.3.4) We ow use the system of two equatios (give by (10.3.3) ad (10.3.4)) solve for a, b to get b ( X j Y j ) X Y ( (10.3.5) Xj 2) X2 X 2 j (10.3.6) 1 We shall use the otatio X, Y, S X, S Y, r[x, Y ](below), eve though they are ot ecessarily radom quatities. This is to simplify otatio ad will allow us to use kow properties, i the evet they are radom. Versio: April 25, 2016

10.3 the least squares lie 243 Recall that the sample variace of X 1, X 2,..., X is S 2 X 1 1 [ (X j X) 2 ] 1 1 [ Xj 2 2X j X + X 2 ] 1 1 [( Xj 2 ) 2X( X j ) + X 2 ] 1 1 [( Xj 2 ) 2X 2 + X 2 ] 1 1 [( Xj 2 ) X 2 ] Therefore, the deomiator of (10.3.5) is simply 2. The umerator may be writte more simply by usig the otatio of sample covairace ad correlatio defied i (10.1.1) ad (10.1.2). So from (10.3.5) we have b ( X j Y j ) X Y ( Xj 2) X2 ( 1)S XY ( 1)S 2 X r[x, Y ]S Y S X Usig the above ad (10.3.3), we also ow ca write a ice formula for a, which is a Y r[x, Y ]S Y S X X (10.3.7) By the above calcuatio we have show that the least squares lie miimizig the sum of the squared residual errors is the lie passig through the poit of averages (X, Y ) ad havig a slope equal to b r[x, Y ]S Y S X. We state this precisely i the Theorem below. Theorem 10.3.1. Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be give data poits. The the least squares lie passes through (X, Y ) ad has slope give by r[x, Y ]S Y S X. We illustrate the use of these formulas with two examples give below. Example 10.3.2. Cosider the followig five data poits: X Y 3 6 4 5 5 6 6 4 7 2 These poits are ot coliear, but suppose we wish to fid a lie that most closely approximates their tred i the least squares sese described above. Viewig these as samples, it is routie to calculate that the formulas above yield a 9.1 ad b 0.9. Of all of the lies i the plae, the oe that miimizes the sum of squared residual errors for the data set above is the lie y 9.1 0.9x. The R software also has a feature to perform a regressio directly. To obtai this result usig R we could first create vectors that represet the data: Versio: April 25, 2016

244 liear regressio > x <- c(3,4,5,6,7) > y <- c(6,5,6,4,2) Ad the istruct R to perform the regressio usig the commad lm idicatig the liear model. > lm(y x) The order of the variables i this commad is importat with this y x idicatig that the y variable is beig predicted usig the x variable as iput. The resultig output from R is (Iercept) x 9.1-0.9 the values of the itercept ad slope of the least squares lie respectively. Example 10.3.3. Suppose as part of a health study, a researcher collects data for weights ad heights of sixty adult me i a populatio. The average height of the me is 174 cm with a sample stadard deviatio of 8.0 cm. The average weight of the me is 78 kg with a sample stadard deviatio of 10 kg. The correlatio betwee the variables i the sample was 0.55. This iformatio aloe is eough to fid the least squares lie for predictig weight from height. The reader may use the formulas above to verify that b 0.6875 ad a 41.625. Therefore, amog all lies, y 41.625 + 0.6875x is the oe which miimizes the sum of squared residuals. This does ot ecessarily mea this lie would be appropriate for predictig ew data poits. To make such a declaratio, we would wat to have some evidece that the two variables had a liear relatioship to begi with, but regardless of whether or ot the data was produced from a simple liear model, the lie above miimizes error i the least squares sese. exercises Ex. 10.3.1. Let (X 1, Y 1 ),... (X, Y ) be data produced via the simple liear model ad suppose y a + bx is the least squares lie for the data. Recall from above that the residual for ay give data poit is Y j (a + bx j ), the error the lie makes i predictig the correct y-value from the give x-value. Show that the sum of the residuals over all data poits must be zero. Ex. 10.3.2. Suppose that istead of usig the simple liear model, we assume the regressio lie is kow to pass through the origi. That is, the regressio lie has the from y βx ad for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j βx j + ɛ j, (10.3.8) for j 1, 2,...,. As with the simple liear model, we assume each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). (We will refer to this as the liear model though the origi ad will have several exercises ivestigatig how several formulas from this chapter would eed to be modified for such a model.) Assumig data (X 1, Y 1 ),... (X, Y ) was produced from the liear model through the origi, fid the least squares lie through the origi. That is, fid a formula for b such that the lie y bx miimizes the sum of squared residual errors. 10.4 a ad b as radom variables I this sectio (ad the remaider of this chapter) we will assume that (X 1, Y 1 ),..., (X, Y ) follow the simple liear model (10.2.1). I other words, there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by (10.2.1). I the previous sectio this data was used to produce a mea squared error-miimizig least squares lie y a + bx. I this sectio we ivestigate how well the radom quatities a ad b approximate the (ukow) values α ad β. Versio: April 25, 2016

10.4 a ad b as radom variables 245 Theorem 10.4.1. Uder the assumptios of the simple liear model (10.2.1), the slope b of the least squares lie is a liear combiatio of the Y j variables. Further it has a ormal distributio with mea β ad σ variace 2. ( 1)S 2 X Proof - First recall that the X 1, X 2, X 3,... X are assumed to be determiistic, so will be treated as kow costats. The data poits Y 1, Y 2,..., Y are assumed to follow the simple liear model (10.2.1). So for j 1,...,, E[Y j ] E[α + βx j + ɛ j ] α + βx j + E[ɛ j ] α + βx j ad V ar[y j ] V ar[α + βx j + ɛ j ] V ar[ɛ j ] σ 2. Usig the formula, (10.3.5), we derived for b ad the above we have E[b] E ( X j Y j ) X Y ( 1)S 2 X 1 2 X j E[Y j ]) XE[Y ] 1 2 X j (α + βx j )) X(α + βx) 1 αx + β( 2 Xj 2 ) αx βx 2 β 2 Xj 2 ) X 2 β. Similarly, V ar[b] V ar ( X j Y j ) X Y ( 1)S 2 X 1 [ 2 X 2 ]2 j V ar[y j ]) 2 X 2 V ar[y ] 1 [ 2 X 2 ]2 j σ 2 ) 2 X 2 (σ 2 /) σ 2 [ 2 X 2 ]2 j ) X 2 σ 2 2. Versio: April 25, 2016

246 liear regressio The algebra below justifies that b is a liear combiatio of the Y j variables. b ( X j Y j ) X Y ( 1)S 2 X 1 [ ] 2 X j Y j ) ( XY j ) Xj X 2 Sice b is a liear combiatio of idepedet, ormal radom variables Y j, b itself is also a ormal radom variable (Theorem 6.3.13). As oted above, the least squares lie ca be defied as the lie of slope b passig through the poit of averages. The followig lemma is a useful fact about how these quatities relate to each other. Lemma 10.4.2. Let b be the slope of the least squares lie ad let Y be the sample average of the Y j variables. The b ad Y are idepedet. Proof - By Theorem 6.3.13, Y has a ormal distributio ad so does b by Theorem 10.4.1. By Theorem 6.4.3, all we have ro show is that Y ad b are ucorrelated. Note that the Y j variables are all idepedet of each other ad so Cov[Y j, Y k ] will be zero if j k ad will equal the variace σ 2 otherwise. So, Cov[b, Y ] Cov X j X ( 1)S 2 X Y j, 1 [ Xj X Cov k1 2 Y j, 1 Y k k1 X j X ( 1)S 2 Cov[Y j, Y k ] k1 X X j X ( 1)S 2 σ 2 X σ 2 2 X j X 0. We coclude this sectio with a result o the distributio of a. Theorem 10.4.3. Uder the assumptios of the simple liear model (10.2.1), The y-itercept a (give by (10.3.7) of the least squares lie is a liear combiatio of Y j variables. Further it has a ormal distributio with mea α ad variace σ 2 ( 1 + Proof- See Exercise 10.4.1. X2 ). ( 1)S 2 X Y k ] Y j exercises Ex. 10.4.1. Prove Theorem 10.4.3. (Hit: Make use of the fact that Y a + bx ad what has previously bee prove about Y ad b). Ex. 10.4.2. Show that, geerally speakig, a ad b are ot idepedet. Fid ecessary ad sufficiet coditios for whe the two varaibles are idepedet. Ex. 10.4.3. Show that a ad Y are ever idepedet. Ex. 10.4.4. Cotiuig from Exercise 10.3.2, assumig the regressio lie y βx passes through the origi ad b is the lest squares lie of the form y bx, do the followig: (a) Fid the expected value of b. Versio: April 25, 2016

10.5 predictig ew data whe σ 2 is kow 247 (b) Fid the variace of b. (c) Determie whether or ot b has a ormal distributio. (d) Determie if b ad Y are idepedet. 10.5 predictig ew data whe σ 2 is kow I this sectio we retur to questio of usig data for predictio. We cotiue to assume the simple liear model (10.2.1). We further assume that α ad β are estimated by a ad b( as calculated from the data (X 1, Y 1 ),..., (X, Y )) ad parameter σ 2 describig the variability of data aroud the regressio lie is a kow quatity. First suppose for a particular determiistic x-value X that we wat to use the data to estimate the correspodig y-value Y α + βx o the regressio lie by Y a + bx. Theorem 10.5.1. The quatity Y a + bx has a ormal distributio with mea Y α + βx ad variace σ 2 ( 1 + (X X) 2 ). ( 1)S 2 X Proof - Recall from Theorem 10.4.3 ad Theorem 10.4.1 that a ad b are both liear combiatio of the radom variables Y j ormal distributio. So Y has ormal distributio by Theorem 6.3.13. We eed to calculate oly its mea ad variace. The expected value is simple to calculate. E[Y ] E[a + bx ] E[a] + E[b]X α + βx Y If a ad b were idepedet, the calculatig the variace of Y would also be a simple task, but this is typically this is ot the case. However, from Lemma 10.4.2, we kow that b ad Y are idepedet. To make use of this, usig (10.3.3), we may rewrite the lie i poit-slope form aroud the poit of averages: Y Y + b(x X). From this we have, V ar[y ] V ar[y + b(x X)] V ar[y ] + V ar[b](x X) 2 σ2 + σ 2 2 (X X) 2 ( ) σ 2 1 + (X X) 2 2. Note that for various values of X this variace is miimal whe X is X, the average value of the x-data. I this case V ar[y ] σ2 V ar[y ] as expected. The further X is from the average of the x-values, the more variace there is i predictig the poit o the regressio lie. Next suppose that, istead of tryig to estimate a poit o the regressio lie, we are tryig to predict a ew data poit produced from the liear model. Let X ow represet the x-value of some ew data poit ad let Y α + βx + ɛ where ɛ Normal(0, σ 2 ) where the radom variable ɛ is assumed to be idepedet of all prior ɛ j which produced the origial data set. The followig theorem addresses the distributio of the predictive error made whe estimatig Y by the quatity Y a + bx. Theorem 10.5.2. If (X, Y ) is a ew data poit, as described i the previous paragraph, the the predictive error i estimatig Y usig the least square lie is (a + bx ) Y which is ormally distributed with mea 0 ad variace σ 2 (1 + 1 + (X X) 2 ). ( 1)S 2 X Proof - The expected value of the predictive error is zero sice E[(a + bx ) Y ] E[a] + E[b]X E[α + βx + ɛ ] α + βx α βx E[ɛ ] 0. Versio: April 25, 2016

248 liear regressio Both quatities a ad b are liear combiatios of the Y j variables ad so a + bx Y a + bx α βx ɛ ( α βx ) + a liear combiatio of Y 1, Y 2,..., Y, ɛ. All ( + 1) of the variables, Y 1, Y 2,..., Y, ɛ, are idepedet ad have a ormal distributio. As ( α βx ) is a costat, from the above (a + bx Y ) has a ormal distributio. Fially, to calculate the varaice, we agai rewrite a + bx i poit-slope form ad exploit idepedece. V ar[(a + bx Y )] V ar[(y + b(x X) (α + βx + ɛ )] V ar[y ] + V ar[b](x X) 2 + V ar[ɛ ] σ2 + σ 2 2 (X X) 2 + σ 2 ( ) σ 2 1 + 1 + (X X) 2 2. Example 10.5.3. A mathematics professor at a large uiversity is studyig the relatioship betwee scores o a preparatio assessmet quiz studets take o the first day of class ad their actual pecetage score at the ed of class. Assumig the simple liear model with σ 6, he takes a radom sample of 30 studets ad discovers their average score o the quiz is X 54 with a sample stadard deviatio of S X 12, while the averae percetage score i the class is Y 68 with a sample stadard deviatio of S Y 10. The sample correlatio is r[x, Y ] 0.6. So accordig to the results above, the least squares lie for predictig the course percetage from the prelimiary quiz will be y 0.5x + 41. If we wish to use the lie to predict the course percetage for someoe who scores a 54 o the prelimiary quiz, we would fid y 0.5(54) + 41 68, as expected sice somoe who gets a average score o the quiz is likely to get aroud the average percetage i the class. Similiarly if we wish to use the lie to predice the course percetage for someoe who scores a 80 o the prelimiary quiz, we would fid y 0.5(80) + 41 81. Also ot surprisig. Due to the positive correlatio, a studet scorig above average o the quiz is also likely to score higher i the course as well. The previous theorem allows us to go further a calculate a stadard deviatio associated with these estimates. For the studet who scores a 54 o the prelimiary quiz, let Y be the actual course percetage ad let a + bx 68 be the least squares lie estimate we made above. The, V ar[a + bx Y ] 36(1 + 1 + 0) 37.2 30 ad so the stadard deviatio i the predictive error is SD[a + bx Y ] 6.1. This meas that studets who make a average score of 54 o the prelimiary quiz will have a rage of percetages i the course. This rage will have a ormal distributio with mea 68 ad stadard deviatio 6.1. We could the use ormal curve computatios to make further predictios about how likely such a studet may be to reach a certai bechmark. Next take the example of a studet who scores 80 o the prelimiary quiz. The least squares lie predicts the course percetage for such a studet will be a + bx 81, but ow V ar[a + bx Y ] 36(1 + 1 30 + (80 54)2 29 12 2 ) 43.0 ad so SD[a + bx Y ] 6.6. Studet who score a 80 o the prelimiary exam will have a rage of course percetages with a ormal distributio of mea 81 ad stadard deviatio 6.6. Thikig of the stadard deviatio as the likely error associated with predictio this example suggests that predictios of data further from the mea will ted to have less accuracy tha predictios ear to the mea. This is true i the simple liear model ad will be explored i the exercises. Versio: April 25, 2016

10.6 hypothesis testig ad regressio 249 exercises Ex. 10.5.1. Usig the figures from Example 10.5.3 do the followig. Two studets are selected idepedetly at radom. The first scored a 50 o the prelimiary quiz while the secod scored 60. Determie how likely it is that the studet who scored the lower grade o the quiz will score a higher percetage i the course. Ex. 10.5.2. Explai why V ar[a + bx Y ] is miimized whe X X. 10.6 hypothesis testig ad regressio As a ad b both have a ormal distriubtio uder the assumptio of the simple liear model, it is possible to perform tests of sigificace cocerig the values of α ad β. Of particular importace is a test with a ull hypothesis that β 0 ad a alterate hypothesis β 0. This is commoly called a test of utility. The reaso for this ame is that if β 0, the the simple liear model produces output values Y j α + ɛ j which do ot deped o the correspodig iput X j. Therefore kowig the value of X j should ot be at all helpful i predictig the correspodig Y j result. However, if β 0 the kowig X j should be at least somewhat useful i predictig Y j value. Example 10.6.1. Suppose (X 1, Y 1 ),..., (X 16, Y 16 ) follows the simple liear model with σ 5 ad produces a least squares lie y 0.3 + 1.1x. Suppose the sample average of the X j data is 20 ad the sample variace is SX 2 10. What is the coclusio of a test of utility at a sigificace level of α 0.05? From the give least squares lie, b 1.1. As oted above, a test of utility compares a ull hypothesis that β 0 to a alterate hypothesis β 0, so this will be a two-tailed test. If the ull were true, the E[b] 0 ad we ca use the ormal distributio to determie whether the 1.1 value is so far from zero that the ull seems ureasoable. Usig the same sample mimicig idea itroduced i Chapter 9 we let Z 1,..., Z 16 be radom variables produced from X 1,... X 16 via the simple liear model. From Theorem 10.4.1, the slope of the least squares lie for the (X 1, Z 1 ),..., (X 16, Z 16 ) data has a ormal distributio with mea β 0 ad variace σ 2 ( 1)S 2 X 1 6. Therefore we ca calculate P ( slope of the least squares lie 1.1) P ( Z 1.1 1/6 ) 2P (Z < 1.1 1/6 ) 0.007 where Z Normal(0, 1). As this P-value is less tha the sigificace level, the test rejects the ull hypothesis. That is, the test cocludes that the slope of 1.1 is far eough from 0 that it demostrates a true relatioship betwee the X j iput values ad the Y j output values. exercises Ex. 10.6.1. Cotiuig with Example 10.6.1, use Theorem 10.4.3 to devise a hypothesis test for determiig whether or ot the regressio lie goes through the origi. That is, determie whether or ot α 0 is a plausible assumptio. 10.7 estimaig a ukow σ 2 I may cases the variace σ 2 of the poits aroud the regressio lie will be a ukow quatity ad so, like α ad β, it too will eed to be approximted usig the (X 1, Y 1 ),..., (X, Y ) data. The followig theorem provides a ubiased estimator for σ usig the data. Theorem 10.7.1. Let (X 1, Y 1 ),..., (X, Y ) be data followig the simple liear model with > 2. Let S 2 2 1 (Y j (a + bx j )) 2. The S 2 is a ubiased estimator for σ 2. (That is, E[S 2 ] σ 2 ). Versio: April 25, 2016

250 liear regressio Proof - Before lookig at E[S 2 ] i its etirety, we look at three quatities that will be helpful i computig this expected value. First ote, V ar[(y j Y )] V ar[ Y j + (Y 1 + Y 2 + + Y ) ] 1 V ar[( 1)Yj 2 + Y i ] 1 2 [( 1) 2 σ 2 + i1,i j i1,i j 1 2 [( 1)2 σ 2 + ( 1)σ 2 ] 1 σ2 σ 2 ] ad therefore, E[(Y j Y ) 2 ] V ar[y j Y ] + (E[Y j Y ]) 2 1 σ2 + ((α + βx j ) (α + βx)) 2 1 σ2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 ( 1)S 2 X. (10.7.1) Next, E[b 2 (X j X) 2 ] E[b 2 ] (X j X) 2 (V ar[b] + (E[b]) 2 )() 2 σ 2 ( 2 + β 2 )() 2 σ 2 + β 2. 2 (10.7.2) Versio: April 25, 2016

10.7 estimaig a ukow σ 2 251 Also, from which we may determie that E[(Y j Y )b(x j X)] E[bY j ] Cov[b, Y j ] + E[b]E[Y j ] X Cov[ i X ( 1)S 2 Y i, Y j ] + β(α + βx j ) i1 X X i X ( 1)S 2 Cov[Y i, Y j ] + β(α + βx j ) i1 X X i X 2 V ar[y j ] + β(α + βx j ) X i X 2 σ 2 + β(α + βx j ) (X j X)E[Y j b] (X j X)E[Y b] (X j X)( X i X ( 1)S 2 σ 2 + β(α + βx j )) (X j X)E[Y ]E[b] X (X i X) 2 ( 1)S 2 σ 2 + (X j X)β(α + βx j ) (X j X)(α + βx)β X σ 2 + (X j X)β 2 (X j X) σ 2 + β 2 ( 1)S 2 X (10.7.3) Fially, puttig together the results from equatios 10.7.1, 10.7.2, ad 10.7.3 we fid E[ (Y j (a + bx j )) 2 ] E[ (Y j (Y + b(x j X)))] Hece E[S 2 X ] E[ 1 2 E[ E[ ((Y j Y ) b(x j X)) 2 ] (Y j Y ) 2 2(Y j Y )b(x j X) + b 2 (X j X) 2 ] E[(Y j Y ) 2 ] 2E[(Y j Y )b(x j X)] + E[b 2 (X j X) 2 ] ( ( 1)σ 2 + β 2 ) 2 ( 2 σ 2 + β 2 ) 2 ( + σ 2 + β 2 2 ) ( 2)σ 2 (Y j (a + bx j )) 2 ] σ 2 as desired. Versio: April 25, 2016

252 liear regressio Versio: April 25, 2016