Chapter 1 Linear Regression with One Predictor

Size: px

Start display at page:

Download "Chapter 1 Linear Regression with One Predictor"

Valerie Snow
5 years ago
Views:

1 STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang

2 Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the choice of which variable is X and which is Y can be arbitrary Association generally does not imply causality In experimental settings, helps select X to control Y at the desired level Predict a future value of Y at a specific value of X Always need to consider scope of the model 1-1

3 Example: Leaning Tower of Pisa Annual measurements of its lean available Measured in tenths of a mm > 2.9 meters Prior to recent repairs, its lean was increasing over time Goals: To characterize lean over time To predict future observations 1-2

4 The Data Set Obs year lean Data taken from Exercise 10.8, p698 in Moore and McCabe, Intro to the Practice of Statistics, 3rd ed. 1-3

5 The Data and Relationship Response/Dependent variable: lean (Y) Explanatory/Independent variable: year (X) Observe lean from Is there a relationship between Y and X? 1-4

6 To Generate a Scatterplot in SAS DATA a1; INPUT year CARDS; ; PROC PRINT DATA=a1; WHERE lean NE.; RUN; SYMBOL1 V=CIRCLE I=SM70; PROC GPLOT DATA=a1; PLOT lean*year / FRAME; WHERE lean NE.; RUN; 1-5

7 What is the Trend? Should always plot the data first!!!!! 1-6

8 Linear Trend? SYMBOL1 V=CIRCLE I=rl; PROC GPLOT DATA=a1; PLOT lean*year / FRAME; WHERE lean NE.; RUN; QUIT; 1-7

9 Straight Line Equation Straight line describes curve well Formula for a straight line β 0 is the intercept β 1 in the slope E[Y] = β 0 +β 1 X Need to estimate β 0 and β 1 i.e. determine their plausible values from the data Will use method of least squares 1-8

10 Simple Linear Regression Model Y i = β 0 +β 1 X i +ε i β 0 is the intercept β 1 is the slope ε i is the i th random error term Mean 0 E(ε i ) = 0 Variance σ 2 Var(ε i ) = σ 2 Uncorrelated Cov(ε i,ε j ) = 0, i j 1-9

11 Features of the Model Y i = deterministic term + random term deterministic term is β 0 +β 1 X i random term is ε i Implies Y i is a random variable E(Y i ) = β 0 +β 1 X i +0 E(Y) = β 0 +β 1 X (underlying relationship) Var(Y i ) = 0+σ 2 variance the same regardless of X i Cov(Y i,y j ) = Cov(ε i,ε j ) = 0, i j 1-10

12 Estimation of Regression Function Consider deviation of Y i from E(Y i ) Y i (β 0 +β 1 X i ) Method of least squares Find estimators of β 0,β 1 which minimize n Q = [Y i (β 0 +β 1 X i )] 2 i=1 Deviations can be positive or negative Square deviations so contribution positive Calculus of solutions shown on pages

13 Estimating the Slope β 1 is the true unknown slope Defines change in E(Y) for change in X β 1 = E(Y) X E(Y) = β 1 X b 1 is the least squares estimate of β 1 b 1 = n i=1 (X i X)(Y i Y) n i=1 (X i X) 2 When will b 1 be negative? 1-12

14 Estimating the Intercept β 0 is the true unknown intercept Defines E(Y) when X = 0 E(Y) = β 0 +β 1 0 = β 0 Usually not of interest (scope of model) b 0 is the least squares estimate of β 0 b 0 = Y b 1 X Fitted line goes through (X,Y) 1-13

15 Properties of Estimates Under the Gauss-Markov theorem, these least squares estimators Are unbiased E(b l ) = β l, l = 0,1 Have minimum variance among all unbiased linear estimators In other words, these estimates are the most precise of any estimator where b l is of the form k i Y i E(b l ) = β l 1-14

16 Estimated Regression Line Using the estimated parameters, the fitted regression line is Ŷ i = b 0 +b 1 X i where Ŷ i is the estimated value at X i Fitted value Ŷ i is also an estimate of the mean response E[Y i ] Extension of the Gauss-Markov theorem E(Ŷ i ) = E(Y i ) Ŷ i minimum variance among linear estimators 1-15

17 Example: Leaning Tower of Pisa Based on the following table 1. Obtain the least squares estimate of β 0 and β State the regression function 3. Obtain a point estimate for the year 2002 (X = 102) 4. State the expected change in lean over two years 1-16

18 X Y X X Y Y (X X)(Y Y) (X X)

19 Answers 1. Obtain the least squares estimate of β 0 and β 1. b 1 = = b 0 = = State the regression function Ŷ i = X i 3. Obtain a point estimate for the year 2002 (X = 102) (Ŷ X = 102) = (102) = State the expected change in lean over two years Since the slope is , a two unit increase in X results in a = increase in lean 1-18

20 Residuals The residual is the difference between the observed and fitted value e i = Y i Ŷ i This is not the error term ε i = Y i E(Y i ) The e i is observable while ε i is not Residuals are highly useful in assessing the appropriateness of the model 1-19

21 Properties of Residuals (1) e i = 0 (2) e 2 i is minimized (3) Y i = Ŷ i (4) X i e i = 0 (5) Ŷ i e i = 0 These properties follow directly from the least squares criterion and normal equations (pg 23-24) 1-20

22 Estimation of Error Variance In single population (i.e., ignoring X) s 2 = (Yi Y) 2 n 1 Unbiased estimate of σ 2 One df lost by using Y in place of µ In regression model s 2 = (Yi Ŷ i ) 2 n 2 Unbiased estimate of σ 2 Two df lost by using (b 0,b 1 ) in place of (β 0,β 1 ) Also known as the mean square error (MSE) 1-21

23 PROC REG in SAS: Leaning Tower of Pisa PROC REG DATA=a1; MODEL lean=year / CLB P R; OUTPUT OUT=a2 P=pred R=resid; ID year; RUN; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 95% Confidence Limits Intercept year <

24 Output Statistics Dep Var Predicted Std Error Std Error Obs year lean Value Mean Predict Residual Residual

25 PROC GPLOT DATA=a2; PLOT resid*year / FRAME VREF=0; WHERE lean NE.; RUN; QUIT; 1-24

26 Normal Error Regression Model Y i = β 0 +β 1 X i +ε i, ε i iid N(0,σ 2 ) β 0 is the intercept β 1 is the slope ε i is the i th random error term ε i N(0,σ 2 ) NEW Uncorrelated independent error terms Defines distribution of random variable Y Y i ind N(β 0 +β 1 X i,σ 2 ) 1-25

27 Comments The least square estimates are unbiased without the normality assumption The normality assumption greatly simplifies the theory of analysis The normality assumption makes it easy to construct confidence intervals / perform hypothesis tests Most inferences are only sensitive to large departures from normality See pages for more details 1-26

28 Maximum Likelihood Estimation Assumption of Normality gives us more choices of methods for parameter estimation Y i N(β 0 +β 1 X i,σ 2 ) f i = 1 2πσ 2 exp{ 1 2σ 2(Y i β 0 β 1 X i ) 2} Likelihood function L = f 1 f 2 f n (i.e. the joint probability distribution of the observations, viewed as function of parameters) Find β 0, β 1 and σ 2 which maximizes L Obtain similar estimators b 0 and b 1 for β 0 and β 1, but slightly different estimators for σ 2 (see HW#1) 1-27

29 Chapter Review Description of Linear Regression Model Least Squares & Parameter Estimation Fitted Regression Line Normality Assumption PROC REG in SAS: First Touch 1-28

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between