Chapter 1 Linear Regression with One Predictor

Similar documents
Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Regression Models - Introduction

Chapter 2 Inferences in Simple Linear Regression

Overview Scatter Plot Example

STAT5044: Regression and Anova. Inyoung Kim

Chapter 8 Quantitative and Qualitative Predictors

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Lecture 1 Linear Regression with One Predictor Variable.p2

Regression Models - Introduction

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Lecture 11: Simple Linear Regression

Simple Linear Regression

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

Statistics 512: Applied Linear Models. Topic 1

Ch 2: Simple Linear Regression

1. Simple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Topic 20: Single Factor Analysis of Variance

Statistics for Engineers Lecture 9 Linear Regression

Topic 14: Inference in Multiple Regression

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

General Linear Model (Chapter 4)

Course Information Text:

Chapter 1. Linear Regression with One Predictor Variable

STOR 455 STATISTICAL METHODS I

Regression Analysis Chapter 2 Simple Linear Regression

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Outline Topic 21 - Two Factor ANOVA

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Chapter 2 Multiple Regression I (Part 1)

Lecture 3: Inference in SLR

Models for Clustered Data

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Models for Clustered Data

Math 3330: Solution to midterm Exam

Answer to exercise 'height vs. age' (Juul)

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Simple linear regression

Lecture 14 Simple Linear Regression

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Simple Linear Regression

Comparison of a Population Means

Simple Linear Regression for the MPG Data

Topic 18: Model Selection and Diagnostics

Topic 28: Unequal Replication in Two-Way ANOVA

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Lecture 11 Multiple Linear Regression

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

13 Simple Linear Regression

Table 1: Fish Biomass data set on 26 streams

Single Factor Experiments

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Lecture 12 Inference in MLR

ECO220Y Simple Regression: Testing the Slope

9. Linear Regression and Correlation

Inference in Regression Analysis

Statistical View of Least Squares

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Measuring the fit of the model - SSR

In Class Review Exercises Vartanian: SW 540

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

Chapter 6 Multiple Regression

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Lecture notes on Regression & SAS example demonstration

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

TMA4255 Applied Statistics V2016 (5)

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3.1 through 3.3

Simple and Multiple Linear Regression

Inference in Normal Regression Model. Dr. Frank Wood

Essential of Simple regression

MATH 644: Regression Analysis Methods

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

STAT Chapter 11: Regression

Y i = η + ɛ i, i = 1,...,n.

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Ch 3: Multiple Linear Regression

SF2930: REGRESION ANALYSIS LECTURE 1 SIMPLE LINEAR REGRESSION.

MATH11400 Statistics Homepage

Simple Linear Regression

Homework 2: Simple Linear Regression

Correlation Analysis

Failure Time of System due to the Hot Electron Effect

ssh tap sas913, sas

Inferences for Regression

3 Variables: Cyberloafing Conscientiousness Age

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Bias Variance Trade-off

STAT 4385 Topic 03: Simple Linear Regression

Regression Estimation Least Squares and Maximum Likelihood

Lecture 18 Miscellaneous Topics in Multiple Regression

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 4. Checking Model Adequacy

Lecture 6 Multiple Linear Regression, cont.

Topic 32: Two-Way Mixed Effects Model

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Transcription:

STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang

Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the choice of which variable is X and which is Y can be arbitrary Association generally does not imply causality In experimental settings, helps select X to control Y at the desired level Predict a future value of Y at a specific value of X Always need to consider scope of the model 1-1

Example: Leaning Tower of Pisa Annual measurements of its lean available Measured in tenths of a mm > 2.9 meters Prior to recent repairs, its lean was increasing over time Goals: To characterize lean over time To predict future observations 1-2

The Data Set Obs year lean 1 75 642 2 76 644 3 77 656 4 78 667 5 79 673 6 80 688 7 81 696 8 82 698 9 83 713 10 84 717 11 85 725 12 86 742 13 87 757 Data taken from Exercise 10.8, p698 in Moore and McCabe, Intro to the Practice of Statistics, 3rd ed. 1-3

The Data and Relationship Response/Dependent variable: lean (Y) Explanatory/Independent variable: year (X) Observe lean from 1975-1987 Is there a relationship between Y and X? 1-4

To Generate a Scatterplot in SAS DATA a1; INPUT year lean @@; CARDS; 75 642 76 644 77 656 78 667 79 673 80 688 81 696 82 698 83 713 84 717 85 725 86 742 87 757 102. ; PROC PRINT DATA=a1; WHERE lean NE.; RUN; SYMBOL1 V=CIRCLE I=SM70; PROC GPLOT DATA=a1; PLOT lean*year / FRAME; WHERE lean NE.; RUN; 1-5

What is the Trend? Should always plot the data first!!!!! 1-6

Linear Trend? SYMBOL1 V=CIRCLE I=rl; PROC GPLOT DATA=a1; PLOT lean*year / FRAME; WHERE lean NE.; RUN; QUIT; 1-7

Straight Line Equation Straight line describes curve well Formula for a straight line β 0 is the intercept β 1 in the slope E[Y] = β 0 +β 1 X Need to estimate β 0 and β 1 i.e. determine their plausible values from the data Will use method of least squares 1-8

Simple Linear Regression Model Y i = β 0 +β 1 X i +ε i β 0 is the intercept β 1 is the slope ε i is the i th random error term Mean 0 E(ε i ) = 0 Variance σ 2 Var(ε i ) = σ 2 Uncorrelated Cov(ε i,ε j ) = 0, i j 1-9

Features of the Model Y i = deterministic term + random term deterministic term is β 0 +β 1 X i random term is ε i Implies Y i is a random variable E(Y i ) = β 0 +β 1 X i +0 E(Y) = β 0 +β 1 X (underlying relationship) Var(Y i ) = 0+σ 2 variance the same regardless of X i Cov(Y i,y j ) = Cov(ε i,ε j ) = 0, i j 1-10

Estimation of Regression Function Consider deviation of Y i from E(Y i ) Y i (β 0 +β 1 X i ) Method of least squares Find estimators of β 0,β 1 which minimize n Q = [Y i (β 0 +β 1 X i )] 2 i=1 Deviations can be positive or negative Square deviations so contribution positive Calculus of solutions shown on pages 17-18 1-11

Estimating the Slope β 1 is the true unknown slope Defines change in E(Y) for change in X β 1 = E(Y) X E(Y) = β 1 X b 1 is the least squares estimate of β 1 b 1 = n i=1 (X i X)(Y i Y) n i=1 (X i X) 2 When will b 1 be negative? 1-12

Estimating the Intercept β 0 is the true unknown intercept Defines E(Y) when X = 0 E(Y) = β 0 +β 1 0 = β 0 Usually not of interest (scope of model) b 0 is the least squares estimate of β 0 b 0 = Y b 1 X Fitted line goes through (X,Y) 1-13

Properties of Estimates Under the Gauss-Markov theorem, these least squares estimators Are unbiased E(b l ) = β l, l = 0,1 Have minimum variance among all unbiased linear estimators In other words, these estimates are the most precise of any estimator where b l is of the form k i Y i E(b l ) = β l 1-14

Estimated Regression Line Using the estimated parameters, the fitted regression line is Ŷ i = b 0 +b 1 X i where Ŷ i is the estimated value at X i Fitted value Ŷ i is also an estimate of the mean response E[Y i ] Extension of the Gauss-Markov theorem E(Ŷ i ) = E(Y i ) Ŷ i minimum variance among linear estimators 1-15

Example: Leaning Tower of Pisa Based on the following table 1. Obtain the least squares estimate of β 0 and β 1. 2. State the regression function 3. Obtain a point estimate for the year 2002 (X = 102) 4. State the expected change in lean over two years 1-16

X Y X X Y Y (X X)(Y Y) (X X) 2 75 642-6 -51.6923 310.1538 36 76 644-5 -49.6923 248.4615 25 77 656-4 -37.6923 150.7692 16 78 667-3 -26.6923 80.0769 9 79 673-2 -20.6923 41.3846 4 80 688-1 -5.6923 5.6923 1 81 696 0 2.3077 0 0 82 698 1 4.3077 4.3077 1 83 713 2 19.3077 38.6154 4 84 717 3 23.3077 69.9231 9 85 725 4 31.3077 125.2308 16 86 742 5 48.3077 241.5385 25 87 757 6 63.3077 379.8462 36 1053 9018 0 0 1696 182 1-17

Answers 1. Obtain the least squares estimate of β 0 and β 1. b 1 = 1696 182 = 9.3187 b 0 = 9018 13 9.31871053 13 = 61.1224 2. State the regression function Ŷ i = 61.1224+9.3187X i 3. Obtain a point estimate for the year 2002 (X = 102) (Ŷ X = 102) = 61.1224+9.3187(102) = 889.3850 4. State the expected change in lean over two years Since the slope is 9.3187, a two unit increase in X results in a 2 9.3187 = 18.6374 increase in lean 1-18

Residuals The residual is the difference between the observed and fitted value e i = Y i Ŷ i This is not the error term ε i = Y i E(Y i ) The e i is observable while ε i is not Residuals are highly useful in assessing the appropriateness of the model 1-19

Properties of Residuals (1) e i = 0 (2) e 2 i is minimized (3) Y i = Ŷ i (4) X i e i = 0 (5) Ŷ i e i = 0 These properties follow directly from the least squares criterion and normal equations (pg 23-24) 1-20

Estimation of Error Variance In single population (i.e., ignoring X) s 2 = (Yi Y) 2 n 1 Unbiased estimate of σ 2 One df lost by using Y in place of µ In regression model s 2 = (Yi Ŷ i ) 2 n 2 Unbiased estimate of σ 2 Two df lost by using (b 0,b 1 ) in place of (β 0,β 1 ) Also known as the mean square error (MSE) 1-21

PROC REG in SAS: Leaning Tower of Pisa PROC REG DATA=a1; MODEL lean=year / CLB P R; OUTPUT OUT=a2 P=pred R=resid; ID year; RUN; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 15804 15804 904.12 <.0001 Error 11 192.28571 17.48052 Corrected Total 12 15997 Root MSE 4.18097 R-Square 0.9880 Dependent Mean 693.69231 Adj R-Sq 0.9869 Coeff Var 0.60271 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 95% Confidence Limits Intercept 1-61.12088 25.12982-2.43 0.0333-116.43124-5.81052 year 1 9.31868 0.30991 30.07 <.0001 8.63656 10.00080 1-22

Output Statistics Dep Var Predicted Std Error Std Error Obs year lean Value Mean Predict Residual Residual 1 75 642.0000 637.7802 2.1914 4.2198 3.561 2 76 644.0000 647.0989 1.9354-3.0989 3.706 3 77 656.0000 656.4176 1.6975-0.4176 3.821 4 78 667.0000 665.7363 1.4863 1.2637 3.908 5 79 673.0000 675.0549 1.3149-2.0549 3.969 6 80 688.0000 684.3736 1.2003 3.6264 4.005 7 81 696.0000 693.6923 1.1596 2.3077 4.017 8 82 698.0000 703.0110 1.2003-5.0110 4.005 9 83 713.0000 712.3297 1.3149 0.6703 3.969 10 84 717.0000 721.6484 1.4863-4.6484 3.908 11 85 725.0000 730.9670 1.6975-5.9670 3.821 12 86 742.0000 740.2857 1.9354 1.7143 3.706 13 87 757.0000 749.6044 2.1914 7.3956 3.561 14 102. 889.3846 6.6107.. 1-23

PROC GPLOT DATA=a2; PLOT resid*year / FRAME VREF=0; WHERE lean NE.; RUN; QUIT; 1-24

Normal Error Regression Model Y i = β 0 +β 1 X i +ε i, ε i iid N(0,σ 2 ) β 0 is the intercept β 1 is the slope ε i is the i th random error term ε i N(0,σ 2 ) NEW Uncorrelated independent error terms Defines distribution of random variable Y Y i ind N(β 0 +β 1 X i,σ 2 ) 1-25

Comments The least square estimates are unbiased without the normality assumption The normality assumption greatly simplifies the theory of analysis The normality assumption makes it easy to construct confidence intervals / perform hypothesis tests Most inferences are only sensitive to large departures from normality See pages 26-27 for more details 1-26

Maximum Likelihood Estimation Assumption of Normality gives us more choices of methods for parameter estimation Y i N(β 0 +β 1 X i,σ 2 ) f i = 1 2πσ 2 exp{ 1 2σ 2(Y i β 0 β 1 X i ) 2} Likelihood function L = f 1 f 2 f n (i.e. the joint probability distribution of the observations, viewed as function of parameters) Find β 0, β 1 and σ 2 which maximizes L Obtain similar estimators b 0 and b 1 for β 0 and β 1, but slightly different estimators for σ 2 (see HW#1) 1-27

Chapter Review Description of Linear Regression Model Least Squares & Parameter Estimation Fitted Regression Line Normality Assumption PROC REG in SAS: First Touch 1-28