Simple Linear Regression: A Model for the Mean. Chap 7

Similar documents
INFERENCE FOR REGRESSION

Lecture 18: Simple Linear Regression

School of Mathematical Sciences. Question 1

Inference for Regression Inference about the Regression Model and Using the Regression Line

23. Inference for regression

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Multiple Regression an Introduction. Stat 511 Chap 9

III. Inferential Tools

Basic Business Statistics, 10/e

The simple linear regression model discussed in Chapter 13 was written as

Confidence Interval for the mean response

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Six Sigma Black Belt Study Guides

Notebook Tab 6 Pages 183 to ConteSolutions

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Ch 13 & 14 - Regression Analysis

Inference for the Regression Coefficient

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression Methods

Basic Business Statistics 6 th Edition

Warm-up Using the given data Create a scatterplot Find the regression line

STAT Chapter 11: Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Multiple Regression Examples

Simple Linear Regression

Conditions for Regression Inference:

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Ch 2: Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

28. SIMPLE LINEAR REGRESSION III

Correlation & Simple Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Q Lecture Introduction to Regression

TMA4255 Applied Statistics V2016 (5)

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Mathematics for Economics MA course

STAT 3022 Spring 2007

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Simple and Multiple Linear Regression

Chapter 9. Correlation and Regression

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

This document contains 3 sets of practice problems.

Introduction to Regression

A discussion on multiple regression models

Data files & analysis PrsnLee.out Ch9.xls

2.1: Inferences about β 1

Review of Regression Basics

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Models with qualitative explanatory variables p216

appstats27.notebook April 06, 2017

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

MINITAB Stat Lab 3

Examination paper for TMA4255 Applied statistics

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

10. Alternative case influence statistics

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

School of Mathematical Sciences. Question 1. Best Subsets Regression

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Regression. Marc H. Mehlman University of New Haven

Regression Models - Introduction

Lecture 3: Inference in SLR

13 Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlation Analysis

Statistical Techniques II EXST7015 Simple Linear Regression

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

R 2 and F -Tests and ANOVA

Inferences for Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Inference with Simple Regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Statistical Modelling in Stata 5: Linear Models

Measuring the fit of the model - SSR

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Chapter 12: Multiple Regression

Ch 3: Multiple Linear Regression

Lecture 1 Linear Regression with One Predictor Variable.p2

Probability and Statistics Notes

SMAM 314 Exam 42 Name

Inference for Regression Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

2. Outliers and inference for regression

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter 27 Summary Inferences for Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Transcription:

Simple Linear Regression: A Model for the Mean Chap 7

An Intermediate Model (if the groups are defined by values of a numeric variable) Separate Means Model Means fall on a straight line function of the group values Equal Means Model 2

Meaning of the Word Regression statistical meaning not related to the usual English definition of regression misnomer, but there s a historical reason refers to finding the best fitting (straight line) relationship between the mean of Y and values of the variable defining the groups (X) 3

Galton s data 75 height of son 7 65 6 6 65 7 75 height of father 4

Case Study Hubble data (from 92 s) observational data each point represents a nebula Y distance of the nebula X recession velocity of the nebula 5

Case Study don t worry about the geometry, but Big Bang theory implies that Y = const. * X (i.e. a straight line relationship between Y and X, with intercept ) furthermore, const. should be the age of the universe 6

Case Study 2 DISTANCE 5 VELOCITY 7

Case Study Regression methods can help answer big cosmological questions: Is the Big Bang theory correct? is the intercept of the straight-line relationship? How old is the universe? what is the slope of the straight-line relationship? 8

Case Study 2 designed experiment response variable (Y) ph of steer carcass explanatory variable (X) log time (hours) after slaughter what is the approximate ph of a particular steer carcass 3 hours after slaughter? about how long do you have to wait after slaughter for the ph to reach 6.? 9

Case Study 2 7. 7. 6.5 6. 5 PH PH 6. 6. 5.5 5. 5 2 3 4 TIME 5 6 7 8 log(time) 2

Regression Analysis statistical methods based on describing the distribution of values of one variable (Y the response variable) as a function of the other variable (X the explanatory variable) simple linear regression: the function is a straight line function

Regression Analysis specifically, the mean of Y is a straight line function of X µ{y X} = β + β X where β is the intercept and β is the slope the standard deviation of Y is constant σ{y X} = σ the distribution of values of Y for any value of X is normal 2

Simple Linear Regression Model LINEARITY The mean response, µ, has a straight-line relationship with X, (µ{y X}= β + β X). ASSUMPTIONS NORMALITY For any given X, Y is normally distributed around µ{y X}. y x µ y = β + β x EQUAL STANDARD DEVIATION The standard deviation(σ) of Y is the same for all values of X. INDEPENDENCE Observations are independent of each other. 3

Simple Linear Regression Model Response Explanatory µ { Y X} = β + β X Regression coefficents 4

Simple Linear Regression Model Response Regressor µ { Y X} = β + β X Intercept Slope 5

Interpretation of coefficients slope: for a -unit change in X, Y changes by β units on average intercept: when X is, the mean value of Y is β 6

Simple Linear Regression Model advantage relative to separate means model interpolation: making inferences about the distribution of Y for X values not actually included in the data set (but within the range of the observed X values e.g. estimating the mean ph of carcasses 2.5 hours after slaughter) danger relative to separate means model extrapolation: making inferences about the distribution of Y for X values outside the range of the observed X values (e.g. estimating the mean ph of carcasses 2 hours after slaughter) 7

Estimating β, β and σ we can t just calculate means to estimate β and β (like we did in ANOVA) strategy: use least squares to find best fitting line; then use slope and intercept of best fitting line as estimates true slope and intercept 8

Least squares find the straight line that minimizes Σ(res i ) 2 µ{y X} = β +β X fit i Y i res i = Y i -fit i 9

Formulas for estimators the least squares principle is just for background 2 ) ˆ ˆ ˆ ) ( ) )( ( ˆ 2 2 = = = = = n res X Y X X Y Y X X i n i i n i i i ( σ β β β 2

Degrees of Freedom n-2 here n-i in one way ANOVA n- in one-sample t tools in general, df = (number of observations) (number of parameters in model for the means) 2

Standard Errors ) ( ) ( ˆ ) ˆ ( ) ( ˆ ) ˆ ( 2 2 2 2 2 = + = = n X X s where s n X n SE s n SE i X X X σ β σ β 22

Inferences with these estimators and standard errors, you can test hypotheses about and compute confidence intervals for β and β in the usual way e.g. est. +/- t (df)(-α/2) SE(est.) based on the t-distribution with n-2 degrees of freedom 23

Regression Software in practice you never need to hand-calculate these estimators nor their standard errors but you might have to hand-calculate tests or confidence intervals MINITAB and S-Plus Hubble data: test the null hypothesis that β = estimate β 24

25

26

Hubble data Regression Analysis: DISTANCE versus VELOCITY ˆ ˆβ ( ˆ β σˆ SE β ) SE( ˆ β ) The regression equation is DISTANCE =.399 +.37 VELOCITY Predictor Coef SE Coef T P Constant.399.85 3.37.3 VELOCITY.3729.2274 6.4. S =.45 R-Sq = 62.4% R-Sq(adj) = 6.6% 27

Hubble data Regression Analysis: DISTANCE versus VELOCITY The regression equation is DISTANCE =.399 +.37 VELOCITY Predictor Coef SE Coef T P Constant.399.85 3.37.3 VELOCITY.3729.2274 6.4. S =.45 R-Sq = 62.4% R-Sq(adj) = 6.6% Since p=.3, the simple Big Bang theory is not correct 28

Hubble data Regression Analysis: DISTANCE versus VELOCITY The regression equation is DISTANCE =.399 +.37 VELOCITY ˆβ SE( ˆ β ) Predictor Coef SE Coef T P Constant.399.85 3.37.3 VELOCITY.3729.2274 6.4. S =.45 R-Sq = 62.4% R-Sq(adj) = 6.6% 29

Hubble data hand calculate confidence interval for β.3729 +/- 2.7 (.2274) = (.98,.844) or roughly (.89,.8) billion years 3

ph data Regression Analysis: PH versus log(time) The regression equation is PH = 6.98 -.726 log(time) Predictor Coef SE Coef T P Constant 6.98363.4853 43.9. log(time -.72566.3443-2.8. S =.8226 R-Sq = 98.2% R-Sq(adj) = 98.% (but these statistics don t answer the questions of interest to us) 3

Three New Kinds of Inference. Estimating the mean value of Y at some specified value X=X (e.g. What is the mean ph of steer carcasses 3 hours after slaughter?) ˆ{ µ Y X } = ˆ β + ˆ β X SE( ˆ{ µ Y X }) = σˆ n + ( X ( n X ) ) s 2 x 2 32

Three New Kinds of Inference 2. Predicting the value of Y for a specific individual at some specified value X=X (e.g. What is the ph of a particular steer s Garth s -- carcass 3 hours after slaughter?) pred{ Y X } = ˆ β + ˆ β X SE( pred{ Y X }) = ˆ σ + n + ( X ( n X ) ) s 2 x 2 = ˆ σ 2 + SE( ˆ{ µ Y X }) 2 33

Three New Kinds of Inference 3. Estimating the value of X that results in Y=Y (inverse prediction or calibration) (e.g. About how long do you have to wait after slaughter for the mean ph to reach 6.?) Xˆ = Y ˆ β ˆ β SE( Xˆ ) = SE( ˆ{ µ Y ˆ β Xˆ }) or SE( pred{ Y ˆ β Xˆ }) 34

Software for first two inferences MINITAB easy SPlus a little harder (for me) 35

36

37

ˆ µ { Y X } SE ( ˆ{ µ Y X }) ˆ{ Y X } ± t( n 2)(.975) SE( ˆ{ µ Y X }) µ Predicted Values for New Observations New Obs Fit SE Fit 95.% CI 95.% PI 6.86.262 ( 6.257, 6.2465) ( 5.987, 6.3852) Values of Predictors for New Observations New Obs log(time. pred{ Y X } ± t( n 2)(.975) SE( pred{ Y X }) X = ln(3.)=.99 38

Hand calculations for inverse estimation for Y =6 Xˆ = Y ˆ β ˆ β = 6. 6.98.726 =.35 exp(.35) = 3.86 hours SE( Xˆ ) = SE( ˆ{ µ Y ˆ β Xˆ }) =.266.726 =.37 39

Related ideas computer centering trick confidence and prediction bands multiple comparison issues 4

Regression Plot 7 PH 6 Regression 95% CI 5 95% PI log(time) 2 4

Standard Errors We have discussed six standard errors SE of the slope SE of the intercept SE of the estimated mean of Y at X SE of the predicted value of Y (new individual) at X SE of inverse estimate of X for a mean Y SE of inverse estimate of X for an individual Y 42

Correlation Correlation is a measure of linear association between two variables Formula on page 94 between - and + zero correlation implies no linear relation + implies on points fall exactly on a straight line with positive slope 43