Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Similar documents
Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

MODELS WITHOUT AN INTERCEPT

Regression Models - Introduction

Statistics for Engineers Lecture 9 Linear Regression

Inference for Regression

Lecture 18: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

III. Inferential Tools

Regression and Models with Multiple Factors. Ch. 17, 18

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Chapter 12: Linear regression II

Business Statistics. Lecture 10: Course Review

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

INFERENCE FOR REGRESSION

Inferences for Regression

STAT 3022 Spring 2007

Simple Linear Regression: A Model for the Mean. Chap 7

Multiple Linear Regression

MATH 644: Regression Analysis Methods

Regression. Marc H. Mehlman University of New Haven

Stat 401B Exam 2 Fall 2015

Coefficient of Determination

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Regression Models - Introduction

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

ST430 Exam 1 with Answers

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Inference for Regression Simple Linear Regression

Introduction to Linear Regression

L21: Chapter 12: Linear regression

Section 3: Simple Linear Regression

Comparing Nested Models

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

Variance Decomposition and Goodness of Fit

STAT 350: Summer Semester Midterm 1: Solutions

Biostatistics 380 Multiple Regression 1. Multiple Regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Topics on Statistics 2

Ch 2: Simple Linear Regression

STAT 215 Confidence and Prediction Intervals in Regression

Lecture 19: Inference for SLR & Transformations

Lecture 3: Inference in SLR

Mathematics for Economics MA course

Chapter 3: Examining Relationships

Stat 401B Final Exam Fall 2015

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Inference for the Regression Coefficient

Lecture 2. The Simple Linear Regression Model: Matrix Approach

1 The Classic Bivariate Least Squares Model

Linear Model Specification in R

Correlation Analysis

Data Analysis Using R ASC & OIR

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

MS&E 226: Small Data

Chapter 7: Simple linear regression

Unit 6 - Simple linear regression

Gov 2000: 9. Regression with Two Independent Variables

ECO220Y Simple Regression: Testing the Slope

Stat 401B Exam 2 Fall 2016

STAT Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9

Simple Linear Regression for the MPG Data

Multiple Regression: Example

Exam 3 Practice Questions Psych , Fall 9

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Regression on Faithful with Section 9.3 content

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

General Linear Statistical Models - Part III

AMS-207: Bayesian Statistics

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Swarthmore Honors Exam 2012: Statistics

appstats27.notebook April 06, 2017

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

13 Simple Linear Regression

The Multiple Regression Model

s e, which is large when errors are large and small Linear regression model

Tests of Linear Restrictions

Lecture 10 Multiple Linear Regression

9. Linear Regression and Correlation

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Simple Linear Regression

5. Linear Regression

1 Multiple Regression

14 Multiple Linear Regression

Math 2311 Written Homework 6 (Sections )

Unit 6 - Introduction to linear regression

Regression. Bret Larget. Department of Statistics. University of Wisconsin - Madison. Statistics 371, Fall Correlation Plots

Simple Linear Regression for the Climate Data

8. Example: Predicting University of New Mexico Enrollment

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

6. Multiple Linear Regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Transcription:

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz

Quiz #4 This weekend, don t forget. Usual format

Assumptions Display 7.5 p. 180 The ideal normal, simple linear regression model µ{y X} Response Variable (Y) Explanatory Variable (X)

Assumptions 1. Normal subpopulation distribution of response at each value of explanatory. 2. The means of the subpopulations fall on a straight line function of the explanatory variable. 3. The subpopulations have the same standard deviation, σ. (constant spread) 4. Observations are independent.

Is linear regression appropriate? Your turn

Put a hat on it! In statistics it is common to distinguish the parameter from it's estimate by putting a hat on it. β 0 is the estimate of the intercept β 1 is the estimate of the slope μ {Y X} is the estimate of the mean response as a function of the explanatory variable.

Fitted values Once we have estimated the slope and intercept, our estimate of the mean function is the line defined by these estimates, μ {Y X} = β 0 + β 1X The fitted value describes the estimated mean for an observation. For the i th observation the fitted value is, fittedi = μ {Yi Xi} = β 0 + β 1Xi

Residuals The residual is the difference between the observed response and it's fitted value residuali = Yi - fittedi = Yi - (β 0 + β 1Xi) This is the same definition we used in the ANOVA. The fitted value in the ANOVA case, is the group average in the full model, and overall average in the equal means model.

Big Bang Case Study Your turn Label the fitted value for the observed nebula with a velocity of 650 km/sec and distance of 0.9 parsecs. Draw in the residual for the same observation.

Your turn Calculate the fitted value, and residual for the same observation. (velocity = 650 km/sec, distance = 0.9 parsecs) β 0 = 0.399 β 1 = 0.0014

Least squares the line that gives the least possible sum of squared residuals One approach to estimating the intercept and slope, is to choose the intercept and slope that minimizes the sum of the squared residuals. It turns out this method is "optimal" in a statistical sense under our assumptions. Of all linear unbiased estimates the least squares estimates have the lowest variance.

Least squares estimates The least squares estimates can be found using calculus. They are: X and Y are the sample averages of the explanatory and response variables

In R > lm(distance ~ Velocity, data = case0701) Call: lm(formula = Distance ~ Velocity, data = case0701) Coefficients: (Intercept) Velocity 0.399098 0.001373 β 0 β 1

It's also easy to get residuals and fitted values in R > fit <- lm(distance ~ Velocity, data = case0701) > residuals(fit) 1 2 3 4 5 6-0.600497351-0.763249684-0.006616517-0.039992674 0.129894974 0.177947738 7 8 9 10 11 12-0.223685447-0.297249686-0.269790963-0.043685440-0.010979035 0.542089847 13 14 15 16 17 18-0.391506710 0.294961346-0.185566293-0.662199437 0.083080560 0.014433755 19 20 21 22 23 24 0.314433707-0.017116833 0.914433731 0.433906091 0.502552897 0.104401424 > fitted(fit) 1 2 3 4 5 6 7 0.63249735 0.79724969 0.22061652 0.30299269 0.14510503 0.09705227 0.67368544 8 9 10 11 12 13 14 0.79724969 0.76979096 0.67368544 0.81097905 0.35791013 1.29150669 0.60503863 15 16 17 18 19 20 21 1.08556627 1.66219944 1.01691946 1.08556627 1.08556627 1.71711688 1.08556627 22 23 24 1.56609391 1.49744710 1.89559858 >

Display 7.7 p. 183 Facts about the sampling distributions of the least squares estimates of slope and intercept in the ideal normal model (from statistical theory) SAMPLING DISTRIBUTION OF β 1 slope 1 SHAPE CENTER The mean of the sampling 2 SPREAD β distribution is β 1 1 SD( βˆ 1) = σ SAMPLING DISTRIBUTION OF β 0 intercept 1 3 The shape of the sampling distribution is normal SHAPE 1 --------------------- 2 ( n 1)s X The shape of the sampling distribution is normal CENTER The mean of the sampling β 2 SPREAD distribution is β 0 0 SD( βˆ 0) σ 1 n -- X 2 = + --------------------- 2 ( n 1)s X 3 where s X 2 is the sample variance of the X s Normal Centered around their respective parameter (also known as unbiased) SD depends on σ and the variation in the explanatory variable. SD gets smaller with bigger samples. sx = sample standard deviation of X s

Need to estimate σ Remember σ is the standard deviation of the subpopulation at each value of the explanatory variable (a parameter). It measures the variation of the response around it s mean. The residuals provide an estimate of this variation,

Degrees of freedom General rule Degrees of freedom associated with a set of residuals is the number of observations minus the number of parameters for the mean. In simple linear regression: μ {Y X} = β 0 + β 1X Two parameters Degrees of freedom = n - 2

Need to estimate σ Remember σ is the standard deviation of the subpopulation at each value of the explanatory variable. The residuals provide a measure of this variation,

> sum_sq_residuals <- sum(residuals(fit)^2) > df <- length(residuals(fit)) - 2 > df [1] 22 > sqrt(sum_sq_residuals/df) [1] 0.4049588 OR > summary(fit) Call: lm(formula = Distance ~ Velocity, data = case0701) Residuals: Min 1Q Median 3Q Max -0.7632-0.2352-0.0088 0.2072 0.9144 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.3990982 0.1184697 3.369 0.00277 ** Velocity 0.0013729 0.0002274 6.036 4.48e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.405 on 22 degrees of freedom Multiple R-squared: 0.6235, Adjusted R-squared: 0.6064 F-statistic: 36.44 on 1 and 22 DF, p-value: 4.477e-06

Standard errors Plug in estimates into formula for standard deviations SE ˆ0 =ˆ s 1 n + X 2 (n 1)s 2 X d.f. = n - 2

> summary(fit) Call: lm(formula = Distance ~ Velocity, data = case0701) Residuals: Min 1Q Median 3Q Max -0.7632-0.2352-0.0088 0.2072 0.9144 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.3990982 0.1184697 3.369 0.00277 ** Velocity 0.0013729 0.0002274 6.036 4.48e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.405 on 22 degrees of freedom Multiple R-squared: 0.6235, Adjusted R-squared: 0.6064 F-statistic: 36.44 on 1 and 22 DF, p-value: 4.477e-06

Your turn We are less certain about our estimate of the slope when we have a larger standard error. Will we be less or more certain about the slope for: larger sample size, n? larger subpopulation variation, σ? larger variation in observed explanatory values, s x?

Three types of inference Inference on the slope or intercept uncertainty comes from sampling error in a single parameter Inference about the mean response (at a given explanatory value) uncertainty comes from sampling error in both parameters Prediction of a new response (at a given explanatory value) uncertainty comes from sampling error in both parameters and variability in subpopulations