Simple Linear Regression

Similar documents
Simple and Multiple Linear Regression

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Ch 2: Simple Linear Regression

ST430 Exam 1 with Answers

Comparing Nested Models

Ch 3: Multiple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Measuring the fit of the model - SSR

Lecture 6 Multiple Linear Regression, cont.

Inferences for Regression

R 2 and F -Tests and ANOVA

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Chapter 14 Simple Linear Regression (A)

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Lecture 18: Simple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

Introduction and Single Predictor Regression. Correlation

Coefficient of Determination

Inference for Regression

Probability and Statistics Notes

Applied Econometrics (QEM)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Linear models and their mathematical foundations: Simple linear regression

Lecture 14 Simple Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Simple Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

STAT 3A03 Applied Regression With SAS Fall 2017

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

School of Mathematical Sciences. Question 1

Homework 2: Simple Linear Regression

AMS 7 Correlation and Regression Lecture 8

Simple linear regression

13 Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Statistics for Engineers Lecture 9 Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Section 3: Simple Linear Regression

INFERENCE FOR REGRESSION

Density Temp vs Ratio. temp

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Correlation 1. December 4, HMS, 2017, v1.1

Chapter 27 Summary Inferences for Regression

Intro to Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Intro to Linear Regression

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Lecture 5: Clustering, Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

MATH11400 Statistics Homepage

Simple Linear Regression: A Model for the Mean. Chap 7

Joint Probability Distributions

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Lecture 5: Clustering, Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Math 423/533: The Main Theoretical Topics

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

Inference in Regression Analysis

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Linear Regression

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Analysing data: regression and correlation S6 and S7

Correlation Analysis

ST430 Exam 2 Solutions

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Business Statistics. Lecture 10: Course Review

Inference in Normal Regression Model. Dr. Frank Wood

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Correlation and Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

1 Least Squares Estimation - multiple regression.

Multiple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Chapter 1. Linear Regression with One Predictor Variable

Correlation and Simple Linear Regression

Introduction to Simple Linear Regression

Chapter 10. Simple Linear Regression and Correlation

Inference for Regression Simple Linear Regression

A discussion on multiple regression models

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

2. Outliers and inference for regression

Lecture 5: Clustering, Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Applied Regression Analysis

Inference for the Regression Coefficient

Handout 4: Simple Linear Regression

Econometrics. 4) Statistical inference

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Transcription:

Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors are called independent variables. In the simple linear regression model, there is only one predictor. 1 / 13 Simple Linear Regression

Empirical Models Example: Oxygen and Hydrocarbon levels Production of oxygen Response: Y, purity of the produced oxygen (%); Predictor: x, level of hydrocarbons in part of the system (%). In R oxygen <- read.csv("data/table-11-01.csv") # scatter plot: plot(purity ~ HC, oxygen) 2 / 13 Simple Linear Regression Empirical Models

The relationship between x and y is roughly linear, so we assume that Y = β 0 + β 1 x + ɛ for some coefficients β 0 (the intercept) and β 1 (the slope), where ɛ is a random noise term (or random error). The noise term ɛ is needed in the model, because without it the data points would have to fall exactly along a straight line, which they don t. This is an empirical model, rather than a mechanistic model, because we have no physical or chemical mechanism to justify it. 3 / 13 Simple Linear Regression Empirical Models

What can we say about β 0 and β 1? By eye, the slope β 1 appears to be around (96 90)/(1.5 1.0) = 12, and β 0 appears to be around 90 1.0 β 1 = 78. In R abline(a = 78, b = 12) This line over-predicts most of the data points with lower HC percentages, so perhaps it can be improved. 4 / 13 Simple Linear Regression Empirical Models

For any candidate values b 0 and b 1, the predicted value for x = x i is b 0 + b 1 x i, and the observed value is y i, i = 1, 2,..., n. The residual is e i = observed predicted = y i (b 0 + b 1 x i ), i = 1, 2,..., n. The candidate values b 0 and b 1 are good if the residuals are generally small, and in particular if L(b 0, b 1 ) = ei 2 = [y i (b 0 + b 1 x i )] 2 is small. 5 / 13 Simple Linear Regression Empirical Models

Least Squares Estimates The best candidates (in this sense) are the values of b 0 and b 1 that give the lowest value of L(b 0, b 1 ). They are the least squares estimates ˆβ 0 and ˆβ 1 and can be shown to be ˆβ 1 = (x i x)(y i ȳ) (x i x) 2 ˆβ 0 = ȳ ˆβ 1 x. 6 / 13 Simple Linear Regression Empirical Models

In R, the function lm() calculates least squares estimates: summary(lm(purity ~ HC, oxygen)) The line labeled (Intercept) shows that ˆβ 0 = 74.283, and the line labeled HC shows that ˆβ 1 = 14.947. This line does indeed fit better than our initial candidate: abline(a = 74.283, b = 14.947, col = "red") # check the sums of squared residuals: L <- function(b0, b1) with(oxygen, sum((purity - (b0 + b1 * HC))^2)) L(b0 = 78, b1 = 12) # 27.8985 L(b0 = 74.283, b1 = 14.947) # 21.24983 7 / 13 Simple Linear Regression Empirical Models

Estimating σ The regression equation also involves a third parameter σ, the standard deviation of the noise term ɛ. The least squares residuals are e i = y i ( ˆβ 0 + ˆβ 1 x i ) so the residual sum of squares is SS E = ei 2. Because two parameters were estimated in finding the residuals, the residual degrees of freedom are n 2, and the estimate of σ 2 is ˆσ 2 = MS E = SS E n 2. 8 / 13 Simple Linear Regression Empirical Models

Other Estimates The sum of squared residuals is not the only criterion that could be used to measure the overall size of the residuals. One alternative is L 1 (b 0, b 1 ) = e i = y i (b 0 + b 1 x i ). Estimates that minimize L 1 (b 0, b 1 ) have no closed form representation, but may be found by linear programming methods. They are used occasionally, but least squares estimates are generally preferred. 9 / 13 Simple Linear Regression Empirical Models

Sampling Variability The 20 observations in the oxygen data set are only one sample, and if other sets of 20 observations were made, the values would be at least a little different. The least squares estimates ˆβ 0 and ˆβ 1 would therefore also vary from sample to sample. We assume that there are true parameter values β 0 and β 1 and that if we carried out many experiments at a given level x, the responses Y would average out to β 0 + β 1 x. 10 / 13 Simple Linear Regression Empirical Models

The standard error measures how far an estimate might typically deviate from the true parameter value. Estimated standard errors may be calculated for ˆβ 0 and ˆβ 1, and are shown in the R output. They are used to set up confidence intervals: ˆβ 1 ± t α/2,ν estimated standard error is a 100(1 α) confidence interval for β 1, where ν = n 2 is the degrees of freedom for Residuals. 11 / 13 Simple Linear Regression Empirical Models

We can also test hypotheses: for example, there is no relationship between x and y if β 1 = 0. To test H 0 : β 1 = 0, we use t obs = ˆβ 1 estimated standard error and find the P-value as usual, as the probability of finding as large a value of t obs if H 0 were true. 12 / 13 Simple Linear Regression Empirical Models

The null hypothesis is not always β 1 = 0. For instance, previous operations might have suggested that β 1 = 12. To test H 0 : β 1 = 12, we use t obs = ˆβ 1 12 estimated standard error and in this case t obs = 2.238, and the probability that T 2.238 is 0.038. So we would reject this hypothesis at the α = 0.05 level, but not at the 0.01 level. 13 / 13 Simple Linear Regression Empirical Models