Simple Linear Regression Using Ordinary Least Squares

Similar documents
Intro to Linear Regression

Inferences for Regression

Intro to Linear Regression

Lecture 9: Linear Regression

Chapter 14 Simple Linear Regression (A)

Chapter 9 - Correlation and Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Inference for the Regression Coefficient

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Lecture 11: Simple Linear Regression

ST Correlation and Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Data Analysis and Statistical Methods Statistics 651

STATS DOESN T SUCK! ~ CHAPTER 16

appstats27.notebook April 06, 2017

Chapter 27 Summary Inferences for Regression

9. Linear Regression and Correlation

Simple Linear Regression

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Regression Analysis Primer DEO PowerPoint, Bureau of Labor Market Statistics

Section Least Squares Regression

Simple Linear Regression: One Quantitative IV

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

Review. Midterm Exam. Midterm Review. May 6th, 2015 AMS-UCSC. Spring Session 1 (Midterm Review) AMS-5 May 6th, / 24

REVIEW 8/2/2017 陈芳华东师大英语系

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Basic Business Statistics 6 th Edition

Correlation Analysis

Multiple linear regression

Measuring the fit of the model - SSR

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

IT 403 Practice Problems (2-2) Answers

bivariate correlation bivariate regression multiple regression

A discussion on multiple regression models

Ch Inference for Linear Regression

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Simple Linear Regression: One Qualitative IV

Chapter 16. Simple Linear Regression and Correlation

Confidence Interval for the mean response

This document contains 3 sets of practice problems.

Statistics for Managers using Microsoft Excel 6 th Edition

Analysis of Bivariate Data

Business Statistics. Lecture 9: Simple Regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

BIOSTATISTICS NURS 3324

Chapter 8. Linear Regression /71

3.2: Least Squares Regressions

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

ECON3150/4150 Spring 2015

Chapter 19 Sir Migo Mendoza

Mathematics for Economics MA course

Chapter 4. Regression Models. Learning Objectives

Chapter 16. Simple Linear Regression and dcorrelation

23. Inference for regression

Unit 6 - Introduction to linear regression

Variance Decomposition and Goodness of Fit

Review of Multiple Regression

Chapter 3: Describing Relationships

Chapter 5 Friday, May 21st

CS 5014: Research Methods in Computer Science

ECON 450 Development Economics

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Inference for Regression Inference about the Regression Model and Using the Regression Line

The Multiple Regression Model

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Inference for Regression Simple Linear Regression

CHAPTER EIGHT Linear Regression

Single and multiple linear regression analysis

Finding Relationships Among Variables

Ch 13 & 14 - Regression Analysis

UNIT 12 ~ More About Regression

Chapter 7 Linear Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Regression Models. Chapter 4

Final Exam - Solutions

Correlation and Regression

STA121: Applied Regression Analysis

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Chapter 3: Describing Relationships

Applied Econometrics (QEM)

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 9. Correlation and Regression

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Basic Statistics Exercises 66

Mrs. Poyner/Mr. Page Chapter 3 page 1

Linear Regression and Correlation. February 11, 2009

SIMPLE REGRESSION ANALYSIS. Business Statistics

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Introduction and Single Predictor Regression. Correlation

Transcription:

Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression (LSR) line is the line with the smallest sum of square residuals smaller than any other line. That is, we want to measure closeness of the line to the points. The LSR line uses vertical distance from points to a line. A residual is the vertical distance from a point to a line. Equations: The true line for the population parameter: Note: We are trying to estimate this equation. y = α + βx We obtain a sample and estimate an approximation: y ˆ = a + bx + E Estimate y Predict α = y-intercept Predict β = slope x = The independent variable. E = Epsilon or the error. These are random errors of measurement. Check the Assumptions: 1. No outliers. 2. Residuals follow a normal distribution with a mean = 0. 3. Residuals should be randomly scattered. Note: For #2, produce a histogram of the standardized residuals to see if they are normal. Hypotheses for the Correlation Coefficient (R): A measure of how close residuals are to the regression line. H 0 : ρ = 0 H 1 : ρ or < or > 0 Coefficient of Determination = R 2 Range 0 1 0 r 2 1 R 2 = An effect size measure and yields the percentage of the variation in the Y values explained by X.

Adjusted R 2 : Adjusts for R 2 s upward bias and is a variance accounted for effect size measure. Adj. R 2 =.40 40% of variation in Y is explained by the regression line and dependent on X. R 2 = SSR SST What s explained by regression line SSRegression Σ y y (variability in y values or ( ) 2 SEE: The square root of the Mean Square Residual is the same as the Standard Error of the Estimate, which is the amount of error in the model measured in DV units. The higher the Adj. R 2 value, the smaller the amount of error in the model (i.e., the smaller the value of the SEE) and the more stability the model will have upon replication. F Test Hypotheses: H 0 : The regression does not explain a significant proportion of the variance in Y. H 1 : The regression works and does explain a significant proportion of the variance in Y. ANOVA Results: How close the points are to the line: 1. SSR/SSE = Sum of Squared Residuals or SSE (E = Errors): Variation attributed to factors other than the relationship between X and Y. 2. SST = Sum of Squares Total: A measure of the total variability of your Y values around their Mean Y (i.e., how much Y values vary). 3. SSR = Sums of Squares Regression: The explained variation attributed to the relationship between X and Y. Note: We want a large SSRegression & Small SSE. That is, the points are close to the line and the line does a good job of predicting. Hypotheses for the Slope: H 0 : β = 0 No linear relationship H 1 : β or < or > 0 Linear relationship

Test statistic = The distribution of β (slope) 1. The distribution of β is normal. 2. Mean = β Se 3. S.D. = Σ x x ( ) 2 Se = measure of the variation of the points around the line. SSE (# of parameters; thing you re estimating) n 2 t = B Se Hypotheses for the Intercept: H 0 : α (y-intercept) = 0 H 1 : α or < or > 0 t = a Se Confidence Intervals for B (Unstandardized Coefficient) B ± 1.645or1.96or2.58 ( Se) α 2

Simple Linear Regression: Example 1 The linear model assumes that the relations between two variables can be summarized by a straight line. The X variable is often called the predictor and Y is often called the criterion. We often talk about the regression of Y on X, so that if we were predicting GPA from SAT we would talk about the regression of GPA on SAT. The regression problems that we deal with will use a line to transform values of X to predict values of Y. In general, not all of the points will fall on the line, but we will choose our regression line so as to best summarize the relations between X and Y. Suppose we measured the height and weight of a random sample of 10 adults in DeKalb. We want to predict weight from height in the population.

Ht Wt 61 105 62 120 63 120 65 160 65 120 68 145 69 175 70 160 72 185 75 210 N=10 N=10 67 150 Mean 20.89 1155.5 Variance (s 2 ) 4.57 33.99 SD (s) Correlation (r) =.94

For the regression of weight on height, we found: Y = -316.86 + 6.97(x), where -361.86 is the intercept (α ) and 6.97 is the slope ( β ). We could also write that weight is -316.86+6.97(height). The slope value means that for each inch we increase in height, we expect to increase approximately 7 pounds in weight. The intercept is the value of Y that we expect when X is zero. So if we had a person 0 inches tall, they should weigh -316.86 pounds (i.e., 6.97 *0 = 0; -316.86 + 0 = -316.86). Of course we do not find people who are zero inches tall and we do not find people with negative weight. Sometimes, in educational research, the value of the intercept will have no meaningful interpretation. Simple Linear Regression: Example 2 A. Predicted: Self-destructiveness = -108.92 + 22.33 Alcohol Unstandardized Regression Coefficients 22.33 Alcohol: We predict a 22.33 point increase in self-destructiveness for a one point increase in Alcohol when all other variables are held constant. B. Predicted: Self-destructiveness = 0.49 Alcohol Standardized Regression Coefficients 0.49 Alcohol: We predict a 0.49 standard deviation increase in self-destructiveness for a one standard deviation increase in Alcohol when all other variables are held constant.

Simple Linear Regression: Example 3 The linear model tells us that each observed Y is composed of two parts, (1) a linear function of X, and (2) an error. We can use the regression line to predict values of Y given values of X. For any given value of X, we go straight up to the line, and then move horizontally to the left to find the value of Y. The predicted value of Y is called the predicted value of Y, and is denoted Y'. The difference between the observed Y and the predicted Y (Y-Y') is called a residual. The predicted Y part is the linear part. The residual is the error. N Ht Wt Y' Residual 1 61 105 108.19-3.19 2 62 120 115.16 4.84 3 63 120 122.13-2.13 4 65 160 136.06 23.94 5 65 120 136.06-16.06 6 68 145 156.97-11.97 7 69 175 163.94 11.06 8 70 160 170.91-10.91 9 72 185 184.84 0.16 10 75 210 205.75 4.25 Mean 67 150 150.00 0.00 SD 4.57 33.99 31.85 11.89 Variance 20.89 1155.56 1014.37 141.32

Compare the numbers in the table for person 5 (height = 65, weight=120) to the same person on the graph. The regression line for X=65 is 136.06. The difference between the mean of Y and 136.06 is the part of Y due to the linear function of X. The difference between the line and Y is -16.06. This is the error part of Y, the residual.