Simple Linear Regression

Similar documents
The Randomized Block Design

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

ENGI 3423 Simple Linear Regression Page 12-01

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Statistics MINITAB - Lab 5

residual. (Note that usually in descriptions of regression analysis, upper-case

Lecture Notes Types of economic variables

Objectives of Multiple Regression

Econometric Methods. Review of Estimation

Chapter 13 Student Lecture Notes 13-1

ESS Line Fitting

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Linear Regression with One Regressor

Probability and. Lecture 13: and Correlation

Multiple Linear Regression Analysis


Lecture 8: Linear Regression

Chapter Two. An Introduction to Regression ( )

Simple Linear Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Chapter 8. Inferences about More Than Two Population Central Values

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Lecture 1: Introduction to Regression

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Lecture 3. Sampling, sampling distributions, and parameter estimation

Chapter 11 The Analysis of Variance

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Sum Mean n

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

ENGI 4421 Propagation of Error Page 8-01

Summary of the lecture in Biostatistics

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Lecture 2: Linear Least Squares Regression

: At least two means differ SST

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Functions of Random Variables

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Correlation and Simple Linear Regression

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Lecture 1: Introduction to Regression

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Chapter 8: Statistical Analysis of Simulated Data

Simple Linear Regression - Scalar Form

Chapter 14 Logistic Regression Models

Statistics: Unlocking the Power of Data Lock 5

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Special Instructions / Useful Data

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

Chapter 2 Supplemental Text Material

Simple Linear Regression Analysis

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

UNIT 7 RANK CORRELATION

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Chapter 5 Properties of a Random Sample

Third handout: On the Gini Index

LINEAR REGRESSION ANALYSIS

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Chapter 2 Simple Linear Regression

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Chapter Statistics Background of Regression Analysis

Module 7: Probability and Statistics

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Simple Linear Regression and Correlation.

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Multiple Choice Test. Chapter Adequacy of Models for Regression

Line Fitting and Regression

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Chapter -2 Simple Random Sampling

Lecture 2: The Simple Regression Model

Analysis of Variance with Weibull Data

Lecture 1 Review of Fundamental Statistical Concepts

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Chapter -2 Simple Random Sampling

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Maximum Likelihood Estimation

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Correlation and Regression Analysis

Chapter 5 Transformation and Weighting to Correct Model Inadequacies

CHAPTER VI Statistical Analysis of Experimental Data

Transcription:

Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato were developed by Sr Fracs Galto the late 18 s. The applcato ca be used to test for a statstcally sgfcat correlato betwee the varables. Fdg a relatoshp does ot prove a cause ad effect relatoshp, but the model ca be used to quatfy a relatoshp where oe s kow to exst. The model provdes a measure of the rate of chage of oe varable relatve to aother varable.. There s a potetal chage the value of varable as the value of varable chages. Varable values wll always be pared, oe termed a depedet varable (ofte referred to as the varable) ad a depedet varable (termed a varable). For each value of there s assumed to be a ormally dstrbuted populato of values for the varable. The lear model whch descrbes the relatoshp betwee two varables s gve as 1 The varable s called the depedet varable or respose varable (vertcal axs). s the populato equato for a straght le. No error s eeded ths yx. 1 equato because t descrbes the le tself. The term yx. s estmated wth at each value of wth ˆ. y.x the true populato mea of at each value of The varable s called the depedet varable or predctor varable (horzotal axs). the true value of the tercept (the value of whe ) 1 the true value of the slope, the amout of chage for each ut chage (.e. f chages by 1 ut, chages by 1 uts). The two populato parameters to abe estmated, ad 1 are also referred to as the regresso coeffcets. All varablty the model s assumed to be due to, so varace s measured vertcally The varablty s assumed to be ormally dstrbuted at each value of The varable s assumed to have o varace sce all varablty s (ths s a ew assumpto) The values ad 1 (b ad b 1 for a sample) are called the regressos coeffcets. The value s the value of at the pot where the le crosses the axs. Ths value s called the tercept. If ths value s zero the le crosses at the org of the ad James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 14 axes, ad the lear equato reduces from b + b 1 to b 1 ad s sad to have o tercept, eve though the regresso le does cross the axs. The uts o b are the same uts as for. The 1 value s called the slope. It determes the cle or agle of the regresso le. If the slope s, the le s horzotal. At ths pot the lear model reduced to b, ad the regresso s sad to have o slope. The slope gves the chage per ut of. The uts o the slope are the uts per ut. The populato equato for the le descrbes a perfect le wth o varato. I practce there s always varato about the le. We clude a addtoal term to represet ths varato. for a populato 1 b b e for a sample 1 Whe we put ths term the model, we are descrbg dvdual pots as ther posto o the le plus or mus some devato The Sum of Squares of devatos from the le wll form the bass of a varace for the regresso le Whe we leave the e off the sample model we are descrbg a pot o the regresso le, predcted from the sample estmates. To dcate ths we put a hat o the value, ˆ b b. 1 Characterstcs of a Regresso Le The le wll pass through the pot, (also the pot b, ) The sum of squared devatos (measured vertcally) of the pots from the regresso le wll be a mmum. Values o the le for ay value of ca be descrbed by the equato ˆ b b1 Commo objectves Regresso : there are a umber of possble objectves Determe f there s a relatoshp betwee ad. Ths would be determed by some hypothess test. The stregth of the relatoshp s, to some extet, reflected the correlato or R value. Determe the value of the rate of chage of relatve to. Ths s measured by the slope of the regresso le. Ths objectve would usually be accompaed by a test of the slope agast (or some other value) ad/or a cofdece terval o the slope. Establsh ad employ a predctve equato for from. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 141 Ths objectve would usually be preceded by a Objectve 1 above to show that a relatoshp exsts. The predcted values would usually be gve wth ther cofdece terval, or the regresso wth ts cofdece bad. Assumptos Regresso Aalyss Idepedece The best guaratee of ths assumpto s radom samplg. Ths s a dffcult assumpto to check. Ths assumpto s made for all tests we wll see ths course. Normalty of the observatos at each value of (or the pooled devatos from the regresso le) Ths s relatvely easy to test f the approprate values are tested (e.g. resduals ANOVA or Regresso, ot the raw values). Ths ca be tested wth the Shapro-Wlks W statstc PROC UNIVARIATE. Ths assumpto s made for all tests we have see ths semester except the Ch square tests of Goodess of Ft ad Idepedece Homogeety of error (homogeeous varaces or homoscedastcty) Ths s easy to check for ad to test aalyss of varace (S o mea or tests lke Bartalett s ANOVA). I Regresso the smplest way to check s by examg the the resdual plot. Ths assumpto s made for ANOVA (for pooled varace) ad Regresso. Recall that sample t-tests the equalty of the varaces eed ot be assumed, t ca be readly tested. measured wthout error: Ths must be assumed ordary least squares regressos, sce all error s measured a vertcal drecto ad occurs. Assumptos geeral assumptos The varable s ormally dstrbuted at each value of The varace s homogeeous (across ). Observatos are depedet of each other ad e depedet of the rest of the model. Specal assumpto for regresso. Assume that all of the varato s attrbutable to the depedet varable (), ad that the varable s measured wthout error. Note that the devatos are measured vertcally, ot horzotally or perpedcular to the le. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 14 Fttg the le Fttg the le starts wth a corrected SSDevato, ths s the SSDevato of the observatos from a horzotal le through the mea. The le wll pass through the pot,. The ftted le s pvoted o ths pot utl t has a mmum SSDevatos. How do we kow the SSDevatos are a mmum? Actually, we solve the equato for e, ad use calculus to determe the soluto that has a mmum of the sum of squared devatos. b b e 1 e ( b b ) ˆ 1 ˆ e [ ( b b1)] 1 1 1 The le has some desrable propertes E(b ) E(b 1 ) 1 E( ). Therefore, the parameter estmates ad predcted values are ubased estmates. Dervato of the formulas ou do ot eed to lear ths dervato for ths class! However you should be aware of the process ad ts objectves. Ay observato from a sample ca be wrtte as b b1 e. where; e a devato of the observed pot from the regresso le The dea of regresso s to mmze the devato of the observatos from the regresso le, ths s called a Least Squares Ft. The smple sum of the devatos s zero,, so mmzg wll requre a square or a absolute value to remove the sg. e The sum of the squared devatos s, e ˆ b b1 The objectve s to select b ad b 1 such that e s a mmum, by usg some techques from calculus. We have prevously defed the ucorrected sum of squares ad corrected sum of squares of a varable. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 143 The corrected sum of squares of The ucorrected SS s The correcto factor s The corrected SS s CSS S We wll call ths corrected sum of squares S ad the correcto factor C The corrected sum of squares of We could defe the exact same seres of calculatos for, ad call t S The corrected cross products of ad We eed a cross product for regresso, ad a corrected cross product. The cross product s. The ucorrected sum of cross products s The correcto factor for the cross products s C The corrected cross product s CCP S The formulas for calculatg the slope ad tercept ca be derved as follows Take the partal dervatve wth respect to each of the parameter estmates, b ad b 1. For b : ( e ) 1 b b b1, whch s set equal to ad solved for b. 1 ( )(-1) b b (ths s the frst ormal equato ) 1 Lkewse, for b 1 we obta the partal dervatve, set t equal to ad solved for b 1. ( e ) 1 b 1 ( b b )(- ) 1 1 ( b b ) b b ) (secod ormal equato ) 1 1 The ormal equatos ca be wrtte as, b b1 b b 1 At ths pot we have two equatos ad two ukows so we ca solve for the ukow regresso coeffcet values b ad b 1. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 144 Revew For b the soluto s: b b1 ad b b1 b1. Note that estmatg requres a pror estmate of b 1 ad the meas of the varables ad. For b 1, gve that, b b 1 ad b b1 the b b b b 1 1 1 1 1 1 1 b b b - S b so b 1 1 s the corrected cross products over the corrected S sum of squares of The termedate statstcs eeded to solve all elemets of a SLR are,,,, ad. We have ot see used the calculatos yet, but we wll eed t later to calculate varace. We wat to ft the best possble le through some observed data pots. We defe ths as the le that mmzes the vertcally measured dstaces from the observed values to the ftted le. The le that acheves ths s defed by the equatos b b1 b1 b - 1 S S These calculatos provde us wth two parameter estmates that we ca the use to get the equato for the ftted le. ˆ b b1. Testg hypotheses about regressos The total varato about a regresso s exactly the same calculato as the total for Aalyss of Varace. SSTotal SSDevatos from the mea Ucorrected SSTotal Correcto factor The smple regresso aalyss wll produce two sources of varato. SSRegresso the varato explaed by the regresso SSError the remag, uexplaed varato about the regresso le. These sources of varato are expressed a ANOVA source table. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 145 Source d.f. Regresso 1 d.f. used to ft slope Error error d.f. Total 1 d.f. lost adjustg for ( correctg for ) the mea Note that oe degree of freedom s lost from the total for the correcto for the mea, whch actually fts the tercept. The sgle regresso d.f. s for fttg the slope. The correcto fts a flat le through the mea The regresso actually fts the slope. The dfferece betwee these two models s that oe has o slope, or a slope equal to zero ( b 1 ) ad the other has a slope ftted. Testg for a dfferece betwee these two cases s the commo hypothess test of terest regresso ad t s expressed as H: 1. The results of a regresso are expressed a ANOVA table. The regresso s tested wth a F test, formed by dvdg the MSRegresso by the MSError. Ths s a oe taled F test, as t was wth ANOVA, ad t has 1 ad 1 d.f. It tests the ull hypothess H: 1 versus the alteratve H: 1 1. The R statstc Source df SS MS F Regresso 1 SSRegresso MSRegresso MSRegresso / MSError Error SSError MSError Total 1 SSTotal Ths s a popular statstc for terpretato. The cocept s that we wat to kow what proporto of the corrected total sum of squares s explaed by the regresso le. Source d.f. SS Regresso 1 SSReg Error SSError Total 1 SSTotal I the regresso the process of fttg the regresso the SSTotal s dvded to two parts, the sum of squares explaed by the regresso (SSRegresso) ad the remag James P. Geagha Copyrght 1