Similar documents
Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter 13 Student Lecture Notes 13-1

Simple Linear Regression

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Linear Regression with One Regressor

Probability and. Lecture 13: and Correlation

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Lecture 8: Linear Regression

Statistics MINITAB - Lab 5

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Chapter Two. An Introduction to Regression ( )

Lecture Notes Types of economic variables

Multiple Choice Test. Chapter Adequacy of Models for Regression

Econometric Methods. Review of Estimation

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Simple Linear Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Lecture 1: Introduction to Regression

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Summary of the lecture in Biostatistics

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Objectives of Multiple Regression

Statistics: Unlocking the Power of Data Lock 5

ENGI 3423 Simple Linear Regression Page 12-01

residual. (Note that usually in descriptions of regression analysis, upper-case

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Correlation and Simple Linear Regression

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Lecture 1: Introduction to Regression

ESS Line Fitting

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Simple Linear Regression and Correlation.

Multiple Linear Regression Analysis

CHAPTER VI Statistical Analysis of Experimental Data

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Simple Linear Regression - Scalar Form

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Functions of Random Variables

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Correlation and Regression Analysis

Analysis of Variance with Weibull Data

Line Fitting and Regression

Chapter 14 Logistic Regression Models

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Fundamentals of Regression Analysis

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

4. Standard Regression Model and Spatial Dependence Tests

Lecture 2: Linear Least Squares Regression

: At least two means differ SST

MEASURES OF DISPERSION

Based on Neter, Wasserman and Whitemore: Applied Statistics, Chapter 18, pp

Chapter 8. Inferences about More Than Two Population Central Values

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Parameter, Statistic and Random Samples

Chapter 2 Supplemental Text Material

CHAPTER 8 REGRESSION AND CORRELATION

A New Family of Transformations for Lifetime Data

Lecture 3. Sampling, sampling distributions, and parameter estimation

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

The expected value of a sum of random variables,, is the sum of the expected values:

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Chapter 5 Properties of a Random Sample

The Randomized Block Design

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

Homework Solution (#5)

Module 7. Lecture 7: Statistical parameter estimation

Chapter 5 Transformation and Weighting to Correct Model Inadequacies

STK4011 and STK9011 Autumn 2016

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Chapter 2 Simple Linear Regression

Lecture 2: The Simple Regression Model

The Mathematics of Portfolio Theory

Chapter Statistics Background of Regression Analysis

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Topic 4: Simple Correlation and Regression Analysis

ε. Therefore, the estimate

Continuous Distributions

Lecture 3 Probability review (cont d)

Linear Regression. Can height information be used to predict weight of an individual? How long should you wait till next eruption?

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

Fitting models to data.

ENGI 4421 Propagation of Error Page 8-01

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

r y Simple Linear Regression How To Study Relation Between Two Quantitative Variables? Scatter Plot Pearson s Sample Correlation Correlation

CHAPTER 2. = y ˆ β x (.1022) So we can write

Bayes (Naïve or not) Classifiers: Generative Approach

LINEAR REGRESSION ANALYSIS

Generalized Minimum Perpendicular Distance Square Method of Estimation

Transcription:

Regresso

What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9

Determstc Models 1. Hypothesze Eact Relatoshps 2. Sutable Whe Predcto Error s Neglgble 3. Eample: Body mass de (BMI) s measure of body fat based BMI = Weght Klograms (Heght Meters) 2 EPI 809/Sprg 2008 10

Probablstc Models 1. Hypothesze 2 Compoets Determstc Radom Error 2. Eample: Systolc blood pressure of ewbors Is 6 Tmes the Age days + Radom Error SBP = 6 age(d) + Radom Error May Be Due to Factors Other Tha age days (e.g. Brthweght) EPI 809/Sprg 2008 11

Smple Regresso Smple regresso aalyss s a statstcal tool that gves us the ablty to estmate the mathematcal relatoshp betwee a depedet varable (usually called y) ad a depedet varable (usually called ). The depedet varable s the varable for whch we wat to make a predcto. Whle varous o-lear forms may be used, smple lear regresso models are the most commo.

Itroducto The prmary goal of quattatve aalyss s to use curret formato about a pheomeo to predct ts future behavor. Curret formato s usually the form of a set of data. I a smple case, whe the data form a set of pars of umbers, we may terpret them as represetg the observed values of a depedet (or predctor or eplaatory) varable X ad a depedet ( or respose or outcome) varable Y. lot sze Ma-hours 30 73 20 50 60 128 80 170 40 87 50 108 60 135 30 69 70 148 60 132

Ma-Hour Itroducto The goal of the aalyst who studes the data s to fd a fuctoal relato betwee the respose varable y ad the predctor varable. 180 160 140 120 Statstcal relato betwee Lot sze ad Ma-Hour y f () 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 Lot sze

Pctoral Presetato of Lear Regresso Model

Lear Regresso Model EPI 809/Sprg 2008 16

Assumptos Lear regresso assumes that 1. The relatoshp betwee X ad Y s lear 2. Y s dstrbuted ormally at each value of X 3. The varace of Y at every value of X s the same (homogeety of varaces) 4. The observatos are depedet

Lear Equatos Y Y = mx + b m = Slope Chage X b = Y-tercept Chage Y X 1984-1994 T/Maker Co. EPI 809/Sprg 2008 19

Lear Regresso Model 1. Relatoshp Betwee Varables Is a Lear Fucto Populato Y-Itercept Populato Slope Radom Error Y X 0 1 Depedet (Respose) Varable (e.g., CD+ c.) Idepedet (Eplaatory) Varable (e.g., Years s. seroco.)

Meag of Regresso Coeffcets Geeral regresso model 1. 0, ad 1 are parameters 2. X s a kow costat 3. Devatos are depedet N(o, 2 ) The values of the regresso parameters 0, ad 1 are ot kow. We estmate them from data. 1 dcates the chage the mea respose per ut crease X.

Populato Lear Regresso Model Y Y X 0 1 = Radom error Observed value Y E 0 1 X Observed value X EPI 809/Sprg 2008 22

Estmatg Parameters: Least Squares Method EPI 809/Sprg 2008 23

Scatter plot 1. 2. Plot of All (X, Y ) Pars Suggests How Well Model Wll Ft 60 40 20 0 Y 0 20 40 60 X EPI 809/Sprg 2008 24

Thkg Challege How would you draw a le through the pots? How do you determe whch le fts best? 60 40 20 0 Y 0 20 40 60 X EPI 809/Sprg 2008 25

Thkg Challege How would you draw a le through the pots? How do you determe whch le fts best? 60 40 20 0 Y Itercept uchaged Slope chaged 0 20 40 60 EPI 809/Sprg 2008 26 X

Thkg Challege How would you draw a le through the pots? How do you determe whch le fts best? 60 40 20 0 Y Itercept chaged Slope uchaged 0 20 40 60 EPI 809/Sprg 2008 27 X

Thkg Challege How would you draw a le through the pots? How do you determe whch le fts best? 60 40 20 0 Y Itercept chaged Slope chaged 0 20 40 60 EPI 809/Sprg 2008 28 X

What s the best fttg le

Predcto Error

Least Squares 1. Best Ft Meas Dfferece Betwee Actual Y Values & Predcted Y Values Are a Mmum. But Postve Dffereces Off-Set Negatve. So square errors! Y Yˆ 2 1 2. LS Mmzes the Sum of the Squared Dffereces (errors) (SSE) 1 ˆ 2 EPI 809/Sprg 2008 31

Least Squares Graphcally Y 1 LS mmzes 2 2 2 2 Y ^ 2 X 2 0 1 2 2 ^ 1 ^ 3 1 4 X EPI 809/Sprg 2008 32 ^ Y X 2 3 0 1 2 4

How to estmate parameters

Estmatg the tercept ad slope: least squares estmato ** Least Squares Estmato A lttle calculus. What are we tryg to estmate? β, the slope, from What s the costrat? We are tryg to mmze the squared dstace (hece the least squares ) betwee the observatos themselves ad the predcted values, or (also called the resduals, or left-over ueplaed varablty) Dfferece = y (β + α) Dfferece 2 = (y (β + α)) 2 Fd the β that gves the mmum sum of the squared dffereces. How do you mamze a fucto? Take the dervatve; set t equal to zero; ad solve. Typcal ma/m problem from calculus. d 2 ( y ( )) 2( ( y )( )) d 1 1 2( 1 ( y )) 0... From here takes a lttle math trckery to solve for β 2

The stadard error of Y gve X s the average varablty aroud the regresso le at ay gve value of X. It s assumed to be equal at all values of X. Sy/ Sy/ Sy/ Sy/ Sy/ Sy/

Regresso Pcture y C A ŷ y A B B y C y 2 ( y y) 1 1 A 2 B 2 C 2 SS total Total squared dstace of observatos from aïve mea of y Total varato ( yˆ y) 1 SS reg Dstace from regresso le to aïve mea of y Varablty due to (regresso) 2 ( yˆ y ) *Least squares estmato gave us the le (β) that mmzed C 2 2 R 2 =SSreg/SStotal SS resdual Varace aroud the regresso le Addtoal varablty ot eplaed by what least squares method ams to mmze

Regresso Le If the scatter plot of our sample data suggests a lear relatoshp betwee two varables.e. y 1 0 we ca summarze the relatoshp by drawg a straght le o the plot. Least squares method gve us the best estmated le for our set of sample data.

Regresso Le We wll wrte a estmated regresso le based o sample data as yˆ b0 b1 The method of least squares chooses the values for b 0, ad b 1 to mmze the sum of squared errors SSE 2 ( y ˆ y ) 1 1 2 y b b 0 1

Regresso Le Usg calculus, we obta estmatg formulas: or y y y y b 1 1 2 2 1 1 1 1 2 1 1 ) ( ) ( ) )( ( b y b 1 0 y S S b r 1

Estmato of Mea Respose Ftted regresso le ca be used to estmate the mea value of y for a gve value of. Eample The weekly advertsg epedture () ad weekly sales (y) are preseted the followg table. y 1250 41 1380 54 1425 63 1425 54 1450 48 1300 46 1400 62 1510 61 1575 64 1650 71

Pot Estmato of Mea Respose From prevous table we have: 2 10 564 32604 y 14365 y 818755 The least squares estmates of the regresso coeffcets are: b y ) 1 2 2 ( y 10(818755) (564)(14365) 2 10(32604) (564) b0 1436.5 10.8(56.4) 828 10.8

Pot Estmato of Mea Respose The estmated regresso fucto s: ŷ 82810.8 Sales 828 10.8 Epedtur e Ths meas that f the weekly advertsg epedture s creased by $1 we would epect the weekly sales to crease by $10.8.

Pot Estmato of Mea Respose Ftted values for the sample data are obtaed by substtutg the value to the estmated regresso fucto. For eample f the advertsg epedture s $50, the the estmated Sales s: Sales 82810.8(50) 1368 Ths s called the pot estmate (forecast) of the mea respose (sales).

Lear correlato ad lear regresso

Covarace cov(, y) 1 ( X )( y Y ) 1

Iterpretg Covarace cov(x,y) > 0 cov(x,y) < 0 cov(x,y) = 0 X ad Y are postvely correlated X ad Y are versely correlated X ad Y are depedet

Correlato coeffcet Pearso s Correlato Coeffcet s stadardzed covarace (utless): covarace(, y) r var var y

Correlato Measures the relatve stregth of the lear relatoshp betwee two varables Ut-less Rages betwee 1 ad 1 The closer to 1, the stroger the egatve lear relatoshp The closer to 1, the stroger the postve lear relatoshp The closer to 0, the weaker ay postve lear relatoshp

Scatter Plots of Data wth Varous Correlato Coeffcets Y Y Y Y X X r = -1 r = -.6 r = 0 Y Y X X r = +1 r = +.3 Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall X r = 0 X

Lear Correlato Lear relatoshps Curvlear relatoshps Y Y X X Y Y X Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall X

Lear Correlato Strog relatoshps Weak relatoshps Y Y X X Y Y X Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall X

Lear Correlato No relatoshp Y X Y Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall X

Calculatg by had 1 ) ( 1 ) ( 1 ) )( ( var var ), ( cov ˆ 1 2 1 2 1 y y y y y y arace r

Smpler calculato formula y y SS SS SS y y y y y y y y r 1 2 1 2 1 1 2 1 2 1 ) ( ) ( ) )( ( 1 ) ( 1 ) ( 1 ) )( ( ˆ y y SS SS SS r ˆ Numerator of covarace Numerators of varace

Least Square estmato Slope (beta coeffcet) = ˆ Cov(, y) Var( ) Itercept= Calculate : ˆ y - ˆ Regresso le always goes through the pot: (, y)

Relatoshp wth correlato rˆ ˆ SD SD y I correlato, the two varables are treated as equals. I regresso, oe varable s cosdered depedet (=predctor) varable (X) ad the other the depedet (=outcome) varable Y.

Resdual Aalyss: check assumptos e Y Yˆ The resdual for observato, e, s the dfferece betwee ts observed ad predcted value Resduals are hghly useful for studyg whether a gve regresso model s approprate for the data at had. Check the assumptos of regresso by eamg the resduals Eame for learty assumpto Eame for costat varace for all levels of X (homoscedastcty) Evaluate ormal dstrbuto assumpto Evaluate depedece assumpto Graphcal Aalyss of Resduals

Resdual = observed - predcted X=95 mol/l y 48 34 yˆ y 34 yˆ 14

resduals resduals Resdual Aalyss for Learty Y Y Not Lear Lear Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall

resduals resduals Resdual Aalyss for Homoscedastcty Y Y No-costat varace Costat varace Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall

resduals resduals resduals Resdual Aalyss for Idepedece Not Idepedet Idepedet X X X Slde from: Statstcs for Maagers Usg Mcrosoft Ecel 4th Edto, 2004 Pretce-Hall

Eample: weekly advertsg epedture y y-hat Resdual (e) 1250 41 1270.8-20.8 1380 54 1411.2-31.2 1425 63 1508.4-83.4 1425 54 1411.2 13.8 1450 48 1346.4 103.6 1300 46 1324.8-24.8 1400 62 1497.6-97.6 1510 61 1486.8 23.2 1575 64 1519.2 55.8 1650 71 1594.8 55.2

Estmato of the varace of the error terms, 2 The varace 2 of the error terms the regresso model eeds to be estmated for a varety of purposes. It gves a dcato of the varablty of the probablty dstrbutos of y. It s eeded for makg ferece cocerg regresso fucto ad the predcto of y.

Regresso Stadard Error To estmate we work wth the varace ad take the square root to obta the stadard devato. For smple lear regresso the estmate of 2 s the average squared resdual. 2 1 2 1 2 s ˆ y. e ( y y ) 2 2 To estmate, use s estmates the stadard devato of the error term the statstcal 2 model for smple lear regresso. s s y. y.

Regresso Stadard Error y y-hat Resdual (e) square(e) 1250 41 1270.8-20.8 432.64 1380 54 1411.2-31.2 973.44 1425 63 1508.4-83.4 6955.56 1425 54 1411.2 13.8 190.44 1450 48 1346.4 103.6 10732.96 1300 46 1324.8-24.8 615.04 1400 62 1497.6-97.6 9525.76 1510 61 1486.8 23.2 538.24 1575 64 1519.2 55.8 3113.64 1650 71 1594.8 55.2 3047.04 y-hat = 828+10.8X total 36124.76 S y. 67.19818

Resdual plots The pots ths resdual plot have a curve patter, so a straght le fts poorly

Resdual plots The pots ths plot show more spread for larger values of the eplaatory varable, so predcto wll be less accurate whe s large.

Varable trasformatos If the resdual plot suggests that the varace s ot costat, a trasformato ca be used to stablze the varace. If the resdual plot suggests a o lear relatoshp betwee ad y, a trasformato may reduce t to oe that s appromately lear. Commo learzg trasformatos are: Varace stablzg trasformatos are: 1, log( ) 1, y log( y), y, y 2

2 predctors: age ad vt D

Dfferet 3D vew

Ft a plae rather tha a le O the plae, the slope for vtam D s the same at every age; thus, the slope for vtam D represets the effect of vtam D whe age s held costat.