ENGI 3423 Simple Linear Regression Page 12-01

Similar documents
Chapter 13 Student Lecture Notes 13-1

Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Probability and. Lecture 13: and Correlation

Multiple Linear Regression Analysis

Summary of the lecture in Biostatistics

Linear Regression with One Regressor

residual. (Note that usually in descriptions of regression analysis, upper-case

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Statistics MINITAB - Lab 5

Simple Linear Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Objectives of Multiple Regression

Econometric Methods. Review of Estimation

Statistics: Unlocking the Power of Data Lock 5

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Multiple Choice Test. Chapter Adequacy of Models for Regression

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

ENGI 4421 Propagation of Error Page 8-01

Lecture Notes Types of economic variables

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Lecture 8: Linear Regression

ESS Line Fitting

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Simple Linear Regression - Scalar Form

: At least two means differ SST

Special Instructions / Useful Data

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

4. Standard Regression Model and Spatial Dependence Tests

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Chapter 2 Supplemental Text Material

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Chapter 14 Logistic Regression Models

Correlation and Simple Linear Regression


STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Simple Linear Regression. How To Study Relation Between Two Quantitative Variables? Scatter Plot. Pearson s Sample Correlation.

Chapter 11 The Analysis of Variance

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

22 Nonparametric Methods.

Simple Linear Regression and Correlation.

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

r y Simple Linear Regression How To Study Relation Between Two Quantitative Variables? Scatter Plot Pearson s Sample Correlation Correlation

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Chapter 2 Simple Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

Functions of Random Variables

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Statistics Review Part 3. Hypothesis Tests, Regression

Simple Linear Regression Analysis

Lecture 1: Introduction to Regression

CHAPTER 2. = y ˆ β x (.1022) So we can write

Lecture 2: Linear Least Squares Regression

Lecture 3. Sampling, sampling distributions, and parameter estimation

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

Module 7. Lecture 7: Statistical parameter estimation

Chapter 5 Properties of a Random Sample

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Class 13,14 June 17, 19, 2015

The expected value of a sum of random variables,, is the sum of the expected values:

Analysis of Variance with Weibull Data

Simulation Output Analysis

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

STK4011 and STK9011 Autumn 2016

TESTS BASED ON MAXIMUM LIKELIHOOD

9.1 Introduction to the probit and logit models

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Linear Regression. Can height information be used to predict weight of an individual? How long should you wait till next eruption?

Introduction to F-testing in linear regression models

Topic 9. Regression and Correlation

Chapter 3 Multiple Linear Regression Model

Point Estimation: definition of estimators

Module 7: Probability and Statistics

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Topic 4: Simple Correlation and Regression Analysis

Spreadsheet Problem Solving

Transcription:

ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable Y, producg a feld of observatos. The questo the arses: What s the best le (or curve to draw through ths feld of pots? Values of X are cotrolled by the expermeter, so the o-radom varable x s called the cotrolled varable or the depedet varable or the regressor. Values of Y are radom, but are flueced by the value of x. Thus Y s called the depedet varable or the respose varable. We wat a le of best ft so that, gve a value of x, we ca predct the value of Y for that value of x. The smple lear regresso model s that the predcted value of y s y β + β x ad that the observed value of Y s Y + β x β + ε where ε s the error. It s assumed that the errors are ormally dstrbuted as ε ~ N(, σ, wth a costat varace σ. The pot estmates of the errors ε are the resduals e y yˆ. Wth the assumptos Y β + β x + ε x x Y ~ N( β + β x, σ [ (3 V[Y] s d t of x ] place, t the follows that ˆβ ad are ubased estmators of the coeffcets β ad β. ( ote lower case E ˆ β ˆ βx + β + βx x Methods for dealg wth o-lear regresso are avalable the course text, but are beyod the scope of ths course.

ENGI 343 mple Lear Regresso Page - Examples llustratg volatos of the assumptos the smple lear regresso model:.. ( volated ot lear (3 volated varace ot costat 3. 4. OK ( & (3 volated If the assumptos are true, the the probablty dstrbuto of Y x s N( β + β x, σ.

ENGI 343 mple Lear Regresso Page -3 Example. Gve that Y.5 x + ε, where ε ~ N(,, fd the probablty that the observed value of y at x 8 wll exceed the observed value of y at x 7. Y ~ N(.5 x, Let Y 7 the observed value of y at x 7 ad Y 8 the observed value of y at x 8, the Y 7 ~ N(6.5, ad Y 8 ~ N(6, Y 8 Y 7 ~ N(6 6.5, + μ.5 σ 4 (.5 P[ Y8 Y7 > ] P Z > Φ(.5.43 Despte β <, P[Y 8 > Y 7 ] > 4%! For ay x the rage of the regresso model, more tha 95% of all Y wll le wth σ ( ether sde of the regresso le.

ENGI 343 mple Lear Regresso Page -4 Dervato of the coeffcets ˆβ ad ˆβ of the regresso le y ˆ β ˆ + β x : We eed to mmze the errors. Each error s estmated by the observed resdual e y yˆ. Mmze errors.? NO e Use the E (sum of squares due to errors e ( y ˆ β ˆ β x f ( ˆ β, ˆ β o Fd ˆβ ad ˆβ such that. ˆ β ˆ β [Note: ˆ β ˆ, β are varables, whle x, y are costats.] ˆ β ( ˆ ˆ ( ˆ ˆ β β β β ( y x + x y ad ( y ˆ ˆ β β ( ˆ ˆ ˆ x x β x + β x x y ( β x ˆ β or, equvaletly, x x ˆ β ˆ β x y ˆ β x x y x y x x y x y x (3 (4

ENGI 343 mple Lear Regresso Page -5 The soluto to the lear system of two ormal equatos ( ad ( s: from the lower row of matrx equato (4: ˆβ x y, (where x y x x y x x y ad x ( x x x ( or, equvaletly, ˆ sample covarace of x, y β ; sample varace of x ad, from equato (: ˆ β ( y ˆ β x. A form that s less susceptble to roud-off errors (but less coveet for maual computatos s ( x x( y y ( x x ˆβ ad ˆ β ˆ β x. y The regresso le of Y o x s y y ( x x ˆβ. Equato ( guaratees that all smple lear regresso les pass through the cetrod ( x, y of the data. It turs out that the smple lear regresso method remas vald eve f the values of the regressor x are also radom. However, ote that terchagg x wth y, (so that Y s the regressor ad X s the respose, results a dfferet regresso le (uless X ad Y are perfectly correlated..

ENGI 343 mple Lear Regresso Page -6 Example. (the same data set as Example.6: pared two sample t test Ne voluteers are tested before ad after a trag programme. Fd the le of best ft for the posteror (after trag scores as a fucto of the pror (before trag scores. Voluteer: 3 4 5 6 7 8 9 After trag: 75 66 69 45 54 85 58 9 6 Before trag: 7 65 64 39 5 85 5 9 58 Let Y score after trag ad X score before trag. I order to use the smple lear regresso model, the assumptos Y β + β x + ε x x Y ~ N( β + β x, σ must hold. From a plot of the data ( http://www.egr.mu.ca/~ggeorge/343/demos/regress.xls, ad http://www.egr.mu.ca/~ggeorge/343/demos/ex.mpj, oe ca see that the assumptos are reasoable. catter plot Normal probablty plot of resduals

ENGI 343 mple Lear Regresso Page -7 Calculatos: x y x x. y y 7 75 584 54 565 65 66 45 49 4356 3 64 69 496 446 476 4 39 45 5 755 5 5 5 54 6 754 96 6 85 85 75 75 75 7 5 58 74 36 3364 8 9 9 8464 837 88 9 58 6 3364 3596 3844 um: 578 65 39384 484 4397 x y x y x y 9 484 578 65 776 [Note: ( * sample covarace of (X, Y ] x ( x x x 9 39384 578 37 [Note: ( * sample varace of X ] x y 776 ˆ β.876 37 ˆ x x y ˆ 578 9 ad β ( β ( 65.876.3445 x Each predcted value ŷ of Y s the estmated usg yˆ ˆ β + ˆ β x.34 +.87 x ad the pot estmates of the ukow errors ε are the observed resduals e y yˆ. [Use u-rouded values.34... ad.87... to fd resduals.] A measure of the degree to whch the regresso le fals to expla the varato Y s the sum of squares due to error, E e ( y ˆ β ˆ β x whch s gve the adjog table. x y ŷ e e 7 75 73.98979..5 65 66 67.89898.899 3.66 64 69 67.886.97 3.8854 39 45 45.7597.76.76 5 54 55.7736.774.9493 85 85 85.33.33.98 5 58 56.58747.45.995 9 9 9.39.39.537 58 6 6.887.98.368 E 3.84

ENGI 343 mple Lear Regresso Page -8 A Alteratve Formula for E: ˆ β y βˆ x E y ( y ˆ β x ˆ x y y ˆ β β x x ( y y ˆ ( x x( y y + ˆ β β ( x x yy ˆ β ˆ + β But ˆ β ( ( ( E yy βˆ or E E yy or ( ( ( yy ( I ths example, 37 5548 7 76 E 9 37 3.84 However, ths formula s very sestve to roud-off errors: If all terms are rouded off prematurely to three sgfcat fgures, the 4 55 7 7 E 5.85 ( d.p. 9 4 ( y ˆ T ( y y E e y

ENGI 343 mple Lear Regresso Page -9 The total varato Y s the T (sum of squares - total: yy T ( y y (whch s ( the sample varace of y. I ths example, T 5 548 / 9 77.555... The total varato (T ca be parttoed to the varato that ca be explaed by the regresso le ( R ( yˆ y ad the varato that remas uexplaed by the regresso le (E. T R + E ˆ β yy The proporto of the varato Y that s explaed by the regresso le s kow as the coeffcet of determato r R T E T I ths example, r (3.8... / 77.555....994... Therefore the regresso model ths example explas 99.% of the total varato y. Note: R ˆ β ad T yy r x x x y y y The coeffcet of determato s just the square of the sample correlato coeffcet r. Thus r r.996. It s o surprse that the two sets of test scores ths example are very strogly correlated. Most of the pots o the graph are very close to the regresso le y.87 x +.34.

ENGI 343 mple Lear Regresso Page - A pot estmate of the ukow populato varace σ of the errors ε s the sample varace or mea square error s ME E / (umber of degrees of freedom. But the calculato of s cludes two parameters that are estmated from the data: ˆβ E ad ˆβ. Therefore two degrees of freedom are lost ad ME. I ths example, ME.973. A cocse method of dsplayg some of ths formato s the ANOVA table (used Chapters ad of Devore for aalyss of varace. The f value the top rght corer of the table s the square of a t value that ca be used a hypothess test o the value of the slope coeffcet β. equece of maual calculatos: {, x, y, x,, y } {,, yy } ˆ β ˆ β, R, T } { R, E } { MR, ME } f t {, ource Degrees of ums of quares Mea quares f Freedom Regresso R 73.74... MR R / MR/ME 73.74... 868.4... Error E 3.8... ME E / ( 7.973... Total T 77.555... 8 To test H o : β (o useful lear assocato agast H a : β (a useful lear assocato exsts, we compare t f to t α/, (.

ENGI 343 mple Lear Regresso Page - I ths example, t 868.4... 9.4... >> t.5, 7 (the p-value s < 7 so we reject H o favour of H a at ay reasoable level of sgfcace α. ˆ β The stadard error s b of ˆβ s s / so the t value s also equal to. ME x x Yet aother alteratve test of the sgfcace of the lear assocato s a hypothess test o the populato correlato coeffcet ρ, (H o : ρ vs. H a : ρ, usg the test statstc above. Example.3 t r r, whch s etrely equvalet to the other two t statstcs (a Fd the le of best ft to the data x 3 4 y 6. 5.3 4. 5. 4.4 3.4.6 3..8. (b Estmate the value of y whe x. (c Why ca t the regresso le be used to estmate y whe x? (d Fd the sample correlato coeffcet. (e Does a useful lear relatoshp betwee Y ad x exst? (a A plot of these data follows. The Excel spreadsheet fle for these data ca be foud at http://www.egr.mu.ca /~ggeorge/343/demos /regress3.xls.

ENGI 343 mple Lear Regresso Page - The summary statstcs are Σ x 6 Σ y 38 Σ x 4 Σ 45.6 Σ y 63.6 From whch Σ Σ x Σ y 5 Σ x (Σ x 44 yy Σ y (Σ y 86.6 ˆ β 5.5 44 ˆ y ˆ β ad β 5.48 o the regresso le s x y 5.489.56 x (3 d.p. (b x y 5.488....55... 3.38 ( d.p. (c x y 5.488....55... 5.7 <! ( d.p. Problem: x s outsde the sample rage for x. LR model may be vald. I oe word: EXTRAPOLATION. (d (e x y 5 r.977....93 44 86.6 x x y y ( ( 5 R 6.4 ( 44 T yy ( 86.6 / 8.66 ad E T R 8.66 6.4....65...

ENGI 343 mple Lear Regresso Page -3 The ANOVA table s the: ource d.f. M f R 6.4444... 6.4444... 49.7... E 8.6555....369... T 9 8.66 from whch t f 7.5 (3 d.p. But t.5,8 5.4... t > t.5, 8 Therefore reject H o : β favour of H a : β at ay reasoable level of sgfcace α. [p-value...] r.977... 8 OR t 7. 5 r.85983... reject H o : ρ favour of H a : ρ (a sgfcat lear assocato exsts. R 6.4 [Also, from the ANOVA table, r.8598 T 8.66 Therefore the regresso le explas ~86% of the varato y. r r.97, as before.] Cofdece ad Predcto Itervals The smple lear regresso model Y β + β x + ε leads to a le of best ft the least squares sese, whch provdes a expected value of Y gve a value for x : yˆ ˆ β + ˆ β x E[ Y x ] μ Y x. The ucertaty ths expected value has two compoets: the square of the stadard error of the scatter of the observed pots about the regresso le ( σ /, ad the ucertaty the posto of the regresso le tself, whch creases wth the dstace of the chose x from the cetrod of the data but decreases wth creasg ( x x spread of the full set of x values: σ. The ukow varace σ of dvdual pots about the true regresso le s estmated by the mea square error s ME.

ENGI 343 mple Lear Regresso Page -4 Thus a ( α% cofdece terval for the expected value of edpots at Y at x x o has ( ˆ ˆ ( xo x β β x ± t s + + o α /, ( The predcto error for a sgle pot s the resdual E Y ŷ, whch ca be treated as the dfferece of two depedet radom varables. The varace of the predcto error s the ( x x V[ E] V[ Y] V[ yˆ ] σ σ + + + Thus a ( α% predcto terval for a sgle future observato of Y at x x o has edpots at ( ˆ ˆ ( xo x β + β xo ± tα /, ( s + + The predcto terval s always wder tha the cofdece terval. Example.3 (cotued (f Fd the 95% cofdece terval for the expected value of Y at x ad x 5. (g Fd the 95% predcto terval for a future value of Y at x ad at x 5. (f α 5% α/.5 Usg the varous values from parts (a ad (e: t.5, 8.36... s.5779... x.6 4.4 ˆβ 5.4888... ˆβ.555... x o the 95% CI for μ Y s ( ˆ ˆ ( xo x β x ± t s + β + o α /, ( 3.3777... ±.385...... 3.3777... ±.4395....94 E[ Y ] < 3.8 (to 3 s.f.

ENGI 343 mple Lear Regresso Page -5 x o 5 the 95% CI for μ Y 5 s ( ˆ ˆ ( xo x β x ± t s + β + o α /, (... ±.385....9777...... ±.58....4 E[ Y 5 ].46 (to 3 s.f. (g x o the 95% PI for Y s ( ˆ ˆ ( xo x β x ± t s + + β + o α /, ( 3.3777... ±.385...... 3.3777... ±.3898....99 Y < 4.77 (to 3 s.f. at x x o 5 the 95% PI for Y s ( ˆ ˆ ( xo x β x ± t s + + β + o α /, (... ±.385....9777...... ±.888....6 < Y <.3 (to 3 s.f. at x 5 Note how the cofdece ad predcto tervals both become wder the further away from the cetrod the value of x o s. The two tervals at x 5 are wde eough to cross the x-axs, whch s a llustrato of the dagers of extrapolato beyod the rage of x for whch data exst. ketch of cofdece ad predcto tervals for Example 3 (f ad (g: (f 95% Cofdece Itervals (g 95% Predcto Itervals