Statistics for Business and Economics

Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1

11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear relatonshp between two varables: Y = β 0 + β 1 X Where Y s the dependent varable and X s the ndependent varable β 0 s the Y-ntercept β 1 s the slope Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-

Least Squares Regresson n Estmates for coeffcents β 0 and β 1 are found usng a Least Squares Regresson technque n The least-squares regresson lne, based on sample data, s yˆ = b b x 0 + 1 n Where b 1 s the slope of the lne and b 0 s the y- ntercept: b = 1 Cov(x, y) s x b0 = y b1x Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-3

Introducton to Regresson Analyss n Regresson analyss s used to: n Predct the value of a dependent varable based on the value of at least one ndependent varable n Explan the mpact of changes n an ndependent varable on the dependent varable Dependent varable: the varable we wsh to explan (also called the endogenous varable) Independent varable: the varable used to explan the dependent varable (also called the exogenous varable) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-4

11. Lnear Regresson Model n n n The relatonshp between X and Y s descrbed by a lnear functon Changes n Y are assumed to be caused by changes n X Lnear regresson populaton equaton model Y = β + β x + 0 1 ε n Where β 0 and β 1 are the populaton model coeffcents and ε s a random error term. Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-5

Smple Lnear Regresson Model The populaton regresson model: Dependent Varable Populaton Y ntercept Y = β + β X + 0 Populaton Slope Coeffcent 1 Independent Varable ε Random Error term Lnear component Random Error component Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-6

Smple Lnear Regresson Model (contnued) Y Observed Value of Y for X Y = β + β X + 0 1 ε Predcted Value of Y for X ε Random Error for ths X value Slope = β 1 Intercept = β 0 X X Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-7

Smple Lnear Regresson Equaton The smple lnear regresson equaton provdes an estmate of the populaton regresson lne Estmated (or predcted) y value for observaton y ˆ = b + Estmate of the regresson ntercept 0 b Estmate of the regresson slope 1 x Value of x for observaton The ndvdual random error terms e have a mean of zero e ˆ = ( y - y) = y -(b0 + b1x ) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-8

11.3 Least Squares Estmators n b 0 and b 1 are obtaned by fndng the values of b 0 and b 1 that mnmze the sum of the squared dfferences between y and ŷ : mn SSE = mn e = mn (y yˆ ) = mn [y (b 0 + b 1 x )] Dfferental calculus s used to obtan the coeffcent estmators b 0 and b 1 that mnmze SSE Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-9

Least Squares Estmators (contnued) n The slope coeffcent estmator s n (x x)(y y) Cov(x,y) = 1 b 1 = = = n sx (x x) = 1 r xy s s y x n And the constant or y-ntercept s b0 = y b1x n The regresson lne always goes through the mean x, y Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-10

Fndng the Least Squares Equaton n The coeffcents b 0 and b 1, and other regresson results n ths chapter, wll be found usng a computer n Hand calculatons are tedous n Statstcal routnes are bult nto Excel n Other statstcal analyss software can be used Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-11

Lnear Regresson Model Assumptons n n n n The true relatonshp form s lnear (Y s a lnear functon of X, plus random error) The error terms, ε are ndependent of the x values The error terms are random varables wth mean 0 and constant varance, σ (the constant varance property s called homoscedastcty) E[ε ] = 0 and E[ε ] = σ for ( = 1,,n) The random error terms, ε, are not correlated wth one another, so that E[ε ε j ] = 0 for all j Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1

Interpretaton of the Slope and the Intercept n b 0 s the estmated average value of y when the value of x s zero (f x = 0 s n the range of observed x values) n b 1 s the estmated change n the average value of y as a result of a one-unt change n x Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-13

Smple Lnear Regresson Example n A real estate agent wshes to examne the relatonshp between the sellng prce of a home and ts sze (measured n square feet) n A random sample of 10 houses s selected n Dependent varable (Y) = house prce n $1000s n Independent varable (X) = square feet Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-14

Sample Data for House Prce Model House Prce n $1000s (Y) Square Feet (X) 45 1400 31 1600 79 1700 308 1875 199 1100 19 1550 405 350 34 450 319 145 55 1700 Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-15

Graphcal Presentaton n House prce model: scatter plot House Prce ($1000s) 450 400 350 300 50 00 150 100 50 0 0 500 1000 1500 000 500 3000 Square Feet Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-16

Graphcal Presentaton n House prce model: scatter plot and regresson lne 450 400 Intercept = 98.48 House Prce ($1000s) 350 300 50 00 150 100 50 0 0 500 1000 1500 000 500 3000 Slope = 0.10977 Square Feet house prce = 98.4833 + 0.10977 (square feet) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-17

Interpretaton of the Intercept, b 0 house prce = 98.4833 + 0.10977 (square feet) n b 0 s the estmated average value of Y when the value of X s zero (f X = 0 s n the range of observed X values) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-18

Interpretaton of the Slope Coeffcent, b 1 house prce = 98.4833 + 0.10977 (square feet) n b 1 measures the estmated change n the average value of Y as a result of a oneunt change n X n Here, b 1 =.10977 tells us that the average value of a house ncreases by.10977($1000) = $109.77, on average, for each addtonal one square foot of sze Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-19

11.4 Measures of Varaton n Total varaton s made up of two parts: SST = SSR + SSE Total Sum of Squares Regresson Sum of Squares Error Sum of Squares = SST (y y = SSR (yˆ y SSE = (y yˆ ) ) ) where: y = Average value of the dependent varable y = Observed values of the dependent varable ŷ = Predcted value of y for the gven x value Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-0

Measures of Varaton (contnued) n n n SST = total sum of squares n Measures the varaton of the y values around ther mean, y SSR = regresson sum of squares n Explaned varaton attrbutable to the lnear relatonshp between x and y SSE = error sum of squares n Varaton attrbutable to factors other than the lnear relatonshp between x and y Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1

Measures of Varaton y Y y _ y _ SST = (y - y) SSE = (y - y ) _ SSR = (y - y) (contnued) y _ y x X Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-

Coeffcent of Determnaton, R n The coeffcent of determnaton s the porton of the total varaton n the dependent varable that s explaned by varaton n the ndependent varable n The coeffcent of determnaton s also called R-squared and s denoted as R SSR R = = SST regresson sum of squares total sum of squares note: 0 R 1 Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-3

Examples of Approxmate r Values Y r = 1 Y r = 1 X Perfect lnear relatonshp between X and Y: 100% of the varaton n Y s explaned by varaton n X r = 1 X Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-4

Examples of Approxmate r Values Y Y X 0 < r < 1 Weaker lnear relatonshps between X and Y: Some but not all of the varaton n Y s explaned by varaton n X X Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-5

Examples of Approxmate r Values Y r = 0 No lnear relatonshp between X and Y: r = 0 X The value of Y does not depend on X. (None of the varaton n Y s explaned by varaton n X) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-6

Correlaton and R n The coeffcent of determnaton, R, for a smple regresson s equal to the smple correlaton squared R = r xy Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-7

Estmaton of Model Error Varance n n An estmator for the varance of the populaton model error s σˆ = s e = n = 1 n SSE n Dvson by n nstead of n 1 s because the smple regresson model uses two estmated parameters, b 0 and b 1, nstead of one e = s e = s e s called the standard error of the estmate Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-8

Comparng Standard Errors s e s a measure of the varaton of observed y values from the regresson lne Y Y small s e X large se X The magntude of s e should always be judged relatve to the sze of the y values n the sample data.e., s e = $41.33K s moderately small relatve to house prces n the $00 - $300K range Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-9

11.5 Inferences About the Regresson Model n The varance of the regresson slope coeffcent (b 1 ) s estmated by s b1 se = (x x) = se (n 1)s x where: s b1 = Estmate of the standard error of the least squares slope s e = SSE n = Standard error of the estmate Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-30

Comparng Standard Errors of the Slope S b1 s a measure of the varaton n the slope of regresson lnes from dfferent possble samples Y Y small S b1 X large S b1 X Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-31

Inference about the Slope: t Test n t test for a populaton slope n Is there a lnear relatonshp between X and Y? n Null and alternatve hypotheses H 0 : β 1 = 0 (no lnear relatonshp) H 1 : β 1 0 (lnear relatonshp does exst) n Test statstc t = d.f. b = β s 1 n b 1 1 where: b 1 = regresson slope coeffcent β 1 = hypotheszed slope s b1 = standard error of the slope Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-3

Inference about the Slope: t Test (contnued) House Prce n $1000s (y) Square Feet (x) 45 1400 31 1600 79 1700 308 1875 199 1100 19 1550 405 350 34 450 319 145 55 1700 Estmated Regresson Equaton: house prce = 98.5 + 0.1098 (sq.ft.) The slope of ths model s 0.1098 Does square footage of the house affect ts sales prce? Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-33

Inferences about the Slope: t Test Example H 0 : β 1 = 0 H 1 : β 1 0 Coeffcents Standard Error t Stat P-value s b1 Intercept 98.4833 58.03348 1.6996 0.189 Square Feet 0.10977 0.0397 3.3938 0.01039 b 1 t = b β s 0.10977 0 0.0397 1 1 = t = b 1 3.3938 Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-34

H 0 : β 1 = 0 H 1 : β 1 0 d.f. = 10- = 8 t 8,.05 =.3060 α/=.05 Inferences about the Slope: t Test Example Test Statstc: t = 3.39 Reject H 0 Do not reject H 0 Reject H 0 -t n-,α/ 0 t n-,α/ sb 1 Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.03348 1.6996 0.189 Square Feet 0.10977 0.0397 3.3938 0.01039 α/=.05 -.3060.3060 3.39 b 1 (contnued) Decson: Reject H 0 Concluson: There s suffcent evdence that square footage affects house prce Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-35 t

Inferences about the Slope: t Test Example H 0 : β 1 = 0 H 1 : β 1 0 P-value = 0.01039 (contnued) P-value Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.03348 1.6996 0.189 Square Feet 0.10977 0.0397 3.3938 0.01039 Ths s a two-tal test, so the p-value s P(t > 3.39)+P(t < -3.39) = 0.01039 (for 8 d.f.) Decson: P-value < α so Reject H 0 Concluson: There s suffcent evdence that square footage affects house prce Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-36

Confdence Interval Estmate for the Slope Confdence Interval Estmate of the Slope: b 1 tn,α/sb < β1 < b1 + tn,α/s 1 b 1 d.f. = n - Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348 1.6996 0.189-35.5770 3.07386 Square Feet 0.10977 0.0397 3.3938 0.01039 0.03374 0.18580 At 95% level of confdence, the confdence nterval for the slope s (0.0337, 0.1858) Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-37

Confdence Interval Estmate for the Slope (contnued) Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.03348 1.6996 0.189-35.5770 3.07386 Square Feet 0.10977 0.0397 3.3938 0.01039 0.03374 0.18580 Snce the unts of the house prce varable s $1000s, we are 95% confdent that the average mpact on sales prce s between $33.70 and $185.80 per square foot of house sze Ths 95% confdence nterval does not nclude 0. Concluson: There s a sgnfcant relatonshp between house prce and square feet at the.05 level of sgnfcance Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-38