Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable The meanng of the regresson coeffcents b and b 1 How to evaluate the assumptons of regresson analyss and know what to do f the assumptons are volated To make nferences about the slope and correlaton coeffcent To estmate mean values and predct ndvdual values
Correlaton vs. Regresson A scatter plot can be used to show the relatonshp between two varables Correlaton analyss s used to measure the strength of the assocaton (lnear relatonshp) between two varables Correlaton s only concerned wth strength of the relatonshp No causal effect s mpled wth correlaton Scatter plots were frst presented n Ch. Correlaton was frst presented n Ch. 3 Introducton to Regresson Analyss Regresson analyss s used to: Predct the value of a dependent varable based on the value of at least one ndependent varable Explan the mpact of changes n an ndependent varable on the dependent varable Dependent varable: the varable we wsh to predct or explan Independent varable: the varable used to predct or explan the dependent varable
Smple Lnear Regresson Model Only one ndependent varable, Relatonshp between and s descrbed by a lnear functon Changes n are assumed to be related to changes n Types of Relatonshps Lnear relatonshps Curvlnear relatonshps
Types of Relatonshps (contnued) Strong relatonshps Weak relatonshps Types of Relatonshps (contnued) No relatonshp
Smple Lnear Regresson Model Dependent Varable Populaton ntercept = β + β + Populaton Slope Coeffcent 1 Independent Varable ε Random Error term Lnear component Random Error component Observed Value of for Smple Lnear Regresson Model = β + β + ε 1 (contnued) Predcted Value of for Intercept = β ε Random Error for ths value Slope = β 1
Smple Lnear Regresson Equaton (Predcton Lne) The smple lnear regresson equaton provdes an estmate of the populaton regresson lne Estmated (or predcted) value for observaton Estmate of the regresson ntercept Estmate of the regresson slope Ŷ = b + b 1 Value of for observaton The Least Squares Method b and b 1 are obtaned by fndng the values of that mnmze the sum of the squared dfferences between and Ŷ : mn ( + Ŷ ) = mn ( (b b1 ))
Fndng the Least Squares Equaton The coeffcents b and b 1, and other regresson results n ths chapter, wll be found usng Excel or Mntab Formulas are shown n the text for those who are nterested Interpretaton of the Slope and the Intercept b s the estmated mean value of when the value of s zero b 1 s the estmated change n the mean value of as a result of a one-unt ncrease n
Smple Lnear Regresson Example A real estate agent wshes to examne the relatonshp between the sellng prce of a home and ts sze (measured n square feet) A random sample of 1 houses s selected Dependent varable () = house prce n $1s Independent varable () = square feet Smple Lnear Regresson Example: Data House Prce n $1s () Square Feet () 45 14 31 16 79 17 38 1875 199 11 19 155 45 35 34 45 319 145 55 17
Smple Lnear Regresson Example: Scatter Plot House prce model: Scatter Plot House Prce ($1s) 45 4 35 3 5 15 1 5 5 1 15 5 3 Square Feet Smple Lnear Regresson Example: Output Regresson Statstcs Multple R.7611 R Square.588 Adjusted R Square.584 Standard Error 41.333 Observatons 1 The regresson equaton s: house prce = 98.4833 +.1977 (square feet) ANOVA df SS MS F Sgnfcance F Regresson 1 18934.9348 18934.9348 11.848.139 Resdual 8 13665.565 178.1957 Total 9 36.5 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.3348 1.6996.189-35.577 3.7386 Square Feet.1977.397 3.3938.139.3374.1858
Smple Lnear Regresson Example: Graphcal Representaton House prce model: Scatter Plot and Predcton Lne Intercept = 98.48 House Prce ($1s) 45 4 35 3 5 15 1 5 5 1 15 5 3 Square Feet Slope =.1977 house prce = 98.4833 +.1977 (square feet) Smple Lnear Regresson Example: Interpretaton of b o house prce = 98.4833 +.1977 (square feet) b s the estmated mean value of when the value of s zero (f = s n the range of observed values) Because a house cannot have a square footage of, b has no practcal applcaton
Smple Lnear Regresson Example: Interpretng b 1 house prce = 98.4833 +.1977 (square feet) b 1 estmates the change n the mean value of as a result of a one-unt ncrease n Here, b 1 =.1977 tells us that the mean value of a house ncreases by.1977($1) = $19.77, on average, for each addtonal one square foot of sze Smple Lnear Regresson Example: Makng Predctons Predct the prce for a house wth square feet: house prce = = = 98.4833 +.1977 (sq.ft.) 98.4833 +.1977() 317.78 The predcted prce for a house wth square feet s 317.78($1,s) = $317,78
Smple Lnear Regresson Example: Makng Predctons When usng a regresson model for predcton, only predct wthn the relevant range of data Relevant range for nterpolaton House Prce ($1s) 45 4 35 3 5 15 1 5 5 1 15 5 3 Square Feet Do not try to extrapolate beyond the range of observed s Measures of Varaton Total varaton s made up of two parts: SST = SSR + SSE Total Sum of Squares Regresson Sum of Squares Error Sum of Squares SST = ( SSR = (Ŷ SSE = ( Ŷ ) ) ) where: = Mean value of the dependent varable = Observed value of the dependent varable ˆ = Predcted value of for the gven value
Measures of Varaton (contnued) SST = total sum of squares (Total Varaton) Measures the varaton of the values around ther mean SSR = regresson sum of squares (Explaned Varaton) Varaton attrbutable to the relatonshp between and SSE = error sum of squares (Unexplaned Varaton) Varaton n attrbutable to factors other than Measures of Varaton (contnued) SST = ( - ) SSE = ( - ) _ SSR = ( - ) _
Coeffcent of Determnaton, r The coeffcent of determnaton s the porton of the total varaton n the dependent varable that s explaned by varaton n the ndependent varable The coeffcent of determnaton s also called r-squared and s denoted as r r SSR = SST regresson sum of squares = total sum of squares note: r 1 Examples of Approxmate r Values r = 1 r = 1 Perfect lnear relatonshp between and : 1% of the varaton n s explaned by varaton n r = 1
Examples of Approxmate r Values < r < 1 Weaker lnear relatonshps between and : Some but not all of the varaton n s explaned by varaton n Examples of Approxmate r Values r = No lnear relatonshp between and : r = The value of does not depend on. (None of the varaton n s explaned by varaton n )
Smple Lnear Regresson Example: Coeffcent of Determnaton, r Regresson Statstcs Multple R.7611 R Square.588 Adjusted R Square.584 Standard Error 41.333 Observatons 1 SSR 18934.9348 r = = =.588 SST 36.5 58.8% of the varaton n house prces s explaned by varaton n square feet ANOVA df SS MS F Sgnfcance F Regresson 1 18934.9348 18934.9348 11.848.139 Resdual 8 13665.565 178.1957 Total 9 36.5 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.3348 1.6996.189-35.577 3.7386 Square Feet.1977.397 3.3938.139.3374.1858 Standard Error of Estmate The standard devaton of the varaton of observatons around the regresson lne s estmated by S Where = SSE = n = 1 SSE = error sum of squares n = sample sze n ( ˆ n )
Smple Lnear Regresson Example: Standard Error of Estmate Regresson Statstcs Multple R.7611 R Square.588 Adjusted R Square.584 Standard Error 41.333 Observatons 1 S = 41.333 ANOVA df SS MS F Sgnfcance F Regresson 1 18934.9348 18934.9348 11.848.139 Resdual 8 13665.565 178.1957 Total 9 36.5 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.3348 1.6996.189-35.577 3.7386 Square Feet.1977.397 3.3938.139.3374.1858 Comparng Standard Errors S s a measure of the varaton of observed values from the regresson lne smalls larges The magntude of S should always be judged relatve to the sze of the values n the sample data.e., S = $41.33K s moderately small relatve to house prces n the $K - $4K range
Assumptons of Regresson L.I.N.E Lnearty The relatonshp between and s lnear Independence of Errors Error values are statstcally ndependent Normalty of Error Error values are normally dstrbuted for any gven value of Equal Varance (also called homoscedastcty) The probablty dstrbuton of the errors has constant varance Resdual Analyss The resdual for observaton, e, s the dfference between ts observed and predcted value Check the assumptons of regresson by examnng the resduals Examne for lnearty assumpton Evaluate ndependence assumpton Evaluate normal dstrbuton assumpton Examne for constant varance for all levels of (homoscedastcty) Graphcal Analyss of Resduals Can plot resduals vs. e = Ŷ
Resdual Analyss for Lnearty x x resduals x resduals x Not Lnear ü Lnear Resdual Analyss for Independence Not Independent ü Independent resduals resduals resduals
Checkng for Normalty Examne the Stem-and-Leaf Dsplay of the Resduals Examne the Boxplot of the Resduals Examne the Hstogram of the Resduals Construct a Normal Probablty Plot of the Resduals Resdual Analyss for Normalty When usng a normal probablty plot, normal errors wll approxmately dsplay n a straght lne Percent 1-3 - -1 1 3 Resdual
Resdual Analyss for Equal Varance x x resduals Non-constant varance x resduals ü Constant varance x Smple Lnear Regresson Example: Resdual Output RESIDUAL OUTPUT Predcted House Prce Resduals 1 51.9316-6.9316 73.87671 38.139 3 84.85348-5.853484 4 34.684 3.93716 5 18.9984-19.9984 6 68.3883-49.3883 7 356.51 48.79749 8 367.1799-43.1799 9 54.6674 64.3364 1 84.85348-9.85348 Resduals 8 6 4 - -4-6 House Prce Model Resdual Plot 1 3 Square Feet Does not appear to volate any regresson assumptons
Smple Lnear Regresson Example: Resdual Output Resdual Plots for House Prce () 99 Normal Probablty Plot Versus Fts Percent 9 5 1 Resdual 5 5-5 1-1 -5 Resdual 5 1-5 4 8 3 Ftted Value 36 Hstogram Versus Order 3 5 Frequency 1 Resdual 5-5 -5-5 5 Resdual 5 75-5 1 3 4 5 6 7 Observaton Order 8 9 1 Measurng Autocorrelaton: The Durbn-Watson Statstc Used when data are collected over tme to detect f autocorrelaton s present Autocorrelaton exsts f resduals n one tme perod are related to resduals n another perod
Autocorrelaton Autocorrelaton s correlaton of the errors (resduals) over tme Tme (t) Resdual Plot Here, resduals show a cyclc pattern (not random.) Cyclcal patterns are a sgn of postve autocorrelaton Resduals 15 1 5-5 -1-15 4 6 8 Tme (t) Volates the regresson assumpton that resduals are random and ndependent The Durbn-Watson Statstc The Durbn-Watson statstc s used to test for autocorrelaton H : resduals are not correlated H 1 : postve autocorrelaton s present D n = = n (e e = 1 e ) 1 The possble range s D 4 D should be close to f H s true D less than may sgnal postve autocorrelaton, D greater than may sgnal negatve autocorrelaton
Testng for Postve Autocorrelaton H : postve autocorrelaton does not exst H 1 : postve autocorrelaton s present Calculate the Durbn-Watson test statstc = D (The Durbn-Watson Statstc can be found usng Excel or Mntab) Fnd the values d L and d U from the Durbn-Watson table (for sample sze n and number of ndependent varables k) Decson rule: reject H f D < d L Reject H Inconclusve Do not reject H d L d U Testng for Postve Autocorrelaton (contnued) Suppose we have the followng tme seres data: 16 14 1 Sales 1 8 6 y = 3.65 + 4.738x R =.8976 4 5 1 15 5 3 Tme Is there autocorrelaton?
Testng for Postve Autocorrelaton (contnued) Example wth n = 5: Excel/PHStat output: Durbn-Watson Calculatons Sum of Squared Dfference of Resduals 396.18 Sum of Squared Resduals 379.98 Durbn-Watson Statstc 1.494 Sales 16 14 1 1 8 6 4 5 1 15 5 3 Tme y = 3.65 + 4.738x R =.8976 D = n = (e e n = 1 e ) 1 = 396.18 = 1.494 379.98 Testng for Postve Autocorrelaton (contnued) Here, n = 5 and there s k = 1 one ndependent varable Usng the Durbn-Watson table, d L = 1.9 and d U = 1.45 D = 1.494 < d L = 1.9, so reject H and conclude that sgnfcant postve autocorrelaton exsts Decson: reject H snce D = 1.494 < d L Reject H Inconclusve Do not reject H d L =1.9 d U =1.45
Inferences About the Slope The standard error of the regresson slope coeffcent (b 1 ) s estmated by Sb 1 = S SS = S ( ) where: S b1 = Estmate of the standard error of the slope S = SSE n = Standard error of the estmate Inferences About the Slope: t Test t test for a populaton slope Is there a lnear relatonshp between and? Null and alternatve hypotheses H : β 1 = (no lnear relatonshp) H 1 : β 1 (lnear relatonshp does exst) Test statstc t STAT = b 1 S d.f. = n β b 1 1 where: b 1 = regresson slope coeffcent β 1 = hypotheszed slope S b1 = standard error of the slope
Inferences About the Slope: t Test Example House Prce n $1s (y) Square Feet (x) 45 14 31 16 79 17 38 1875 199 11 19 155 45 35 34 45 319 145 55 17 Estmated Regresson Equaton: house prce = 98.5 +.198 (sq.ft.) The slope of ths model s.198 Is there a relatonshp between the square footage of the house and ts sales prce? Inferences About the Slope: t Test Example H : β 1 = H 1 : β 1 Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.3348 1.6996.189 Square Feet.1977.397 3.3938.139 b 1 S b1 t STAT b β 1 1 = S b 1. 1977 = = 3. 3938. 397
Inferences About the Slope: t Test Example Test Statstc: t STAT = 3.39 H : β 1 = H 1 : β 1 d.f. = 1- = 8 α/=.5 α/=.5 Reject H Do not reject H -t α/ t α/ Reject H -.36.36 3.39 Decson: Reject H There s suffcent evdence that square footage affects house prce Inferences About the Slope: t Test Example From Excel output: H : β 1 = H 1 : β 1 Coeffcents Standard Error t Stat P-value Intercept 98.4833 58.3348 1.6996.189 Square Feet.1977.397 3.3938.139 From Mntab output: Predctor Coef SE Coef T P Constant 98.5 58.3 1.69.19 Square Feet.1977.397 3.33.1 Decson: Reject H, snce p-value < α There s suffcent evdence that square footage affects house prce. p-value
F Test for Sgnfcance F Test statstc: F STAT = MSR MSE where MSR = SSR k SSE MSE = n k 1 where F STAT follows an F dstrbuton wth k numerator and (n k - 1) denomnator degrees of freedom (k = the number of ndependent varables n the regresson model) F-Test for Sgnfcance Output Regresson Statstcs Multple R.7611 R Square.588 Adjusted R Square.584 Standard Error 41.333 Observatons 1 ANOVA df SS MS F Sgnfcance F Regresson 1 18934.9348 18934.9348 11.848.139 Resdual 8 13665.565 178.1957 Total 9 36.5 MSR 18934.9348 F STAT = = = 11.848 MSE 178.1957 Wth 1 and 8 degrees of freedom p-value for the F-Test
F Test for Sgnfcance (contnued) H : β 1 = H 1 : β 1 α =.5 df 1 = 1 df = 8 Do not reject H Crtcal Value: F α = 5.3 α =.5 Reject H F.5 = 5.3 F Test Statstc: MSR F STAT = =11.8 MSE Decson: Reject H at α =.5 Concluson: There s suffcent evdence that house sze affects sellng prce Confdence Interval Estmate for the Slope Confdence Interval Estmate of the Slope: b 1 ± t α / S b 1 d.f. = n - Excel Prntout for House Prces: Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.3348 1.6996.189-35.577 3.7386 Square Feet.1977.397 3.3938.139.3374.1858 At 95% level of confdence, the confdence nterval for the slope s (.337,.1858)
Confdence Interval Estmate for the Slope (contnued) Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.4833 58.3348 1.6996.189-35.577 3.7386 Square Feet.1977.397 3.3938.139.3374.1858 Snce the unts of the house prce varable s $1s, we are 95% confdent that the average mpact on sales prce s between $33.74 and $185.8 per square foot of house sze Ths 95% confdence nterval does not nclude. Concluson: There s a sgnfcant relatonshp between house prce and square feet at the.5 level of sgnfcance t Test for a Correlaton Coeffcent Hypotheses H : ρ = (no correlaton between and ) H 1 : ρ (correlaton exsts) Test statstc t STAT = r - ρ 1 r n (wth n degrees of freedom) where r = + r = r r f b > 1 f b < 1
t-test For A Correlaton Coeffcent Is there evdence of a lnear relatonshp between square feet and house prce at the.5 level of sgnfcance? t H : ρ = H 1 : ρ (No correlaton) (correlaton exsts) α =.5, df = 1 - = 8 r ρ 1 r n.76 STAT = = = 1.76 1 3.39 (contnued) t-test For A Correlaton Coeffcent (contnued) t r ρ 1 r n.76 STAT = = = 1.76 1 d.f. = 1- = 8 α/=.5 α/=.5 3.39 Decson: Reject H Concluson: There s evdence of a lnear assocaton at the 5% level of sgnfcance Reject H Reject H t α/ Do not reject H -t α/ -.36.36 3.39
Estmatng Mean Values and Predctng Indvdual Values Goal: Form ntervals around to express uncertanty about the value of for a gven Confdence Interval for the mean of, gven = b +b 1 Predcton Interval for an ndvdual, gven Confdence Interval for the Average, Gven Confdence nterval estmate for the mean value of gven a partcular Confdence nterval for µ ˆ ± t α / S h = : Sze of nterval vares accordng to dstance away from mean, 1 ( ) 1 ( ) h = + = + n SS n ( )
Predcton Interval for an Indvdual, Gven Confdence nterval estmate for an Indvdual value of gven a partcular Confdence nterval for ˆ ± t S 1+ α / = h : Ths extra term adds to the nterval wdth to reflect the added uncertanty for an ndvdual case Estmaton of Mean Values: Example Confdence Interval Estmate for µ = Fnd the 95% confdence nterval for the mean prce of, square-foot houses Predcted Prce = 317.78 ($1,s) 1 ( ) ± t.5 S + = 317.78 ± 37.1 n ( ) Ŷ The confdence nterval endponts are 8.66 and 354.9, or from $8,66 to $354,9
Estmaton of Indvdual Values: Example Predcton Interval Estmate for = Fnd the 95% predcton nterval for an ndvdual house wth, square feet Predcted Prce = 317.85 ($1,s) 1 ( ) ± t.5 S 1+ + = 317.78 ± 1.8 n ( ) Ŷ The predcton nterval endponts are 15.5 and 4.7, or from $15,5 to $4,7 Ptfalls of Regresson Analyss Lackng an awareness of the assumptons underlyng least-squares regresson Not knowng how to evaluate the assumptons Not knowng the alternatves to least-squares regresson f a partcular assumpton s volated Usng a regresson model wthout knowledge of the subject matter Extrapolatng outsde the relevant range
Strateges for Avodng the Ptfalls of Regresson Start wth a scatter plot of vs. to observe possble relatonshp Perform resdual analyss to check the assumptons Plot the resduals vs. to check for volatons of assumptons such as homoscedastcty Use a hstogram, stem-and-leaf dsplay, boxplot, or normal probablty plot of the resduals to uncover possble non-normalty Strateges for Avodng the Ptfalls of Regresson If there s volaton of any assumpton, use alternatve methods or models (contnued) If there s no evdence of assumpton volaton, then test for the sgnfcance of the regresson coeffcents and construct confdence ntervals and predcton ntervals Avod makng predctons or forecasts outsde the relevant range
Summary Introduced types of regresson models Revewed assumptons of regresson and correlaton Dscussed determnng the smple lnear regresson equaton Descrbed measures of varaton Dscussed resdual analyss Addressed measurng autocorrelaton Summary (contnued) Descrbed nference about the slope Dscussed correlaton -- measurng the strength of the assocaton Addressed estmaton of mean values and predcton of ndvdual values Dscussed possble ptfalls n regresson and recommended strateges to avod them