Correlaton and Regresson otes prepared by Pamela Peterson Drake Index Basc terms and concepts... Smple regresson...5 Multple Regresson...3 Regresson termnology...0 Regresson formulas... Basc terms and concepts. A scatter plot s a graphcal representaton of the relaton between two or more varables. In the scatter plot of two varables x and y, each pont on the plot s an x-y par.. We use regresson and correlaton to descrbe the varaton n one or more varables. A. The varaton s the sum of the squared devatons of a varable. Varaton ( x-x) B. The varaton s the numerator of the varance of a sample: ( x-x) Varance - C. Both the varaton and the varance are measures of the dsperson of a sample. 3. The covarance between two random varables s a statstcal measure of the degree to whch the two varables move together. A. The covarance captures how one varable s dfferent from ts mean as the other varable s dfferent from ts mean. B. A postve covarance ndcates that the varables tend to move together; a negatve covarance ndcates that the varables tend to move n opposte drectons. C. The covarance s calculated as the rato of the covaraton to the sample sze less one: Covarance (x -x)(y-y) - where s the sample sze x s the th observaton on varable x, x s the mean of the varable x observatons, s the th observaton on varable y, and y Example: Home sale prces and square footage Home sales prces (vertcal axs) v. square footage for a sample of 34 home sales n September 005 n St. Luce County. Sales prce $800,000 $700,000 $600,000 $500,000 $400,000 $300,000 $00,000 $00,000 $0 0 500,000,500,000,500 3,000 Square footage Regresson otes, Prepared by Pamela Peterson Drake of
y s the mean of the varable y observatons. D. The actual value of the covarance s not meanngful because t s affected by the scale of the two varables. That s why we calculate the correlaton coeffcent to make somethng nterpretable from the covarance nformaton. E. The correlaton coeffcent, r, s a measure of the strength of the relatonshp between or among varables. Calculaton: covarance betwen x and y r standard devaton standard devaton of x of y r (x-x) (y -y) ( -) (x-x) (y-y),n - - ote: Correlaton does not mply causaton. We may say that two varables X and Y are correlated, but that does not mean that X causes Y or that Y causes X they smply are related or assocated wth one another. Example : Calculatng the correlaton coeffcent Devaton of x Squared devaton of x Devaton of y Squared devaton of y Product of devatons Observaton x y x- x (x- x ) y- y (y- y ) (x- x )(y-y ) 50 -.50.5 8.40 70.56 -.60 3 54-0.50 0.5.40 53.76-6.0 3 0 48-3.50.5 6.40 40.96 -.40 4 9 47-4.50 0.5 5.40 9.6-4.30 5 0 70 6.50 4.5 8.40 806.56 84.60 6 7 0-6.50 4.5 -.60 466.56 40.40 7 4 5-9.50 90.5-6.60 707.56 5.70 8 40 8.50 7.5 -.60.56-3.60 9 5 35.50.5-6.60 43.56-9.90 0 3 37 9.50 90.5-4.60.6-43.70 Sum 35 46 0.00 374.50 0.00,34.40 445.00 Calculatons: x 35/0 3.5 y 46 / 0 4.6 s x 374.5 / 9 4.6 s y,34.4 / 9 60.67 r 445/9 49.444 0.475 4.6 60.67 (6.45)(6.33). The type of relatonshp s represented by the correlaton coeffcent: r + perfect postve correlaton + >r > 0 postve relatonshp Regresson otes, Prepared by Pamela Peterson Drake of
. r 0 no relatonshp 0 > r > negatve relatonshp r perfect negatve correlaton You can determne the degree of correlaton by lookng at the scatter graphs. If the relaton s upward there s postve correlaton. If the relaton downward there s negatve correlaton. Y 0 < r <.0 Y -.0 < r < 0 X X. The correlaton coeffcent s bound by and +. The closer the coeffcent to or +, the stronger s the correlaton. v. Wth the excepton of the extremes (that s, r.0 or r -), we cannot really talk about the strength of a relatonshp ndcated by the correlaton coeffcent wthout a statstcal test of sgnfcance. v. The hypotheses of nterest regardng the populaton correlaton, ρ, are: ull hypothess H 0 : ρ 0 In other words, there s no correlaton between the two varables Alternatve hypothess H a : ρ / 0 In other words, there s a correlaton between the two varables v. The test statstc s t-dstrbuted wth n- degrees of freedom: t r - - r Example, contnued In the prevous example, r 0.475 0 t 0. 475 8 0. 475. 3435 0. 88. 567 v. To make a decson, compare the calculated t-statstc wth the crtcal t-statstc for the approprate degrees of freedom and level of sgnfcance. Regresson otes, Prepared by Pamela Peterson Drake 3 of
Problem Suppose the correlaton coeffcent s 0. and the number of observatons s 3. What s the calculated test statstc? Is ths sgnfcant correlaton usng a 5% level of sgnfcance? Soluton Hypotheses: H 0 : ρ 0 H a : ρ 0 0. 3-0. 30 Calculated t -statstc: t. 803-0.04 0. 96 Degrees of freedom 3-3 The crtcal t-value for a 5% level of sgnfcance and 3 degrees of freedom s.04. Therefore, there s no sgnfcant correlaton (.803 falls between the two crtcal values of.04 and +.04). Problem Suppose the correlaton coeffcent s 0.80 and the number of observatons s 6. What s the calculated test statstc? Is ths sgnfcant correlaton usng a % level of sgnfcance? Soluton Hypotheses: H 0 : ρ 0 v. H a : ρ 0 Calculated t -statstc: 0.80 6 0.80 50 5.65685 t 9.4809 0.64 0.36 0.6 The crtcal t-value for a % level of sgnfcance and observatons s 3.69. Therefore, the null hypothess s rejected and we conclude that there s sgnfcant correlaton. F. An outler s an extreme value of a varable. The outler may be qute large or small (where large and small are defned relatve to the rest of the sample).. An outler may affect the sample statstcs, such as a correlaton coeffcent. It s possble for an outler to affect the result, for example, such that we conclude that there s a sgnfcant relaton when n fact there s none or to conclude that there s no relaton when n fact there s a relaton.. The researcher must exercse judgment (and cauton) when decdng whether to nclude or exclude an observaton. G. Spurous correlaton s the appearance of a relatonshp when n fact there s no relaton. Outlers may result n spurous correlaton.. The correlaton coeffcent does not ndcate a causal relatonshp. Certan data tems may be hghly correlated, but not necessarly a result of a causal relatonshp.. A good example of a spurous correlaton s snowfall and stock prces n January. If we regress hstorcal stock prces on snowfall totals n Mnnesota, we would get a statstcally sgnfcant relatonshp especally for the month of January. Snce there s not an economc reason for ths relatonshp, ths would be an example of spurous correlaton. Regresson otes, Prepared by Pamela Peterson Drake 4 of
Smple regresson. Regresson s the analyss of the relaton between one varable and some other varable(s), assumng a lnear relaton. Also referred to as least squares regresson and ordnary least squares (OLS). A. The purpose s to explan the varaton n a varable (that s, how a varable dffers from t's mean value) usng the varaton n one or more other varables. B. Suppose we want to descrbe, explan, or predct why a varable dffers from ts mean. Let the th observaton on ths varable be represented as Y, and let n ndcate the number of observatons. The varaton n Y 's (what we want to explan) s: Varaton of Y SS Total ( y-y) C. The least squares prncple s that the regresson lne s determned by mnmzng the sum of the squares of the vertcal dstances between the actual Y values and the predcted values of Y. Y X A lne s ft through the XY ponts such that the sum of the squared resduals (that s, the sum of the squared the vertcal dstance between the observatons and the lne) s mnmzed.. The varables n a regresson relaton consst of dependent and ndependent varables. A. The dependent varable s the varable whose varaton s beng explaned by the other varable(s). Also referred to as the explaned varable, the endogenous varable, or the predcted varable. B. The ndependent varable s the varable whose varaton s used to explan that of the dependent varable. Also referred to as the explanatory varable, the exogenous varable, or the predctng varable. C. The parameters n a smple regresson equaton are the slope (b ) and the ntercept (b 0 ): y b 0 + b x + ε where y x b 0 b ε s the th observaton on the dependent varable, s the th observaton on the ndependent varable, s the ntercept. s the slope coeffcent, s the resdual for the th observaton. Regresson otes, Prepared by Pamela Peterson Drake 5 of
Y b0 0 b X D. The slope, b, s the change n Y for a gven oneunt change n X. The slope can be postve, negatve, or zero, calculated as: b cov(x, Y) var(x) (y y)(x (x x) x) Hnt: Thnk of the regresson lne as the average of the relatonshp between the ndependent varable(s) and the dependent varable. The resdual represents the dstance an observed value of the dependent varables (.e., Y) s away from the average relatonshp as depcted by the regresson lne. Suppose that: (y y)(x x),000 (x x) 450 30 Then,000 34.4876 ˆb 9. 450 5.574 9 A short -cut formula for the slope coeffcent: x y (y y)(x x) xy b (x x) x x Whether ths s truly a short-cut or not depends on the method of performng the calculatons: by hand, usng Mcrosoft Excel, or usng a calculator. E. The ntercept, b 0, s the lne s ntersecton wth the Y-axs at X0. The ntercept can be postve, negatve, or zero. The ntercept s calculated as: ˆb 0y-bx Regresson otes, Prepared by Pamela Peterson Drake 6 of
3. Lnear regresson assumes the followng: A. A lnear relatonshp exsts between dependent and ndependent varable. ote: f the relaton s not lnear, t may be possble to transform one or both varables so that there s a lnear relaton. B. The ndependent varable s uncorrelated wth the resduals; that s, the ndependent varable s not random. C. The expected value of the dsturbance term s zero; that s, E(ε )0 Example, contnued: D. There s a constant varance of the dsturbance term; that s, the dsturbance or resdual terms are all drawn from a dstrbuton wth an dentcal varance. In other words, the dsturbance terms are homoskedaststc. [A volaton of ths s referred to as heteroskedastcty.] E. The resduals are ndependently dstrbuted; that s, the resdual or dsturbance for one observaton s not correlated wth that of another observaton. [A volaton of ths s referred to as autocorrelaton.] F. The dsturbance term (a.k.a. resdual, a.k.a. error term) s normally dstrbuted. 4. The standard error of the estmate, SEE, (also referred to as the standard error of the resdual or standard error of the regresson, and often ndcated as s e ) s the standard devaton of predcted dependent varable values about the estmated regresson lne. 5. Standard error of the estmate (SEE) s SSResdual e Home sales prces (vertcal axs) v. square footage for a sample of 34 home sales n September 005 n St. Luce County. Sales prce $800,000 $600,000 $400,000 $00,000 $0 -$00,000 -$400,000 -,000 0,000,000 3,000 4,000 Square footage SEE ( y bˆ 0 bˆ x ) (y yˆ ) eˆ where SS Resdual s the sum of squared errors; ^ ndcates the predcted or estmated value of the varable or parameter; and ŷ I bˆ + ˆ x, s a pont on the regresson lne correspondng to a value of the 0 b ndependent varable, the x ; the expected value of y, gven the estmated mean relaton between x and y. Regresson otes, Prepared by Pamela Peterson Drake 7 of
A. The standard error of the estmate helps us gauge the "ft" of the regresson lne; that s, how well we have descrbed the varaton n the dependent varable.. The smaller the standard error, the better the ft... v. The standard error of the estmate s a measure of close the estmated values (usng the estmated regresson), the ŷ's, are to the actual values, the Y's. The ε s (a.k.a. the dsturbance terms; a.k.a. the resduals) are the vertcal dstance between the observed value of Y and that predcted by the equaton, the ŷ's. The ε s are n the same terms (unt of measure) as the Y s (e.g., dollars, pounds, bllons) 6. The coeffcent of determnaton, R, s the percentage of varaton n the dependent varable (varaton of Y 's or the sum of squares total, SST) explaned by the ndependent varable(s). A. The coeffcent of determnaton s calculated as: R Explaned varaton Total varaton Example, contnued Consder the followng observatons on X and Y: Observaton X Y 50 3 54 3 0 48 4 9 47 5 0 70 6 7 0 7 4 5 8 40 9 5 35 0 3 37 Sum 35 46 The estmated regresson lne s: y 5.559 +.88 x and the resduals are calculated as: Observaton x y ŷ y- ŷ ε 50 39.8 0.8 03.68 3 54 4.0.99 68.85 3 0 48 37.44 0.56.49 4 9 47 36.5 0.75 5.50 5 0 70 49.3 0.68 47.5 6 7 0 33.88-3.88 9.55 7 4 5 30.3-5.3 34.45 8 40 5.70 -.70 36.89 9 5 35 43.38-8.38 70.6 0 3 37 5.89-5.89 5.44 Total 0,83.63 Therefore, SS Resdaul 83.63 / 8 6.70 SEE 6.70 5.06 Total varaton Unexplaned varaton SS SS Total SSResdual Total varaton SS SS Total Regresson B. An R of 0.49 ndcates that the ndependent varables explan 49% of the varaton n the dependent varable. Total Regresson otes, Prepared by Pamela Peterson Drake 8 of
Example, contnued Contnung the prevous regresson example, we can calculate t he R : Observaton x y (y- y) ŷ Y-ŷ ( ŷ - y) ε 50 70.56 39.8 0.8 3.8 03.68 3 54 53.76 4.0.99 0.35 68.85 3 0 48 40.96 37.44 0.56 7.30.49 4 9 47 9.6 36.5 0.75 8.59 5.50 5 0 70 806.56 49.3 0.68 59.65 47.5 6 7 0 466.56 33.88-3.88 59.65 9.55 7 4 5 707.56 30.3-5.3 7.43 34.45 8 40.56 5.70 -.70 0.0 36.89 9 5 35 43.56 43.38-8.38 3.8 70.6 0 3 37.6 5.89-5.89 7.43 5.44 Total 46,34.40 46.00 0.00 58.77,83.63 R 58.77 /,34.40.57% or R (,83.63 /,34.40) 0.7743.57% 7. A confdence nterval s the range of regresson coeffcent values for a gven value estmate of the coeffcent and a gven level of probablty. A. The confdence nterval for a regresson coeffcent bˆ s calculated as: or bˆ ± t c s bˆ bˆ t c s < b bˆ < bˆ + t c s bˆ where t c s the crtcal t-value for the selected confdence level. If there are 30 degrees of freedom and a 95% confdence level, t c s.04 [taken from a t-table]. B. The nterpretaton of the confdence nterval s that ths s an nterval that we beleve wll nclude the true parameter ( s n the case above) wth the specfed level of confdence. bˆ 8. As the standard error of the estmate (the varablty of the data about the regresson lne) rses, the confdence wdens. In other words, the more varable the data, the less confdent you wll be when you re usng the regresson model to estmate the coeffcent. 9. The standard error of the coeffcent s the square root of the rato of the varance of the regresson to the varaton n the ndependent varable: s bˆ n (x s e x) A. Hypothess testng: an ndvdual explanatory varable Regresson otes, Prepared by Pamela Peterson Drake 9 of
. To test the hypothess of the slope coeffcent (that s, to see whether the estmated slope s equal to a hypotheszed value, b 0, Ho: b b, we calculate a t-dstrbuted statstc: ˆb -b t b s ˆb. The test statstc s t dstrbuted wth k degrees of freedom (number of observatons (), less the number of ndependent varables (k), less one). B. If the t statstc s greater than the crtcal t value for the approprate degrees of freedom, (or less than the crtcal t value for a negatve slope) we can say that the slope coeffcent s dfferent from the hypotheszed value, b. C. If there s no relaton between the dependent and an ndependent varable, the slope coeffcent, b, would be zero. Y b0 0 ote: The formula for the standard error of the coeffcent has the varaton of the ndependent varable n the denomnator, not the varance. The varance varaton / n-. X b 0 A zero slope ndcates that there s no change n Y for a gven change n X A zero slope ndcates that there s no relatonshp between Y and X. D. To test whether an ndependent varable explans the varaton n the dependent varable, the hypothess that s tested s whether the slope s zero: Ho: b 0 versus the alternatve (what you conclude f you reject the null, Ho): Ha: b / 0 Ths alternatve hypothess s referred to as a two-sded hypothess. Ths means that we reject the null f the observed slope s dfferent from zero n ether drecton (postve or negatve). E. There are hypotheses n economcs that refer to the sgn of the relaton between the dependent and the ndependent varables. In ths case, the alternatve s drectonal (> or <) and the t-test s one-sded (uses only one tal of the t-dstrbuton). In the case of a one-sded alternatve, there s only one crtcal t-value. Regresson otes, Prepared by Pamela Peterson Drake 0 of
Example 3: Testng the sgnfcance of a slope coeffcent Suppose the estmated slope coeffcent s 0.78, the sample sze s 6, the standard error of the coeffcent s 0.3, and the level of sgnfcance s 5%. Is the slope dfference than zero? The calculated test statstc s: t b The crtcal t-values are ±.060: bˆ b s bˆ 0.78 0.4375 0.3 -.060.060 Reject H 0 Fal to reject H 0 Reject H 0 Therefore, we reject the null hypothess, concludng that the slope s dfferent from zero. 0. Interpretaton of coeffcents. A. The estmated ntercept s nterpreted as the value of the dependent varable (the Y) f the ndependent varable (the X) takes on a value of zero. B. The estmated slope coeffcent s nterpreted as the change n the dependent varable for a gven one-unt change n the ndependent varable. C. Any conclusons regardng the mportance of an ndependent varable n explanng a dependent varable requres determnng the statstcal sgnfcance f the slope coeffcent. Smply lookng at the magntude of the slope coeffcent does not address ths ssue of the mportance of the varable.. Forecastng s usng regresson nvolves makng predctons about the dependent varable based on average relatonshps observed n the estmated regresson. A. Predcted values are values of the dependent varable based on the estmated regresson coeffcents and a predcton about the values of the ndependent varables. B. For a smple regresson, the value of Y s predcted as: Example 4 Suppose you estmate a regresson model wth the followng estmates: ŷ.50 +.5 X In addton, you have forecasted value for the ndependent varable, X 0. The forecasted value for y s 5.5: ŷ.50 +.50 (0).50 + 50 5.5 Regresson otes, Prepared by Pamela Peterson Drake of
where ŷ x p ˆ + ˆ ŷ b 0 b x p s the predcted value of the dependent varable, and s the predcted value of the ndependent varable (nput).. An analyss of varance table (AOVA table) table s a summary of the explanaton of the varaton n the dependent varable. The basc form of the AOVA table s as follows: Source of varaton Degrees of freedom Sum of squares Mean square Regresson (explaned) Sum of squares regresson (SS Regresson ) Error (unexplaned) - Sum of squares resdual (SS Resdual ) Total - Sum of squares total (SS Total ) Mean square regresson SS Regresson Mean square error SS Resdual - Example 5 Source of Degrees of Sum of Mean varaton freedom squares square Regresson (explaned) 5050 5050 Error (unexplaned) 8 600.49 Total 9 5650 R 5,050 0.8938 or 89.38% 5,650 SEE 600.49 4.69 8 Regresson otes, Prepared by Pamela Peterson Drake of
Multple Regresson. Multple regresson s regresson analyss wth more than one ndependent varable. A. The concept of multple regresson s dentcal to that of smple regresson analyss except that two or more ndependent varables are used smultaneously to explan varatons n the dependent varable. B. In a multple regresson, the goal s to mnmze the sum of the squared errors. Each slope coeffcent s estmated whle holdng the other varables constant. y b 0 + b x + b x + b 3 x 3 + b 4 x 4. The ntercept n the regresson equaton has the same nterpretaton as t dd under the smple lnear case the ntercept s the value of the dependent varable when all ndependent varables are equal zero. 3. The slope coeffcent s the parameter that reflects the change n the dependent varable for a one unt change n the ndependent varable. A. The slope coeffcents (the betas) are descrbed as the movement n the dependent varable for a one unt change n the ndependent varable holdng all other ndependent varables constant. B. For ths reason, beta coeffcents n a multple lnear regresson are sometmes called partal betas or partal regresson coeffcents. 4. Regresson model: Y b 0 + b x + b x + ε We do not represent the multple regresson graphcally because t would requre graphs that are n more than two dmensons. A slope by any other name The slope coeffcent s the elastcty of the dependent varable wth respect to the ndependent varable. In other words, t s the frst dervatve of the dependent varable wth respect to the ndependent varable. where: b j x j s the slope coeffcent on the j th ndependent varable; and s the th observaton on the j th varable. A. The degrees of freedom for the test of a slope coeffcent are -k-, where n s the number of observatons n the sample and k s the number of ndependent varables. B. In multple regresson, the ndependent varables may be correlated wth one another, resultng n less relable estmates. Ths problem s referred to as multcollnearty. 5. A confdence nterval for a populaton regresson slope n a multple regresson s an nterval centered on the estmated slope: bˆ ± t c s or bˆ { bˆ } t c s < b bˆ < + t c s bˆ A. Ths s the same nterval usng n smple regresson for the nterval of a slope coeffcent. B. If ths nterval contans zero, we conclude that the slope s not statstcally dfferent from zero. bˆ Regresson otes, Prepared by Pamela Peterson Drake 3 of
6. The assumptons of the multple regresson model are as follows: A. A lnear relatonshp exsts between dependent and ndependent varables. B. The ndependent varables are uncorrelated wth the resduals; that s, the ndependent varable s not random. In addton, there s no exact lnear relaton between two or more ndependent varables. [ote: ths s modfed slghtly from the assumptons of the smple regresson model.] C. The expected value of the dsturbance term s zero; that s, E(ε )0 D. There s a constant varance of the dsturbance term; that s, the dsturbance or resdual terms are all drawn from a dstrbuton wth an dentcal varance. In other words, the dsturbance terms are homoskedaststc. [A volaton of ths s referred to as heteroskedastcty.] E. The resduals are ndependently dstrbuted; that s, the resdual or dsturbance for one observaton s not correlated wth that of another observaton. [A volaton of ths s referred to as autocorrelaton.] F. The dsturbance term (a.k.a. resdual, a.k.a. error term) s normally dstrbuted. G. The resdual (a.k.a. dsturbance term, a.k.a. error term) s what s not explaned by the ndependent varables. 7. In a regresson wth two ndependent varables, the resdual for the th observaton s: ε Y ( bˆ 0 + bˆ x + bˆ x ) 8. The standard error of the estmate (SEE) s the standard error of the resdual: s e eˆ t t SSE SEE k k 9. The degrees of freedom, df, are calculated as: number of number of df k (k + ) observato ns ndependen t varable s A. The degrees of freedom are the number of ndependent peces of nformaton that are used to estmate the regresson parameters. In calculatng the regresson parameters, we use the followng peces of nformaton: The mean of the dependent varable. The mean of each of the ndependent varables. B. Therefore, f the regresson s a smple regresson, we use the two degrees of freedom n estmatng the regresson lne. f the regresson s a multple regresson wth four ndependent varables, we use fve degrees of freedom n the estmaton of the regresson lne. Regresson otes, Prepared by Pamela Peterson Drake 4 of
0. Forecastng s usng regresson nvolves makng predctons about the dependent varable based on average relatonshps observed n the estmated regresson. A. Predcted values are values of the dependent varable based on the estmated regresson coeffcents and a predcton about the values of the ndependent varables. B. For a smple regresson, the value of y s predcted as: where ŷ bˆ xˆ yˆ bˆ ˆ + ˆ 0 + bxˆ b xˆ s the predcted value of the dependent varable, s the estmated parameter, and s the predcted value of the ndependent varable C. The better the ft of the regresson (that s, the smaller s SEE), the more confdent we are n our predctons. Example 6: Usng analyss of varance nformaton Suppose we estmate a multple regresson model that has fve ndependent varables usng a sample of 65 observatons. If the sum of squared resduals s 789, what s the standard error of the estmate? Soluton Gven: SEE SS Resdual 789 65 k 5 789 789 3.373 65-5- 59 Cauton: The estmated ntercept and all the estmated slopes are used n the predcton of the dependent varable value, even f a slope s not statstcally sgnfcantly dfferent from zero. Example 7: Calculatng a forecasted value Suppose you estmate a regresson model wth the followng estmates: Y^.50 +.5 X 0. X +.5 X 3 In addton, you have forecasted values for the ndependent varables: X 0 X 0 X 3 50 What s the forecasted value of y? Soluton The forecasted value for Y s 90: Y^.50 +.50 (0) 0.0 (0) +.5 (50).50 + 50 4 + 6.50 90. The F-statstc s a measure of how well a set of ndependent varables, as a group, explan the varaton n the dependent varable. A. The F-statstc s calculated as: Regresson otes, Prepared by Pamela Peterson Drake 5 of
(y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k B. The F statstc can be formulated to test all ndependent varables as a group (the most common applcaton). For example, f there are four ndependent varables n the model, the hypotheses are: H 0 : b b b3 b 4 0 Ha: at least one b 0 C. The F-statstc can be formulated to test subsets of ndependent varables (to see whether they have ncremental explanatory power). For example f there are four ndependent varables n the model, a subset could be examned: H 0 : bb 40 Ha: b or b 4 0. The coeffcent of determnaton, R, s the percentage of varaton n the dependent varable explaned by the ndependent varables. R Explaned varaton Total varaton Total Unexplaned - varaton varaton Total varaton R (y ˆ y) (y y) 0 < R < A. By constructon, R ranges from 0 to.0 B. The adjusted-r s an alternatve to R : R ( R k ). The adjusted R s less than or equal to R ( equal to only when k)... Addng ndependent varables to the model wll ncrease R. Addng ndependent varables to the model may ncrease or decrease the adjusted-r (ote: adjusted-r can even be negatve). The adjusted R does not have the clean explanaton of explanatory power that the R has. 3. The purpose of the Analyss of Varance (AOVA) table s to attrbute the total varaton of the dependent varable to the regresson model (the regresson source n column ) and the resduals (the error source from column ). A. SS Total s the total varaton of Y about ts mean or average value (a.k.a. total sum of squares) and s computed as: Regresson otes, Prepared by Pamela Peterson Drake 6 of
SS (y -y) Total where y s the mean of Y. n B. SS Resdual (a.k.a. SSE) s the varablty that s unexplaned by the regresson and s computed as: n Resdual ˆ ˆ SS SSE (y-y) e where s Ŷ the value of the dependent varable usng the regresson equaton. C. SS Regresson (a.k.a. SS Explaned ) s the varablty that s explaned by the regresson equaton and s computed as SS Total SS Resdual. SS (y-y) ˆ Regresson D. MSE s the mean square error, or MSE SS Resdual / ( k - ) where k s the number of ndependent varables n the regresson. E. MSR s the mean square regresson, MSR SS Regresson / k Analyss of Varance Table (AOVA) Source df (Degrees of Freedom) SS (Sum of Squares) Mean Square (SS/df) Regresson k SS Regresson MSR Error -k- SS Resdual MSE Total - SS Total SS SS R - SS SS Regresson Resdual Total Total MSR F MSE 4. Dummy varables are qualtatve varables that take on a value of zero or one. A. Most ndependent varables represent a contnuous flow of values. However, sometmes the ndependent varable s of a bnary nature (t s ether O or OFF). B. These types of varables are called dummy varables and the data s assgned a value of "0" or "". In many cases, you apply the dummy varable concept to quantfy the mpact of a qualtatve varable. A dummy varable s a dchotomous varable; that s, t takes on a value of one or zero. Regresson otes, Prepared by Pamela Peterson Drake 7 of
C. Use one dummy varable less than the number of classes (e.g., f have three classes, use two dummy varables), otherwse you fall nto the dummy varable "trap" (perfect multcollnearty volatng assumpton []). D. An nteractve dummy varable s a dummy varable (0,) multpled by a varable to create a new varable. The slope on ths new varable tells us the ncremental slope. 5. Heteroskedastcty s the stuaton n whch the varance of the resduals s not constant across all observatons. A. An assumpton of the regresson methodology s that the sample s drawn from the same populaton, and that the varance of resduals s constant across observatons; n other words, the resduals are homoskedastc. B. Heteroskedastcty s a problem because the estmators do not have the smallest possble varance, and therefore the standard errors of the coeffcents would not be correct. 6. Autocorrelaton s the stuaton n whch the resdual terms are correlated wth one another. Ths occurs frequently n tme-seres analyss. A. Autocorrelaton usually appears n tme seres data. If last year s earnngs were hgh, ths means that ths year s earnngs may have a greater probablty of beng hgh than beng low. Ths s an example of postve autocorrelaton. When a good year s always followed by a bad year, ths s negatve autocorrelaton. B. Autocorrelaton s a problem because the estmators do not have the smallest possble varance and therefore the standard errors of the coeffcents would not be correct. 7. Multcollnearty s the problem of hgh correlaton between or among two or more ndependent varables. A. Multcollnearty s a problem because. The presence of multcollnearty can cause dstortons n the standard error and may lead to problems wth sgnfcance testng of ndvdual coeffcents, and. Estmates are senstve to changes n the sample observatons or the model specfcaton. B. If there s multcollnearty, we are more lkely to conclude a varable s not mportant. C. Multcollnearty s lkely present to some degree n most economc models. Perfect multcollnearty would prohbt us from estmatng the regresson parameters. The ssue then s really a one of degree. 8. The economc meanng of the results of a regresson estmaton focuses prmarly on the slope coeffcents. A. The slope coeffcents ndcate the change n the dependent varable for a one-unt change n the ndependent varable. Ths slope can than be nterpreted as an elastcty measure; that s, the change n one varable correspondng to a change n another varable. B. It s possble to have statstcal sgnfcance, yet not have economc sgnfcance (e.g., sgnfcant abnormal returns assocated wth an announcement, but these returns are not suffcent to cover transactons costs). Regresson otes, Prepared by Pamela Peterson Drake 8 of
To test the role of a sngle varable n explanng the varaton n the dependent varable test the role of all varables n explanng the varaton n the dependent varable estmate the change n the dependent varable for a oneunt change n the ndependent varable estmate the dependent varable f all of the ndependent varables take on a value of zero estmate the percentage of the dependent varable s varaton explaned by the ndependent varables forecast the value of the dependent varable gven the estmated values of the ndependent varable(s) use the t -statstc. the F-statstc. the slope coeffcent. the ntercept. the R. the regresson equaton, substtutng the estmated values of the ndependent varable(s) n the equaton. Regresson otes, Prepared by Pamela Peterson Drake 9 of
Regresson termnology Analyss of varance AOVA Autocorrelaton Coeffcent of determnaton Confdence nterval Correlaton coeffcent Covarance Covaraton Cross-sectonal Degrees of freedom Dependent varable Explaned varable Explanatory varable Forecast F-statstc Heteroskedastcty Homoskedastcty Independent varable Intercept Least squares regresson Mean square error Mean square regresson Multcollnearty Multple regresson egatve correlaton Ordnary least squares Perfect negatve correlaton Perfect postve correlaton Postve correlaton Predcted value R Regresson Resdual Scatterplot s e SEE Smple regresson Slope Slope coeffcent Spurous correlaton SS Resdual SS Regresson SS Total Standard error of the estmate Sum of squares error Sum of squares regresson Sum of squares total Tme-seres t-statstc Varance Varaton Regresson otes, Prepared by Pamela Peterson Drake 0 of
Regresson formulas Varances Varaton ( x x ) Varance ( x x) Covarance (x-x)(y-y) - Correlaton r (x,n x) (x x) (y y) ( ) (y y) t r - - r Regresson y b 0 + b x + ε y b 0 + b x + b x + b 3 x 3 + b 4 x 4 + ε b cov(x, Y) var(x) (y y)(x (x x) x) ˆb 0y-bx Tests and confdence ntervals ( y bˆ 0 bˆ x ) s e (y yˆ ) eˆ s bˆ n (x s e x) t b ˆb -b s ˆb bˆ t c s < b bˆ < bˆ + t c s bˆ (y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k Smple Regresson, prepared by Pamela Peterson Drake
Forecastng ŷb ˆ 0+bx ˆ p+bx ˆ p+bx ˆ ˆ 3p+...+bx Kp Analyss of Varance SS (y -y) Total n SS SS R - n Resdual ˆ ˆ Regresson SS SSE (y-y) e Regresson Resdual SS Total SSTotal (y ˆ y) (y y) (y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k SS (y-y) ˆ Smple Regresson, prepared by Pamela Peterson Drake