Correlation and Regression

Similar documents
Statistics for Economics & Business

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Basic Business Statistics, 10/e

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 11: Simple Linear Regression and Correlation

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 14 Simple Linear Regression

Chapter 13: Multiple Regression

Comparison of Regression Lines

x i1 =1 for all i (the constant ).

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Lecture 6: Introduction to Linear Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Economics 130. Lecture 4 Simple Linear Regression Continued

18. SIMPLE LINEAR REGRESSION III

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Chapter 15 Student Lecture Notes 15-1

Learning Objectives for Chapter 11

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 15 - Multiple Regression

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

28. SIMPLE LINEAR REGRESSION III

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 9: Statistical Inference and the Relationship between Two Variables

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

January Examinations 2015

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Introduction to Regression

/ n ) are compared. The logic is: if the two

Chapter 8 Indicator Variables

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

e i is a random error

Scatter Plot x

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Statistics II Final Exam 26/6/18

x = , so that calculated

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Statistics MINITAB - Lab 2

Midterm Examination. Regression and Forecasting Models

The Ordinary Least Squares (OLS) Estimator

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

This column is a continuation of our previous column

Negative Binomial Regression

STAT 3008 Applied Regression Analysis

SIMPLE LINEAR REGRESSION

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Basically, if you have a dummy dependent variable you will be estimating a probability.

17 - LINEAR REGRESSION II

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Regression. The Simple Linear Regression Model

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

β0 + β1xi. You are interested in estimating the unknown parameters β

Linear Regression Analysis: Terminology and Notation

β0 + β1xi and want to estimate the unknown

CHAPTER 8. Exercise Solutions

Lecture 4 Hypothesis Testing

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

Properties of Least Squares

Chapter 4: Regression With One Regressor

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Biostatistics 360 F&t Tests and Intervals in Regression 1

Polynomial Regression Models

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 8 Multivariate Regression Analysis

Diagnostics in Poisson Regression. Models - Residual Analysis

Continuous vs. Discrete Goods

experimenteel en correlationeel onderzoek

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

STAT 511 FINAL EXAM NAME Spring 2001

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

The SAS program I used to obtain the analyses for my answers is given below.

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Statistics Chapter 4

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 3 Describing Data Using Numerical Measures

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Lecture 3 Stat102, Spring 2007

a. (All your answers should be in the letter!

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

REGRESSION ANALYSIS II- MULTICOLLINEARITY

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

A Comparative Study for Estimation Parameters in Panel Data Model

Transcription:

Correlaton and Regresson otes prepared by Pamela Peterson Drake Index Basc terms and concepts... Smple regresson...5 Multple Regresson...3 Regresson termnology...0 Regresson formulas... Basc terms and concepts. A scatter plot s a graphcal representaton of the relaton between two or more varables. In the scatter plot of two varables x and y, each pont on the plot s an x-y par.. We use regresson and correlaton to descrbe the varaton n one or more varables. A. The varaton s the sum of the squared devatons of a varable. Varaton ( x-x) B. The varaton s the numerator of the varance of a sample: ( x-x) Varance - C. Both the varaton and the varance are measures of the dsperson of a sample. 3. The covarance between two random varables s a statstcal measure of the degree to whch the two varables move together. A. The covarance captures how one varable s dfferent from ts mean as the other varable s dfferent from ts mean. B. A postve covarance ndcates that the varables tend to move together; a negatve covarance ndcates that the varables tend to move n opposte drectons. C. The covarance s calculated as the rato of the covaraton to the sample sze less one: Covarance (x -x)(y-y) - where s the sample sze x s the th observaton on varable x, x s the mean of the varable x observatons, s the th observaton on varable y, and y Example: Home sale prces and square footage Home sales prces (vertcal axs) v. square footage for a sample of 34 home sales n September 005 n St. Luce County. Sales prce $800,000 $700,000 $600,000 $500,000 $400,000 $300,000 $00,000 $00,000 $0 0 500,000,500,000,500 3,000 Square footage Regresson otes, Prepared by Pamela Peterson Drake of

y s the mean of the varable y observatons. D. The actual value of the covarance s not meanngful because t s affected by the scale of the two varables. That s why we calculate the correlaton coeffcent to make somethng nterpretable from the covarance nformaton. E. The correlaton coeffcent, r, s a measure of the strength of the relatonshp between or among varables. Calculaton: covarance betwen x and y r standard devaton standard devaton of x of y r (x-x) (y -y) ( -) (x-x) (y-y),n - - ote: Correlaton does not mply causaton. We may say that two varables X and Y are correlated, but that does not mean that X causes Y or that Y causes X they smply are related or assocated wth one another. Example : Calculatng the correlaton coeffcent Devaton of x Squared devaton of x Devaton of y Squared devaton of y Product of devatons Observaton x y x- x (x- x ) y- y (y- y ) (x- x )(y-y ) 50 -.50.5 8.40 70.56 -.60 3 54-0.50 0.5.40 53.76-6.0 3 0 48-3.50.5 6.40 40.96 -.40 4 9 47-4.50 0.5 5.40 9.6-4.30 5 0 70 6.50 4.5 8.40 806.56 84.60 6 7 0-6.50 4.5 -.60 466.56 40.40 7 4 5-9.50 90.5-6.60 707.56 5.70 8 40 8.50 7.5 -.60.56-3.60 9 5 35.50.5-6.60 43.56-9.90 0 3 37 9.50 90.5-4.60.6-43.70 Sum 35 46 0.00 374.50 0.00,34.40 445.00 Calculatons: x 35/0 3.5 y 46 / 0 4.6 s x 374.5 / 9 4.6 s y,34.4 / 9 60.67 r 445/9 49.444 0.475 4.6 60.67 (6.45)(6.33). The type of relatonshp s represented by the correlaton coeffcent: r + perfect postve correlaton + >r > 0 postve relatonshp Regresson otes, Prepared by Pamela Peterson Drake of

. r 0 no relatonshp 0 > r > negatve relatonshp r perfect negatve correlaton You can determne the degree of correlaton by lookng at the scatter graphs. If the relaton s upward there s postve correlaton. If the relaton downward there s negatve correlaton. Y 0 < r <.0 Y -.0 < r < 0 X X. The correlaton coeffcent s bound by and +. The closer the coeffcent to or +, the stronger s the correlaton. v. Wth the excepton of the extremes (that s, r.0 or r -), we cannot really talk about the strength of a relatonshp ndcated by the correlaton coeffcent wthout a statstcal test of sgnfcance. v. The hypotheses of nterest regardng the populaton correlaton, ρ, are: ull hypothess H 0 : ρ 0 In other words, there s no correlaton between the two varables Alternatve hypothess H a : ρ / 0 In other words, there s a correlaton between the two varables v. The test statstc s t-dstrbuted wth n- degrees of freedom: t r - - r Example, contnued In the prevous example, r 0.475 0 t 0. 475 8 0. 475. 3435 0. 88. 567 v. To make a decson, compare the calculated t-statstc wth the crtcal t-statstc for the approprate degrees of freedom and level of sgnfcance. Regresson otes, Prepared by Pamela Peterson Drake 3 of

Problem Suppose the correlaton coeffcent s 0. and the number of observatons s 3. What s the calculated test statstc? Is ths sgnfcant correlaton usng a 5% level of sgnfcance? Soluton Hypotheses: H 0 : ρ 0 H a : ρ 0 0. 3-0. 30 Calculated t -statstc: t. 803-0.04 0. 96 Degrees of freedom 3-3 The crtcal t-value for a 5% level of sgnfcance and 3 degrees of freedom s.04. Therefore, there s no sgnfcant correlaton (.803 falls between the two crtcal values of.04 and +.04). Problem Suppose the correlaton coeffcent s 0.80 and the number of observatons s 6. What s the calculated test statstc? Is ths sgnfcant correlaton usng a % level of sgnfcance? Soluton Hypotheses: H 0 : ρ 0 v. H a : ρ 0 Calculated t -statstc: 0.80 6 0.80 50 5.65685 t 9.4809 0.64 0.36 0.6 The crtcal t-value for a % level of sgnfcance and observatons s 3.69. Therefore, the null hypothess s rejected and we conclude that there s sgnfcant correlaton. F. An outler s an extreme value of a varable. The outler may be qute large or small (where large and small are defned relatve to the rest of the sample).. An outler may affect the sample statstcs, such as a correlaton coeffcent. It s possble for an outler to affect the result, for example, such that we conclude that there s a sgnfcant relaton when n fact there s none or to conclude that there s no relaton when n fact there s a relaton.. The researcher must exercse judgment (and cauton) when decdng whether to nclude or exclude an observaton. G. Spurous correlaton s the appearance of a relatonshp when n fact there s no relaton. Outlers may result n spurous correlaton.. The correlaton coeffcent does not ndcate a causal relatonshp. Certan data tems may be hghly correlated, but not necessarly a result of a causal relatonshp.. A good example of a spurous correlaton s snowfall and stock prces n January. If we regress hstorcal stock prces on snowfall totals n Mnnesota, we would get a statstcally sgnfcant relatonshp especally for the month of January. Snce there s not an economc reason for ths relatonshp, ths would be an example of spurous correlaton. Regresson otes, Prepared by Pamela Peterson Drake 4 of

Smple regresson. Regresson s the analyss of the relaton between one varable and some other varable(s), assumng a lnear relaton. Also referred to as least squares regresson and ordnary least squares (OLS). A. The purpose s to explan the varaton n a varable (that s, how a varable dffers from t's mean value) usng the varaton n one or more other varables. B. Suppose we want to descrbe, explan, or predct why a varable dffers from ts mean. Let the th observaton on ths varable be represented as Y, and let n ndcate the number of observatons. The varaton n Y 's (what we want to explan) s: Varaton of Y SS Total ( y-y) C. The least squares prncple s that the regresson lne s determned by mnmzng the sum of the squares of the vertcal dstances between the actual Y values and the predcted values of Y. Y X A lne s ft through the XY ponts such that the sum of the squared resduals (that s, the sum of the squared the vertcal dstance between the observatons and the lne) s mnmzed.. The varables n a regresson relaton consst of dependent and ndependent varables. A. The dependent varable s the varable whose varaton s beng explaned by the other varable(s). Also referred to as the explaned varable, the endogenous varable, or the predcted varable. B. The ndependent varable s the varable whose varaton s used to explan that of the dependent varable. Also referred to as the explanatory varable, the exogenous varable, or the predctng varable. C. The parameters n a smple regresson equaton are the slope (b ) and the ntercept (b 0 ): y b 0 + b x + ε where y x b 0 b ε s the th observaton on the dependent varable, s the th observaton on the ndependent varable, s the ntercept. s the slope coeffcent, s the resdual for the th observaton. Regresson otes, Prepared by Pamela Peterson Drake 5 of

Y b0 0 b X D. The slope, b, s the change n Y for a gven oneunt change n X. The slope can be postve, negatve, or zero, calculated as: b cov(x, Y) var(x) (y y)(x (x x) x) Hnt: Thnk of the regresson lne as the average of the relatonshp between the ndependent varable(s) and the dependent varable. The resdual represents the dstance an observed value of the dependent varables (.e., Y) s away from the average relatonshp as depcted by the regresson lne. Suppose that: (y y)(x x),000 (x x) 450 30 Then,000 34.4876 ˆb 9. 450 5.574 9 A short -cut formula for the slope coeffcent: x y (y y)(x x) xy b (x x) x x Whether ths s truly a short-cut or not depends on the method of performng the calculatons: by hand, usng Mcrosoft Excel, or usng a calculator. E. The ntercept, b 0, s the lne s ntersecton wth the Y-axs at X0. The ntercept can be postve, negatve, or zero. The ntercept s calculated as: ˆb 0y-bx Regresson otes, Prepared by Pamela Peterson Drake 6 of

3. Lnear regresson assumes the followng: A. A lnear relatonshp exsts between dependent and ndependent varable. ote: f the relaton s not lnear, t may be possble to transform one or both varables so that there s a lnear relaton. B. The ndependent varable s uncorrelated wth the resduals; that s, the ndependent varable s not random. C. The expected value of the dsturbance term s zero; that s, E(ε )0 Example, contnued: D. There s a constant varance of the dsturbance term; that s, the dsturbance or resdual terms are all drawn from a dstrbuton wth an dentcal varance. In other words, the dsturbance terms are homoskedaststc. [A volaton of ths s referred to as heteroskedastcty.] E. The resduals are ndependently dstrbuted; that s, the resdual or dsturbance for one observaton s not correlated wth that of another observaton. [A volaton of ths s referred to as autocorrelaton.] F. The dsturbance term (a.k.a. resdual, a.k.a. error term) s normally dstrbuted. 4. The standard error of the estmate, SEE, (also referred to as the standard error of the resdual or standard error of the regresson, and often ndcated as s e ) s the standard devaton of predcted dependent varable values about the estmated regresson lne. 5. Standard error of the estmate (SEE) s SSResdual e Home sales prces (vertcal axs) v. square footage for a sample of 34 home sales n September 005 n St. Luce County. Sales prce $800,000 $600,000 $400,000 $00,000 $0 -$00,000 -$400,000 -,000 0,000,000 3,000 4,000 Square footage SEE ( y bˆ 0 bˆ x ) (y yˆ ) eˆ where SS Resdual s the sum of squared errors; ^ ndcates the predcted or estmated value of the varable or parameter; and ŷ I bˆ + ˆ x, s a pont on the regresson lne correspondng to a value of the 0 b ndependent varable, the x ; the expected value of y, gven the estmated mean relaton between x and y. Regresson otes, Prepared by Pamela Peterson Drake 7 of

A. The standard error of the estmate helps us gauge the "ft" of the regresson lne; that s, how well we have descrbed the varaton n the dependent varable.. The smaller the standard error, the better the ft... v. The standard error of the estmate s a measure of close the estmated values (usng the estmated regresson), the ŷ's, are to the actual values, the Y's. The ε s (a.k.a. the dsturbance terms; a.k.a. the resduals) are the vertcal dstance between the observed value of Y and that predcted by the equaton, the ŷ's. The ε s are n the same terms (unt of measure) as the Y s (e.g., dollars, pounds, bllons) 6. The coeffcent of determnaton, R, s the percentage of varaton n the dependent varable (varaton of Y 's or the sum of squares total, SST) explaned by the ndependent varable(s). A. The coeffcent of determnaton s calculated as: R Explaned varaton Total varaton Example, contnued Consder the followng observatons on X and Y: Observaton X Y 50 3 54 3 0 48 4 9 47 5 0 70 6 7 0 7 4 5 8 40 9 5 35 0 3 37 Sum 35 46 The estmated regresson lne s: y 5.559 +.88 x and the resduals are calculated as: Observaton x y ŷ y- ŷ ε 50 39.8 0.8 03.68 3 54 4.0.99 68.85 3 0 48 37.44 0.56.49 4 9 47 36.5 0.75 5.50 5 0 70 49.3 0.68 47.5 6 7 0 33.88-3.88 9.55 7 4 5 30.3-5.3 34.45 8 40 5.70 -.70 36.89 9 5 35 43.38-8.38 70.6 0 3 37 5.89-5.89 5.44 Total 0,83.63 Therefore, SS Resdaul 83.63 / 8 6.70 SEE 6.70 5.06 Total varaton Unexplaned varaton SS SS Total SSResdual Total varaton SS SS Total Regresson B. An R of 0.49 ndcates that the ndependent varables explan 49% of the varaton n the dependent varable. Total Regresson otes, Prepared by Pamela Peterson Drake 8 of

Example, contnued Contnung the prevous regresson example, we can calculate t he R : Observaton x y (y- y) ŷ Y-ŷ ( ŷ - y) ε 50 70.56 39.8 0.8 3.8 03.68 3 54 53.76 4.0.99 0.35 68.85 3 0 48 40.96 37.44 0.56 7.30.49 4 9 47 9.6 36.5 0.75 8.59 5.50 5 0 70 806.56 49.3 0.68 59.65 47.5 6 7 0 466.56 33.88-3.88 59.65 9.55 7 4 5 707.56 30.3-5.3 7.43 34.45 8 40.56 5.70 -.70 0.0 36.89 9 5 35 43.56 43.38-8.38 3.8 70.6 0 3 37.6 5.89-5.89 7.43 5.44 Total 46,34.40 46.00 0.00 58.77,83.63 R 58.77 /,34.40.57% or R (,83.63 /,34.40) 0.7743.57% 7. A confdence nterval s the range of regresson coeffcent values for a gven value estmate of the coeffcent and a gven level of probablty. A. The confdence nterval for a regresson coeffcent bˆ s calculated as: or bˆ ± t c s bˆ bˆ t c s < b bˆ < bˆ + t c s bˆ where t c s the crtcal t-value for the selected confdence level. If there are 30 degrees of freedom and a 95% confdence level, t c s.04 [taken from a t-table]. B. The nterpretaton of the confdence nterval s that ths s an nterval that we beleve wll nclude the true parameter ( s n the case above) wth the specfed level of confdence. bˆ 8. As the standard error of the estmate (the varablty of the data about the regresson lne) rses, the confdence wdens. In other words, the more varable the data, the less confdent you wll be when you re usng the regresson model to estmate the coeffcent. 9. The standard error of the coeffcent s the square root of the rato of the varance of the regresson to the varaton n the ndependent varable: s bˆ n (x s e x) A. Hypothess testng: an ndvdual explanatory varable Regresson otes, Prepared by Pamela Peterson Drake 9 of

. To test the hypothess of the slope coeffcent (that s, to see whether the estmated slope s equal to a hypotheszed value, b 0, Ho: b b, we calculate a t-dstrbuted statstc: ˆb -b t b s ˆb. The test statstc s t dstrbuted wth k degrees of freedom (number of observatons (), less the number of ndependent varables (k), less one). B. If the t statstc s greater than the crtcal t value for the approprate degrees of freedom, (or less than the crtcal t value for a negatve slope) we can say that the slope coeffcent s dfferent from the hypotheszed value, b. C. If there s no relaton between the dependent and an ndependent varable, the slope coeffcent, b, would be zero. Y b0 0 ote: The formula for the standard error of the coeffcent has the varaton of the ndependent varable n the denomnator, not the varance. The varance varaton / n-. X b 0 A zero slope ndcates that there s no change n Y for a gven change n X A zero slope ndcates that there s no relatonshp between Y and X. D. To test whether an ndependent varable explans the varaton n the dependent varable, the hypothess that s tested s whether the slope s zero: Ho: b 0 versus the alternatve (what you conclude f you reject the null, Ho): Ha: b / 0 Ths alternatve hypothess s referred to as a two-sded hypothess. Ths means that we reject the null f the observed slope s dfferent from zero n ether drecton (postve or negatve). E. There are hypotheses n economcs that refer to the sgn of the relaton between the dependent and the ndependent varables. In ths case, the alternatve s drectonal (> or <) and the t-test s one-sded (uses only one tal of the t-dstrbuton). In the case of a one-sded alternatve, there s only one crtcal t-value. Regresson otes, Prepared by Pamela Peterson Drake 0 of

Example 3: Testng the sgnfcance of a slope coeffcent Suppose the estmated slope coeffcent s 0.78, the sample sze s 6, the standard error of the coeffcent s 0.3, and the level of sgnfcance s 5%. Is the slope dfference than zero? The calculated test statstc s: t b The crtcal t-values are ±.060: bˆ b s bˆ 0.78 0.4375 0.3 -.060.060 Reject H 0 Fal to reject H 0 Reject H 0 Therefore, we reject the null hypothess, concludng that the slope s dfferent from zero. 0. Interpretaton of coeffcents. A. The estmated ntercept s nterpreted as the value of the dependent varable (the Y) f the ndependent varable (the X) takes on a value of zero. B. The estmated slope coeffcent s nterpreted as the change n the dependent varable for a gven one-unt change n the ndependent varable. C. Any conclusons regardng the mportance of an ndependent varable n explanng a dependent varable requres determnng the statstcal sgnfcance f the slope coeffcent. Smply lookng at the magntude of the slope coeffcent does not address ths ssue of the mportance of the varable.. Forecastng s usng regresson nvolves makng predctons about the dependent varable based on average relatonshps observed n the estmated regresson. A. Predcted values are values of the dependent varable based on the estmated regresson coeffcents and a predcton about the values of the ndependent varables. B. For a smple regresson, the value of Y s predcted as: Example 4 Suppose you estmate a regresson model wth the followng estmates: ŷ.50 +.5 X In addton, you have forecasted value for the ndependent varable, X 0. The forecasted value for y s 5.5: ŷ.50 +.50 (0).50 + 50 5.5 Regresson otes, Prepared by Pamela Peterson Drake of

where ŷ x p ˆ + ˆ ŷ b 0 b x p s the predcted value of the dependent varable, and s the predcted value of the ndependent varable (nput).. An analyss of varance table (AOVA table) table s a summary of the explanaton of the varaton n the dependent varable. The basc form of the AOVA table s as follows: Source of varaton Degrees of freedom Sum of squares Mean square Regresson (explaned) Sum of squares regresson (SS Regresson ) Error (unexplaned) - Sum of squares resdual (SS Resdual ) Total - Sum of squares total (SS Total ) Mean square regresson SS Regresson Mean square error SS Resdual - Example 5 Source of Degrees of Sum of Mean varaton freedom squares square Regresson (explaned) 5050 5050 Error (unexplaned) 8 600.49 Total 9 5650 R 5,050 0.8938 or 89.38% 5,650 SEE 600.49 4.69 8 Regresson otes, Prepared by Pamela Peterson Drake of

Multple Regresson. Multple regresson s regresson analyss wth more than one ndependent varable. A. The concept of multple regresson s dentcal to that of smple regresson analyss except that two or more ndependent varables are used smultaneously to explan varatons n the dependent varable. B. In a multple regresson, the goal s to mnmze the sum of the squared errors. Each slope coeffcent s estmated whle holdng the other varables constant. y b 0 + b x + b x + b 3 x 3 + b 4 x 4. The ntercept n the regresson equaton has the same nterpretaton as t dd under the smple lnear case the ntercept s the value of the dependent varable when all ndependent varables are equal zero. 3. The slope coeffcent s the parameter that reflects the change n the dependent varable for a one unt change n the ndependent varable. A. The slope coeffcents (the betas) are descrbed as the movement n the dependent varable for a one unt change n the ndependent varable holdng all other ndependent varables constant. B. For ths reason, beta coeffcents n a multple lnear regresson are sometmes called partal betas or partal regresson coeffcents. 4. Regresson model: Y b 0 + b x + b x + ε We do not represent the multple regresson graphcally because t would requre graphs that are n more than two dmensons. A slope by any other name The slope coeffcent s the elastcty of the dependent varable wth respect to the ndependent varable. In other words, t s the frst dervatve of the dependent varable wth respect to the ndependent varable. where: b j x j s the slope coeffcent on the j th ndependent varable; and s the th observaton on the j th varable. A. The degrees of freedom for the test of a slope coeffcent are -k-, where n s the number of observatons n the sample and k s the number of ndependent varables. B. In multple regresson, the ndependent varables may be correlated wth one another, resultng n less relable estmates. Ths problem s referred to as multcollnearty. 5. A confdence nterval for a populaton regresson slope n a multple regresson s an nterval centered on the estmated slope: bˆ ± t c s or bˆ { bˆ } t c s < b bˆ < + t c s bˆ A. Ths s the same nterval usng n smple regresson for the nterval of a slope coeffcent. B. If ths nterval contans zero, we conclude that the slope s not statstcally dfferent from zero. bˆ Regresson otes, Prepared by Pamela Peterson Drake 3 of

6. The assumptons of the multple regresson model are as follows: A. A lnear relatonshp exsts between dependent and ndependent varables. B. The ndependent varables are uncorrelated wth the resduals; that s, the ndependent varable s not random. In addton, there s no exact lnear relaton between two or more ndependent varables. [ote: ths s modfed slghtly from the assumptons of the smple regresson model.] C. The expected value of the dsturbance term s zero; that s, E(ε )0 D. There s a constant varance of the dsturbance term; that s, the dsturbance or resdual terms are all drawn from a dstrbuton wth an dentcal varance. In other words, the dsturbance terms are homoskedaststc. [A volaton of ths s referred to as heteroskedastcty.] E. The resduals are ndependently dstrbuted; that s, the resdual or dsturbance for one observaton s not correlated wth that of another observaton. [A volaton of ths s referred to as autocorrelaton.] F. The dsturbance term (a.k.a. resdual, a.k.a. error term) s normally dstrbuted. G. The resdual (a.k.a. dsturbance term, a.k.a. error term) s what s not explaned by the ndependent varables. 7. In a regresson wth two ndependent varables, the resdual for the th observaton s: ε Y ( bˆ 0 + bˆ x + bˆ x ) 8. The standard error of the estmate (SEE) s the standard error of the resdual: s e eˆ t t SSE SEE k k 9. The degrees of freedom, df, are calculated as: number of number of df k (k + ) observato ns ndependen t varable s A. The degrees of freedom are the number of ndependent peces of nformaton that are used to estmate the regresson parameters. In calculatng the regresson parameters, we use the followng peces of nformaton: The mean of the dependent varable. The mean of each of the ndependent varables. B. Therefore, f the regresson s a smple regresson, we use the two degrees of freedom n estmatng the regresson lne. f the regresson s a multple regresson wth four ndependent varables, we use fve degrees of freedom n the estmaton of the regresson lne. Regresson otes, Prepared by Pamela Peterson Drake 4 of

0. Forecastng s usng regresson nvolves makng predctons about the dependent varable based on average relatonshps observed n the estmated regresson. A. Predcted values are values of the dependent varable based on the estmated regresson coeffcents and a predcton about the values of the ndependent varables. B. For a smple regresson, the value of y s predcted as: where ŷ bˆ xˆ yˆ bˆ ˆ + ˆ 0 + bxˆ b xˆ s the predcted value of the dependent varable, s the estmated parameter, and s the predcted value of the ndependent varable C. The better the ft of the regresson (that s, the smaller s SEE), the more confdent we are n our predctons. Example 6: Usng analyss of varance nformaton Suppose we estmate a multple regresson model that has fve ndependent varables usng a sample of 65 observatons. If the sum of squared resduals s 789, what s the standard error of the estmate? Soluton Gven: SEE SS Resdual 789 65 k 5 789 789 3.373 65-5- 59 Cauton: The estmated ntercept and all the estmated slopes are used n the predcton of the dependent varable value, even f a slope s not statstcally sgnfcantly dfferent from zero. Example 7: Calculatng a forecasted value Suppose you estmate a regresson model wth the followng estmates: Y^.50 +.5 X 0. X +.5 X 3 In addton, you have forecasted values for the ndependent varables: X 0 X 0 X 3 50 What s the forecasted value of y? Soluton The forecasted value for Y s 90: Y^.50 +.50 (0) 0.0 (0) +.5 (50).50 + 50 4 + 6.50 90. The F-statstc s a measure of how well a set of ndependent varables, as a group, explan the varaton n the dependent varable. A. The F-statstc s calculated as: Regresson otes, Prepared by Pamela Peterson Drake 5 of

(y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k B. The F statstc can be formulated to test all ndependent varables as a group (the most common applcaton). For example, f there are four ndependent varables n the model, the hypotheses are: H 0 : b b b3 b 4 0 Ha: at least one b 0 C. The F-statstc can be formulated to test subsets of ndependent varables (to see whether they have ncremental explanatory power). For example f there are four ndependent varables n the model, a subset could be examned: H 0 : bb 40 Ha: b or b 4 0. The coeffcent of determnaton, R, s the percentage of varaton n the dependent varable explaned by the ndependent varables. R Explaned varaton Total varaton Total Unexplaned - varaton varaton Total varaton R (y ˆ y) (y y) 0 < R < A. By constructon, R ranges from 0 to.0 B. The adjusted-r s an alternatve to R : R ( R k ). The adjusted R s less than or equal to R ( equal to only when k)... Addng ndependent varables to the model wll ncrease R. Addng ndependent varables to the model may ncrease or decrease the adjusted-r (ote: adjusted-r can even be negatve). The adjusted R does not have the clean explanaton of explanatory power that the R has. 3. The purpose of the Analyss of Varance (AOVA) table s to attrbute the total varaton of the dependent varable to the regresson model (the regresson source n column ) and the resduals (the error source from column ). A. SS Total s the total varaton of Y about ts mean or average value (a.k.a. total sum of squares) and s computed as: Regresson otes, Prepared by Pamela Peterson Drake 6 of

SS (y -y) Total where y s the mean of Y. n B. SS Resdual (a.k.a. SSE) s the varablty that s unexplaned by the regresson and s computed as: n Resdual ˆ ˆ SS SSE (y-y) e where s Ŷ the value of the dependent varable usng the regresson equaton. C. SS Regresson (a.k.a. SS Explaned ) s the varablty that s explaned by the regresson equaton and s computed as SS Total SS Resdual. SS (y-y) ˆ Regresson D. MSE s the mean square error, or MSE SS Resdual / ( k - ) where k s the number of ndependent varables n the regresson. E. MSR s the mean square regresson, MSR SS Regresson / k Analyss of Varance Table (AOVA) Source df (Degrees of Freedom) SS (Sum of Squares) Mean Square (SS/df) Regresson k SS Regresson MSR Error -k- SS Resdual MSE Total - SS Total SS SS R - SS SS Regresson Resdual Total Total MSR F MSE 4. Dummy varables are qualtatve varables that take on a value of zero or one. A. Most ndependent varables represent a contnuous flow of values. However, sometmes the ndependent varable s of a bnary nature (t s ether O or OFF). B. These types of varables are called dummy varables and the data s assgned a value of "0" or "". In many cases, you apply the dummy varable concept to quantfy the mpact of a qualtatve varable. A dummy varable s a dchotomous varable; that s, t takes on a value of one or zero. Regresson otes, Prepared by Pamela Peterson Drake 7 of

C. Use one dummy varable less than the number of classes (e.g., f have three classes, use two dummy varables), otherwse you fall nto the dummy varable "trap" (perfect multcollnearty volatng assumpton []). D. An nteractve dummy varable s a dummy varable (0,) multpled by a varable to create a new varable. The slope on ths new varable tells us the ncremental slope. 5. Heteroskedastcty s the stuaton n whch the varance of the resduals s not constant across all observatons. A. An assumpton of the regresson methodology s that the sample s drawn from the same populaton, and that the varance of resduals s constant across observatons; n other words, the resduals are homoskedastc. B. Heteroskedastcty s a problem because the estmators do not have the smallest possble varance, and therefore the standard errors of the coeffcents would not be correct. 6. Autocorrelaton s the stuaton n whch the resdual terms are correlated wth one another. Ths occurs frequently n tme-seres analyss. A. Autocorrelaton usually appears n tme seres data. If last year s earnngs were hgh, ths means that ths year s earnngs may have a greater probablty of beng hgh than beng low. Ths s an example of postve autocorrelaton. When a good year s always followed by a bad year, ths s negatve autocorrelaton. B. Autocorrelaton s a problem because the estmators do not have the smallest possble varance and therefore the standard errors of the coeffcents would not be correct. 7. Multcollnearty s the problem of hgh correlaton between or among two or more ndependent varables. A. Multcollnearty s a problem because. The presence of multcollnearty can cause dstortons n the standard error and may lead to problems wth sgnfcance testng of ndvdual coeffcents, and. Estmates are senstve to changes n the sample observatons or the model specfcaton. B. If there s multcollnearty, we are more lkely to conclude a varable s not mportant. C. Multcollnearty s lkely present to some degree n most economc models. Perfect multcollnearty would prohbt us from estmatng the regresson parameters. The ssue then s really a one of degree. 8. The economc meanng of the results of a regresson estmaton focuses prmarly on the slope coeffcents. A. The slope coeffcents ndcate the change n the dependent varable for a one-unt change n the ndependent varable. Ths slope can than be nterpreted as an elastcty measure; that s, the change n one varable correspondng to a change n another varable. B. It s possble to have statstcal sgnfcance, yet not have economc sgnfcance (e.g., sgnfcant abnormal returns assocated wth an announcement, but these returns are not suffcent to cover transactons costs). Regresson otes, Prepared by Pamela Peterson Drake 8 of

To test the role of a sngle varable n explanng the varaton n the dependent varable test the role of all varables n explanng the varaton n the dependent varable estmate the change n the dependent varable for a oneunt change n the ndependent varable estmate the dependent varable f all of the ndependent varables take on a value of zero estmate the percentage of the dependent varable s varaton explaned by the ndependent varables forecast the value of the dependent varable gven the estmated values of the ndependent varable(s) use the t -statstc. the F-statstc. the slope coeffcent. the ntercept. the R. the regresson equaton, substtutng the estmated values of the ndependent varable(s) n the equaton. Regresson otes, Prepared by Pamela Peterson Drake 9 of

Regresson termnology Analyss of varance AOVA Autocorrelaton Coeffcent of determnaton Confdence nterval Correlaton coeffcent Covarance Covaraton Cross-sectonal Degrees of freedom Dependent varable Explaned varable Explanatory varable Forecast F-statstc Heteroskedastcty Homoskedastcty Independent varable Intercept Least squares regresson Mean square error Mean square regresson Multcollnearty Multple regresson egatve correlaton Ordnary least squares Perfect negatve correlaton Perfect postve correlaton Postve correlaton Predcted value R Regresson Resdual Scatterplot s e SEE Smple regresson Slope Slope coeffcent Spurous correlaton SS Resdual SS Regresson SS Total Standard error of the estmate Sum of squares error Sum of squares regresson Sum of squares total Tme-seres t-statstc Varance Varaton Regresson otes, Prepared by Pamela Peterson Drake 0 of

Regresson formulas Varances Varaton ( x x ) Varance ( x x) Covarance (x-x)(y-y) - Correlaton r (x,n x) (x x) (y y) ( ) (y y) t r - - r Regresson y b 0 + b x + ε y b 0 + b x + b x + b 3 x 3 + b 4 x 4 + ε b cov(x, Y) var(x) (y y)(x (x x) x) ˆb 0y-bx Tests and confdence ntervals ( y bˆ 0 bˆ x ) s e (y yˆ ) eˆ s bˆ n (x s e x) t b ˆb -b s ˆb bˆ t c s < b bˆ < bˆ + t c s bˆ (y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k Smple Regresson, prepared by Pamela Peterson Drake

Forecastng ŷb ˆ 0+bx ˆ p+bx ˆ p+bx ˆ ˆ 3p+...+bx Kp Analyss of Varance SS (y -y) Total n SS SS R - n Resdual ˆ ˆ Regresson SS SSE (y-y) e Regresson Resdual SS Total SSTotal (y ˆ y) (y y) (y ˆ y) SSRegresson Mean squared regresson MSR k k F Mean squared error MSE SS Resdual (y ˆy) -k- k SS (y-y) ˆ Smple Regresson, prepared by Pamela Peterson Drake