Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Similar documents
Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Business and Economics

STATISTICS QUESTIONS. Step by Step Solutions.

Statistics for Economics & Business

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Statistics II Final Exam 26/6/18

Chapter 14 Simple Linear Regression

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chapter 9: Statistical Inference and the Relationship between Two Variables

SIMPLE LINEAR REGRESSION

Statistics MINITAB - Lab 2

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

x i1 =1 for all i (the constant ).

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Chapter 13: Multiple Regression

Basic Business Statistics, 10/e

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 11: Simple Linear Regression and Correlation

Economics 130. Lecture 4 Simple Linear Regression Continued

Lecture 6: Introduction to Linear Regression

/ n ) are compared. The logic is: if the two

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Comparison of Regression Lines

e i is a random error

β0 + β1xi. You are interested in estimating the unknown parameters β

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Scatter Plot x

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Negative Binomial Regression

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Chapter 15 - Multiple Regression

a. (All your answers should be in the letter!

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

STAT 511 FINAL EXAM NAME Spring 2001

STAT 3008 Applied Regression Analysis

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Introduction to Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 4 Hypothesis Testing

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Professor Chris Murray. Midterm Exam

Properties of Least Squares

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

January Examinations 2015

Introduction to Analysis of Variance (ANOVA) Part 1

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

18. SIMPLE LINEAR REGRESSION III

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Linear Regression Analysis: Terminology and Notation

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

28. SIMPLE LINEAR REGRESSION III

Statistics Chapter 4

x = , so that calculated

Lecture 3 Stat102, Spring 2007

Chapter 15 Student Lecture Notes 15-1

Correlation and Regression

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

β0 + β1xi. You are interested in estimating the unknown parameters β

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Linear Approximation with Regularization and Moving Least Squares

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

PROBABILITY PRIMER. Exercise Solutions

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

First Year Examination Department of Statistics, University of Florida

Basically, if you have a dummy dependent variable you will be estimating a probability.

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

This column is a continuation of our previous column

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Learning Objectives for Chapter 11

Kernel Methods and SVMs Extension

Biostatistics 360 F&t Tests and Intervals in Regression 1

Lecture 3: Probability Distributions

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 8 Indicator Variables

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

β0 + β1xi and want to estimate the unknown

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

The topics in this section concern with the second course objective. Correlation is a linear relation between two random variables.

LECTURE 9 CANONICAL CORRELATION ANALYSIS

REGRESSION ANALYSIS II- MULTICOLLINEARITY

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Numerical Solution of Ordinary Differential Equations

Empirical Methods for Corporate Finance. Identification

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

CHAPTER 8. Exercise Solutions

Transcription:

Resource Allocaton and Decson Analss (ECON 800) Sprng 04 Foundatons of Regresson Analss Readng: Regresson Analss (ECON 800 Coursepak, Page 3) Defntons and Concepts: Regresson Analss statstcal technques for modelng and analzng the relatonshp between multple varables In regresson analss, we examne the relatonshp between a sngle varable (Y ) and a set of varables ( X ) Y s the dependent varable X are the ndependent varables when specfng a relatonshp between X and Y for a regresson, we are mplctl assumng causalt (as opposed to just correlaton) =>.e., assumng that changes n the values of X cause changes n the value of Y Inputs nto a Regresson Analss:. Identf the dependent varable (.e., the varable whose value we want to estmate or forecast). Specf the explanator or ndependent varables (.e., the varables whose values determne the value taken b the dependent varable) 3. Determne the relevant group of observatons for the analss 4. Defne the mathematcal relaton between the dependent varable and the ndependent varables 5. Provde the data for the analss Outputs from a Regresson Analss:. Regresson coeffcents (whch eld an estmated functonal relatonshp between the ndependent varables and the dependent varable). Measures of Goodness of Ft (whch provde nformaton on the degree of relablt of the analss) 3. Estmates or Forecasts (whch allow us to gan nsghts useful for decson-makng) Sample Mean: n n Sample Varance: n var s n

Sample Standard Devaton: s s Sample Covarance: cov( x, ) n x x n Correlaton an observaton that the values of two varables tend to move together The drecton and magntude of correlaton between two varables s measured b the cov( X, Y ) Correlaton Coeffcent, defned as XY s X sy can take on values between ( ) and (+) postve value postve relatonshp negatve value nverse or negatve relatonshp value close to zero low degree of correlaton absolute value close to one hgh degree of correlaton Causaton a relatonshp between the values of two varables such that the value of one varable changes as a consequence of changes n the value of the other varable Resdual the dfference between the actual value of and the estmated value of Lnear Regresson Model (wth one ndependent varable) assumes a lnear relatonshp between the two varables: b 0 b x Denote our estmated parameter values b ˆb 0 and ˆb => for an observed value of x, ths elds an estmated value of gven b ˆ bˆ 0 bˆ x For such an estmate, we have an resdual of ˆ for each observaton => the squared resdual for each observaton s ˆ Statstcans have shown that the best estmates of the coeffcents of the equaton above are the values of the parameters that mnmze the summaton of squared resduals These best estmates are specfed b the followng formulas: and bˆ n ( x n x)( ) cov( x, ) ( x x) var( x) ˆ bˆ x b0

Man computer programs (ncludng Excel) can perform the necessar calculatons. To run a regresson n Excel:. Enable the Data Analss functon (see: http://www.addctvetps.com/wndows-tps/excel-00-data-analss/). Under the data tab, clck on Data Analss (whch should appear near the top rght of the screen) a. select Regresson b. dentf Y-range (.e., specf the dependent varable) c. dentf X-range (.e., specf the ndependent varables) d. clck OK (see: http://www.addctvetps.com/wndows-tps/excel-00-regresson-analss/) Total Sum of Squares (TSS) the sum total of the squared dfference between each observed value of the dependent varable and ts mean value (.e., n TSS ) Resdual Sum of Squares (RSS) the sum total of the squared dfference between each observed value of the dependent varable and ts estmated value (.e., RSS n ˆ ) Coeffcent of Determnaton (denoted R ) a measure of the goodness of ft of the TSS RSS regresson, calculated as R TSS mathematcall, the value of R must be between 0 and a larger value (.e., closer to ) mples that the regresson results explan a greater amount of the varaton n the value of the dependent varable When usng the results of a regresson to conduct hpothess testng, we tpcall specf: Null Hpothess, H : b 0 0 Alternatve Hpothess, H : b 0 We can NEVER PROVE that the true value of the coeffcent s not equal to zero. So, we state a Null Hpothess of b 0, and ntend to fnd support for a clam that ths null hpothess s NOT true (.e., support for a clam that the true coeffcent value s NOT equal to zero) That s, we mght conclude that the estmated coeffcent value s statstcall dfferent from zero at a specfed confdence level Tpe I Error an error whch occurs when the null hpothess s actuall true but t s ncorrectl rejected (.e., a true statement s rejected) Tpe II Error an error whch occurs when the null hpothess s false but t s ncorrectl not rejected (.e., a false statement s not rejected)

P-Value reports the level at whch we can clam our estmate to be statstcall sgnfcant states the smallest sgnfcance level for whch the null hpothess (of coeffcent value actuall equal to zero ) can be rejected essentall tells us the probablt of Tpe I Error f we reject the null hpothess the lower the p-value, the better our estmate Tpcal threshold of ether 5%, or %, or 0.% n mnd for statstcal sgnfcance => for each of these clams, the p-value would respectvel have to be less than.05, less than.0, or less than.00 for our estmated parameter value to be statstcall sgnfcant Multple Regresson Model (stll assumng lnear relatonshp) wth more than one ndependent varable, our equaton becomes: 0 b x b x... b x k k estmated parameter values of 0,, ˆb, ˆb 3,, and bˆ k elds an estmated equaton of: bˆ bˆ x bˆ x... bˆ 0 k xk nterpretaton of each coeffcent s that t reveals the mpact of a change n the value of one of the specfc ndependent varables, holdng all other varables constant

Example (lnear regresson wth one ndependent varable): Suppose we observe the followng data: Sellng Prce (n thousands of dollars) Sze of House (n square feet) 83.5,656 6,9 70,090 4.5,378 75.5,48 68.9,94 0,75 50,930 309.9,456 56.3,78 5.5,70 36.5,5 0.8,34 79.9,756 estmated coeffcents of b ˆ0 0. 960 and b ˆ 0. 876 => ˆ 0.960 (.876) x Sellng Prce Sze Estmate Resdual Resdual Squared 83.5,656 09.46-5.96 674.3 6,9 57.49 3.5.33 70,090 90.88-0.88 436.0 4.5,378 44.9 -.4 5.80 75.5,48 0.76-6.6 689.67 68.9,94 63. 5.78 33.45 0,75.4 -.4 53.96 50,930 60.87-0.87 8.06 309.9,456 59.54 50.36,535.99 56.3,78 6.5-69.85 4,878.9 5.5,70 05.89 45.6,080.39 36.5,5 97.45 39.05,55.5 0.8,34 38.6-7.36 748.3 79.9,756 8. 5.68,670.44 TOTAL 6,56.6 Recall, the Sum of Squared Resduals wthout observng the X s was 45,779.8. B usng the nformaton conveed n the observed values of the X s, the Sum of Squared Resduals has been reduced to onl 6,56.6. For ths example, TSS 45,779. 8 and RSS 6,56. 6 => R. 638

Regresson results for Example SUMMARY OUTPUT Regresson Statstcs Multple R 0.79888543 R Square 0.63879 Adjusted R Square 0.60806696 Standard Error 37.57494 Observatons 4 ANOVA df SS MS F Sgnfcance F Regresson 97.539 97.539.685705 0.000609948 Resdual 656.60676 380.73 Total 3 45779.834 Coeffcents Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 0.959776 84.7757395.37388879 0.035906 385.9054 6.48674093 385.9054 6.48674093 X Varable 0.8759664 0.0407736 4.60093455 0.000609948 0.09875855 0.7643473 0.09875855 0.7643473

Example (lnear regresson wth multple ndependent varables): Suppose we now observe the followng data: Prce Sq Ft Lot Sze Age 83.5,656 0.3 9 6,9 0.9 6 70,090 0. 5 4.5,378 0.7 75.5,48 0.33 68.9,94 0.38 7 0,75 0.39 6 50,930 0.33 6 309.9,456 0.75 6 56.3,78 0.5 5 5.5,70 0.63 3 36.5,5 0.44 0.8,34 0.35 79.9,756 0.38 8 estmated coeffcents of b ˆ0 0. 36, b ˆ. 0995, b ˆ 9. 4375, and b ˆ3 4. 085. => ˆ 0.36.0995 x 9.4375 x 4. 085 x3 hgher value of R. 8743 (t can generall be shown that the Coeffcent of Determnaton must alwas ncrease whenever a new ndependent varable s added) Postve valued coeffcent estmates for sze and lot sze and negatve valued coeffcent for age. (nothng surprsng ) Fnall, note that n terms of statstcal sgnfcance, the coeffcent estmate for sze s sgnfcant at the 5% level ( p-value s between.0 and.05) lot sze s sgnfcant at the 0% level ( p-value s between.05 and.) age s sgnfcant at the % level ( p-value s below.0) the ntercept s not sgnfcant ( p-value s above.)

Regresson results for Example SUMMARY OUTPUT Regresson Statstcs Multple R 0.93494785 R Square 0.8747435 Adjusted R Square 0.836365665 Standard Error 4.0050535 Observatons 4 ANOVA df SS MS F Sgnfcance F Regresson 3 4007.4074 3339.3575 3.4847649 8.097E 05 Resdual 0 576.44905 576.44905 Total 3 45779.834 Coeffcents Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 0.360533 74.405597 0.395363 0.8904665 76.470569 55.44646 76.470569 55.44646 X Varable 0.09950066 0.033899.98889958 0.035993 0.0535647 0.73674685 0.0535647 0.73674685 X Varable 9.43749835 48.8354777.89834937 0.08765637 6.3747687 0.49736 6.3747687 0.49736 X Varable 3 4.0850845.4759545 3.50399369 0.0056790 6.58550368.475609 6.58550368.475609

Multple Choce Questons:. can be broadl descrbed as statstcal technques for modelng and analzng the relatonshp between multple varables. A. Decson Analss. B. Lnear Programmng. C. Regresson Analss. D. None of the above answers s correct.. The dfference between the actual value of and the estmated value of s called the A. resdual. B. correlaton coeffcent. C. Total Sum of Squares. D. p-value. 3. Whch of the followng s NOT one of the nputs nto a Regresson Analss that was dscussed n lecture and the Coursepak? A. Identf the Dependent Varable. B. Provde data for the analss. C. Obtan estmated values of regresson coeffcents. D. None of the above answers s correct (snce each choce s one of the nputs nto a Regresson Analss ). 4. refers to an error whch occurs when the null hpothess s false but t s ncorrectl not rejected. A. Tpe I Error. B. Tpe II Error. C. Tpe III Error. D. Omtted varables bas. 5. The Coeffcent of determnaton, denoted A. R RSS TSS. B. R TSS RSS. C. RSS TSS R. TSS D. TSS RSS R. TSS R, s defned as 6. Whch of the followng values s NOT a possble value for R, based upon the mathematcal defnton of the Coeffcent of Determnaton? A. 0.395. B. 0.78. C..46. D. More than one (perhaps all) of the above answers s correct (.e., more than one of the above choces s NOT a possble value for R ).

7. Consder the lnear relatonshp b 0 b x. The best estmates of the parameters are those whch mnmze the sum total of the squared values of resduals. For the parameter b, ths best estmate s: cov( x, ) A. bˆ var( x). B. bˆ x. x C. bˆ x. x D. bˆ var( x). Problem Solvng or Short Answer Questons: Answer Questons and usng the data posted onlne at: http://ksuweb.kennesaw.edu/~tmathew7/econ800/06_regressonanalssfoundatons_problemsetdata_sprng04.xlsx. The worksheet ttled Data for Queston contans observatons on Quz Average and Fnal Exam Grade for 30 students enrolled n ECON 00 durng a prevous semester. Usng ths data, answer the followng questons. A. Determne the values of Sample Mean, Sample Standard Devaton, and Sample Covarance for each varable. B. Determne the value of the Correlaton Coeffcent between these two varables. Does there appear to be a postve or negatve correlaton between Quz Average and Fnal Exam Grade? C. If we run a regresson to estmate the equaton ( ExamGrade) b0 b ( QuzAvg), what must we assume about the fundamental relaton between Fnal Exam Grade and Quz Average? Explan. D. Run a regresson to estmate the parameters n ExamGrade) b b ( ). What are the estmated values of ˆb 0 and ( 0 QuzAvg ˆb? Are each of these estmate statstcall sgnfcant at the % level? Explan. What s the value of the Coeffcent of Determnaton for ths Regresson? If Jan s quz average s 0 ponts hgher than Marca s quz average, how wll ther expected Fnal Exam Grades dffer one another? Explan.. Suz wants to run a regresson to determne the mpact of three dfferent factors (runnng speed, weght, and heght) on the salar of professonal football runnng backs. She has obtaned the data n the worksheet ttled Data for Queston. Ths dataset contans observatons on Salar (annual base salar plus prorated sgnng bonus, n mllons of dollars), Speed (tme needed to run 40 ards, measured n seconds), Heght (measured n nches), and Weght (measured n pounds) for the startng runnng back on each of the 3 teams n the league. A. Compute the value of Sample Mean and Sample Standard Devaton for each varable.

B. Run a regresson to estmate ( Salar) b0 b ( Speed) b ( Heght) b3 ( Weght). What are the estmated values of ˆb 0, ˆb, ˆb, and ˆb 3? C. Based upon our regresson results, whch of the three ndependent varables have an mpact on Salar that s statstcall sgnfcant at the % level? Whch of the three ndependent varables have an mpact on Salar that s statstcall sgnfcant at the 0% level? D. If Runnng Back A and Runnng Back are smlar n all aspects, except Runnng Back A s 67 nches tall whle Runnng Back B s 69 nches tall, b how much do ther expected salares dffer? Explan. E. All other factors equal, f Runnng Back C were to decrease hs 40 ard dash tme b (.05) seconds, b how much would hs expected salar change? Explan. Answers to Multple Choce Questons:. C. A 3. C 4. B 5. D 6. D 7. A Answers to Problem Solvng or Short Answer Questons: A. The value of the mean of the data gven n rows through 3 of Column X can be computed b Excel usng the formula =average(x:x3). The value of the standard devaton of the data gven n rows through 3 of Column X can be computed b Excel usng the formula =stdev(x:x3). Dong ths, for Quz Average we obtan a mean of 47.5 and a standard devaton of.754. Lkewse, for Fnal Exam we obtan a mean of 57.3667 and a standard devaton of 3.8377. The sample covarance can be computed b the Excel code =COVARIANCE.S(B:B3,C:C3). Ths elds a value of 47.7586. B. The value of the Correlaton Coeffcent can be obtaned b ether usng the Excel code cov( X, Y ) =CORREL(B:B3,C:C3) or b applng the formula of XY (usng s X sy the relevant values determned n part A). In ether case, we have XY. 4693. Snce ths value s postve, there s a postve correlaton between Quz Average and Fnal Exam Grade. C. In order to legtmatel run a regresson on the equaton ( ExamGrade) b0 b ( QuzAvg), we must assume that there s not just a correlaton between Quz Average and Fnal Exam Grade, but rather a causal relaton between these two varables. That s, we must assume that a hgher Quz Average causes the Fnal Exam Grade to be hgher.

D. The estmated parameter values are b ˆ0 43. 808798 and b ˆ 0. 8549. The correspondng p-values are 6.0046E-09 and 0.00888756 respectvel. Snce each of these values s less than (.00), each estmated coeffcent s statstcall sgnfcant at the % level. The Coeffcent of Determnaton for ths regresson s R. 054. Fnall, f Jan s quz average s 0 ponts hgher than Marca s quz average, Jan s expected grade on the Fnal Exam wll be ( 0) b ˆ (0)0.85. 85 ponts hgher than Marca s. A. Usng the Excel formulas descrbed n the answer to queston A, we obtan: Salar Speed Heght Weght Sample Mean 4.00465 4.499375 69.4375 7.9063 Sample Std Dev.858 0.0708.60583.469 B. The estmated parameter values are b ˆ0 53. 50058, b ˆ 6. 4648, b ˆ 0. 38383, and b ˆ3 0.0736. C. The correspondng p-values are.84466e-07, 4.9866E-05, 0.06860477, and 0.600344 respectvel. The frst two of these are less than (.0), whle the last two are greater than (.0). Thus, the estmates of the ntercept and the coeffcent for Speed are each statstcall sgnfcant at the % level, whle the estmates for the coeffcents for Heght and Weght are not. Smlarl, the frst three are less than (.0), whle the fnal one s greater than (.0). Thus, the estmates of the ntercept, the coeffcent for Speed, and the coeffcent for Heght are each statstcall sgnfcant at the 0% level, whle the estmate for the coeffcent for Weght s not. D. If Runnng Back A s two nches shorter than Runnng Back B, then the expected salar of Runnng Back A wll dffer from that for Runnng Back B b approxmatel ( )( 0.38383) 0. 76766 (.e., the shorter runnng back can expect to earn $767,660 more). E. If Runnng Back C were to shorten hs 40 ard dash tme b (.05) seconds (all other factors equal), hs expected salar would be approxmatel (.05)( 6.4648). 33074 (.e., $,33,074) greater.