FAQ: Linear and Multiple Regression Analysis: Coefficients

Size: px
Start display at page:

Download "FAQ: Linear and Multiple Regression Analysis: Coefficients"

Transcription

1 Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable (dependent variable) can be predicted from the others (independent variables). For example, if one knows the relationship between corporate research and development expenditures and future sales, one may be able to predict future sales. The linear regression model is typically represented as Y= a+bx+e. In order to construct a regression model, information on both x and y must be obtained from a sample of objects or individuals. The relationship between the two can then be estimated. Assume we are estimated the relation between the number of operating hours of a machine and its annual repair and maintenance costs. In this example, the number of operating hours is the independent variable and the repair cost is the dependent variable. The parameters a and b can take on any of an infinite number of real values. The goal in the regression procedure is to create a model where we can predict the value of y with good accuracy. More exactly, we estimate a regression to minimize the sum of the squared deviations between predicted y and actual y (the residual). Fortunately, statisticians have developed equations to estimate a and b so that the best fitting regression line is obtained (called least squares equations). So, lets use the following information on machine hours and repair costs to calculate a least squares regression line. First, collect data to estimate the relation. We limit the sample to 5 observations for simplicity only. Typically at least 3 observations will be required to fit any type of line. Hours Costs 1

2 5 $1, 8 $2, 1,5 $2,5 2,5 $7, 6, $9, Mean 2,26 Mean $4,3 Second, estimate the regression coefficient b, the average amount repair costs increase when operating hours increase one unit and other independent variables are held constant. b is estimated as S (x-mean of x) (y-mean of y) / S(x-mean of x) 2 This is the covariance of x and y divided by the variance of the x. In our example, this estimate is (5-2,26)*(1,-4,3) + (8-2,26)*(2,-4,3) + (5-226) + (8-226) + b = Third, estimate for a is computed as mean of y (b*mean of x). In our example, this estimate is a = 43 (1.449 * 226) The final estimated line is thus y = 1, x + e. This means we can predict repair costs. For x equal to 3 hours, we estimate repair costs as 1, *3, = $5,373. 2

3 Note that for b coefficients for dummy variables which have been binary coded (1 or ), or based on categories, b is relative to the reference category (the category left out). Thus for the a of dummy variables for "Region," assuming "North" is the reference category and income is the dependent, a b of -1.5 for the dummy "South" means that the expected income the South is 1.5 units less than the average of "North" respondents. Also, note that t-tests are used to assess the significance of individual b coefficients. specifically testing the null hypothesis that the regression coefficient is zero. A common rule of thumb is to drop from the equation all variables not significant at the.5 level or better. Note that there is always error in the regression procedure. Error may be incorporated into the information collected to estimate the line. Regression models are a source of information about the world but they must be used wisely. Question 2: How do I calculate the coefficient of correlation? Answer 2: The correlation coefficient measures the strength of the relationship between two variables. Perfectly positive correlation coefficients are +1.. This means that for every change in one variable, the other variable changes in the same direction and in the same proportion as every other change in the first variable. For example, if every $1, increase in a company s reported net income is associated with a 5% increase in its stock price, then the relationship between company reported net income and stock price is perfectly positively correlated (the correlation coefficient equals 1.). Note that this correlation infers nothing about cause and effect. That is, an increase in the stock price may not be caused by the increase in net income. It may be that another identified variable is affecting both. Perfectly negative correlations also exist such as the relationship between age and purchases of CDs. When no correlation exists, the coefficient is zero, such as the relation between the air temperature and birth rates. Like any statistic, we can test the significance of the measured correlation using a t-statistic. 3

4 The correlation coefficient is defined as the covariance of x and y divided by the product of the standard deviations of x and y. Correlation coefficients are particularly useful when we have situations in which y is considered to be a dependent variable and x is taken as an independent variable and times when x is considered to be the dependent variable and y is the independent variable. So how do we calculate it the coefficient if we do not use the function in excel? Consider the following example of 5 observations of the relation between number of years of education school past high school and dollars spent on new cars over the 1 years period following high school. Years $ 3, 5, 2 9, 4 95, 6 8, 8 First, compute the mean of each variable. Mean of years = 4. years Mean of $ = $69, Next compute the difference of each variable from its mean. Years Deviation $ Deviation 4. 3, 39K , 19K 4. 9, 21K 4

5 , 26K , 11K Third, for each pair of observations, sum the product of the deviations. The sum is (4. * 39K) + (2. * 19K) + (. * 21K) + (2. * 26K) + (4. * 11K) = 29K Next, compute the product of the sum of the squared deviations. For Years: (4.) 2 + (2.) 2 + (.) 2 + (2.) 2 + (4.) 2 = 4 For $: (39K) 2 +(19K) 2 +(21K) 2 +(26K) 2 +(11K) 2 = 3,12,, The product of the sums is then $124,8,, Finally, divide the sum of the products by the square root of the product of the sum of the squared deviations. 29, / SQR (124,8,,) = 29, / 353, =.82 This is r, the correlation coefficient. In this case the coefficient is positive by not perfectly correlated. We could use a t-statistic to test the significance. Question 3: How do I calculate the coefficient of determination? Answer 3: The coefficient of determination, R 2, is the percent of the variance in the dependent explained by the independents. R-squared can also be interpreted as the proportionate reduction in error in estimating the dependent when knowing the independents. That is, R 2 is the number of errors made when using the regression model to guess the value of the dependent, in ratio to the total errors made when using only the dependent's mean as the basis for estimating all cases. 5

6 Mathematically, R 2 = (1 - (SSE/SST)), where SSE = error sum of squares and SST = total sum of squares. Error sum of squares is the sum of the squared residuals (predicted y less actual y) and total sum of squares is the sum of the squared deviations between actual y and the mean value of y. SSE is the error not explained by the model and SST is the total error whether explained or not. In our example of machine hours and costs, we can compute R 2. First, using the estimated regression(y=1, *x), compute the predicted y for each x. Hours Actual Predicte Costs d Cost 5 $1, 1,75 8 $2, 2,185 1,5 $2,5 3,2 2,5 $7, 4,649 6, $9, 9,72 Mean 2,26 Mean $4,3 Mean $4,31 Second, compute SSE as (1-175) 2 + (2 2185) 2 + (25-32) 2 + (7-4649) 2 + (9 972) 2 = approx. 7,, Third, compute SST as (175 4,31) 2 + ( ) 2 + (32 431) 2 +( ) 2 +( ) 2 = approx. 41,68, Thus, R 2 = (1-16.8%) = 83.2% While R 2 can be increased by adding variables, it is inappropriate unless variables are added to the equation for sound theoretical 6

7 reason. At an extreme, when n-1 variables are added to a regression equation, R 2 will be 1, but this result is meaningless. Adjusted R 2 is used as a conservative reduction to R 2 to penalize for adding variables. Adjusted R-Square is an adjustment for the fact that when one has a large number of independents, it is possible that R 2 will become artificially high simply because some independents' chance variations "explain" small parts of the variance of the dependent. At the extreme, when there are as many independents as cases in the sample, R 2 will always be 1.. The adjustment to the formula arbitrarily lowers R 2 as p, the number of independents, increases. Also note that typically, R 2 should not be compared between samples because one sample may have more variation generally to explain. More variance might allow more explanation of the variance. Question 4: How do I calculate the standard error of the estimate? Answer 4: Residual variance is a measure of the variation of the y values about the regression line. That is, it is the deviations between actual y and predicted y. The square root of the residual variance is the standard error of estimate. If the standard error is too large, the model may not be useful. To compute the standard error, first, determine predicted y values. Second, compute the difference between predicted y and actual y values and square the differences. This is also known as the error sum of squares (SSE). It is the error that cannot be explained by the estimated regression model. Finally, scale the error sum of square (SSE) by the sample size less the number of independent variables less one. The square root of the result is the standard error of the estimate and it measures the dispersion of the actual values of y around the fitted regression line. The standard error is also called the standard deviation of the regression model. Even if the coefficient of determination is high, the standard error will be too high to provide adequate confidence in the model. For example two standard deviations might be impractical for prediction intervals. A better model with different variables or additional variables is then warranted. 7

8 Question 5: What are the similarities between simple linear regression analysis and multiple regression analysis? Answer 5: The multiple regression equation takes the form: y = a + b 1 x 1 + b 2 x b n x n + e. In multiple regression, more than one independent variables explain the one dependent variable. This is expected to reduce the standard error. The b's are the regression coefficients, representing the amount the dependent variable y changes when the independent changes 1 unit. The a is the constant, where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are. Associated with multiple regression is R 2, multiple correlation, which is the percent of variance in the dependent variable explained collectively by all of the independent variables. While for simple linear regression, R 2 2 (that is, the simple correlations coefficient squared), for multiple regression the same equality does not hold. Like linear regression, multiple regression shares all the assumptions of correlation: linearity of relationships, the same level of relationship throughout the range of the independent variable ("homoscedasticity"),interval or near-interval data, and data whose range is not truncated. In addition, it is important that the model being tested is correctly specified. The exclusion of important causal variables or the inclusion of extraneous variables can change markedly the estimated coefficients and thus the interpretation of the importance of the independent variables. Question 6: What are the basic assumptions of regression analysis? Answer 6: There are four general assumption of regression analysis. 1. First, model errors are normally distributed. 8

9 2. Second, the mean of the model error terms is zero. 3. Third, the model error terms have a constant variance for all values and combination of values of the independent variable("homoscedasticity". 4. Finally, the each error for each x is independent of each other. Error, represented by the residuals, should be normally distributed for each set of values of the independents. A histogram of standardized residuals should show a roughly normal curve. An alternative for the same purpose is the normal probability plot, with the observed cumulative probabilities of occurrence of the standardized residuals on the Y axis and of expected normal probabilities of occurrence on the X axis, such that a 45-degree line will appear when observed conforms to normally expected. The central limit theorem assumes that even when error is not normally distributed, when sample size is large, the sampling distribution of the b coefficient will still be normal. Therefore violations of this assumption usually have little or no impact on substantive conclusions for large samples, but when sample size is small, tests of normality are important. The assumption that the mean error is independent of the x independent variables. This is a critical regression assumption which, when violated, may lead to substantive misinterpretation of output. The (population) error term, which is the difference between the actual values of the dependent and those estimated by the population regression equation, should be uncorrelated with each of the independent variables. Since the population regression line is not known for sample data, the assumption must be assessed by theory. Specifically, one must be confident that the dependent is not also a cause of one or more of the independents, and that the variables not included in the equation are not causes of Y and correlated with the variables which are included. Either circumstance would violate the assumption of uncorrelated error. One common type of correlated error occurs due to selection bias with regard to membership in the independent variable "group" (representing membership in a treatment vs. a comparison group): measured factors such as gender, race, education, etc., may cause differential selection into the two groups and also can be correlated with the dependent variable. When there is correlated error, conventional computation of 9

10 standard deviations, t-tests, and significance are biased and cannot be used validly. Other assumptions are also part of regression. Some of the major ones are discussed here. Model specification is critical. Relevant variables are not omitted. If relevant variables are omitted from the model, the common variance they share with included variables may be wrongly attributed to those variables, and the error term is inflated. Similarly, if causally irrelevant variables are included in the model, the common variance they share with included variables may be wrongly attributed to the irrelevant variables. Omission and irrelevancy can both affect substantially the size of the b coefficients. This is one reason why it is better to use regression to compare the relative fit of two models rather than to seek to establish the validity of a single model specification. Continuous data are required(interval or ratio), though it is common to use ordinal data. Dummy variables form a special case and are allowed in regression as independents. Dichotomies may be used as independents but not as the dependent variable. Use of a dichotomous dependent in regression violates the assumptions of normality and homoscedasticity as a normal distribution is impossible with only two values. Also, when the values can only be or 1, residuals will be low for the portions of the regression line near Y= and Y=1, but high in the middle -- hence the error term will violate the assumption of homoscedasticity (equal variances) when a dichotomy is used as a dependent. Unbounded data are an assumption. That is, the regression line produced by can be extrapolated in both directions but is meaningful only within the upper and lower natural bounds of the dependent. Data are not censored, sample selected, or truncated. There are as many observations of the independents as for the dependents. Absence of perfect multicollinearity is another assumption. When there is perfect multicollinearity, there is no unique regression solution. Perfect multicollinearity occurs if independents are linear functions of each other (ex., age and year of birth), when the researcher creates dummy variables 1

11 for all values of a categorical variable rather than leaving one out, and when there are fewer observations than variables. Regression analysis is a linear procedure. To the extent nonlinear relationships are present, conventional regression analysis will underestimate the relationship. Nonlinear transformation of selected variables may be a pre-processing step, but this is not common because it runs the danger of over fitting the model to what are, in fact, chance variations in the data. When nonlinearity is present, there may be a need for exponential or interactive terms. The same underlying distribution is assumed for all variables. To the extent that an independent variable has a different underlying distribution compared to the dependent (bimodal vs. normal, for instance), then a unit increase in the independent will have nonlinear impacts on the dependent. Even when independent/dependent data pairs are ordered perfectly, unit increases in the independent cannot be associated with fixed linear changes in the dependent. For instance, perfect ordering of a bimodal independent with a normal dependent will generate an s-shaped scatter plot not amenable to a linear solution. Linear regression will underestimate the correlation of the independent and dependent when they come from different underlying distributions. Variable measurement is reliable and valid. To the extent there is systematic error in the measurement of the variables, the regression coefficients will be simply wrong. Independent observations (absence of autocorrelation) leading to uncorrelated error terms. Current values should not be correlated with previous values in a data series. This is often a problem with time series data, where many variables tend to increment over time such that knowing the value of the current observation helps one estimate the value of the previous observation. That is, each observation should be independent of each other observation if the error terms are not to be correlated, which would in turn lead to biased estimates of standard deviations and significance. 11

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10 Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis 17th Class 7/1/10 The only function of economic forecasting is to make astrology look respectable. --John Kenneth Galbraith show

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Chapter 13. Multiple Regression and Model Building

Chapter 13. Multiple Regression and Model Building Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Chapter 14 Multiple Regression Analysis

Chapter 14 Multiple Regression Analysis Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM 1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

ECONOMETRICS HONOR S EXAM REVIEW SESSION

ECONOMETRICS HONOR S EXAM REVIEW SESSION ECONOMETRICS HONOR S EXAM REVIEW SESSION Eunice Han ehan@fas.harvard.edu March 26 th, 2013 Harvard University Information 2 Exam: April 3 rd 3-6pm @ Emerson 105 Bring a calculator and extra pens. Notes

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK 1 ECONOMETRICS STUDY PACK MAY/JUNE 2016 Question 1 (a) (i) Describing economic reality (ii) Testing hypothesis about economic theory (iii) Forecasting future

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati 405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Prof. M. El-Sakka Dept of Economics Kuwait University In this chapter we take a critical

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

Multiple Linear Regression estimation, testing and checking assumptions

Multiple Linear Regression estimation, testing and checking assumptions Multiple Linear Regression estimation, testing and checking assumptions Lecture No. 07 Example 1 The president of a large chain of fast-food restaurants has randomly selected 10 franchises and recorded

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013 Midterm 3 4/5/2013 Instructions: You may use a calculator, and one sheet of notes. You will never be penalized for showing work, but if what is asked for can be computed directly, points awarded will depend

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Econometrics -- Final Exam (Sample)

Econometrics -- Final Exam (Sample) Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

Diagnostics of Linear Regression

Diagnostics of Linear Regression Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

Applied Econometrics Lecture 1

Applied Econometrics Lecture 1 Lecture 1 1 1 Università di Urbino Università di Urbino PhD Programme in Global Studies Spring 2018 Outline of this module Beyond OLS (very brief sketch) Regression and causality: sources of endogeneity

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 7: Multicollinearity Egypt Scholars Economic Society November 22, 2014 Assignment & feedback Multicollinearity enter classroom at room name c28efb78 http://b.socrative.com/login/student/

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals. 9.1 Simple linear regression 9.1.1 Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable,

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

DEMAND ESTIMATION (PART III)

DEMAND ESTIMATION (PART III) BEC 30325: MANAGERIAL ECONOMICS Session 04 DEMAND ESTIMATION (PART III) Dr. Sumudu Perera Session Outline 2 Multiple Regression Model Test the Goodness of Fit Coefficient of Determination F Statistic t

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

6. Dummy variable regression

6. Dummy variable regression 6. Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Friday, March 15, 13. Mul$ple Regression

Friday, March 15, 13. Mul$ple Regression Mul$ple Regression Mul$ple Regression I have a hypothesis about the effect of X on Y. Why might we need addi$onal variables? Confounding variables Condi$onal independence Reduce/eliminate bias in es$mates

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

FNCE 926 Empirical Methods in CF

FNCE 926 Empirical Methods in CF FNCE 926 Empirical Methods in CF Lecture 2 Linear Regression II Professor Todd Gormley Today's Agenda n Quick review n Finish discussion of linear regression q Hypothesis testing n n Standard errors Robustness,

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2015-16 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11 Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Lecture (chapter 13): Association between variables measured at the interval-ratio level Lecture (chapter 13): Association between variables measured at the interval-ratio level Ernesto F. L. Amaral April 9 11, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015.

More information