Multicollinearity : Estimation and Elimination

Size: px
Start display at page:

Download "Multicollinearity : Estimation and Elimination"

Transcription

1 Multicollinearity : Estimation and Elimination S.S.Shantha Kumari 1 Abstract Multiple regression fits a model to predict a dependent (Y) variable from two or more independent (X) variables. If the model fits the data well, the overall R 2 value will be high, and the corresponding P value will be low In addition to the overall P value, multiple regression also reports an individual P value for each independent variable. A low P value here means that this particular independent variable significantly improves the fit of the model. It is calculated by comparing the goodness-of-fit of the entire model to the goodness-of-fit when that independent variable is omitted. If the fit is much worse when that variable is omitted from the model, the P value will be low, telling you that the variable has a significant impact on the model. In some cases, multiple regression results may seem paradoxical. Even though the overall P value is very low, all of the individual P values are high. This means that the model fits the data well, even though none of the X variables has a statistically significant impact on predicting Y. This is due to the high correlation between the independent variables. In this case, neither may contribute significantly to the model after the other one is included. But together they contribute a lot. If we removed both variables from the model, the fit would be much worse. So the overall model fits the data well, but neither X variable makes a significant contribution when it is added to the model. When this happens, the X variables are collinear and the results show multicollinearity. The best solution is to understand the cause of multicollinearity and remove it. This paper helps in ways for identification and elimination of multicollinearity that could result in best-fit model. 1 Faculty, PSG Institute of Management, PSG College of Technology, Coimbatore :

2 Introduction The past twenty years have seen an extraordinary g rowth in the use of quantitative methods in financial markets. This is one area where econometric methods have rapidly gained ground. As economic growth is making more and more people wealthier and with the rapid progress in information technology, there will be a continuous need for improving the performance of financial mo dels in forecasting returns, making use of all the information available, in particular the ultra high frequency intra daily data. The de velo pmen t of mu ltivariate and simultaneous extensions of financial models has made Finance professionals now routinely use sophisticated techniques in portfolio management, proprietary trading, risk management, financial consulting, and securities regulation. Reg ression anal ysis is almo st certainly the most important tool at the econometrician s disposal. The explanation and prediction of the security returns and their relation to risk has received a great deal of attention in the financial research. Both intuitive and theoretical models have been developed in which return or risk is expressed as a linear function of either one or several macroeconomic, market or firm related variables. Studies attempting to explore these relationships, however, have been plagued by the interdependent nature of corporate financial variables. When using classical mu lti ple regressi on analysis, these interdependencies may result with the various symptoms of multicollinearity including overstated regression coefficients, incorrect signs, and highly unstable predictive equations. The objective of this paper is to present ways and means for detecti on and elimination of multicollinearity to improve the predictive power of any financial model. Multicollinearity : Its Nature One of the three basic assumptions in re gression modeli ng i s th at the independent variables in the model are not li nearly related. The oth er two assumptions are the model residuals are normally distributed with zero mean and constant variances and they have no autocorrelation. The existence of a linear relationship among of the independent variables is called multicollinearity. The term multicollinearity is due to Ragnar Frisch 2. Multicollinearity can cause large forecasting error and make it difficult to assess the relative importance of individual variables in the model. If two or more variables have a l inear relati onship between the m, w e have perfe ct multicollinearity. The following regression equation 88

3 Y i =a+bx 1i +cx 2i +dx 3i +u i 1 has three independent variables X 1i, X 2i and X 3i. The assumptions requires that the three variables are not linearly related in the following form X 1i =k 1 X 2i + k 2 X 3i +e i 2 If the assumption holds true then k 1 =k 2 =0 and e i is simply X 1i, there is no multicollinearity among the independent variables included in the model. If one variable in equation 2 is not zero then the model has multicollinearity problem. Consequences of Multicollinearity 1. In a tw o-variable model, wh en multicollinearity is present, the estimated standard error for the coefficients will be large. This is because in the coefficient variance formula there is a multiplying factor in the form of l/(l-r 2 ), where r is the correlation coefficient between two variables, and its value falls between -1 and +1. This factor is often called variance inflation factor. When r = 0, there is no multicollinearity, and the inflation factor equals to 1. As r increases in absolute terms, the varian ces for the estimated co effi cien ts i ncrease too. As r approaches +1, the inflation factor approaches infinity. 2. The estimated coefficie nts may become insignificant or have wrong sig ns and conse quen tly will be sensitive to changes in the data. This is because when the independent variables are correlated, the estimated standard errors for the coefficients will be large, and as a result the t-statistics wi ll be small. The estimated coefficients with large standard errors will be unstable; an addition of a few more data points to the sample will cause a large change in the size of the coefficients and sometimes in the signs of the coefficients. When any of the coefficients changes sign from positive to negative or from negative to positive at model updating, the model will not produce a good forecast. 3. When the estimated coefficients have large standard errors and are unstable, it will be difficult for the model user to properly assess th e re lati ve importance of the i ndepende nt variables. 4. The presence of multicollinearity can lead th e re searcher to drop an important variable from the model because of its low t-statistic. Detection of Multicollinearity Multicolline arity is essentially a sample phenomenon arising out of the largely non experimental data collected in most social sciences. According to Kmenta 3 (1986), multicollinearity is a question of degree and not of kind and it is the feature of the sample and not the population. The refo re, we do no t test for multicollinearity but can measure its degree in any particular sample. 89

4 1. High R 2 but few significant t ratios. Table 1. Model Summary(b) 1.925(a) a Predictors: (Constant), logx6, logx5, logx2, logx3, logx4 b Dependent Variable: logy Table 2. ANOVA(b) Model Sum of df Mean Square F Sig. Squares 1 Regression (a) Residual Total a Predictors: (Constant), logx6, logx5, logx2, logx3, logx4 b Dependent Variable: logy Table 3 Coefficients(a) Model Unstandardized Standardized t Sig. Collinearity Coefficients Coefficients Statistics B Std. Error Beta Tolerance VIF 1 (Constant) logx logx logx logx logx a Dependent Variable: logy It is clear from Table 1 that the R 2 is.855 and the F Ratio (Table 2) is also significant showing the model is fit. But most of the t-stat is insignificant showing the possibility of multicollinearity. 90

5 2. High Pair-wise correlation among regressors. Table 4 Correlations logx2 logx3 logx4 logx5 logx6 logx2 Pearson Correlation 1.996(**).993(**).585(*).974(**) Sig. (2-tailed) logx3 Pearson Correlation.996(**) 1.996(**).619(*).974(**) Sig. (2-tailed) logx4 Pearson Correlation.993(**).996(**) 1.585(*).987(**) Sig. (2-tailed) logx5 Pearson Correlation.585(*).619(*).585(*) 1.600(*) Sig. (2-tailed) logx6 Pearson Correlation.974(**).974(**).987(**).600(*) 1 Sig. (2-tailed) ** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). If the pair-wise correlation coefficient between two regressors is high, i.e. in excess of 0.80, then multicollinearity is a problem. High pair-wise correlation is sufficient but not a necessary condition for the existence of the multicollinearity. 91

6 3. Auxiliary Regressions. Table 5.1 Model Summary(b) 1.998(a) a Predictors: (Constant), logx6, logx5, logx3, logx4 b Dependent Variable: logx2 Table 5.2 Model Summary(b) (a) a Predictors: (Constant), logx2, logx5, logx6, logx4 b Dependent Variable: logx3 Table 5.3 Model Summary(b) (a) a Predictors: (Constant), logx3, logx5, logx6, logx2 b Dependent Variable: logx4 Table 5.4 Model Summary(b) 1.933(a) a Predictors: (Constant), logx4, logx6, logx2, logx3 b Dependent Variable: logx5 Table 5.5 Model Summary(b) 1.997(a) a Predictors: (Constant), logx5, logx4, logx2, logx3 b Dependent Variable: logx6 92

7 The table 5.1 to 5.5 shows that the R 2 value of the auxiliary regressions is more than the overall R 2 suggesting that the multicollinearity is a troublesome problem. 4. Eigen Values and Condition Index From the Eigen values we can derive the condition number k. maximum Eigen Value k minimum Eigen Value If k is between 100 and 1000 there is moderate to strong multicollinearity. as And the condition index (CI) is defined CI maximum Eigen Value minimum Eigen Value If the CI is between 10 and 30, there is a moderate to strong multicollinearity. If it exce eds 30 there is seve re multicollinearity. Table 6 Eigen Value and Condition index. Dimension Eigenvalue k Condition Index k = CI= The k value is greater than thousand showing the existence of multicollinearity. The condition index is also greater than con firming the existence of severe multicollinearity. 5. Tolerance and Variance Inflation Factor (Constant) Table 6 Tolerance and VIF. Tolerance VIF logx logx logx logx logx The closer the Tolerance value to Zero and if VIF exceeds 10, the greater is the degree of multicollinearity. Elimination of Multicollinearity The choice of a remedial measure de pends on the circumstances the researcher encounters. The methods which solve the problem in one model may not be effective in another model. The researcher has to try several procedures to obtain a best fit model. 1. Dropping a variable(s) 2. Transformation of the variables 3. Additional or new data 93

8 4. Reducing collinearity in a polynomial regression The Tolerance, VIF and Zero order Correlation which tells us to look into variables like log X 2, Log X 3, Log X 4 and Log X 6. By analyzing the above factors and the theoretical background, the variables X 2 and X 3 are eliminated from the model. Revised model results are presented below. Y= logX logX logX 6 Standard t p value Error Constant Personal Disposable income log X Interest rate log X Employed civilian labor force log X R Square.759 Adjusted R Square.699 F Ratiop value (0.001) Sample Size 16 The F ratio is also significant explaining the impact of the explanatory variables on the sale of new passenger cars. The R square is 0.759, which means 76% of the variation in the dependent variable are due to the explanatory variables. The t-value for all the coefficient is significant for the explanatory variable. Conclusion The explanatory variables specified in an economic model usually come from economic theory or basic understanding of the behaviour the researchers are trying to model. The data for these variables typical ly comes fro m un controll ed experiments and often move together. In this situation, it is difficult to solve the problem by omitting or adding a new variable. So care should be taken by the researcher to reduce the problem of multicollinearity while formulating a model using the time series data. 94

9 References i i i i i i iv Ragnar Frisch, Statistical confluence Analysi s by means o f Co mple te Regression systems, Institute of Economics, Osl o U nive rsity, publ.no.5,1934. Jan Kmenta, Elements of Econo metrics, 2nd edition, Macmillan, New York, Ramu Ramanathan, Introductory Econometrics with Applications 5th edition, Thomson South Western, Bangalore, Brooks Chris, Introductory econo metrics for finance, Cambri dge university Press, v. Gujarathi Damodaran & Sangeetha, Basic Econometrics, 4th Edition, Tata Mcgraw-Hill Companies, New Delhi, vi Maddla G. S., In troduction to Econometrics, 3rd Edition, Wiley India, New Delhi,

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors Available online at www.worldscientificnews.com WSN 59 (016) 1-3 EISSN 39-19 elationship between ridge regression estimator and sample size when multicollinearity present among regressors ABSTACT M. C.

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

St. Xavier s College Autonomous Mumbai. Syllabus For 4 th Semester Core and Applied Courses in. Economics (June 2019 onwards)

St. Xavier s College Autonomous Mumbai. Syllabus For 4 th Semester Core and Applied Courses in. Economics (June 2019 onwards) St. Xavier s College Autonomous Mumbai Syllabus For 4 th Semester Core and Applied Courses in Economics (June 2019 onwards) Contents: Theory Syllabus for Courses: A.ECO.4.01 Macroeconomic Analysis-II A.ECO.4.02

More information

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi) Regression (, Lingkungan kerja dan ) Descriptive Statistics Mean Std. Deviation N 3.87.333 32 3.47.672 32 3.78.585 32 s Pearson Sig. (-tailed) N Kemampuan Lingkungan Individu Kerja.000.432.49.432.000.3.49.3.000..000.000.000..000.000.000.

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati 405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Prof. M. El-Sakka Dept of Economics Kuwait University In this chapter we take a critical

More information

School of Mathematical Sciences. Question 1. Best Subsets Regression

School of Mathematical Sciences. Question 1. Best Subsets Regression School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 7: Multicollinearity Egypt Scholars Economic Society November 22, 2014 Assignment & feedback Multicollinearity enter classroom at room name c28efb78 http://b.socrative.com/login/student/

More information

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013 QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 3 Introduction Objectives of course: Regression and Forecasting

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 984. y ˆ = a + b x + b 2 x 2K + b n x n where n is the number of variables Example: In an earlier bivariate

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall Applied Econometrics Second edition Dimitrios Asteriou and Stephen G. Hall MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3. Imperfect Multicollinearity 4.

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

EC4051 Project and Introductory Econometrics

EC4051 Project and Introductory Econometrics EC4051 Project and Introductory Econometrics Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Intro to Econometrics 1 / 23 Project Guidelines Each student is required to undertake

More information

Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar

Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar Multiple Regression and Model Building (cont d) + GIS 11.220 Lecture 21 3 May 2006 R. Ryznar Model Summary b 1-[(SSE/n-k+1)/(SST/n-1)] Model 1 Adjusted Std. Error of R R Square R Square the Estimate.991

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

Available online at   (Elixir International Journal) Statistics. Elixir Statistics 49 (2012) 10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple

More information

Multiple Regression Analysis

Multiple Regression Analysis 1 OUTLINE Basic Concept: Multiple Regression MULTICOLLINEARITY AUTOCORRELATION HETEROSCEDASTICITY REASEARCH IN FINANCE 2 BASIC CONCEPTS: Multiple Regression Y i = β 1 + β 2 X 1i + β 3 X 2i + β 4 X 3i +

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Using the Regression Model in multivariate data analysis

Using the Regression Model in multivariate data analysis Bulletin of the Transilvania University of Braşov Series V: Economic Sciences Vol. 10 (59) No. 1-2017 Using the Regression Model in multivariate data analysis Cristinel CONSTANTIN 1 Abstract: This paper

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

WORKSHOP. Introductory Econometrics with EViews. Asst. Prof. Dr. Kemal Bağzıbağlı Department of Economic

WORKSHOP. Introductory Econometrics with EViews. Asst. Prof. Dr. Kemal Bağzıbağlı Department of Economic WORKSHOP on Introductory Econometrics with EViews Asst. Prof. Dr. Kemal Bağzıbağlı Department of Economic Res. Asst. Pejman Bahramian PhD Candidate, Department of Economic Res. Asst. Gizem Uzuner MSc Student,

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Matematické Metody v Ekonometrii 7.

Matematické Metody v Ekonometrii 7. Matematické Metody v Ekonometrii 7. Multicollinearity Blanka Šedivá KMA zimní semestr 2016/2017 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 1 / 15 One of the assumptions

More information

Econometric Analysis of Some Economic Indicators Influencing Nigeria s Economy.

Econometric Analysis of Some Economic Indicators Influencing Nigeria s Economy. Econometric Analysis of Some Economic Indicators Influencing Nigeria s Economy. Babalola B. Teniola, M.Sc. 1* and A.O. Olubiyi, M.Sc. 2 1 Department of Mathematical and Physical Sciences, Afe Babalola

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

Regression of Inflation on Percent M3 Change

Regression of Inflation on Percent M3 Change ECON 497 Final Exam Page of ECON 497: Economic Research and Forecasting Name: Spring 2006 Bellas Final Exam Return this exam to me by midnight on Thursday, April 27. It may be e-mailed to me. It may be

More information

Multicollinearity Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps. ECON 497: Lecture 6 Page 1 of 1 Metropolitan State University ECON 497: Research and Forecasting Lecture Notes 6 Specification: Choosing the Independent Variables Studenmund Chapter 6 Before we start,

More information

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed) Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships

More information

DEMAND ESTIMATION (PART III)

DEMAND ESTIMATION (PART III) BEC 30325: MANAGERIAL ECONOMICS Session 04 DEMAND ESTIMATION (PART III) Dr. Sumudu Perera Session Outline 2 Multiple Regression Model Test the Goodness of Fit Coefficient of Determination F Statistic t

More information

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL) 12.12 Model Building, and the Effects of Multicollinearity (Optional) 1 Although Excel and MegaStat are emphasized in Business Statistics in Practice, Second Canadian Edition, some examples in the additional

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Where as simple linear regression has 2 variables (1 dependent, 1 independent): y ˆ = a + bx Multiple linear regression has >2 variables (1 dependent, many independent): ˆ

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Reducing Computation Time for the Analysis of Large Social Science Datasets

Reducing Computation Time for the Analysis of Large Social Science Datasets Reducing Computation Time for the Analysis of Large Social Science Datasets Douglas G. Bonett Center for Statistical Analysis in the Social Sciences University of California, Santa Cruz Jan 28, 2014 Overview

More information

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA441: Spring Multiple Regression. More than one explanatory variable at the same time STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Item-Total Statistics. Corrected Item- Cronbach's Item Deleted. Total

Item-Total Statistics. Corrected Item- Cronbach's Item Deleted. Total 45 Lampiran 3 : Uji Validitas dan Reliabilitas Reliability Case Processing Summary N % Valid 75 00.0 Cases Excluded a 0.0 Total 75 00.0 a. Listwise deletion based on all variables in the procedure. Reliability

More information

J. Environ. Res. Develop. Journal of Environmental Research And Development Vol. 8 No. 3A, January-March 2014

J. Environ. Res. Develop. Journal of Environmental Research And Development Vol. 8 No. 3A, January-March 2014 Journal of Environmental esearch And Development Vol. 8 No. 3A, January-March 014 IDGE EGESSION AS AN APPLICATION TO EGESSION POBLEMS Saikia B. and Singh.* Department of Statistics, North Eastern Hill

More information

Iris Wang.

Iris Wang. Chapter 10: Multicollinearity Iris Wang iris.wang@kau.se Econometric problems Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences?

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Testing For Unit Roots With Cointegrated Data NOTE: This paper is a revision of

More information

UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS. Minutes of Meeting

UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS. Minutes of Meeting UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS Minutes of Meeting Subject : B.A. (Hons) Economics Sixth Semester (2014) Course : 26 - Applied Econometrics Date of Meeting : 10 th

More information

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder Last updated: Oct 18, 2012 LINEAR REGRESSION Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. Simple Linear Objectives

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics STAT-S-301 Introduction to Time Series Regression and Forecasting (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Introduction to Time Series Regression

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1 MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance

More information

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much! Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much! OLS: Comparison of SLR and MLR Analysis Interpreting Coefficients I (SRF): Marginal effects ceteris paribus

More information

Information Content Change under SFAS No. 131 s Interim Segment Reporting Requirements

Information Content Change under SFAS No. 131 s Interim Segment Reporting Requirements Vol 2, No. 3, Fall 2010 Page 61~75 Information Content Change under SFAS No. 131 s Interim Segment Reporting Requirements Cho, Joong-Seok a a. School of Business Administration, Hanyang University, Seoul,

More information

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Stat 500 Midterm 2 12 November 2009 page 0 of 11 Stat 500 Midterm 2 12 November 2009 page 0 of 11 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. Do not start until I tell you to. The exam is closed book, closed

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Cost analysis of alternative modes of delivery by lognormal regression model

Cost analysis of alternative modes of delivery by lognormal regression model 2016; 2(9): 215-219 ISSN Print: 2394-7500 ISSN Online: 2394-5869 Impact Factor: 5.2 IJAR 2016; 2(9): 215-219 www.allresearchjournal.com Received: 02-07-2016 Accepted: 03-08-2016 Vice Principal MVP Samaj

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Research Center for Science Technology and Society of Fuzhou University, International Studies and Trade, Changle Fuzhou , China

Research Center for Science Technology and Society of Fuzhou University, International Studies and Trade, Changle Fuzhou , China 2017 3rd Annual International Conference on Modern Education and Social Science (MESS 2017) ISBN: 978-1-60595-450-9 An Analysis of the Correlation Between the Scale of Higher Education and Economic Growth

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS Modeling Spatial Relationships Using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS Workshop Overview Answering why? questions Introduce regression analysis - What it is and why

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

1 A Non-technical Introduction to Regression

1 A Non-technical Introduction to Regression 1 A Non-technical Introduction to Regression Chapters 1 and Chapter 2 of the textbook are reviews of material you should know from your previous study (e.g. in your second year course). They cover, in

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

A NOTE ON THE EFFECT OF THE MULTICOLLINEARITY PHENOMENON OF A SIMULTANEOUS EQUATION MODEL

A NOTE ON THE EFFECT OF THE MULTICOLLINEARITY PHENOMENON OF A SIMULTANEOUS EQUATION MODEL Journal of Mathematical Sciences: Advances and Applications Volume 15, Number 1, 2012, Pages 1-12 A NOTE ON THE EFFECT OF THE MULTICOLLINEARITY PHENOMENON OF A SIMULTANEOUS EQUATION MODEL Department of

More information

Circling the Square: Experiments in Regression

Circling the Square: Experiments in Regression Circling the Square: Experiments in Regression R. D. Coleman [unaffiliated] This document is excerpted from the research paper entitled Critique of Asset Pricing Circularity by Robert D. Coleman dated

More information

Overview of Dispersion. Standard. Deviation

Overview of Dispersion. Standard. Deviation 15.30 STATISTICS UNIT II: DISPERSION After reading this chapter, students will be able to understand: LEARNING OBJECTIVES To understand different measures of Dispersion i.e Range, Quartile Deviation, Mean

More information

THE RANK CONDITION FOR STRUCTURAL EQUATION IDENTIFICATION RE-VISITED: NOT QUITE SUFFICIENT AFTER ALL. Richard Ashley and Hui Boon Tan

THE RANK CONDITION FOR STRUCTURAL EQUATION IDENTIFICATION RE-VISITED: NOT QUITE SUFFICIENT AFTER ALL. Richard Ashley and Hui Boon Tan THE RANK CONDITION FOR STRUCTURAL EQUATION IDENTIFICATION RE-VISITED: NOT QUITE SUFFICIENT AFTER ALL Richard Ashley and Hui Boon Tan Virginia Polytechnic Institute and State University 1 Economics Department

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS. Minutes of Meeting

UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS. Minutes of Meeting UNIVERSITY OF DELHI DELHI SCHOOL OF ECONOMICS DEPARTMENT OF ECONOMICS Minutes of Meeting Subject : B.A. (Hons) Economics (CBCS) Fifth Semester (2017) DSEC Course : ii) Applied Econometrics Date of Meeting

More information

Econometrics. 9) Heteroscedasticity and autocorrelation

Econometrics. 9) Heteroscedasticity and autocorrelation 30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

Econ107 Applied Econometrics

Econ107 Applied Econometrics Econ107 Applied Econometrics Topics 2-4: discussed under the classical Assumptions 1-6 (or 1-7 when normality is needed for finite-sample inference) Question: what if some of the classical assumptions

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information