Application of Independent Variables Transformations for Polynomial Regression Model Estimations

Size: px
Start display at page:

Download "Application of Independent Variables Transformations for Polynomial Regression Model Estimations"

Transcription

1 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore Application of Independent Variables Transformations for Polynomial Regression Model Estimations W. Wongrin, M. Rodchuen and P. Booamana Abstract The obective of is research was to find out e solution to e polynomial regression model which has a problem wi multicollinearity. The study was done by transforming of independent variables. The first proposed solution was centering. The second was Z[-,] transformation. The next proposed solution was Z[-,] transformation wi Legendre polynomials. The last was e transformation matrix to be orogonal matrix. Least squares meod was used to e parameters estimation wi stepwise regression. We applied e transformations to fit a 4 degrees polynomial regression for e heat capacity of solid hydrogen bromide (HBr) and temperature. It was found at orogonal matrix had e lowest VIFs and e small mean square error (MSE). Thus, e model from orogonal matrix is e best wi e smallest effect size on multicollinearity. Keywords Centering, Multicollinearity, Orogonal Matrix, Polynomial Regression. I. INTRODUCTION INEAR regression is a statistical technique at is widely L used to model e relationship between dependent variable (Y) and independent variable (), in which e relationship between e variables is linear. However, e practical non linear function or curvilinear. So, polynomial regression is appropriate for e data. The polynomial regression is a special case of multiple linear regressions. Since, we considered each term of e polynomial as an independent variable [],[]. The parameters in e polynomial regression were estimate by least squares meod. However, e least squares meod has a problem about e correlation among e term of e polynomial called multicollinearity or ill conditioned matrix. It shows e matrix ' becomes a singular matrix ( ' 0) [],[]. Multicollinearity usually leads to unreliable estimated parameters in e polynomial regression, which en has large variances. Thus, ere are many ways to reduce e effects on multicollinearity such as augmentation of e data, selection of W. Wongrin is wi a master student of Department of Statistics, Faculty of Scince, Chiang Mai University, Thailand (phone: ; weerinrada@gmail.com). M. Rodchuen, is wi e Department of Statistics, Chiang Mai University, Thailand.( r.manachai@gmail.com). P. Booamana is wi e Department of Statistics, Chiang Mai University, Thailand.( putipongb@gmail.com). variables and alternative procedure: Ridge regression, Principle component analysis, Shrun estimates, etc. Multicollinearity can be addressed by centering, but may remain larger correlation between terms of e polynomial. [3] So e independent variables transformations are proposed to minimize e effects of e collinearity by Z[-,] transformation, V[0,] transformation, W[0,] transformation and ree above transformations wi Legendre polynomials. Hence, e Condition number was used to determine e effects on multicollinearity. The Z[-,] transformation and e Z[-,] transformation wi Legendre polynomials are e best models. [4] Alough oer meods such as Weighted least squares meod, Ridge absolute value meod and Liu-Type meod are proposed to estimate e parameters. The one important ing can done to remove e multicollinearity is orogonal polynomials. [5] Therefore, e researchers were interested in fitting e data of e heat capacity of solid hydrogen bromide (HBr) and temperature [4] by e polynomial regression model wi 4 degrees. We proposed to use centering, e Z[-,] transformation, e Z[-,] transformation wi Legendre polynomials and e matrix transformation to obtain orogonal columns. This research would lie to estimate e parameters using e meod of least squares wi stepwise regression. We checed on multicollinearity wi condition number and VIFs (Variance inflation factors). In addition, we used e Mean Square Errors (MSE) to determine e efficiency of e models. II. POLYNOMIAL REGRESSION Polynomial regression is a special case of multiple linear regression which sets each term of e polynomial regression model as an independent variable. The model of e form is i = β0 + β i + β i + + β i + + β i + εi Y () When i =,,.., n, = 0,,,.., and Y i = e dependent variable of i unit, β = e parameter or regression coefficient, i = e independent variable unit i in e order of e polynomial, and ε i = e random error. 330

2 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore Hence, e model in matrix form is Y = β + ε The least squares meod is used to estimate e parameters in e fixedx model. We find e estimators at minimize e sum of squares error of e n observed ( y) from eir predicted values ( y ˆ). III. INDEPENDENT VARIABLE TRANSFORMATIONS A. Centering Data Centering [], [3] is used to invert n observations on independent variables using eir center or eir means. It is widely used to reduce e collinearity between independent variables to zero, but still has highly collinearity among term x and x, < when + is even, [3]. xnewi xi x Where x newi = e observation unit variable already centered, = e observation unit x i = () variable, and x = e mean of independent variable We have matrix of x at centered new xnew xnew x new xnew xnew xnew = x x x newn newn newn n i on independent i on independent Hence e solution of normal equation wi centering e ' new new β ' =new Y B. Z[-,] Transformation The Z[-,] Transformation is a transformation of e data which can reduce e effects of e collinearity in regression at are widely used and recommended by statisticians. The Z[-,] Transformation used to normalized x run to - to + is given by [4] zi = ( xi xmax xmin ) / ( xmax xmin ) (3) Where z i = e observation unit i on independent variable already transformed, x max = e maximum of observed value on independent variable, and x min = e minimum of observed value on independent variable We have matrix of at transformed z z z 0 z z z Z = 0 z z z n n n n Hence e solution of normal equation wi transforming e ' ' Z Z β =ZY C. Z[-,] Transformation wi Legendre Polynomials One of approach often suggested to reduce collinearity in polynomial regression is orogonal polynomials. Because is property, e orogonal polynomials yield diagonal normal matrix. There are several ways to generate orogonal polynomials. For a large number of data points which are evenly spaced in e [-,] interval, Legendre polynomials provide an orogonal set. Hence e first six Legendre polynomials for e Z[-,] transformation are given by [4] 0( z) = ( z) = ( + 3 z ) 3 3 3( z) = ( 3z+ 5 z ) 5 4 4( z) = (3 30z + 35 z ) ( z) = (5z 70z + 63 z ) ( z) = ( z 35z + 3 z ) 3 D. Matrix Transformations by Orogonal If e columns in a matrix are highly correlated, we can transform e columns using orogonal. Therefore, we have orogonal columns which can be an orogonal matrix. The columns transformation taes e following form [], [], [5] Z = { I Z ( Z Z ) Z } = Z ( Z Z ) Z Where Z = e matrix of column vector already transformed, = e next column vector of matrix at to be transformed, and Z = transformed vector and orogonal to oer vectors in e previous Z matrix So, we have e orogonal matrix at is a diagonal matrix which it s easy to calculate e inverse of e matrix (4) 33

3 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore z z z 0 z z z Z = 0 z z n n zn n Hence e solution of normal equation wi transforming e ' ' Z Z β =Z Y IV. MEASURING MULTICOLLINEARITY A. VIFs (Variance Inflation Factors) The VIFs show e effects of e independent variables in a multiple linear regression model. They provide an index at measures how much e variance of e estimators change because of multicollinearity. The VIFs can be obtained from When when e VIFs = ( R ) (5) R is e multiple correlation coefficient obtained predictor variable is regressed against all e remaining predictor variable i (when i) Note at if VIFs are greater an 5 or 0, it indicates e correlations among independent variables are occurred.[6] V. CONDITION NUMBER To measure e multicollinearity using Eigen system analysis from eigenvalues at is determined by condition number (K( )) from e below λ K( ) = max (6) λmin Where λ max = e maximum of e eigenvalues of e λ min = e minimum of e eigenvalues of e Note at if K( ) is greater an,000 indicates e wrong sign of multicollinearity. [6] Bo K( ) and VIFs can diagnose e collinearity on independent variables but ese indicators is at ere are no well established reshold values for em to indicate harmful level of collinerity. Therefore, ey cannot be considered as quantitative measures for collinerity [7]. VI. THE MODEL EVALUATION The performance of e models was compared using mean square error of e predicted value of dependent variables. ˆ ˆ MSEY ( ) EY ( Y) = (7) And it could be estimated by n n i i i i= i= MSE( Yˆ ) ( y yˆ ) / n = ( eˆ ) / n (8) VII. A POLYNOMIAL REGRESSION MODEL FOR THE HEAT CAPACITY OF SOLID HYDROGEN BROMIDE (HBR) In most cases, e relationship between e heat capacity of a substance and temperature is a curvilinear. The data shown in Table is e relationship between e heat capacity of solid hydrogen bromide on temperature at was fitted e polynomial regression model up to 4 order TABLE I THE DATA OF SOLID HYDROGEN BROMIDE AND TEMPERATURE Cp(Capacity) Temperature Cp(Capacity) Temperature *Data from e wor of Giauque and Wiebe in 98 VIII. RESULTS From fitted curve between e heat capacity and temperature, e relationship between e data was curvilinear. So, we should to model e highest degrees of e polynomial up to 4 order. (See Fig.). heat capacity Scatterplot of Heat capacity vs Temperature x Fig. Scatter plot of Heat Capacity vs. Temperature From analyze e data; we applied to use 4 transformations of e independent variables. The results were at e Z[-,] transformation had e smallest condition number followed by e Z[-,] transformation wi Legendre polynomial when e independent variables was not transformed have e largest condition number. The difference of condition number caused e highly observed values and e eigenvalue of e matrix. If e matrix had e smallest eigenvalue, e condition number was small as well. The

4 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore higher of condition number indicates e effects on of multicollineariry, shown in Table II. Transformations TABLE II THE CONDITION NUMBER OF THE MATRI Condition Number Independent Variables 6.5 x 0 8 Centering x 0 0 Z[-,] Transformation Z [-,] transformation wi Legendre polynomials Orogonal matrix.858 x 0 7 The models, 3 4 Model Yˆ i = 0.47xi xi xi (9) 4 Model Yˆ i = xnewi xnewi (0) 3 Model 3 Yˆ i = zi zi zi () 3 Model 4 Yˆ i = zi zi z i () 3 Model 5 ˆ Yi = z i z i z i (3) Considering of e VIFs values, we ordered each terms among e models. It showed at e model was e highest, because of each term of e polynomial was not transformed and had e problem wi e highly collinearity among e terms. The model 5 was e lowest, since each term was orogonal, at each term of e polynomial is exactly linear independent, shown in Table III. It is noted at e model 4, e model from Z[-,] transformation wi Legendre polynomials are e orogonal polynomials. The VIFs values were quite good, but not equal to. It is suggested ere is a little correlation among terms in e polynomial model. VIFs TABLE III THE VIFS VALUES OF THE MODELS Models The results of e modeling show at all 5 models, e multiple correlation coefficient and e effects of e selection degree of e polynomial wi stepwise regression as shown in Table IV. TABLE IV THE COEFFICIENT OF DETERMINATION AND MEAN SQUARE ERRORS OF THE MODEL Models R MSE( Y ˆ) Model 99.99% Model 99.00% Model % Model % Model % Consequently, e mean square errors of e models have shown in Table IV. It is shown at e model 3, 4 and 5 had e small while e highest was e model. Alough, e mean square errors of predictor of e model 3, 4 and 5 were equally and ey had e same polynomial order and e same predicted values, but e variances of e estimate parameters were differences. The estimators variance of e model 5 was minimizing as shown in e Table V. Thus e model 5 can predict e values of e dependent variable wi highest accuracy and precision from all 5 models. TABLE V THE STANDARD ERRORS OF THE COEFFICIENT REGRESSION Models 0) ) ) 3) Model 3.08 x x x x 0-3 Model x x x x 0-3 Model x x x x 0 - I. CONCLUSION It was found at, e polynomial regression model order 4 for e relationship of e heat capacity of solid hydrogen bromide and temperature based on e independent variables transformations wi 4 transformations should use e transforming matrix into orogonal matrix, which is linearly independent among e terms of e polynomial. It also provides e error of an estimate for e dependent variable at least as well. The high accuracy of predicted values are obtained, while e model still has highly collinearity among terms of e polynomial and e highest of mean squares error. ACKNOWLEDGMENTS Thans to e professors of department of statistics for e guidance of is paper. We are grateful to e Graduate School Chiang Mai University, Faculty of Science, Department of Statistics and Science Achievement Scholarship of Thailand to support is research. REFERENCES [] T. Veraaworn, Linear Model (Theory and Application). Bango : Vittayawat, 998. [] G. A. Seber, F. and A. J. Lee, Linear Regression Analysis. nd ed. New Yor : John wiley & Sons,

5 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore [3] R. A. Bradley, and S. S. Srivastava, Correlation in polynomial regression, Journal of e American Statistical Association, vol. 33, pp. -4, 979. [4] M. Shacham, and N. Brauner, Minimizing e effects of collinearity in polynomial regression, Journal of e American Chemical Society, vol. 36, pp , 997. [5] G-L. Tian, The comparison between polynomial regression and orogonal polynomial regression, Statistics & Probability Letters. vol. 38, pp , 998. [6] D. C. Montgomery, E. A. Pec, and G. G. Vining, Introduction to Linear Regression Analysis. 4 ed. New Yor : John wiley & Sons, 006. [7] N. Brauner, and M. Shacam, Indentifying and removing sources of imprecision in polynomial regression, Maematics and Computers in Simulation, vol. 48, pp. 75-9, 998. [8] A. C. Rencher, and G. B. Schaalie, Linear Models in Statistics. nd ed. New Yor : John wiley & Sons, 008. [9] N. R. Draper, and H. Smi, Applied Regression Analysis. 3rd ed. New Yor : John wiley & Sons, 998. [0] S. Weisberg, Applied Linear Regression. 3rd ed. New Yor : John wiley & Sons,

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

APPLICATION OF GLOBAL REGRESSION METHOD FOR CALIBRATION OF WIND TUNNEL BALANCES

APPLICATION OF GLOBAL REGRESSION METHOD FOR CALIBRATION OF WIND TUNNEL BALANCES Symposium on Applied Aerodynamics and Design of Aerospace Vehicles (SAROD 2011) November 16-18, 2011, Bangalore, India APPLICATION OF GLOBAL REGRESSION METHOD FOR CALIBRATION OF WIND TUNNEL BALANCES Gireesh

More information

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors Available online at www.worldscientificnews.com WSN 59 (016) 1-3 EISSN 39-19 elationship between ridge regression estimator and sample size when multicollinearity present among regressors ABSTACT M. C.

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Chapter 2 Multiple Regression (Part 4)

Chapter 2 Multiple Regression (Part 4) Chapter 2 Multiple Regression (Part 4) 1 The effect of multi-collinearity Now, we know to find the estimator (X X) 1 must exist! Therefore, n must be great or at least equal to p + 1 (WHY?) However, even

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1 MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Matematické Metody v Ekonometrii 7.

Matematické Metody v Ekonometrii 7. Matematické Metody v Ekonometrii 7. Multicollinearity Blanka Šedivá KMA zimní semestr 2016/2017 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 1 / 15 One of the assumptions

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Multicollinearity Exercise

Multicollinearity Exercise Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there

More information

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Lecture 11. Correlation and Regression

Lecture 11. Correlation and Regression Lecture 11 Correlation and Regression Overview of the Correlation and Regression Analysis The Correlation Analysis In statistics, dependence refers to any statistical relationship between two random variables

More information

J. Environ. Res. Develop. Journal of Environmental Research And Development Vol. 8 No. 3A, January-March 2014

J. Environ. Res. Develop. Journal of Environmental Research And Development Vol. 8 No. 3A, January-March 2014 Journal of Environmental esearch And Development Vol. 8 No. 3A, January-March 014 IDGE EGESSION AS AN APPLICATION TO EGESSION POBLEMS Saikia B. and Singh.* Department of Statistics, North Eastern Hill

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Some Construction Methods of Optimum Chemical Balance Weighing Designs III

Some Construction Methods of Optimum Chemical Balance Weighing Designs III Open Journal of Statistics, 06, 6, 37-48 Published Online February 06 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/0.436/ojs.06.6006 Some Construction Meods of Optimum Chemical Balance

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Nonparametric Principal Components Regression

Nonparametric Principal Components Regression Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS031) p.4574 Nonparametric Principal Components Regression Barrios, Erniel University of the Philippines Diliman,

More information

Multivariate Process Control Chart for Controlling the False Discovery Rate

Multivariate Process Control Chart for Controlling the False Discovery Rate Industrial Engineering & Management Systems Vol, No 4, December 0, pp.385-389 ISSN 598-748 EISSN 34-6473 http://dx.doi.org/0.73/iems.0..4.385 0 KIIE Multivariate Process Control Chart for Controlling e

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Efficient Choice of Biasing Constant. for Ridge Regression

Efficient Choice of Biasing Constant. for Ridge Regression Int. J. Contemp. Math. Sciences, Vol. 3, 008, no., 57-536 Efficient Choice of Biasing Constant for Ridge Regression Sona Mardikyan* and Eyüp Çetin Department of Management Information Systems, School of

More information

Linear Models in Statistics

Linear Models in Statistics Linear Models in Statistics ALVIN C. RENCHER Department of Statistics Brigham Young University Provo, Utah A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Analysis of Variance and Design of Experiments-II MODULE VIII LECTURE - 36 RESPONSE SURFACE DESIGNS Dr. Shalabh Department of Mathematics & Statistics Indian Institute of Technology Kanpur 2 Design for

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002 Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 00 Please do not quote without permission Variance Inflation Factors. Variance

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Some Construction Methods of Optimum Chemical Balance Weighing Designs II

Some Construction Methods of Optimum Chemical Balance Weighing Designs II Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 5(): 39-44 Scholarlin Research Institute Journals, 4 (ISS: 4-76) jeteas.scholarlinresearch.org Journal of Emerging Trends in Engineering

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Regression coefficients may even have a different sign from the expected.

Regression coefficients may even have a different sign from the expected. Multicolinearity Diagnostics : Some of the diagnostics e have just discussed are sensitive to multicolinearity. For example, e kno that ith multicolinearity, additions and deletions of data cause shifts

More information

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

CHAPTER 5 LINEAR REGRESSION AND CORRELATION CHAPTER 5 LINEAR REGRESSION AND CORRELATION Expected Outcomes Able to use simple and multiple linear regression analysis, and correlation. Able to conduct hypothesis testing for simple and multiple linear

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Multiple Regression Methods

Multiple Regression Methods Chapter 1: Multiple Regression Methods Hildebrand, Ott and Gray Basic Statistical Ideas for Managers Second Edition 1 Learning Objectives for Ch. 1 The Multiple Linear Regression Model How to interpret

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall Applied Econometrics Second edition Dimitrios Asteriou and Stephen G. Hall MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3. Imperfect Multicollinearity 4.

More information

Curve Fitting Analytical Mode Shapes to Experimental Data

Curve Fitting Analytical Mode Shapes to Experimental Data Curve Fitting Analytical Mode Shapes to Experimental Data Brian Schwarz, Shawn Richardson, Mar Richardson Vibrant Technology, Inc. Scotts Valley, CA ABSTRACT In is paper, we employ e fact at all experimental

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model A1: There is a linear relationship between X and Y. A2: The error terms (and

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Multicollinearity : Estimation and Elimination

Multicollinearity : Estimation and Elimination Multicollinearity : Estimation and Elimination S.S.Shantha Kumari 1 Abstract Multiple regression fits a model to predict a dependent (Y) variable from two or more independent (X) variables. If the model

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

Chapter 5 Friday, May 21st

Chapter 5 Friday, May 21st Chapter 5 Friday, May 21 st Overview In this Chapter we will see three different methods we can use to describe a relationship between two quantitative variables. These methods are: Scatterplot Correlation

More information

Study Notes on Matrices & Determinants for GATE 2017

Study Notes on Matrices & Determinants for GATE 2017 Study Notes on Matrices & Determinants for GATE 2017 Matrices and Determinates are undoubtedly one of the most scoring and high yielding topics in GATE. At least 3-4 questions are always anticipated from

More information

General linear models. One and Two-way ANOVA in SPSS Repeated measures ANOVA Multiple linear regression

General linear models. One and Two-way ANOVA in SPSS Repeated measures ANOVA Multiple linear regression General linear models One and Two-way ANOVA in SPSS Repeated measures ANOVA Multiple linear regression 2-way ANOVA in SPSS Example 14.1 2 3 2-way ANOVA in SPSS Click Add 4 Repeated measures The stroop

More information

Using the Regression Model in multivariate data analysis

Using the Regression Model in multivariate data analysis Bulletin of the Transilvania University of Braşov Series V: Economic Sciences Vol. 10 (59) No. 1-2017 Using the Regression Model in multivariate data analysis Cristinel CONSTANTIN 1 Abstract: This paper

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

Applied Regression Modeling

Applied Regression Modeling Applied Regression Modeling A Business Approach Iain Pardoe University of Oregon Charles H. Lundquist College of Business Eugene, Oregon WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION CONTENTS

More information

SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING

SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING Antoni Wibowo Farewell Lecture PPI Ibaraki 27 June 2009 EDUCATION BACKGROUNDS Dr.Eng., Social Systems

More information

Multiple Regression. Dan Frey Associate Professor of Mechanical Engineering and Engineering Systems

Multiple Regression. Dan Frey Associate Professor of Mechanical Engineering and Engineering Systems ultiple Regression Dan Frey Associate Professor of echanical Engineering and Engineering Systems Plan for Today ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

Recursive Summation of the nth Powers Consecutive Congruent Numbers

Recursive Summation of the nth Powers Consecutive Congruent Numbers Int. Journal of Math. Analysis, Vol. 7, 013, no. 5, 19-7 Recursive Summation of the nth Powers Consecutive Congruent Numbers P. Juntharee and P. Prommi Department of Mathematics Faculty of Applied Science

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH Professor Habshah MIDI, PhD Department of Mathematics, Faculty of Science / Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research University Putra, Malaysia

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

EECE208 Intro to Electrical Engineering Lab. 5. Circuit Theorems - Thevenin Theorem, Maximum Power Transfer, and Superposition

EECE208 Intro to Electrical Engineering Lab. 5. Circuit Theorems - Thevenin Theorem, Maximum Power Transfer, and Superposition EECE208 Intro to Electrical Engineering Lab Dr. Charles Kim 5. Circuit Theorems - Thevenin Theorem, Maximum Power Transfer, and Superposition Objectives: This experiment emphasizes e following ree circuit

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

THE COLLINEARITY KILL CHAIN:

THE COLLINEARITY KILL CHAIN: THE COLLINEARITY KILL CHAIN: Using the warfighter s engagement sequence to detect, classify, localize, and neutralize a vexing and seemingly intractable problem in regression analysis I believe that a

More information

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model International Journal of Statistics and Applications 04, 4(): 4-33 DOI: 0.593/j.statistics.04040.07 Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Multicollinearity Problem and Some Hypothetical Tests in Regression Model

Multicollinearity Problem and Some Hypothetical Tests in Regression Model ISSN: 3-9653; IC Value: 45.98; SJ Impact Factor :6.887 Volume 5 Issue XII December 07- Available at www.iraset.com Multicollinearity Problem and Some Hypothetical Tests in Regression Model R.V.S.S. Nagabhushana

More information

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition Douglas C. Montgomery ARIZONA STATE UNIVERSITY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore Contents Chapter 1. Introduction 1-1 What

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

Introduction to Regression

Introduction to Regression Introduction to Regression ιατµηµατικό Πρόγραµµα Μεταπτυχιακών Σπουδών Τεχνο-Οικονοµικά Συστήµατα ηµήτρης Φουσκάκης Introduction Basic idea: Use data to identify relationships among variables and use these

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Variable Selection in Regression using Multilayer Feedforward Network

Variable Selection in Regression using Multilayer Feedforward Network Journal of Modern Applied Statistical Methods Volume 15 Issue 1 Article 33 5-1-016 Variable Selection in Regression using Multilayer Feedforward Network Tejaswi S. Kamble Shivaji University, Kolhapur,

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information