Efficient Choice of Biasing Constant. for Ridge Regression

Similar documents
Multicollinearity and A Ridge Parameter Estimation Approach

A Proposed Nth Order Jackknife Ridge Estimator for Linear Regression Designs

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

Ridge Regression and Ill-Conditioning

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

Measuring Local Influential Observations in Modified Ridge Regression

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

ENHANCING THE EFFICIENCY OF THE RIDGE REGRESSION MODEL USING MONTE CARLO SIMULATIONS

Least Squares Estimation-Finite-Sample Properties

Response surface designs using the generalized variance inflation factors

Multicollinearity Problem and Some Hypothetical Tests in Regression Model

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Ill-conditioning and multicollinearity

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

Improved Liu Estimators for the Poisson Regression Model

Regression coefficients may even have a different sign from the expected.

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION

SOME NEW PROPOSED RIDGE PARAMETERS FOR THE LOGISTIC REGRESSION MODEL

Correlation. Tests of Relationships: Correlation. Correlation. Correlation. Bivariate linear correlation. Correlation 9/8/2018

A Modern Look at Classical Multivariate Techniques

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Section 2 NABE ASTEF 65

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

Ridge Regression Revisited

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

Theorems. Least squares regression

A Practical Guide for Creating Monte Carlo Simulation Studies Using R

CHAPTER 3: Multicollinearity and Model Selection

Quantitative Methods I: Regression diagnostics

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Machine Learning for OR & FE

The Effect of a Single Point on Correlation and Slope

Regression Diagnostics for Survey Data

Effect of Multicollinearity on Power Rates of the Ordinary Least Squares Estimators. P.M.B. 4000, Ogbomoso, Oyo State, Nigeria

Prediction of Bike Rental using Model Reuse Strategy

The Precise Effect of Multicollinearity on Classification Prediction

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Two Stage Robust Ridge Method in a Linear Regression Model

Homoskedasticity. Var (u X) = σ 2. (23)

Effect of outliers on the variable selection by the regularized regression

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Package mctest. February 26, 2018

Modelling the Electric Power Consumption in Germany

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Multiple Regression of Students Performance Using forward Selection Procedure, Backward Elimination and Stepwise Procedure

Analysing data: regression and correlation S6 and S7

USEFUL TRANSFORMATIONS

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

J. Environ. Res. Develop. Journal of Environmental Research And Development Vol. 8 No. 3A, January-March 2014

The prediction of house price

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Regression Analysis for Data Containing Outliers and High Leverage Points

Nonparametric Principal Components Regression

Matematické Metody v Ekonometrii 7.

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002

Response Surface Methodology

ISyE 691 Data mining and analytics

Lecture 4: Multivariate Regression, Part 2

Regression Diagnostics Procedures

Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators

Sizing Mixture (RSM) Designs for Adequate Precision via Fraction of Design Space (FDS)

Regularization: Ridge Regression and the LASSO

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Estimation under Multicollinearity: A Comparative Approach Using Monte Carlo Methods

Chapter 7 Student Lecture Notes 7-1

STAT 4385 Topic 06: Model Diagnostics

Empirical Economic Research, Part II

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

Shrinkage regression

Multiple Regression Analysis

TABLES OF P-VALUES FOR t- AND CHI-SQUARE REFERENCE DISTRIBUTIONS. W. W. Piegorsch

Lecture 1: OLS derivations and inference

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Robust model selection criteria for robust S and LT S estimators

Chapter 14 Student Lecture Notes 14-1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

\It is important that there be options in explication, and equally important that. Joseph Retzer, Market Probe, Inc., Milwaukee WI

ADDRESSING MULTICOLLINEARITY IN REGRESSION MODELS: A RIDGE REGRESSION APPLICATION 1

Multicollinearity: What Is It and What Can We Do About It?

Chapter 3 Multiple Regression Complete Example

9. Decision-making in Complex Engineering Design. School of Mechanical Engineering Associate Professor Choi, Hae-Jin

Transcription:

Int. J. Contemp. Math. Sciences, Vol. 3, 008, no., 57-536 Efficient Choice of Biasing Constant for Ridge Regression Sona Mardikyan* and Eyüp Çetin Department of Management Information Systems, School of Applied Disciplines, Boğaziçi University, 3434, Istanbul, Turkey mardikya@boun.edu.tr Department of Quantitative Methods, School of Business Administration, Istanbul University, 3430, Istanbul, Turkey eycetin@istanbul.edu.tr Abstract This endeavor developed a mathematical programming model, which has two main obectives minimizing variance inflation factors (VIF) and maximizing coefficient of determination, to yield an efficient biasing constant for ridge regression, which is widely used for multicollinearity problem. The multiobective structure is handled by means of goal programming. The model allowing analysts preemtive planning is analytical rather than trial and error, which is often the case in determining the biasing constant for ridge regression. The road of approach facilitates decision makers to move in an efficient region for choosing the constant via efficient frontiers stemming from the nonlinear goal programming model. The derived model requiring a few inputs may be embedded in any appropriate statistical software even though it is implemented on MS Excel for a published data set in this study. The results coming from even other many observations are impressive and definitely in accordance with that of the literature. The recipe may be a useful tool for ridge regression analysis for both researchers and industry. Keywords: Multi-obective and goal programming; Efficient frontier; Multicollinearity; Variance inflation factors; Coefficient of determination; Spreadsheet modeling Mathematics Subect Classification: 90C9; 90C99; 6J99; 68U99

58 S. Mardikyan and E. Çetin. Introduction Ridge regression first introduced by Hoerl and Kennard [5, 6] is one of the most popular methods that have been suggested for the multicollinearity problem. In case of multicolinearity, the ordinary least squares (OLS) unbiased estimators could become very unstable due to their large variances, which leads to poor prediction. Ridge regression allows biased estimators of the regression coefficients by modifying the least squares method to attain a substantial reduction in variance with an accompanied increase in stability of these coefficients. The method is thoroughly discussed and applied in the literature (See Belsley et al. []; Draper and Smith [3]; Hawkins and Yin [4]; Kleinbaum et al. [7]; Maddala [9]; Marquardt and Snee []; Myers []; Neter et al. [3]; Ngo et al. [4]). The vector of the ridge regression estimators b R is obtained by adding a small positive biasing constant (shrinkage parameter) c 0 into the least squares normal equations: b R = ( X X + ci) X Y () In the above formula, X is the nxp matrix consisting of the n observed values of the explanatory variables X, X,..., X p, Y is the nx vector of the sample data of the explained variable, and I is the pxp identity matrix. Due to the multicollinearity and dissimilar orders of magnitudes of the original variables, the problem of inverting a near-singular matrix appears in equation (). The centering and scaling transformations are applied to the variables to handle this problem. Therefore, X X becomes the correlation matrix. It is obvious that when c = 0 in equation (), OLS estimators are recovered. As c inceases the ridge regression estimators are biased but more precise than OLS estimators, hence they will be closer to the true parameters. It is shown that (Hoerl and Kennard [5-6]) if small enough c value for which the mean squared error is less than the mean squared error of the ordinary least squares is chosen, the procedure of the ridge regression is successful and b R becomes stable. The difficulty of this method is that the efficient value of c is not known and it varies from one application to another. There is a number of ways for the determination of the biasing constant c. The most common way of choosing this parameter is based on the ridge trace and the variance inflation factors (VIF). Ridge trace is the graphical representation of the estimated ridge parameters for different values of c, usually between 0 and. VIF is a measure to examine the degree of the multicollinearity in the model and computed as follows:

Efficient choice of biasing constant 59 VIF = () R where R is the determination coefficient in the regression of explanatory variable X on the remaining explanatory variables of the model. The variance of the th regression coefficient is proportional to VIF ; var( b R ) = ( σ ) VIF (3) where (σ') is the error term variance of the transformed model. As a rule of thumb, if any VIF exceeds 0, corresponding variable is said to be highly collinear. A common strategy (Belsley et al. []; Draper and Smith [3]; Myers []; Neter et al., [3]) in determination of c, is to examine the ridge trace and VIF values simultaneously, and choose the smallest c for which the regression coefficients first become stable with the lowest values of VIF. At this point, the corresponding values of the determination coefficient of the model ( R ) can also be taken into consideration. Other procedures (Nordberg [5]; Lee [8]) also exist and give an approximate efficient value for c. One of these is called the df-trace criterion (Tripp [6]), and it is based on the effective regression degrees of freedom (df). The criterion has the following form: df=tr([ X ( X X + ci) X ]) (4) The procedure involves plotting df against c and choosing c for which df becomes stable. In this study, we have developed a goal programming model by which efficient value of the biasing constant c is determined based on the two criteria: the minimization of the variance inflation factors and the maximization of the determination coefficient R of the model. The proposed model may be clearly superior in the sense that it gives the efficient value of c much closer to the values discussed in the literature, by ust one attempt. The paper is organized in the following way; the model is formulated, some Computational results are given regarding a published data set and finally some concluding remarks are discussed.. Developing the Model Let =,,3,..., n be the index of independent variables and c be the biasing constant for a ridge regression analysis. Then, we have two primary obectives; the first one is the minimization of variance inflation factors and maximization of

530 S. Mardikyan and E. Çetin R values of the model. These goals are generally competitive and hence there is a tradeoff between them. This competitive nature of the structure may be modeled by means of goal programming, which is a kind of multiobective programming (See Winston [7]; Winston and Venkataramanan [8] for more discussion about goal programming). The proposed goal programming can be formulated as follows. The obective function of the model is the minimization of the weighted sum of the appropriate deviations from the aspiration levels as a classical effort for goal programming methodology. The individual VIF values for the explanatory variables can be determined as separate goals within the recipe. If the priority coefficients of the VIF goals are P and that of the determination coefficient goal is P+ with an extension of the index, then the obective function is written as Minimize Pd n = + + P n+ d n+ (5) + where d and d n+ are the positive and negative deviations from the aspiration levels, respectively. Since the VIF goals to be minimized whereas R goal to be maximized, the positive and the negative deviations are minimized, respectively. The goal constraints group for the aspiration levels ρ, VIF d + + d = ρ VIF obectives can be stated as, for the (6) where VIF is the th diagonal element of the matrix ( + ci ) r( r + ci ) r and r is the correlation matrix of response variables in a matrix notational view. The other goal constraint imposes that the coefficient of determination must be equal to (in fact, aspiration level ρ ) within allowed limited deviations, n+ = + R d n + d = (7) + n+ where the well known coefficient of determination observation index i. ( yˆ ) i i y ( yi y) R = i with the There are some basic system constraints. The first one is that the biasing constant, which is a decision variable, is generally bounded below by 0 and above by, that is, c 0 and c. (8)

Efficient choice of biasing constant 53 For the deviation variables, d + d = 0 and + d 0 n+ d n+ = (9) Finally, other decision variables completes the mathematical model, + + d d, d, d 0. (0), n+ n+ The foregoing model is a nonlinear goal programming model. The desired (aspiration) values may be chosen arbitrarily, but the literature states that it will be sufficient to have desired VIF values as (Neter et al. [3]). The priorities may be specified by decision maker via manipulating priority constants. If all goals are nonpreemptive, which is the general case, all the constants should be the same such as. The solution of the model yields efficient biasing constant c for all obectives. The model may be easily optimized by means of MS Excel, which also facilitates operations in matrices in addition to its optimization tool the Solver (See Winston and Albright [9] and Albright et al. [] for the details in spreadsheet modeling and optimization). 3. Computational Results For a typical numerical example we use data given by Neter et al. [3] with 54 observations by one explained and three explanatory variables. The existence of the severe multicollinearity in the OLS model is shown by taking into consideration the various diagnostics. As a formal measure, the VIF values of the three variables are computed. It seems that they greatly exceed 0 (VIF =708.84, VIF =564.34, VIF 3 =04.6), which again indicates that serious multicollinearity problem exist. In this study, to combat the problem, first all the variables are centered and scaled then the ridge regression is applied to the data by using our goal programming model. As it was stated before, the aim is to determine the efficient c value from which the ridge regression model can be specified. The model is run by using MS Excel Solver, for a sequence of equal aspiration levels associated with VIF goals (from 0 to 00), and efficient c values and corresponding VIF, R, b R and df values are obtained and summarized in table. It is obvious that, for aspiration level value, the efficient c value is found to be 0.0, which is the proposed efficient value 0.0 in the literature. For this efficient c, all the VIF values are nearly, R is obtained as 0.78 that may be the sufficient value for this coefficient and corresponding ridge regression parameters are obtained.

53 S. Mardikyan and E. Çetin Aspiration Levels for VIFs c VIF VIF VIF 3 R b R b R b R3 df 0.000 0.07 0.36 0.7 0.68 0.80 0.30-0.006.57 0.0.000 0.998 0.993 0.78 0.538 0.384-0.33.000 0.04.000.797.53 0.78 0.605 0.38-0.60.09 4 0.009 4.000 3.39.454 0.783 0.694 0.5-0.95.059 6 0.007 6.000 4.984.750 0.784 0.76 0.9-0..079 8 0.006 8.000 6.576.045 0.785 0.87 0.43-0.4.095 0 0.005 0.000 8.68.339 0.785 0.866 0.099-0.6.09 0.005.000 9.760.63 0.785 0.90 0.060-0.78. 0 0.004 0.000 6.7 3.805 0.787.058-0.070-0.334.6 00 0.00 00.000 79.796 5.56 0.793.859-0.784-0.64.373 Table. The variation of c with VIF, R, br and df values for different equal aspiration values corresponding to VIF goals A typical ridge trace for the data is shown in figure. From this ridge trace, the c values between 0.00 and 0.05 seem promising.,5 br 0,5 0 0,000 0,00 0,00 0,030 0,040 0,050-0,5 br br br3 - c Figure. Ridge trace of the estimated regression coefficients for 0 0.05 c. In Figure, the trace of VIF values for some initial c values is obtained. It is obvious that for c = 0.0 the VIF values become stable and very close to.

Efficient choice of biasing constant 533 0 8 6 4 VIF 0 8 VIF VIF VIF3 6 4 0 0,000 0,005 0,00 0,05 0,00 0,05 c Figure. VIF values versus c. As mentioned before, a formal criterion for determination of the efficient c is computing df values by changing c from 0 to. In figure 3, df values are obtained for c between 0 to 0., and it is again clear that the minimum value of c for which df values become stable is around 0.05. 3,5 3,5 df,5 c 0,5 0 0,00 0,0 0,04 0,06 0,08 0,0 c Figure 3. df values versus c. As a nature of multiobective programming, since there is a trade-off between competitive obectives, in this particular case, maximizing R values versus minimizing VIF values, some trade-off curves can be generated by parametric variation of the aspiration levels associated with VIF goals. For the data with

534 S. Mardikyan and E. Çetin which we deal, the efficient frontiers R versus VIF values are plotted in figure 4. The graphical structure is as expected in theory for two vying obectives; one is minimized when the other is maximized. The trade-off curves facilitate decision maker to move efficiently in a region bounded by approximately.77 on the VIF axis and 0. on the R axis. The efficient point for a decision maker is a point on which her indifference curve and the efficient frontier intersect. 0,8 0,78 0,76 R squared 0,74 0,7 0,7 0,68 0,66 0 4 6 8 0 VIF VIF VIF VIF3 Figure 4. The Efficient Frontiers The different foregoing methodologies for obtaining c, in this case, are for the developed goal programming model. Also, many trials on different data showed that our model offers reliable and accurate results as in the literature. 4. Conclusion This study presented here proposes a mathematical programming model to obtain biasing constant for ridge regression, rather than a process of trial and error. The model has two main competitive obectives; minimizing VIF values and maximizing R values. The multiobective structure is handled via goal programming. The recipe balancing the obectives analytically offers an efficient biasing constant c in only one step for ridge regression models. From computational view, the road map may make redundant the ridge trace, df-trace or any other investigations to determine the biasing constant that it yields efficient value. Also, the presented mathematical programming model may be employed by means of MS Excel as a powerful spreadsheet tool. In addition, the methodology facilitates analysts to specify the desired aspiration levels. Moreover, by parametric runs of the model may generate efficient frontiers (tradeoff curves) for R and VIF competition, which provides analysts an efficient

Efficient choice of biasing constant 535 move within some bounds. Furthermore, the model may be utilized for a preemptive planning by ust giving different weights to the priority constants. This algorithm requiring a few input parameters may be integrated into any appropriate statistical software package such as SPSS at least using macros or recently developed statistical software (Mardikyan and Darcan [0]) as a future research. Another further research may be to extend the mathematical formulation so that some convenient criteria may be included. The analytical model, which is an application of operations research to statistics, indeed an instance of inbreeding, may be a useful tool for efficient choice of biasing constant for both researchers and industry. References [] S.C. Albright, W.L. Winston and C. Zappe, Data Analysis & Decision Making with Microsoft Excel (3rd edn), Thomson, South-Western, 006. [] D.A. Belsley, K. Edwin and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, New York, 980. [3] N. R. Draper and H. Smith, Applied Regression Analysis, Wiley, New York, 998. [4] D.M. Hawkins and X. Yin, A faster algorithm for ridge regression, Computational Statistics & Data Analysis, 40 (00), 53-6. [5] A.E. Hoerl and R.W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, (970a), 55-67. (Republished in Technometrics. 4 (000), 80-86.) [6] A.E. Hoerl and R.W. Kennard, Ridge regression: applications to nonorthogonal problems, Technometrics, (970b), 69-8. [7] D.G. Kleinbaum, L.L. Kupper, K.E. Muller and A. Nizam, Applied Regression Analysis and Other Multivariable Methods (3rd edn), Duxbury Press, Washington, 998. [8] T.S. Lee, Optimum ridge parameter selection, Applied Statistics, 36() (988), -8. [9] G.S. Maddala, Introduction to Econometrics (nd edn), Macmillan, New York, 99. [0] S. Mardikyan and O.N. Darcan, A software tool for regression analysis and its assumptions, Information Technology Journal, 5(5) (006), 884-89.

536 S. Mardikyan and E. Çetin [] D.W. Marquardt and R.D. Snee, Ridge regression in practice, The American Statistician, 9 (No. ) (975), 4-9. [] R.H. Myers, Classical and Modern Regression with Applications (nd edn), Duxbury Press, Washington, 990. [3] J. Neter, W. Wasserman and M.H. Kutner, Applied Linear Regression Models, Richard D. Irwin. Inc., Illinois, 983. [4] S.H. Ngo, S. Kemény and A. Deák, Performance of the ridge regression method as applied to complex linear and nonlinear models, Chemometrics and Intelligent Laboratory Systems, 67 (003), 69-78. [5] L. Nordberg, A procedure of determination of a good ridge parameter in linear regression, Communications in Statistics. Simulation and Computation, (3) (98), 85-89. [6] R.E. Tripp, Non-stochastic ridge regression and effective rank of the regressors matrix, PhD thesis, Department of Statistic, Virginia Polytechnic Institute and State University, 983. [7] W.L. Winston, Operations Research: Application and Algorithms (4th edn), Brooks/Cole, Belmont, 004. [8] W.L. Winston and M. Venkataramanan, Introduction to Mathematical Programming (4th edn), Brooks/Cole, USA, 003. [9] W.L. Winston and S.C. Albright, Practical Management Science (nd edn), Brooks/Cole, Pacific Grove, 00. Received: October 4, 007