T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ

Size: px

Start display at page:

Download "T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ"

Paula Little
6 years ago
Views:

1 T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ LIU TYPE LOGISTIC ESTIMATORS Yasin ASAR DOKTORA TEZİ İstatisti Anabilim Dalı Oca-015 KONYA Her Haı Salıdır

4 ÖZET DOKTORA TEZİ LİU TİPİ LOJİSTİK REGRESYON TAHMİN EDİCİLERİ Yasin ASAR Selçu Üniversitesi Fen Bilimleri Enstitüsü İstatisti Anabilim Dalı Danışman: Prof.Dr. Aşır GENÇ 015, 99 Sayfa Jüri Prof. Dr. Aşır GENÇ Doç. Dr. Coşun KUŞ Doç. Dr. Murat ERİŞOĞLU Doç. Dr. İsmail KINACI Yrd. Doç. Dr. Aydın KARAKOCA Binari loisti regresyon modellerinde çolu bağlantı problemi en ço olabilirli tahmin edicisinin varyansını şişirmete ve tahmin edicinin performansını düşürmetedir. Bu nedenle doğrusal modellerde çolu bağlantı problemini giderme için önerilen tahmin ediciler loisti regresyona genelleştirilmiştir. Bu tezde, bazı yanlı tahmin edicilerin loisti versiyonları gözden geçirilmiştir. Ayrıca, yeni bir genelleştirme yapılara daha önceilerle MSE riteri baımından performansı arşılaştırılmıştır. Yeni önerilen tahmin edici ii parametreli olduğundan parametrelerin seçimi için terarlı (iterative) bir metot önerilmiştir. Anahtar Kelimeler: Binari loisti regresyon, çolu bağlantı, hata areler ortalaması, Monte Carlo simülasyonu, MSE, yanlı tahmin edici. iv

5 ABSTRACT Ph.D THESIS LIU TYPE LOGISTIC ESTIMATORS Yasin ASAR THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCE OF SELÇUK UNIVERSITY THE DEGREE OF DOCTOR OF PHILOSOPHY IN STATISTICS Advisor: Prof.Dr. Aşır GENÇ 015, 99 Pages Jury Prof. Dr. Aşır GENÇ Assoc. Prof. Dr. Coşun KUŞ Assoc. Prof. Dr. Murat ERİŞOĞLU Assoc. Prof. Dr. İsmail KINACI Asst. Prof. Dr. Aydın KARAKOCA Multicollinearity problem inflates the variance of maximum lielihood estimator and affects the performance of this estimator negatively in binary logistic regression. Thus, biased estimators used to overcome this problem have been generated to logistic regression. Some of these estimators are reviewed in this thesis. Moreover, a new generalization is proposed to overcome multicollinearity performances of estimators are compared in the sense of MSE criterion. Since new estimator has two parameters, an iterative method is proposed to choose these parameters. Keywords: Biased estimator, binary logistic regression, mean squared error, Monte Carlo simulation, MSE, multicollinearity v

6 PREFACE First and foremost I want to than my advisor Prof. Dr. Aşır Genç for his support and guidance, for his patience and indness. It has been an honor to be his Ph.D. student. I appreciate all his contributions to mae my Ph.D. experience productive and stimulating, He has been very inspring for me. I would also lie to than my committee members Assoc. Prof. Dr. Coşun Kuş and Assist. Prof. Dr. Aydın Karaoca for their guidance. I am also grateful to Assoc. Prof. Dr. Murat Erişoğlu, as he was always there to help. His guidance and helpful contributions mae the thesis more comprehensive. I also appreciate his theoretical contributions to the thesis. Lastly and most importantly, I would lie to express immense appreciation and love to my wife and daughter. Without them, I simply would not be here. I owe them my success. Than you for being so patient and ind. Yasin ASAR KONYA-015 vi

7 CONTENTS ÖZET... iv ABSTRACT... v PREFACE... vi CONTENTS... vii SYMBOLS AND ABBREVIATIONS... ix 1. INTRODUCTION Review of the Literature MULTICOLLINEARITY Definition of Multicollinearity A graphical representation of the problem Sources of Multicollinearity Consequences of Multicollinearity Multicollinearity Diagnostics Some Methods to Overcome Multicollinearity BIASED ESTIMATORS IN LINEAR MODEL Ridge Estimator MSE properties of ridge estimator Selection of the parameter New proposed ridge estimators Comparison of the ridge estimators: A Monte Carlo simulation study Some conclusive remars regarding ridge regression and simulation Liu Estimator Liu Type Estimator Another Two Parameter Liu Type Estimator Two Parameter Ridge Estimator BIASED ESTIMATORS IN LOGISTIC MODEL: NEW METHODS Logistic Ridge Estimator MMSE comparison of MLE and LRE Logistic Liu Estimator MMSE comparison of MLE and LLE Logistic Liu-Type Estimator MMSE and MSE comparisons of MLE and LT A Monte Carlo simulation study Results and discussions regarding the simulation study Summary and conclusion Another Logistic Liu-Type Estimator vii

8 MMSE and MSE comparisons of MLE and LT Proposed estimators of the shrinage parameter d A Monte Carlo simulation study The results of the Monte Carlo simulation An application of real data Summary and conclusion A NEW BIASED ESTIMATOR FOR THE LOGISTIC MODEL MMSE Comparisons Between The Estimators The Comparison of MLE and YA The Comparison of LRE and YA The Comparison of LLE and YA The Comparison of LT1 and YA The Comparison of LT and YA Selection Processes of the Parameters d, and q Design of the Monte Carlo Simulation Study Results of the Simulation and Discussions Real Data Application Real Data Application Some Conclusive Remars CONCLUSION AND SUGGESTIONS Conclusion Suggestions REFERENCES CV viii

9 SYMBOLS AND ABBREVIATIONS Symbols XY Bias diag X E Var Cov, ran X tr X : Bias of the vector : Diagonal of the matrix X : Expected value of the random variable : Variance-covariance matrix of the random variable : Covariance of the random variables X and Y : Ran of the matrix X : Trace of the matrix X a b : Vector a is orthogonal to vector b : Variance of the error terms in linear regression model I : n n identity matrix n ~ N 0, In : Vector is distributed norrmally with zero mean and variance I n X x i : Another representation of the matrix X such that x i is the X X 1 X element placed in i th row and : Determinant of the matrix X : Inverse of the matrix X : Transpose of the matrix X : Euclidean norm of the vector th column of the matrix X ix

10 Abbreviations p.d. IWLS LLE LRE LT1 LT MAE MLE MMSE MSE OLS VIF YA : positive definite : Iteratively weighted least squares : Logistic Liu estimator : Logistic ridge estimator : Liu-type logistic estimator : Another Liu-type logistic estimator : Mean absolute error : Maximum lielihood estimator : Matrix mean square error : Mean square error : Ordinary least squares : Variance Inflation Factors : New two-parameter estimator x

11 1 1. INTRODUCTION Regression analysis is one of the most widely used statistical techniques which can be applied to investigate and model the relationship between variables. There is a wide range of applications of regression such as engineering, the physical and chemical sciences, economics, management and the social sciences (Montgomery et al., 001). Multiple linear regression is used to predict the values of a dependent variable when there are more than one explanatory variables (regressor). A multiple linear regression model can be expressed as follows: Y X (1.1) where Y is an n 1 vector of dependent variable, X is an n p data matrix (design matrix) whose columns are the explanatory variables, is a p 1 vector of unnown regression coefficients and is an n 1 vector of random errors following the normal distribution with zero mean and variance identity matrix. where I is an n n i.e., ~ N0, I There are some assumptions on the random error term in multiple linear regression models as follows: 1. The error terms i, i 1,,..., n are independent of each other and random, E, i 1,,..., n. 0 i Var i i, Var is the variance function, i 1,,..., n Cov, 0, Cov is the covariance function, i,1 i n 5. X, X is the th column vector of the matrix X such that 1,,..., p and X X1, X,..., X p, 6. The elements of the matrix X are constants and it has full ran, i.e., ran X p, 7. The matrix XX is non-singular, 8. ~ N0, In is true for the hypothesis tests and the error terms are the random variables following the normal distribution, 9. Model specification is constructed truly.

12 The above assumptions can be summarized in five basic assumptions (Allison, 1999; Aydın, 014) namely, 1. Y X means that Y is a linear function of X and a random error term. E, i1,,..., n. 0 i implies that the error terms have zero mean and more importantly the expected value of does not change with X implying that X and are not correlated. Var 3. i, i 1,,..., n provides that the variance of is the same for all observations, this property is called homoscedasticity. 4. i Cov, 0, i,1 i n means that random errors are independent of each other. 5. ~ N0, In means that the error terms are distributed normally. On the other hand, logistic regression or logit regression is a type of probabilistic statistical classification model (Bishop, 006). It is also used for predicting the outcome of a categorical dependent variable. Logistic regression measures the relationships between a categorical dependent variable and one or more independent variables, which are usually (not necessarily) continuous, by using probability scores as the predicted values of a dependent variable (Bhandari and Joensson, 011). Logistic regression can be binomial or multinomial. Binomial (binary) logistic regression is a type of regression analysis where the dependent (response) variable is dichotomous and the explanatory (independent) variables are continuous, categorical or both. In logistic regression, there is no assumption that the relationship between the independent variables and the dependent variable is linear, thus it is a nonlinear method being a useful tool in order to analyze data including categorical response variables. The binary logistic regression model has become the popular method of analysis in the situation that the outcome variable is discrete or dichotomous. Although, its original acceptance is important in the field of epidemiologic researches, this method has become a commonly employed method in the area of applied sciences such as engineering, health policy, biomedical research, business and finance, criminology, ecology, linguistics and biology (Hosmer and Lemeshow, 000). It is very important to understand that using logistic regression model for a dichotomous dependent variable as a predictive model is a better approach than using

13 3 the ordinary linear regression. It is nown that if the five basic assumptions of the linear regression are satisfied, ordinary least squares (OLS) estimates of the regression coefficients are unbiased and have minimum variance. However, if the dependent variable is assumed to be dichotomous, then it can be observed that if the first and second assumptions are true, then third and fifth ones are definitely false. For example, if the fifth assumption is considered, it is easy to observe that the error terms can only tae two values implying that it is not possible for them to have a normal distribution. If the variance of the errors is considered, then first assumption implies that variance of i equals to variance of y i. Generally, the variance of a dummy variable is p 1 p equation holds: i i where i p p x x p is the probability that y 1. Thus, the following Var i i 1 i i 1 i. But this equation says that the variance of i differs for different observations which violates the third assumption. For a great discussion of this situation see Allison (1999), Chapter. Off course, after determining the difference between the logistic regression and the linear regression in terms of the choice of parametric model and the assumptions, one can assess the regression analysis using the same general principles used in the linear regression model (Midi et al., 010). In order to satisfy the validity of the model, it has to satisfy the assumptions of logistic regression. The followings are the general assumptions involves in logistic regression analysis (UCLA, 007): 1. The conditional probabilities are a logistic function of the explanatory variables,. No important variables are omitted, 3. Irrelative variables are not included, 4. The explanatory variables are measured without error, 5. The observations of the variables are independent, 6. There is no linear dependency between the explanatory variables, 7. The errors of the model are distributed binomially. Both in multiple linear regression models and binary logistic regression models, one may not verify the given assumptions most of the time. The assumptions 5, 6 and 7 of the linear model and the sixth assumption of the logistic model imply that there should be no linear dependency between the explanatory variables or the columns of the i

14 4 matrix X. In other words, the explanatory variables are said to be orthogonal to each other. However, the regressors are not orthogonal in most of the applications. The multicollinearity (collinearity or ill-conditioning) problem arises when at least one linear function of the explanatory variables is nearly close to zero (Rawlings et al., 1998). The main point is that, if there is collinearity within the explanatory variables, it is hardly possible to obtain statistically good estimates of their distinct effects on some dependent variable. In the next chapter, the causes, results and the diagnostics of the multicollinearity problem are given in detail. In this thesis, Maximum lielihood estimator (MLE) and the following methods proposed to overcome multicollinearity are considered: Ridge estimator, Liu estimator, Liu-type estimators and a two-parameter ridge estimator. These estimators are successfully used in linear regression model. However, they have been adapted to the logistic regression model except for the two parameter ridge estimator recently. In this study, the logistic version of the two parameter ridge estimator is defined and the above estimators are compared by using the matrix mean squared error (MMSE) and mean squared error (MSE) criteria in the logistic regression model. Both some theoretical comparisons in the sense of MMSE and numerical comparisons via Monte Carlo simulation studies are given. In the simulations, MSE criterion is used to compare the performances of the estimators. The purposes of this study are as follows: Some new ridge regression estimators are defined for the linear model in Chapter 3 and these new estimators are used in the logistic versions of the above mentioned estimators successfully in Chapter 4; moreover, some new optimal shrinage parameters are proposed to use in Liu-type estimators to decrease the variance of the estimator in logistic regression model in Section 4.4; finally, a new two parameter ridge estimator is defined for the logistic regression model and it is showed by conducting an extensive Monte Carlo simulation that this new estimator is better than above estimators in the sense of MSE criterion. The organization of the thesis is as follows: In Chapter, formal definition of multicollinearity, a graphical representation of the problem, sources and consequences of multicollinearity are given. Moreover, multicollinearity diagnostics are reviewed. In Chapter 3, biased estimators in linear model which can be used to solve collinearity problem are reviewed. The definition of ridge estimator and its MSE

15 5 properties are given. Also, some new ridge estimators are defined and their performances are demonstrated via a Monte Carlo simulation study in Section 3.1. In the remaining parts of Chapter 3, Liu estimator, Liu type estimator, another two parameter Liu type estimator and finally a two parameter ridge estimator are given and some properties of them are reviewed. In Chapter 4, brief introductory information regarding logistic regression is given. Logistic versions of the mentioned estimators are discussed. In Section 4.1, the logistic ridge estimator is given and MMSE comparison of maximum lielihood estimator and logistic ridge estimator is obtained. In Section 4., logistic version of Liu estimator called logistic Liu estimator is discussed and MMSE comparison of MLE and logistic Liu estimator is obtained. In Section 4.3, logistic Liu type estimator is reviewed and MMSE comparison between MLE and this estimator is obtained. Moreover, a Monte Carlo simulation study is designed to compare some existing ridge estimators and the new estimators defined in Chapter 3 in logistic regression. In Section 4.4, another two-parameter Liu-type estimator is given in logistic regression. Again MMSE and MSE comparisons are obtained. Moreover, some new shrinage estimators are defined to be used in logistic Liu type estimator and a Monte Carlo experiment is conducted to evaluate the performances of these estimators. Finally, an application to real data is demonstrated. In Chapter 5, a new two parameter logistic ridge estimator is defined. In Section 5.1, MMSE comparisons between the new estimator and the other reviewed estimators are obtained. In Section 5., some new iterative selection processes of the parameters are defined. In Section 5.3, a Monte Carlo simulation is designed to compare the performances of all logistic estimators discussed in the thesis. Finally, two real data applications are illustrated. In the last chapter, a brief summary and conclusion are given and some future suggestions for the practitioners are provided Review of the Literature There are many distribution functions proposed for using in the analysis of a dichotomous dependent variable, see Cox and Snell (1989). However, the logistic distribution being an extremely flexible and easily used function and providing

16 6 clinically meaningful interpretation, it has become the popular distribution in this research area. The weighted sum of squares can be minimized approximately by using the maximum lielihood estimator. However, this estimator becomes instable when the explanatory variables are intercorrelated. Thus, due to high variance and very low t- ratios, the estimations of MLE are no more trustful. This is because the weighted matrix of cross-products becomes ill-conditioned when there is multicollinearity. There are some solutions to this problem. One of them is so called ridge regression which is firstly defined by Hoerl and Kennard (1970) for the linear model. Ridge estimator has been adusted to binary logistic regression model by Schaefer et al. (1984) successfully. The authors applied the ridge estimators defined by Hoerl and Kennard (1970) and Hoerl et al. (1975) in logistic regression. Recently, Månsson and Shuur (011) have applied and investigated a number of logistic ridge estimators. By conducting a Monte Carlo experiment, they investigated the performances of MLE and logistic ridge estimators in the presence of multicollinearity under different conditions. According to the results of the simulation, logistic ridge estimators have better performance than MLE. Kibria et al. (01) generalized different types of estimation methods of the ridge parameter proposed by Muniz et al. (01) to be used for logistic ridge regression. They evaluated the performances of the estimators via a Monte Carlo simulation. In the simulation study, they also calculated the average values and the standard deviations of the ridge parameter. Results showed that logistic ridge estimators outperform the MLE approach. In the study of Månsson et al. (01), a new shrinage estimator which is a generalization of the estimator defined by Liu (1993) is proposed. Using MSE, the optimal value of the shrinage parameter is obtained and some methods of estimation are given. Since logistic Liu estimator uses shrinage parameter, its length becomes smaller than the length of MLE. The authors showed that logistic Liu estimator has a better performance than MLE according to MSE and mean absolute error (MAE) criteria. Huang (01) defined a biased estimator which is a combination of logistic ridge estimator and logistic Liu estimator in order to combat multicollinearity. Necessary and sufficient conditions for the superiority of this estimator over MLE and logistic ridge

17 7 estimator are given. Moreover, a Monte Carlo simulation is designed to evaluate the performances of the estimators numerically. Finally, logistic Liu-type estimator defined by Inan and Erdogan (013) can also be used a solution to the problem. There are two parameters used in this estimator which seems to be a combination of ridge estimator and Liu estimator. It is showed that logistic Liu-type estimator has a better performance than MLE and logistic ridge estimator defined by Schaefer et al. (1984) in the sense of MSE. Moreover, a real data application is given in the paper.

18 8. MULTICOLLINEARITY In the first chapter, informal definition of collinearity which was firstly introduced by Frisch (1934) is given such that the problem occurs when there is a strong linear relation between the explanatory variables. In this chapter, formal definition of multicollinearity, sources and results of it and diagnostics of the problem are discussed..1. Definition of Multicollinearity It is given that X X1, X,..., X p and X contains the n observations of the th regressor. Now, the formal definition of multicollinearity can be written in terms of the linear dependencies of the columns of X. The vectors X1, X,..., X p are linearly dependent if there is a set of constants a1, a,..., a p, not all zero, such that p ax 0. (.1) 1 If the left hand side of the above equation is equal to zero, it is said that the data set has perfect (or exact) multicollinearity, i.e., the correlation coefficient between the two regressors is 1 or -1. In this case, the ran of the matrix XX becomes less than p and the inverse of it does not exists. However, if equation (.1) holds for some subset of the columns of X, then there is a near-linear dependency in XX and the problem of collinearity exists. Then the matrix XX is said to be ill-conditioning A graphical representation of the problem The nature of the collinearity problem can be exhibit geometrically with the following figures. The figures are taen from Belsley et al. (005). Some situations of the model y 0 1x 1 x, i 1,,..., n are given. In the figures below, the i i i i scatters of the observations are shown. In the x1, x floor are 1, x x scatters (points denoted by ), while above the data cloud resulting when the y dimension included (points denoted by ) is shown. Figure.1 shows the case that x1, x are not collinear. The well-defined least squares plane is represented by data cloud above. The y

9 intercept of this plane is the estimate of 0 and the partial slopes in the x 1 and x directions are the estimations of 1 and respectively. Figure.1. No collinearity Figure.

19 9 intercept of this plane is the estimate of 0 and the partial slopes in the x 1 and x directions are the estimations of 1 and respectively. Figure.1. No collinearity Figure.. Exact collinearity Figure. exhibits the case of perfect multicollinearity between x 1 and x. One can see that there is no plane of data cloud i.e., the plane of least squares is not defined. Figure.3 shows strong collinearity (but not perfect). The plane of least squares is illdefined since the least square estimates are imprecise i.e., their variance inflate.

20 10 Figure.3. Strong collinearity: All coefficients ill-determined Figure.4. Strong collinearity: Constant term well-determined Figure.5. Strong collinearity: well-determined

21 11 It is nown that multicoliinearity may not inflate all of the parameter estimates. This situation is represented in Figure.4 such that the estimates of 1 and are imprecise but the estimate of the intercept term is precise. Similarly, the estimate of is precise however, 0 and 1 have estimates lac of precision as shown in Figure.5... Sources of Multicollinearity There are several sources of collinearity. The followings are four primary sources of collinearity (Montgomery et al., 001): 1. The data collection method employed: When the researcher samples only a subspace of the region of the explanatory variables defined by equation (.1).. Constraints on the model or in the population: When physical constraints are present, multicollinearity exists regardless of the sampling method employed. These constraints generally occur in problems involving production or chemical processes, where the regressors are the components of a product, and these components add to a constant. 3. Model specification: It is possible that adding a polynomial term into a regression model can cause ill-conditioning in the matrix XX. 4. An overdefined model: If a model has more explanatory variables than observations, it is called an overdefined model. According to Gunst and Webster (1975), there are three things to do: First one is to redefine the model in terms of a smaller set of explanatory variables; second is to perform preliminary studies using only subsets of the original explanatory variables; finally using some methods such as principal components analysis to decide which variables to remove from the model..3. Consequences of Multicollinearity If there is multicollinearity problem with the data set X, one can face the following consequences: 1. The ordinary least squares (OLS) estimator used in linear model or MLE used in logistic model are unbiased estimators of the coefficient vector. They

22 1 have large variance and covariance which mae the estimation process difficult.. The lengths of the unbiased estimators mentioned above tend to be so large in absolute value. It is easy to see this by computing the squared distance between the unbiased estimators and the true parameter. To compute this distance, one should find the diagonal elements of the matrix XX 1. Since some of the eigenvalues of the matrix XX will be close to zero in the presence of collinearity, the inverse of that eigenvalue becomes too large (may be infinity in worst case). Thus, the squared distance becomes too large. 3. Because of the first consequence, the confidence intervals of the coefficients become wider which leads to the acceptance of the null hypothesis. 4. Similarly, t-ratios of the coefficients become statistically insignificant. 5. However, the measure of goodness of fit t-ratios are insignificant. R may be very high, although the 6. The unbiased estimators and their standard errors are very sensitive to small changes in values of the observations. 7. Finally, multicollinearity problem can affect the model selection seriously. The increase in the sample standard errors of coefficients virtually assures a tendency for relevant variables to be discarded incorrectly from regression equations (Farrar and Glauber, 1967)..4. Multicollinearity Diagnostics There are several techniques that can be used for detecting multicollinearity. Some of them which directly determine the degree of collinearity and satisfy information in determining which regressors are involve in collinearity are given as follows: 1. Correlation Matrix: After standardizing the data, the matrix XX becomes the correlation matrix. Let variables x i and X X r i such that r i represents the correlation between the x. The more the linear dependency is, the more r becomes close to 1. However, this criterion shows only pairwise correlations,i.e., if more than two i

23 13 variables are involved in the collinearity, it is not certain that the pairwise correlations will be large (Montgomery et al., 001).. Variance Inflation Factors (VIF): The diagonal elements of the matrix X X 1 S was firstly used by Farrar and Glauber (1967) to determine the collinearity and named as variance inflation factor due to Marquaridt (1970). Moreover, S being the th diagonal element of the matrix S where 1,,..., p, it can be written that 1 1 VIF S R where R is the coefficient of determination obtained when x is regressed on the remaining p 1 regressors. It can be seen that if values of VIF increases. R increases, the VIF for each of the explanatory variables in the model measures the combined effect of the linear dependencies among the regressors on the variance term. It is said that if VIF exceeds 10, then there is a multicollinearity problem with the th regressor. Thus, the th regressor is estimated poorly (Montgomery et al., 001). However, in weaer models, which is often the case in logistic regression; values above.5 may be a sign of collinearity (Allison, 1999). There is another interpretation of VIF. Since the length of the confidence interval concerning the th regression coefficient can be computed by 1/ L ( S ) t /, np1, the square root of VIF shows how large is the interval. 3. Eigenvalues of the matrix XX : The eigenvalue analysis of XX can be used to determine multicollinearity. Let the eigenvalues of XX be 1,,..., p and max be the maximum and min be the minimum eigenvalues respectively. If there is a linear dependency between the columns of X, then one or more eigenvalues of XX will be small such that if the dependency is strong, the smallest eigenvalue will be close to zero. 4. Condition number: The condition number can be defined as follows: max. (.) min It is used as a measure of the degree of multicollinearity. If 10, then there is no multicollinearity problem. If , then there is a moderate multicollinearity. If 100, then there is strong multicollinearity (Aydın, 014; Montgomery et al., 001).

24 14 5. Determinant of XX : One can also chec the determinant of XX to determine whether there is collinearity or not. Since XX is in the correlation form, the possible values are between zero and one. If XX 0, then there is a near-linear dependency, so there is multicollinearity (Aydın, 014; Montgomery et al., 001). There are some other methods to determine the collinearity problem in the literature. However, the first four methods given above will be used to determine the problem most of the time in this study..5. Some Methods to Overcome Multicollinearity There are several methods for dealing with the problem of multicollinearity. Some of them are: Collecting additional data, model re-specification, using biased estimators especially when dealing with the regression coefficients. The first two methods are not the topic of this study, but the last one is. In this study, some biased estimators are considered such as ridge estimator, Liu estimator, Liu-type estimators and finally a new two parameter ridge estimator. In Chapter 3, these biased estimators are defined and their properties are investigated in the linear model.

25 15 3. BIASED ESTIMATORS IN LINEAR MODEL is given by Consider the multiple linear regression model given in (1.1). OLS estimator of 1 OLS X X X Y. (3.1) OLS estimator can be obtained by minimizing the following sum of squared deviations obective function with respect to the coefficient vector S Y X Y X Y X (3.) where prime denotes the transposition operation. Assuming that the assumptions of regression hold, OLS is the unbiased linear estimator with minimum variance (BLUE). However, OLS estimator becomes instable and its variance is inflated when there is multicollinearity. If the distance between and OLS is considered, one can obtain the following distance function and its expected value respectively: L, (3.3) E E L tr p 1 XX 1 1 (3.4) where is the p and th eigenvalue of the matrix XX such that ,,..., p. When the error term is distributed normally, the variance of the distance and its expected value can be written respectively as follows: 4 L X X Var tr p 4 1. i1 i (3.5) It can be seen easily from the above equations that if one of the eigenvalues is close to zero, then the variance of the distance becomes inflated. Thus, the length of OLS estimator becomes longer than the original coefficient vector.

26 16 In order to overcome this problem, biased estimators are proposed. Although these estimators impose some bias, their variances are much less than the variance of OLS estimator. Now, the following biased estimators are reviewed in this chapter: Ridge estimator (Hoerl and Kennard, 1970), Liu estimator (Liu, 1993), Liu-type estimator (Liu, 003), two parameter estimator (Özale and Kaçıranlar, 007) and finally two parameter ridge estimator (Lipovetsy and Conlin, 005). Before reviewing these estimators, a comparison criterion is needed. Since MMSE contains all the relevant information about an estimator, MMSE and MSE are commonly used to compare the performances of the estimators. Let be an estimator of the coefficient vector. Then, MMSE and MSE of can be obtained respectively as follows: MMSE MSE E Var Bias Bias, E tr Var Bias Bias where Bias is the bias of the estimator such that Bias E MSE tr MMSE holds. (3.6) (3.7) and 3.1. Ridge Estimator Ridge estimator was firstly proposed by Hoerl and Kennard (1970) in order to control the inflation and general instability associated with OLS estimator. The idea behind the ridge regression is that by adding a positive constant 0 to the diagonal elements of the matrix XX, one can obtain a smaller condition number and the variance is decreased. Ridge estimator can be written as follows: 1 X X I X Y. (3.8)

27 17 Hoerl and Kennard (1970) obtained the ridge estimator by minimizing subect to OLS X X OLS c where c is a constant. As a Lagrangian problem one can obtain the following equation OLS OLS 1 F X X c (3.9) where 1/ is the multiplier. Then, differentiating (3.9) with respect to, F 1 X X X X OLS 0 (3.10) is obtained and solving the last equation gives the ridge estimator given in equation (3.8). Ridge estimator can also be obtained by minimizing the following obective function with respect to : S (3.11) where. is the usual Euclidean norm MSE properties of ridge estimator To investigate the properties of ridge estimator, MMSE and MSE functions should be obtained. By using (3.6) and (3.7), it is easy to obtain these functions. First of all, it is better to compute the variance and bias of ridge estimator. To manage this, following alternative form of ridge estimator is used: C C (3.1) 1 OLS where C X X I, C X X and I is the p p identity matrix. The bias and variance of the estimator are obtained respectively as follows: Bias C C 1 1 1, 1 1 I C I I C I C 1 1 C CC (3.13) Var. (3.14)

28 18 Moreover, to define the MSE functions easily, the original model (1.1) can be written as follows: Y Z (3.15) where Q and Z XQ such that Q is the matrix whose columns are the eigenvectors of XX and ZZ diag 1,,..., p canonic model. follows:. This model is also called the Thus, MMSE and MSE of ridge estimator can be obtained respectively as MMSE C CC C C, (3.16) MSE where p p f f f1 is the total variance and f is the total bias of the estimator. (3.17) There are some interesting properties of this function obtained by Hoerl and Kennard (1970). The total variance f1 function of. The squared bias function increasing function of, see figure below. is a continuous, monotonically decreasing f is a continuous, monotonically Figure 3.1. MSE function of ridge estimator

29 f These two results can be proven by examining the derivatives of f1 19 and, see Hoerl and Kennard (1970). Moreover, the most important property is that there is always a 0 such that E L E L where E L E. This property is proved by Hoerl and Kennard (1970) and they found the following sufficient condition: (3.18) max where max is the maximum element of the coefficient vector. However, Theobald (1974) expand this upper bound to by using MMSE properties of the ridge max estimator. Since, the estimator given above fully depends on the unnown parameters and, Hoerl and Kennard (1970) suggested to use the unbiased estimator than. 1 ZY where Y Y Y Y / n p and On the other hand, there are some estimators of ridge parameter being greater max. Thus there is no definite condition to determine whether an estimator is optimal or not. This is an open problem to the researchers Selection of the parameter There are a lot of studies proposing different selection process of the parameter. In this subsection, a short literature review is provided. Before giving the review, there is a need for the coming discussion: It can be seen from the Figure 3.1 that it is possible to obtain smaller MSE values if the derivative of MSE function of ridge estimator is negative. If one adds different positive to the diagonal elements of the matrix XX (this is nown as generalized ridge regression), the optimal value of can be obtained by computing (3.19), and by using the unbiased estimators and, the following is obtained:

30 0. (3.0) There are some methods using the above individual parameter and its modifications in order to obtain new estimators of the usual ridge parameter. These new estimators being greater than max do not satisfy the sufficient condition. Moreover, there are estimators not satisfying (3.18) and having better performances at the same time. This is because, if one can find estimators maing the derivative of the equation (3.17) positive up to the intersection point of MSE and MSE OLS functions, it is possible that MSE MSE OLS. Some of the estimators considered in this study do not satisfy the condition (3.18). The following estimators are chosen from the literature: HK Hoerl and Kennard (1970) proposed the following estimator (3.1) max to estimate the values of. Hoerl et al. (1975) proposed the following estimator by applying harmonic mean function to the individual parameter (3.19) and obtain HKB p where is the OLS estimator of given in (3.15) such that HKB clearly. HK (3.) Lawless and Wang (1976) defined a new individual parameter and applied harmonic mean function to obtain p LW. p 1 which is definitely smaller than HKB. LWi (3.3) Kibria (003) proposed to use arithmetic and geometric means and median of the individual parameter (3.19) to obtain the following new estimators: p 1 AM, (3.4) p 1

31 1 which is the arithmetic mean of given in (3.0). GM, p 1/ p 1 (3.5) which is the geometric mean of. MED median, 1,,..., p, (3.6) which is obtained by using the median function. All of these estimator are clearly smaller than HK. Khalaf and Shuur (005) defined the following estimator:. n p KS max max max (3.7) The suggested modification here is that adding the amount / to the denominator of (3.1) which is a function of the correlation between the independent variables. However, this amount varies according to the sample size used. Thus, to eep the variation fixed, the authors multiply n p. At the end, holds. KS HK / by the number of degrees of freedom max Alhamisi et al. (006) suggested to apply the modifications mention in Kibria (003) to the estimator KS defined by Khalaf and Shuur (005) as follows: max p max KS max, 1,,...,, n p (3.8) (3.9) p mean 1 KS, p 1 n p p median KS median, 1,,...,. n p (3.30) Alhamisi and Shuur (007) proposed a new method based on (3.1) to estimate the ridge parameter, which is given by

32 AS max max maxmax 1 max HK 1 max max (3.31) which is definitely greater than HK. Moreover, new methods based on (3.31) are derived as follows: 1 NHKB HKB, (3.3) max 1 NAS max, 1,,..., p, (3.33) p 1 1 ARITH, p 1 (3.34) 1 NMED median, 1,,..., p, (3.35) again, one can observe that NHKB and. HK Muniz and Kibria (009) defined new estimators by applying the algorithms of geometric mean and square root to the approach obtained by Khalaf and Shuur (005) and Kibria (003). The idea of the square root transformation is taen from Alhamisi and Shuur (008). The proposed estimators are as follows: 1/ p p KM1, (3.36) 1 n p 1 KM max, 1,,..., p, (3.37) m KM 3 max m, 1,,..., p, (3.38) i 1/ p p 1 KM 4, (3.39) 1 m 1/ p p KM 5 m, (3.40) 1 NAS HK

33 3 1 KM 6 median, 1,,..., p, (3.41) m KM 7 median m, 1,,..., p, (3.4) where, 1,,...,. m p Muniz et al. (01) proposed some new estimators of the ridge regression parameter. These estimators use different quantiles and the square root transformation proposed in Khalaf and Shuur (005) and Alhamisi and Shuur (008) respectively. However, the base of the different functions is no longer the optimal value but a modification proposed by Khalaf and Shuur (005). This modification, which in general leads to larger values of the ridge parameters than those derived from the optimal values, was shown to wor well in the simulation study conducted in that paper: 1 KM 8 max, max n p max max KM 9 max, n p max (3.43) (3.44) 1/ p p 1 KM 10, (3.45) max n p max 1/ p p max KM11, (3.46) 1 n p max KM 1 1 median max n p max where 1,,..., p. (3.47)

34 4 Dorugade (014) suggested some new ridge parameters. The author proposed a new individual parameter which is a modification of (3.0) by multiplying the denominator with / max. This estimator taes a little bias than the estimator given by Hoerl et al. (1975) and substantially reduces the total variance of the parameter estimates than the total variance using the estimator given by Lawless and Wang (1976), thus improving the mean square error of estimation and prediction. The suggested individual estimator is defined as follows: AD, 1,,..., p. (3.48) max This leads to the denominator of new estimator being greater than that of (3.0) by /. Thus, one can write max, 1,,..., p. It is clear that this max new estimator is between the estimators given by Hoerl et al. (1975) and Lawless and Wang (1976). After that, the author uses arithmetic, geometric and harmonic means and median function to obtain the following new estimators: p AD, (3.49) 1 pmax 1 which is the arithmetic mean of AD. AD median, 1,,...,, p max (3.50) AD, 3 p 1/ p max 1 which is the geometric mean of AD. 4 AD, p max 1 which is the harmonic mean of given in (3.18). (3.51) (3.5) AD. All of these estimator satisfy the upper bound

35 New proposed ridge estimators In this subsection, some new estimators of ridge parameter are defined. New defined estimators are modifications of the estimators AM p 1 proposed in p 1 Kibria (003) and AD 4 p max 1 proposed in Dorugade (014). Some transformations are applied following Alhamisi and Shuur (007) and Khalaf and Shuur (005) in order to obtain new estimators having better performance. The followings are the new defined estimators: 1. Following Dorugade (014), the first new estimator is obtained by multiplying the denominator of (3.0) by such that /p max, 1,,..., p. The new estimator which is the harmonic mean max p of this individual estimator and smaller than HKB is as follows: p AY1. p max 1 (3.53). Similarly, multiplying the denominator of (3.0) by new estimator being smaller than HKB is obtained as follows: AY 3 p 3 p max i 1 3 /p max, the second (3.54) which is the harmonic mean of the following individual parameter, 1,,..., p. 3 max p 3. Third new estimator is obtained by modifying the denominator HKB by multiplying by 1/3 max. It is defined as follows: AY 3 p 1/3 p max 1 (3.55)

36 6 which is clearly smaller than HKB. 4. Another new estimator is obtained by multiplying the denominator of HKB p by 1/3 1 AY 4 p 1/3 as follows: p (3.56) 1 1 p which is again smaller than HKB. 5. The last new estimator is obtained by multiplying the denominator of the individual parameter by / such that max, 1,,..., p. max It is defined by AY 5 p p max 1 which is definitely smaller than HKB. (3.57) Comparison of the ridge estimators: A Monte Carlo simulation study This subsection is related to the comparison of the ridge estimators via a Monte Carlo simulation. In conducting the simulation, the performances of the estimators are compared in the sense of MSE. For a valuable Monte Carlo simulation two criteria are used in design. One criterion is to determine the effective factors affecting the properties of the estimators. The other one is to specify the criteria of udgment. The sample size n, the number of predictors p, the degree of correlation and the variances between the error terms are decided to be the effective factors. Also the mean squared error (MSE) is chosen to be the criteria for comparison of the performances. Thus the average MSE (AMSE) values of all estimators are computed with respect to different effective factors. There are many ridge estimators proposed in papers as mentioned in the previous subsections. Hence, some of them are chosen from the literature and they are compared to the new proposed estimators defined above. The followings are the

37 7 estimators to be compared: HK AY 4 and AY 5., NAS,, AM 4 AD, KM 8, KM 1, AY 1, AY, AY 3, Now, the general multiple linear regression model (1.1) is considered such that ~ N 0, In. If is chosen to be the largest eigenvalue of the matrix XX such that 1971). 1, then minimized value of the MSE can be obtained (Newhouse and Oman, In order to generate the explanatory variables, the following equation is used (Asar et al., 014; Månsson and Shuur, 011; Muniz and Kibria, 009): 1/ x 1 z z, (3.58) i i ip where i 1,,, n, 1,,... p, represents the correlation between the explanatory variables and z i s are independent random numbers obtained from the standard normal distribution. Dependent variable Y is obtained by Yi 1xi 1 xi pxip i, i 1,,, n. (3.59) The following different variations of the effective factors are considered: n 50, 100, 150, 0.95, 0.99, 0.999, p 4, 6 and 0.1, 0. 5, 1.0. For the given values of, the following condition numbers are considered respectively: 15, 30, 90 for p 4 and 5, 45,10 for p 6. First of all, the data matrix X and the observation vector Y are generated. Then, they are standardized in such a way that XX and XY are in correlation form. For different values of np,, and, the simulation is repeated 5000 times by generating the error terms of the general linear regression equation (1.1). equation: AMSE Average mean squared errors of the estimators are computed via the following 5000 r r r (3.60) where r is OLS and for different estimators of in the r th reprilcation.

38 Results of the simulation study and different In Tables , the results of AMSE values are presented for fixed n, p, s. All of the proposed parameters have better performance than HK and OLS estimator has the largest AMSE values. One can see from tables that when the error variance increases, AMSE values increase for all estimators. For the case p 4, 0.95, AY has the least AMSE value and other new proposed estimators except for AY 3 have better performance than the ones chosen from the literature. When the degree of the correlation is increased, AY 5 and 4 AD have less AMSE values than other estimators for 0.99 and One can see that all of the new proposed estimators except for AY 3 are quite better than the others, especially AY 4 is the best among them for p 6 and 0.95 and However, AY 5 and 4 AD perform almost equally for Some of these results can easily be seen from Figure 3. and 3.3 as well. All comparison graphs have been setched using earlier parameters,, NAS AM 4 AD and new parameters AY AY 3, selected randomly. Since the values of OLS and HK estimators are larger than the others in scale, they are not included in the graphs. It is nown that multicollinearity becomes severe when the correlation increases. However, there is an interesting result that AMSE values of new proposed estimators decrease, when the correlation increases. In other words, new proposed estimators are robust to the correlation. This feature is also observed for Figure AD and presented in the

39 A M S E A M S E 9 NAS AM AD4 AY AY Variance Figure 3.. Comparison of AMSE values for n 50, 0.95, p 4 Figure 3.3. Comparison of AMSE values for n 150, 0.95, p 4 NAS AM AD4 AY AY3 1,6 1,4 1, 1,0 0,8 0,6 0,4 0, 0, ,0 Variance Figure 3.4. Comparison of AMSE values for n 50, 0.999, p 6

40 A M S E 30 NAS AM AD4 AY AY Variance Figure 3.5. Comparison of AMSE values for n 150, 0.999, p 6 Figure 3.6. Comparison of AMSE values for n 50, 0.1, p 4 Although MSE is used as a comparison criterion, bias of an estimator is another indicator of good performance. Thus, comparisons of the estimators according to biases are summarized in Figure In most of the cases, HK has the least bias value. If the degree of correlation is increased, estimators have more bias as it is observed from Figure Moreover, AY AD. has a less bias than When 0.999, AY becomes the estimator having the least bias. Also 4 AD and AY 4 4 have quite less biases for this situation. If the biases are compared to the error variance, one can see that when the error variance increases, bias values of all of the estimators except for KM 8 and KM 1 increases monotonically when 0.95

41 31 and p 4. Figures , show the performances of bias values between the estimators. Figure 3.7. Comparison of biases for n 100, 0.99, p 6 Figure 3.8. Comparison of biases for n 100, 0.999, p 6 Figure 3.9. Comparison of biases for n p 50, 4, 0.1

42 3 Figure Comparison of biases for n p 50, 4, 1 It is observed from Figures that some of the estimators are robust to correlation. Biases of the estimators AY 3, AY 4 and 4 AD decrease when the degree of correlation increases. However, NAS and KM 8 have opposite features Some conclusive remars regarding ridge regression and simulation In Section 3.1, existing ridge estimators are reviewed and some new estimators are proposed. Six estimators chosen from the literature and new estimators are compared according to the mean squared error and bias criteria. A Monte Carlo simulation is conducted to compare the estimators by generating random numbers for dependent and independent variables and pseudorandom numbers for the error terms from the normal distribution. Tables consisting of AMSE values according to different values of the sample size n, the correlation coefficient between the explanatory variables, the number of predictors p and the variance of error terms. Some graphs are provided for selected situations. According to the tables and figures, it can be said that that new suggested estimators are better than other estimators in the sense of MSE. Finally, AY 1 and AY are the best ones in the sense of AMSE and the bias among the new estimators. The superiority of new estimators changes according to the

Multicollinearity and A Ridge Parameter Estimation Approach

Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com