T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ

Size: px
Start display at page:

Download "T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ"

Transcription

1 T.C. SELÇUK ÜNİVERSİTESİ FEN BİLİMLERİ ENSTİTÜSÜ LIU TYPE LOGISTIC ESTIMATORS Yasin ASAR DOKTORA TEZİ İstatisti Anabilim Dalı Oca-015 KONYA Her Haı Salıdır

2

3

4 ÖZET DOKTORA TEZİ LİU TİPİ LOJİSTİK REGRESYON TAHMİN EDİCİLERİ Yasin ASAR Selçu Üniversitesi Fen Bilimleri Enstitüsü İstatisti Anabilim Dalı Danışman: Prof.Dr. Aşır GENÇ 015, 99 Sayfa Jüri Prof. Dr. Aşır GENÇ Doç. Dr. Coşun KUŞ Doç. Dr. Murat ERİŞOĞLU Doç. Dr. İsmail KINACI Yrd. Doç. Dr. Aydın KARAKOCA Binari loisti regresyon modellerinde çolu bağlantı problemi en ço olabilirli tahmin edicisinin varyansını şişirmete ve tahmin edicinin performansını düşürmetedir. Bu nedenle doğrusal modellerde çolu bağlantı problemini giderme için önerilen tahmin ediciler loisti regresyona genelleştirilmiştir. Bu tezde, bazı yanlı tahmin edicilerin loisti versiyonları gözden geçirilmiştir. Ayrıca, yeni bir genelleştirme yapılara daha önceilerle MSE riteri baımından performansı arşılaştırılmıştır. Yeni önerilen tahmin edici ii parametreli olduğundan parametrelerin seçimi için terarlı (iterative) bir metot önerilmiştir. Anahtar Kelimeler: Binari loisti regresyon, çolu bağlantı, hata areler ortalaması, Monte Carlo simülasyonu, MSE, yanlı tahmin edici. iv

5 ABSTRACT Ph.D THESIS LIU TYPE LOGISTIC ESTIMATORS Yasin ASAR THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCE OF SELÇUK UNIVERSITY THE DEGREE OF DOCTOR OF PHILOSOPHY IN STATISTICS Advisor: Prof.Dr. Aşır GENÇ 015, 99 Pages Jury Prof. Dr. Aşır GENÇ Assoc. Prof. Dr. Coşun KUŞ Assoc. Prof. Dr. Murat ERİŞOĞLU Assoc. Prof. Dr. İsmail KINACI Asst. Prof. Dr. Aydın KARAKOCA Multicollinearity problem inflates the variance of maximum lielihood estimator and affects the performance of this estimator negatively in binary logistic regression. Thus, biased estimators used to overcome this problem have been generated to logistic regression. Some of these estimators are reviewed in this thesis. Moreover, a new generalization is proposed to overcome multicollinearity performances of estimators are compared in the sense of MSE criterion. Since new estimator has two parameters, an iterative method is proposed to choose these parameters. Keywords: Biased estimator, binary logistic regression, mean squared error, Monte Carlo simulation, MSE, multicollinearity v

6 PREFACE First and foremost I want to than my advisor Prof. Dr. Aşır Genç for his support and guidance, for his patience and indness. It has been an honor to be his Ph.D. student. I appreciate all his contributions to mae my Ph.D. experience productive and stimulating, He has been very inspring for me. I would also lie to than my committee members Assoc. Prof. Dr. Coşun Kuş and Assist. Prof. Dr. Aydın Karaoca for their guidance. I am also grateful to Assoc. Prof. Dr. Murat Erişoğlu, as he was always there to help. His guidance and helpful contributions mae the thesis more comprehensive. I also appreciate his theoretical contributions to the thesis. Lastly and most importantly, I would lie to express immense appreciation and love to my wife and daughter. Without them, I simply would not be here. I owe them my success. Than you for being so patient and ind. Yasin ASAR KONYA-015 vi

7 CONTENTS ÖZET... iv ABSTRACT... v PREFACE... vi CONTENTS... vii SYMBOLS AND ABBREVIATIONS... ix 1. INTRODUCTION Review of the Literature MULTICOLLINEARITY Definition of Multicollinearity A graphical representation of the problem Sources of Multicollinearity Consequences of Multicollinearity Multicollinearity Diagnostics Some Methods to Overcome Multicollinearity BIASED ESTIMATORS IN LINEAR MODEL Ridge Estimator MSE properties of ridge estimator Selection of the parameter New proposed ridge estimators Comparison of the ridge estimators: A Monte Carlo simulation study Some conclusive remars regarding ridge regression and simulation Liu Estimator Liu Type Estimator Another Two Parameter Liu Type Estimator Two Parameter Ridge Estimator BIASED ESTIMATORS IN LOGISTIC MODEL: NEW METHODS Logistic Ridge Estimator MMSE comparison of MLE and LRE Logistic Liu Estimator MMSE comparison of MLE and LLE Logistic Liu-Type Estimator MMSE and MSE comparisons of MLE and LT A Monte Carlo simulation study Results and discussions regarding the simulation study Summary and conclusion Another Logistic Liu-Type Estimator vii

8 MMSE and MSE comparisons of MLE and LT Proposed estimators of the shrinage parameter d A Monte Carlo simulation study The results of the Monte Carlo simulation An application of real data Summary and conclusion A NEW BIASED ESTIMATOR FOR THE LOGISTIC MODEL MMSE Comparisons Between The Estimators The Comparison of MLE and YA The Comparison of LRE and YA The Comparison of LLE and YA The Comparison of LT1 and YA The Comparison of LT and YA Selection Processes of the Parameters d, and q Design of the Monte Carlo Simulation Study Results of the Simulation and Discussions Real Data Application Real Data Application Some Conclusive Remars CONCLUSION AND SUGGESTIONS Conclusion Suggestions REFERENCES CV viii

9 SYMBOLS AND ABBREVIATIONS Symbols XY Bias diag X E Var Cov, ran X tr X : Bias of the vector : Diagonal of the matrix X : Expected value of the random variable : Variance-covariance matrix of the random variable : Covariance of the random variables X and Y : Ran of the matrix X : Trace of the matrix X a b : Vector a is orthogonal to vector b : Variance of the error terms in linear regression model I : n n identity matrix n ~ N 0, In : Vector is distributed norrmally with zero mean and variance I n X x i : Another representation of the matrix X such that x i is the X X 1 X element placed in i th row and : Determinant of the matrix X : Inverse of the matrix X : Transpose of the matrix X : Euclidean norm of the vector th column of the matrix X ix

10 Abbreviations p.d. IWLS LLE LRE LT1 LT MAE MLE MMSE MSE OLS VIF YA : positive definite : Iteratively weighted least squares : Logistic Liu estimator : Logistic ridge estimator : Liu-type logistic estimator : Another Liu-type logistic estimator : Mean absolute error : Maximum lielihood estimator : Matrix mean square error : Mean square error : Ordinary least squares : Variance Inflation Factors : New two-parameter estimator x

11 1 1. INTRODUCTION Regression analysis is one of the most widely used statistical techniques which can be applied to investigate and model the relationship between variables. There is a wide range of applications of regression such as engineering, the physical and chemical sciences, economics, management and the social sciences (Montgomery et al., 001). Multiple linear regression is used to predict the values of a dependent variable when there are more than one explanatory variables (regressor). A multiple linear regression model can be expressed as follows: Y X (1.1) where Y is an n 1 vector of dependent variable, X is an n p data matrix (design matrix) whose columns are the explanatory variables, is a p 1 vector of unnown regression coefficients and is an n 1 vector of random errors following the normal distribution with zero mean and variance identity matrix. where I is an n n i.e., ~ N0, I There are some assumptions on the random error term in multiple linear regression models as follows: 1. The error terms i, i 1,,..., n are independent of each other and random, E, i 1,,..., n. 0 i Var i i, Var is the variance function, i 1,,..., n Cov, 0, Cov is the covariance function, i,1 i n 5. X, X is the th column vector of the matrix X such that 1,,..., p and X X1, X,..., X p, 6. The elements of the matrix X are constants and it has full ran, i.e., ran X p, 7. The matrix XX is non-singular, 8. ~ N0, In is true for the hypothesis tests and the error terms are the random variables following the normal distribution, 9. Model specification is constructed truly.

12 The above assumptions can be summarized in five basic assumptions (Allison, 1999; Aydın, 014) namely, 1. Y X means that Y is a linear function of X and a random error term. E, i1,,..., n. 0 i implies that the error terms have zero mean and more importantly the expected value of does not change with X implying that X and are not correlated. Var 3. i, i 1,,..., n provides that the variance of is the same for all observations, this property is called homoscedasticity. 4. i Cov, 0, i,1 i n means that random errors are independent of each other. 5. ~ N0, In means that the error terms are distributed normally. On the other hand, logistic regression or logit regression is a type of probabilistic statistical classification model (Bishop, 006). It is also used for predicting the outcome of a categorical dependent variable. Logistic regression measures the relationships between a categorical dependent variable and one or more independent variables, which are usually (not necessarily) continuous, by using probability scores as the predicted values of a dependent variable (Bhandari and Joensson, 011). Logistic regression can be binomial or multinomial. Binomial (binary) logistic regression is a type of regression analysis where the dependent (response) variable is dichotomous and the explanatory (independent) variables are continuous, categorical or both. In logistic regression, there is no assumption that the relationship between the independent variables and the dependent variable is linear, thus it is a nonlinear method being a useful tool in order to analyze data including categorical response variables. The binary logistic regression model has become the popular method of analysis in the situation that the outcome variable is discrete or dichotomous. Although, its original acceptance is important in the field of epidemiologic researches, this method has become a commonly employed method in the area of applied sciences such as engineering, health policy, biomedical research, business and finance, criminology, ecology, linguistics and biology (Hosmer and Lemeshow, 000). It is very important to understand that using logistic regression model for a dichotomous dependent variable as a predictive model is a better approach than using

13 3 the ordinary linear regression. It is nown that if the five basic assumptions of the linear regression are satisfied, ordinary least squares (OLS) estimates of the regression coefficients are unbiased and have minimum variance. However, if the dependent variable is assumed to be dichotomous, then it can be observed that if the first and second assumptions are true, then third and fifth ones are definitely false. For example, if the fifth assumption is considered, it is easy to observe that the error terms can only tae two values implying that it is not possible for them to have a normal distribution. If the variance of the errors is considered, then first assumption implies that variance of i equals to variance of y i. Generally, the variance of a dummy variable is p 1 p equation holds: i i where i p p x x p is the probability that y 1. Thus, the following Var i i 1 i i 1 i. But this equation says that the variance of i differs for different observations which violates the third assumption. For a great discussion of this situation see Allison (1999), Chapter. Off course, after determining the difference between the logistic regression and the linear regression in terms of the choice of parametric model and the assumptions, one can assess the regression analysis using the same general principles used in the linear regression model (Midi et al., 010). In order to satisfy the validity of the model, it has to satisfy the assumptions of logistic regression. The followings are the general assumptions involves in logistic regression analysis (UCLA, 007): 1. The conditional probabilities are a logistic function of the explanatory variables,. No important variables are omitted, 3. Irrelative variables are not included, 4. The explanatory variables are measured without error, 5. The observations of the variables are independent, 6. There is no linear dependency between the explanatory variables, 7. The errors of the model are distributed binomially. Both in multiple linear regression models and binary logistic regression models, one may not verify the given assumptions most of the time. The assumptions 5, 6 and 7 of the linear model and the sixth assumption of the logistic model imply that there should be no linear dependency between the explanatory variables or the columns of the i

14 4 matrix X. In other words, the explanatory variables are said to be orthogonal to each other. However, the regressors are not orthogonal in most of the applications. The multicollinearity (collinearity or ill-conditioning) problem arises when at least one linear function of the explanatory variables is nearly close to zero (Rawlings et al., 1998). The main point is that, if there is collinearity within the explanatory variables, it is hardly possible to obtain statistically good estimates of their distinct effects on some dependent variable. In the next chapter, the causes, results and the diagnostics of the multicollinearity problem are given in detail. In this thesis, Maximum lielihood estimator (MLE) and the following methods proposed to overcome multicollinearity are considered: Ridge estimator, Liu estimator, Liu-type estimators and a two-parameter ridge estimator. These estimators are successfully used in linear regression model. However, they have been adapted to the logistic regression model except for the two parameter ridge estimator recently. In this study, the logistic version of the two parameter ridge estimator is defined and the above estimators are compared by using the matrix mean squared error (MMSE) and mean squared error (MSE) criteria in the logistic regression model. Both some theoretical comparisons in the sense of MMSE and numerical comparisons via Monte Carlo simulation studies are given. In the simulations, MSE criterion is used to compare the performances of the estimators. The purposes of this study are as follows: Some new ridge regression estimators are defined for the linear model in Chapter 3 and these new estimators are used in the logistic versions of the above mentioned estimators successfully in Chapter 4; moreover, some new optimal shrinage parameters are proposed to use in Liu-type estimators to decrease the variance of the estimator in logistic regression model in Section 4.4; finally, a new two parameter ridge estimator is defined for the logistic regression model and it is showed by conducting an extensive Monte Carlo simulation that this new estimator is better than above estimators in the sense of MSE criterion. The organization of the thesis is as follows: In Chapter, formal definition of multicollinearity, a graphical representation of the problem, sources and consequences of multicollinearity are given. Moreover, multicollinearity diagnostics are reviewed. In Chapter 3, biased estimators in linear model which can be used to solve collinearity problem are reviewed. The definition of ridge estimator and its MSE

15 5 properties are given. Also, some new ridge estimators are defined and their performances are demonstrated via a Monte Carlo simulation study in Section 3.1. In the remaining parts of Chapter 3, Liu estimator, Liu type estimator, another two parameter Liu type estimator and finally a two parameter ridge estimator are given and some properties of them are reviewed. In Chapter 4, brief introductory information regarding logistic regression is given. Logistic versions of the mentioned estimators are discussed. In Section 4.1, the logistic ridge estimator is given and MMSE comparison of maximum lielihood estimator and logistic ridge estimator is obtained. In Section 4., logistic version of Liu estimator called logistic Liu estimator is discussed and MMSE comparison of MLE and logistic Liu estimator is obtained. In Section 4.3, logistic Liu type estimator is reviewed and MMSE comparison between MLE and this estimator is obtained. Moreover, a Monte Carlo simulation study is designed to compare some existing ridge estimators and the new estimators defined in Chapter 3 in logistic regression. In Section 4.4, another two-parameter Liu-type estimator is given in logistic regression. Again MMSE and MSE comparisons are obtained. Moreover, some new shrinage estimators are defined to be used in logistic Liu type estimator and a Monte Carlo experiment is conducted to evaluate the performances of these estimators. Finally, an application to real data is demonstrated. In Chapter 5, a new two parameter logistic ridge estimator is defined. In Section 5.1, MMSE comparisons between the new estimator and the other reviewed estimators are obtained. In Section 5., some new iterative selection processes of the parameters are defined. In Section 5.3, a Monte Carlo simulation is designed to compare the performances of all logistic estimators discussed in the thesis. Finally, two real data applications are illustrated. In the last chapter, a brief summary and conclusion are given and some future suggestions for the practitioners are provided Review of the Literature There are many distribution functions proposed for using in the analysis of a dichotomous dependent variable, see Cox and Snell (1989). However, the logistic distribution being an extremely flexible and easily used function and providing

16 6 clinically meaningful interpretation, it has become the popular distribution in this research area. The weighted sum of squares can be minimized approximately by using the maximum lielihood estimator. However, this estimator becomes instable when the explanatory variables are intercorrelated. Thus, due to high variance and very low t- ratios, the estimations of MLE are no more trustful. This is because the weighted matrix of cross-products becomes ill-conditioned when there is multicollinearity. There are some solutions to this problem. One of them is so called ridge regression which is firstly defined by Hoerl and Kennard (1970) for the linear model. Ridge estimator has been adusted to binary logistic regression model by Schaefer et al. (1984) successfully. The authors applied the ridge estimators defined by Hoerl and Kennard (1970) and Hoerl et al. (1975) in logistic regression. Recently, Månsson and Shuur (011) have applied and investigated a number of logistic ridge estimators. By conducting a Monte Carlo experiment, they investigated the performances of MLE and logistic ridge estimators in the presence of multicollinearity under different conditions. According to the results of the simulation, logistic ridge estimators have better performance than MLE. Kibria et al. (01) generalized different types of estimation methods of the ridge parameter proposed by Muniz et al. (01) to be used for logistic ridge regression. They evaluated the performances of the estimators via a Monte Carlo simulation. In the simulation study, they also calculated the average values and the standard deviations of the ridge parameter. Results showed that logistic ridge estimators outperform the MLE approach. In the study of Månsson et al. (01), a new shrinage estimator which is a generalization of the estimator defined by Liu (1993) is proposed. Using MSE, the optimal value of the shrinage parameter is obtained and some methods of estimation are given. Since logistic Liu estimator uses shrinage parameter, its length becomes smaller than the length of MLE. The authors showed that logistic Liu estimator has a better performance than MLE according to MSE and mean absolute error (MAE) criteria. Huang (01) defined a biased estimator which is a combination of logistic ridge estimator and logistic Liu estimator in order to combat multicollinearity. Necessary and sufficient conditions for the superiority of this estimator over MLE and logistic ridge

17 7 estimator are given. Moreover, a Monte Carlo simulation is designed to evaluate the performances of the estimators numerically. Finally, logistic Liu-type estimator defined by Inan and Erdogan (013) can also be used a solution to the problem. There are two parameters used in this estimator which seems to be a combination of ridge estimator and Liu estimator. It is showed that logistic Liu-type estimator has a better performance than MLE and logistic ridge estimator defined by Schaefer et al. (1984) in the sense of MSE. Moreover, a real data application is given in the paper.

18 8. MULTICOLLINEARITY In the first chapter, informal definition of collinearity which was firstly introduced by Frisch (1934) is given such that the problem occurs when there is a strong linear relation between the explanatory variables. In this chapter, formal definition of multicollinearity, sources and results of it and diagnostics of the problem are discussed..1. Definition of Multicollinearity It is given that X X1, X,..., X p and X contains the n observations of the th regressor. Now, the formal definition of multicollinearity can be written in terms of the linear dependencies of the columns of X. The vectors X1, X,..., X p are linearly dependent if there is a set of constants a1, a,..., a p, not all zero, such that p ax 0. (.1) 1 If the left hand side of the above equation is equal to zero, it is said that the data set has perfect (or exact) multicollinearity, i.e., the correlation coefficient between the two regressors is 1 or -1. In this case, the ran of the matrix XX becomes less than p and the inverse of it does not exists. However, if equation (.1) holds for some subset of the columns of X, then there is a near-linear dependency in XX and the problem of collinearity exists. Then the matrix XX is said to be ill-conditioning A graphical representation of the problem The nature of the collinearity problem can be exhibit geometrically with the following figures. The figures are taen from Belsley et al. (005). Some situations of the model y 0 1x 1 x, i 1,,..., n are given. In the figures below, the i i i i scatters of the observations are shown. In the x1, x floor are 1, x x scatters (points denoted by ), while above the data cloud resulting when the y dimension included (points denoted by ) is shown. Figure.1 shows the case that x1, x are not collinear. The well-defined least squares plane is represented by data cloud above. The y

19 9 intercept of this plane is the estimate of 0 and the partial slopes in the x 1 and x directions are the estimations of 1 and respectively. Figure.1. No collinearity Figure.. Exact collinearity Figure. exhibits the case of perfect multicollinearity between x 1 and x. One can see that there is no plane of data cloud i.e., the plane of least squares is not defined. Figure.3 shows strong collinearity (but not perfect). The plane of least squares is illdefined since the least square estimates are imprecise i.e., their variance inflate.

20 10 Figure.3. Strong collinearity: All coefficients ill-determined Figure.4. Strong collinearity: Constant term well-determined Figure.5. Strong collinearity: well-determined

21 11 It is nown that multicoliinearity may not inflate all of the parameter estimates. This situation is represented in Figure.4 such that the estimates of 1 and are imprecise but the estimate of the intercept term is precise. Similarly, the estimate of is precise however, 0 and 1 have estimates lac of precision as shown in Figure.5... Sources of Multicollinearity There are several sources of collinearity. The followings are four primary sources of collinearity (Montgomery et al., 001): 1. The data collection method employed: When the researcher samples only a subspace of the region of the explanatory variables defined by equation (.1).. Constraints on the model or in the population: When physical constraints are present, multicollinearity exists regardless of the sampling method employed. These constraints generally occur in problems involving production or chemical processes, where the regressors are the components of a product, and these components add to a constant. 3. Model specification: It is possible that adding a polynomial term into a regression model can cause ill-conditioning in the matrix XX. 4. An overdefined model: If a model has more explanatory variables than observations, it is called an overdefined model. According to Gunst and Webster (1975), there are three things to do: First one is to redefine the model in terms of a smaller set of explanatory variables; second is to perform preliminary studies using only subsets of the original explanatory variables; finally using some methods such as principal components analysis to decide which variables to remove from the model..3. Consequences of Multicollinearity If there is multicollinearity problem with the data set X, one can face the following consequences: 1. The ordinary least squares (OLS) estimator used in linear model or MLE used in logistic model are unbiased estimators of the coefficient vector. They

22 1 have large variance and covariance which mae the estimation process difficult.. The lengths of the unbiased estimators mentioned above tend to be so large in absolute value. It is easy to see this by computing the squared distance between the unbiased estimators and the true parameter. To compute this distance, one should find the diagonal elements of the matrix XX 1. Since some of the eigenvalues of the matrix XX will be close to zero in the presence of collinearity, the inverse of that eigenvalue becomes too large (may be infinity in worst case). Thus, the squared distance becomes too large. 3. Because of the first consequence, the confidence intervals of the coefficients become wider which leads to the acceptance of the null hypothesis. 4. Similarly, t-ratios of the coefficients become statistically insignificant. 5. However, the measure of goodness of fit t-ratios are insignificant. R may be very high, although the 6. The unbiased estimators and their standard errors are very sensitive to small changes in values of the observations. 7. Finally, multicollinearity problem can affect the model selection seriously. The increase in the sample standard errors of coefficients virtually assures a tendency for relevant variables to be discarded incorrectly from regression equations (Farrar and Glauber, 1967)..4. Multicollinearity Diagnostics There are several techniques that can be used for detecting multicollinearity. Some of them which directly determine the degree of collinearity and satisfy information in determining which regressors are involve in collinearity are given as follows: 1. Correlation Matrix: After standardizing the data, the matrix XX becomes the correlation matrix. Let variables x i and X X r i such that r i represents the correlation between the x. The more the linear dependency is, the more r becomes close to 1. However, this criterion shows only pairwise correlations,i.e., if more than two i

23 13 variables are involved in the collinearity, it is not certain that the pairwise correlations will be large (Montgomery et al., 001).. Variance Inflation Factors (VIF): The diagonal elements of the matrix X X 1 S was firstly used by Farrar and Glauber (1967) to determine the collinearity and named as variance inflation factor due to Marquaridt (1970). Moreover, S being the th diagonal element of the matrix S where 1,,..., p, it can be written that 1 1 VIF S R where R is the coefficient of determination obtained when x is regressed on the remaining p 1 regressors. It can be seen that if values of VIF increases. R increases, the VIF for each of the explanatory variables in the model measures the combined effect of the linear dependencies among the regressors on the variance term. It is said that if VIF exceeds 10, then there is a multicollinearity problem with the th regressor. Thus, the th regressor is estimated poorly (Montgomery et al., 001). However, in weaer models, which is often the case in logistic regression; values above.5 may be a sign of collinearity (Allison, 1999). There is another interpretation of VIF. Since the length of the confidence interval concerning the th regression coefficient can be computed by 1/ L ( S ) t /, np1, the square root of VIF shows how large is the interval. 3. Eigenvalues of the matrix XX : The eigenvalue analysis of XX can be used to determine multicollinearity. Let the eigenvalues of XX be 1,,..., p and max be the maximum and min be the minimum eigenvalues respectively. If there is a linear dependency between the columns of X, then one or more eigenvalues of XX will be small such that if the dependency is strong, the smallest eigenvalue will be close to zero. 4. Condition number: The condition number can be defined as follows: max. (.) min It is used as a measure of the degree of multicollinearity. If 10, then there is no multicollinearity problem. If , then there is a moderate multicollinearity. If 100, then there is strong multicollinearity (Aydın, 014; Montgomery et al., 001).

24 14 5. Determinant of XX : One can also chec the determinant of XX to determine whether there is collinearity or not. Since XX is in the correlation form, the possible values are between zero and one. If XX 0, then there is a near-linear dependency, so there is multicollinearity (Aydın, 014; Montgomery et al., 001). There are some other methods to determine the collinearity problem in the literature. However, the first four methods given above will be used to determine the problem most of the time in this study..5. Some Methods to Overcome Multicollinearity There are several methods for dealing with the problem of multicollinearity. Some of them are: Collecting additional data, model re-specification, using biased estimators especially when dealing with the regression coefficients. The first two methods are not the topic of this study, but the last one is. In this study, some biased estimators are considered such as ridge estimator, Liu estimator, Liu-type estimators and finally a new two parameter ridge estimator. In Chapter 3, these biased estimators are defined and their properties are investigated in the linear model.

25 15 3. BIASED ESTIMATORS IN LINEAR MODEL is given by Consider the multiple linear regression model given in (1.1). OLS estimator of 1 OLS X X X Y. (3.1) OLS estimator can be obtained by minimizing the following sum of squared deviations obective function with respect to the coefficient vector S Y X Y X Y X (3.) where prime denotes the transposition operation. Assuming that the assumptions of regression hold, OLS is the unbiased linear estimator with minimum variance (BLUE). However, OLS estimator becomes instable and its variance is inflated when there is multicollinearity. If the distance between and OLS is considered, one can obtain the following distance function and its expected value respectively: L, (3.3) E E L tr p 1 XX 1 1 (3.4) where is the p and th eigenvalue of the matrix XX such that ,,..., p. When the error term is distributed normally, the variance of the distance and its expected value can be written respectively as follows: 4 L X X Var tr p 4 1. i1 i (3.5) It can be seen easily from the above equations that if one of the eigenvalues is close to zero, then the variance of the distance becomes inflated. Thus, the length of OLS estimator becomes longer than the original coefficient vector.

26 16 In order to overcome this problem, biased estimators are proposed. Although these estimators impose some bias, their variances are much less than the variance of OLS estimator. Now, the following biased estimators are reviewed in this chapter: Ridge estimator (Hoerl and Kennard, 1970), Liu estimator (Liu, 1993), Liu-type estimator (Liu, 003), two parameter estimator (Özale and Kaçıranlar, 007) and finally two parameter ridge estimator (Lipovetsy and Conlin, 005). Before reviewing these estimators, a comparison criterion is needed. Since MMSE contains all the relevant information about an estimator, MMSE and MSE are commonly used to compare the performances of the estimators. Let be an estimator of the coefficient vector. Then, MMSE and MSE of can be obtained respectively as follows: MMSE MSE E Var Bias Bias, E tr Var Bias Bias where Bias is the bias of the estimator such that Bias E MSE tr MMSE holds. (3.6) (3.7) and 3.1. Ridge Estimator Ridge estimator was firstly proposed by Hoerl and Kennard (1970) in order to control the inflation and general instability associated with OLS estimator. The idea behind the ridge regression is that by adding a positive constant 0 to the diagonal elements of the matrix XX, one can obtain a smaller condition number and the variance is decreased. Ridge estimator can be written as follows: 1 X X I X Y. (3.8)

27 17 Hoerl and Kennard (1970) obtained the ridge estimator by minimizing subect to OLS X X OLS c where c is a constant. As a Lagrangian problem one can obtain the following equation OLS OLS 1 F X X c (3.9) where 1/ is the multiplier. Then, differentiating (3.9) with respect to, F 1 X X X X OLS 0 (3.10) is obtained and solving the last equation gives the ridge estimator given in equation (3.8). Ridge estimator can also be obtained by minimizing the following obective function with respect to : S (3.11) where. is the usual Euclidean norm MSE properties of ridge estimator To investigate the properties of ridge estimator, MMSE and MSE functions should be obtained. By using (3.6) and (3.7), it is easy to obtain these functions. First of all, it is better to compute the variance and bias of ridge estimator. To manage this, following alternative form of ridge estimator is used: C C (3.1) 1 OLS where C X X I, C X X and I is the p p identity matrix. The bias and variance of the estimator are obtained respectively as follows: Bias C C 1 1 1, 1 1 I C I I C I C 1 1 C CC (3.13) Var. (3.14)

28 18 Moreover, to define the MSE functions easily, the original model (1.1) can be written as follows: Y Z (3.15) where Q and Z XQ such that Q is the matrix whose columns are the eigenvectors of XX and ZZ diag 1,,..., p canonic model. follows:. This model is also called the Thus, MMSE and MSE of ridge estimator can be obtained respectively as MMSE C CC C C, (3.16) MSE where p p f f f1 is the total variance and f is the total bias of the estimator. (3.17) There are some interesting properties of this function obtained by Hoerl and Kennard (1970). The total variance f1 function of. The squared bias function increasing function of, see figure below. is a continuous, monotonically decreasing f is a continuous, monotonically Figure 3.1. MSE function of ridge estimator

29 f These two results can be proven by examining the derivatives of f1 19 and, see Hoerl and Kennard (1970). Moreover, the most important property is that there is always a 0 such that E L E L where E L E. This property is proved by Hoerl and Kennard (1970) and they found the following sufficient condition: (3.18) max where max is the maximum element of the coefficient vector. However, Theobald (1974) expand this upper bound to by using MMSE properties of the ridge max estimator. Since, the estimator given above fully depends on the unnown parameters and, Hoerl and Kennard (1970) suggested to use the unbiased estimator than. 1 ZY where Y Y Y Y / n p and On the other hand, there are some estimators of ridge parameter being greater max. Thus there is no definite condition to determine whether an estimator is optimal or not. This is an open problem to the researchers Selection of the parameter There are a lot of studies proposing different selection process of the parameter. In this subsection, a short literature review is provided. Before giving the review, there is a need for the coming discussion: It can be seen from the Figure 3.1 that it is possible to obtain smaller MSE values if the derivative of MSE function of ridge estimator is negative. If one adds different positive to the diagonal elements of the matrix XX (this is nown as generalized ridge regression), the optimal value of can be obtained by computing (3.19), and by using the unbiased estimators and, the following is obtained:

30 0. (3.0) There are some methods using the above individual parameter and its modifications in order to obtain new estimators of the usual ridge parameter. These new estimators being greater than max do not satisfy the sufficient condition. Moreover, there are estimators not satisfying (3.18) and having better performances at the same time. This is because, if one can find estimators maing the derivative of the equation (3.17) positive up to the intersection point of MSE and MSE OLS functions, it is possible that MSE MSE OLS. Some of the estimators considered in this study do not satisfy the condition (3.18). The following estimators are chosen from the literature: HK Hoerl and Kennard (1970) proposed the following estimator (3.1) max to estimate the values of. Hoerl et al. (1975) proposed the following estimator by applying harmonic mean function to the individual parameter (3.19) and obtain HKB p where is the OLS estimator of given in (3.15) such that HKB clearly. HK (3.) Lawless and Wang (1976) defined a new individual parameter and applied harmonic mean function to obtain p LW. p 1 which is definitely smaller than HKB. LWi (3.3) Kibria (003) proposed to use arithmetic and geometric means and median of the individual parameter (3.19) to obtain the following new estimators: p 1 AM, (3.4) p 1

31 1 which is the arithmetic mean of given in (3.0). GM, p 1/ p 1 (3.5) which is the geometric mean of. MED median, 1,,..., p, (3.6) which is obtained by using the median function. All of these estimator are clearly smaller than HK. Khalaf and Shuur (005) defined the following estimator:. n p KS max max max (3.7) The suggested modification here is that adding the amount / to the denominator of (3.1) which is a function of the correlation between the independent variables. However, this amount varies according to the sample size used. Thus, to eep the variation fixed, the authors multiply n p. At the end, holds. KS HK / by the number of degrees of freedom max Alhamisi et al. (006) suggested to apply the modifications mention in Kibria (003) to the estimator KS defined by Khalaf and Shuur (005) as follows: max p max KS max, 1,,...,, n p (3.8) (3.9) p mean 1 KS, p 1 n p p median KS median, 1,,...,. n p (3.30) Alhamisi and Shuur (007) proposed a new method based on (3.1) to estimate the ridge parameter, which is given by

32 AS max max maxmax 1 max HK 1 max max (3.31) which is definitely greater than HK. Moreover, new methods based on (3.31) are derived as follows: 1 NHKB HKB, (3.3) max 1 NAS max, 1,,..., p, (3.33) p 1 1 ARITH, p 1 (3.34) 1 NMED median, 1,,..., p, (3.35) again, one can observe that NHKB and. HK Muniz and Kibria (009) defined new estimators by applying the algorithms of geometric mean and square root to the approach obtained by Khalaf and Shuur (005) and Kibria (003). The idea of the square root transformation is taen from Alhamisi and Shuur (008). The proposed estimators are as follows: 1/ p p KM1, (3.36) 1 n p 1 KM max, 1,,..., p, (3.37) m KM 3 max m, 1,,..., p, (3.38) i 1/ p p 1 KM 4, (3.39) 1 m 1/ p p KM 5 m, (3.40) 1 NAS HK

33 3 1 KM 6 median, 1,,..., p, (3.41) m KM 7 median m, 1,,..., p, (3.4) where, 1,,...,. m p Muniz et al. (01) proposed some new estimators of the ridge regression parameter. These estimators use different quantiles and the square root transformation proposed in Khalaf and Shuur (005) and Alhamisi and Shuur (008) respectively. However, the base of the different functions is no longer the optimal value but a modification proposed by Khalaf and Shuur (005). This modification, which in general leads to larger values of the ridge parameters than those derived from the optimal values, was shown to wor well in the simulation study conducted in that paper: 1 KM 8 max, max n p max max KM 9 max, n p max (3.43) (3.44) 1/ p p 1 KM 10, (3.45) max n p max 1/ p p max KM11, (3.46) 1 n p max KM 1 1 median max n p max where 1,,..., p. (3.47)

34 4 Dorugade (014) suggested some new ridge parameters. The author proposed a new individual parameter which is a modification of (3.0) by multiplying the denominator with / max. This estimator taes a little bias than the estimator given by Hoerl et al. (1975) and substantially reduces the total variance of the parameter estimates than the total variance using the estimator given by Lawless and Wang (1976), thus improving the mean square error of estimation and prediction. The suggested individual estimator is defined as follows: AD, 1,,..., p. (3.48) max This leads to the denominator of new estimator being greater than that of (3.0) by /. Thus, one can write max, 1,,..., p. It is clear that this max new estimator is between the estimators given by Hoerl et al. (1975) and Lawless and Wang (1976). After that, the author uses arithmetic, geometric and harmonic means and median function to obtain the following new estimators: p AD, (3.49) 1 pmax 1 which is the arithmetic mean of AD. AD median, 1,,...,, p max (3.50) AD, 3 p 1/ p max 1 which is the geometric mean of AD. 4 AD, p max 1 which is the harmonic mean of given in (3.18). (3.51) (3.5) AD. All of these estimator satisfy the upper bound

35 New proposed ridge estimators In this subsection, some new estimators of ridge parameter are defined. New defined estimators are modifications of the estimators AM p 1 proposed in p 1 Kibria (003) and AD 4 p max 1 proposed in Dorugade (014). Some transformations are applied following Alhamisi and Shuur (007) and Khalaf and Shuur (005) in order to obtain new estimators having better performance. The followings are the new defined estimators: 1. Following Dorugade (014), the first new estimator is obtained by multiplying the denominator of (3.0) by such that /p max, 1,,..., p. The new estimator which is the harmonic mean max p of this individual estimator and smaller than HKB is as follows: p AY1. p max 1 (3.53). Similarly, multiplying the denominator of (3.0) by new estimator being smaller than HKB is obtained as follows: AY 3 p 3 p max i 1 3 /p max, the second (3.54) which is the harmonic mean of the following individual parameter, 1,,..., p. 3 max p 3. Third new estimator is obtained by modifying the denominator HKB by multiplying by 1/3 max. It is defined as follows: AY 3 p 1/3 p max 1 (3.55)

36 6 which is clearly smaller than HKB. 4. Another new estimator is obtained by multiplying the denominator of HKB p by 1/3 1 AY 4 p 1/3 as follows: p (3.56) 1 1 p which is again smaller than HKB. 5. The last new estimator is obtained by multiplying the denominator of the individual parameter by / such that max, 1,,..., p. max It is defined by AY 5 p p max 1 which is definitely smaller than HKB. (3.57) Comparison of the ridge estimators: A Monte Carlo simulation study This subsection is related to the comparison of the ridge estimators via a Monte Carlo simulation. In conducting the simulation, the performances of the estimators are compared in the sense of MSE. For a valuable Monte Carlo simulation two criteria are used in design. One criterion is to determine the effective factors affecting the properties of the estimators. The other one is to specify the criteria of udgment. The sample size n, the number of predictors p, the degree of correlation and the variances between the error terms are decided to be the effective factors. Also the mean squared error (MSE) is chosen to be the criteria for comparison of the performances. Thus the average MSE (AMSE) values of all estimators are computed with respect to different effective factors. There are many ridge estimators proposed in papers as mentioned in the previous subsections. Hence, some of them are chosen from the literature and they are compared to the new proposed estimators defined above. The followings are the

37 7 estimators to be compared: HK AY 4 and AY 5., NAS,, AM 4 AD, KM 8, KM 1, AY 1, AY, AY 3, Now, the general multiple linear regression model (1.1) is considered such that ~ N 0, In. If is chosen to be the largest eigenvalue of the matrix XX such that 1971). 1, then minimized value of the MSE can be obtained (Newhouse and Oman, In order to generate the explanatory variables, the following equation is used (Asar et al., 014; Månsson and Shuur, 011; Muniz and Kibria, 009): 1/ x 1 z z, (3.58) i i ip where i 1,,, n, 1,,... p, represents the correlation between the explanatory variables and z i s are independent random numbers obtained from the standard normal distribution. Dependent variable Y is obtained by Yi 1xi 1 xi pxip i, i 1,,, n. (3.59) The following different variations of the effective factors are considered: n 50, 100, 150, 0.95, 0.99, 0.999, p 4, 6 and 0.1, 0. 5, 1.0. For the given values of, the following condition numbers are considered respectively: 15, 30, 90 for p 4 and 5, 45,10 for p 6. First of all, the data matrix X and the observation vector Y are generated. Then, they are standardized in such a way that XX and XY are in correlation form. For different values of np,, and, the simulation is repeated 5000 times by generating the error terms of the general linear regression equation (1.1). equation: AMSE Average mean squared errors of the estimators are computed via the following 5000 r r r (3.60) where r is OLS and for different estimators of in the r th reprilcation.

38 Results of the simulation study and different In Tables , the results of AMSE values are presented for fixed n, p, s. All of the proposed parameters have better performance than HK and OLS estimator has the largest AMSE values. One can see from tables that when the error variance increases, AMSE values increase for all estimators. For the case p 4, 0.95, AY has the least AMSE value and other new proposed estimators except for AY 3 have better performance than the ones chosen from the literature. When the degree of the correlation is increased, AY 5 and 4 AD have less AMSE values than other estimators for 0.99 and One can see that all of the new proposed estimators except for AY 3 are quite better than the others, especially AY 4 is the best among them for p 6 and 0.95 and However, AY 5 and 4 AD perform almost equally for Some of these results can easily be seen from Figure 3. and 3.3 as well. All comparison graphs have been setched using earlier parameters,, NAS AM 4 AD and new parameters AY AY 3, selected randomly. Since the values of OLS and HK estimators are larger than the others in scale, they are not included in the graphs. It is nown that multicollinearity becomes severe when the correlation increases. However, there is an interesting result that AMSE values of new proposed estimators decrease, when the correlation increases. In other words, new proposed estimators are robust to the correlation. This feature is also observed for Figure AD and presented in the

39 A M S E A M S E 9 NAS AM AD4 AY AY Variance Figure 3.. Comparison of AMSE values for n 50, 0.95, p 4 Figure 3.3. Comparison of AMSE values for n 150, 0.95, p 4 NAS AM AD4 AY AY3 1,6 1,4 1, 1,0 0,8 0,6 0,4 0, 0, ,0 Variance Figure 3.4. Comparison of AMSE values for n 50, 0.999, p 6

40 A M S E 30 NAS AM AD4 AY AY Variance Figure 3.5. Comparison of AMSE values for n 150, 0.999, p 6 Figure 3.6. Comparison of AMSE values for n 50, 0.1, p 4 Although MSE is used as a comparison criterion, bias of an estimator is another indicator of good performance. Thus, comparisons of the estimators according to biases are summarized in Figure In most of the cases, HK has the least bias value. If the degree of correlation is increased, estimators have more bias as it is observed from Figure Moreover, AY AD. has a less bias than When 0.999, AY becomes the estimator having the least bias. Also 4 AD and AY 4 4 have quite less biases for this situation. If the biases are compared to the error variance, one can see that when the error variance increases, bias values of all of the estimators except for KM 8 and KM 1 increases monotonically when 0.95

41 31 and p 4. Figures , show the performances of bias values between the estimators. Figure 3.7. Comparison of biases for n 100, 0.99, p 6 Figure 3.8. Comparison of biases for n 100, 0.999, p 6 Figure 3.9. Comparison of biases for n p 50, 4, 0.1

42 3 Figure Comparison of biases for n p 50, 4, 1 It is observed from Figures that some of the estimators are robust to correlation. Biases of the estimators AY 3, AY 4 and 4 AD decrease when the degree of correlation increases. However, NAS and KM 8 have opposite features Some conclusive remars regarding ridge regression and simulation In Section 3.1, existing ridge estimators are reviewed and some new estimators are proposed. Six estimators chosen from the literature and new estimators are compared according to the mean squared error and bias criteria. A Monte Carlo simulation is conducted to compare the estimators by generating random numbers for dependent and independent variables and pseudorandom numbers for the error terms from the normal distribution. Tables consisting of AMSE values according to different values of the sample size n, the correlation coefficient between the explanatory variables, the number of predictors p and the variance of error terms. Some graphs are provided for selected situations. According to the tables and figures, it can be said that that new suggested estimators are better than other estimators in the sense of MSE. Finally, AY 1 and AY are the best ones in the sense of AMSE and the bias among the new estimators. The superiority of new estimators changes according to the

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

SOME NEW PROPOSED RIDGE PARAMETERS FOR THE LOGISTIC REGRESSION MODEL

SOME NEW PROPOSED RIDGE PARAMETERS FOR THE LOGISTIC REGRESSION MODEL IMPACT: International Journal of Research in Applied, Natural and Social Sciences (IMPACT: IJRANSS) ISSN(E): 2321-8851; ISSN(P): 2347-4580 Vol. 3, Issue 1, Jan 2015, 67-82 Impact Journals SOME NEW PROPOSED

More information

On the Performance of some Poisson Ridge Regression Estimators

On the Performance of some Poisson Ridge Regression Estimators Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-28-2018 On the Performance of some Poisson Ridge Regression Estimators Cynthia

More information

ENHANCING THE EFFICIENCY OF THE RIDGE REGRESSION MODEL USING MONTE CARLO SIMULATIONS

ENHANCING THE EFFICIENCY OF THE RIDGE REGRESSION MODEL USING MONTE CARLO SIMULATIONS www.arpapress.com/volumes/vol27issue1/ijrras_27_1_02.pdf ENHANCING THE EFFICIENCY OF THE RIDGE REGRESSION MODEL USING MONTE CARLO SIMULATIONS Rania Ahmed Hamed Mohamed Department of Statistics, Mathematics

More information

Generalized Ridge Regression Estimator in Semiparametric Regression Models

Generalized Ridge Regression Estimator in Semiparametric Regression Models JIRSS (2015) Vol. 14, No. 1, pp 25-62 Generalized Ridge Regression Estimator in Semiparametric Regression Models M. Roozbeh 1, M. Arashi 2, B. M. Golam Kibria 3 1 Department of Statistics, Faculty of Mathematics,

More information

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors Available online at www.worldscientificnews.com WSN 59 (016) 1-3 EISSN 39-19 elationship between ridge regression estimator and sample size when multicollinearity present among regressors ABSTACT M. C.

More information

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com

More information

On Some Ridge Regression Estimators for Logistic Regression Models

On Some Ridge Regression Estimators for Logistic Regression Models Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-28-2018 On Some Ridge Regression Estimators for Logistic Regression Models Ulyana

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

Efficient Choice of Biasing Constant. for Ridge Regression

Efficient Choice of Biasing Constant. for Ridge Regression Int. J. Contemp. Math. Sciences, Vol. 3, 008, no., 57-536 Efficient Choice of Biasing Constant for Ridge Regression Sona Mardikyan* and Eyüp Çetin Department of Management Information Systems, School of

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Issues of multicollinearity and conditional heteroscedasticy in time series econometrics

Issues of multicollinearity and conditional heteroscedasticy in time series econometrics KRISTOFER MÅNSSON DS DS Issues of multicollinearity and conditional heteroscedasticy in time series econometrics JIBS Dissertation Series No. 075 JIBS Issues of multicollinearity and conditional heteroscedasticy

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2. Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round

More information

Ridge Estimator in Logistic Regression under Stochastic Linear Restrictions

Ridge Estimator in Logistic Regression under Stochastic Linear Restrictions British Journal of Mathematics & Computer Science 15(3): 1-14, 2016, Article no.bjmcs.24585 ISSN: 2231-0851 SCIENCEDOMAIN international www.sciencedomain.org Ridge Estimator in Logistic Regression under

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics CHAPTER 4 & 5 Linear Regression with One Regressor Kazu Matsuda IBEC PHBU 430 Econometrics Introduction Simple linear regression model = Linear model with one independent variable. y = dependent variable

More information

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity Chapter 335 Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Application of Independent Variables Transformations for Polynomial Regression Model Estimations

Application of Independent Variables Transformations for Polynomial Regression Model Estimations 3rd International Conference on Applied Maematics and Pharmaceutical Sciences (ICAMPS'03) April 9-30, 03 Singapore Application of Independent Variables Transformations for Polynomial Regression Model Estimations

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002 Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 00 Please do not quote without permission Variance Inflation Factors. Variance

More information

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY

More information

The Precise Effect of Multicollinearity on Classification Prediction

The Precise Effect of Multicollinearity on Classification Prediction Multicollinearity and Classification Prediction The Precise Effect of Multicollinearity on Classification Prediction Mary G. Lieberman John D. Morris Florida Atlantic University The results of Morris and

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

A Practical Guide for Creating Monte Carlo Simulation Studies Using R

A Practical Guide for Creating Monte Carlo Simulation Studies Using R International Journal of Mathematics and Computational Science Vol. 4, No. 1, 2018, pp. 18-33 http://www.aiscience.org/journal/ijmcs ISSN: 2381-7011 (Print); ISSN: 2381-702X (Online) A Practical Guide

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati

405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati 405 ECONOMETRICS Chapter # 11: MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED? Domodar N. Gujarati Prof. M. El-Sakka Dept of Economics Kuwait University In this chapter we take a critical

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

J.Thi-Qar Sci. Vol.2 (2) April/2010

J.Thi-Qar Sci. Vol.2 (2) April/2010 ISSN 1991-8690 1661 الترقيم الدولي - 0968 Correction for multicollinearity between the explanatory variables to estimation by using the Principal component method Ali L.Areef Science college Thi- Qar University

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Lecture 11. Correlation and Regression

Lecture 11. Correlation and Regression Lecture 11 Correlation and Regression Overview of the Correlation and Regression Analysis The Correlation Analysis In statistics, dependence refers to any statistical relationship between two random variables

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Lecture 1: Systems of linear equations and their solutions

Lecture 1: Systems of linear equations and their solutions Lecture 1: Systems of linear equations and their solutions Course overview Topics to be covered this semester: Systems of linear equations and Gaussian elimination: Solving linear equations and applications

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Section 2 NABE ASTEF 65

Section 2 NABE ASTEF 65 Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

Available online at   (Elixir International Journal) Statistics. Elixir Statistics 49 (2012) 10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors James Younker Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Research Article On the Weighted Mixed Almost Unbiased Ridge Estimator in Stochastic Restricted Linear Regression

Research Article On the Weighted Mixed Almost Unbiased Ridge Estimator in Stochastic Restricted Linear Regression Applied Mathematics Volume 2013, Article ID 902715, 10 pages http://dx.doi.org/10.1155/2013/902715 Research Article On the Weighted Mixed Almost Unbiased Ridge Estimator in Stochastic Restricted Linear

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information