Ill-conditioning and multicollinearity

Size: px

Start display at page:

Download "Ill-conditioning and multicollinearity"

Donald Ford
5 years ago
Views:

1 Linear Algebra and its Applications 2 (2000) Ill-conditioning and multicollinearity Fikri Öztürk a,, Fikri Akdeniz b a Department of Statistics, Ankara University, Ankara, Turkey b Department of Mathematics, Çukurova University, Adana, Turkey Received 2 June 997; accepted 20 April 2000 Submitted by H.J. Werner Abstract It is well known that unstability of solutions to small changes in inputs causes many problems in numerical computations. Existence, uniqueness and stability of solutions are important features of mathematical problems. Problems that fail to satisfy these conditions are called illposed. The purpose of this study is to remind briefly some methods of solution to ill-posed problems and to see the impacts or connections of these techniques to some statistical methods Elsevier Science Inc. All rights reserved. AMS classification: 65F5; 62J07 Keywords: Ill-posed problem; Ill-conditioning; Multicollinearity. Introduction In general, a system (or problem) has an input (initial value, data) u and an output (solution) r, wherer = R(u) and R is a method of solution. Let u be in the metric space U with a metric d U and r = R(u) be in the metric space F with a metric d F. Then the problem is called stable in the pair of spaces (F, U) if for all ɛ>0thereexists δ(ɛ) > 0 such that d U (u,u 2 ) δ(ɛ) d F (r,r 2 ) ɛ, where u,u 2 U and r = R(u ), r 2 = R(u 2 ). Corresponding author. addresses: ozturk@turkuaz.kku.edu.tr (F. Öztürk), akdeniz@mail.cu.edu.tr (F. Akdeniz) /00/$ - see front matter 2000 Elsevier Science Inc. All rights reserved. PII:S (00)0047-6

2 296 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) The problem of determining the solution r in the space F from the initial data u in the space U is said to be well posed on the pair of metric spaces (F, U) if the following three conditions are satisfied: (i) for every element u U, there exists a solution r in F ; (ii) the solution is unique; (iii) the problem is stable on (F, U). Problems that fail to satisfy these conditions are called ill-posed. Problems that satisfy conditions (i) and (ii) but not (iii) are called ill-conditioned, that is, problems which are unstable are called ill-conditioned [,2,20]. It should be pointed out that the definition of an ill-posed problem is related to the pair of metric spaces (F, U). The same problem may be well posed in other metrics. As a simple example let us consider the following two systems of equations: 6 x y + 6 z =, x + y + 2 z = 4, x y + z =.999 x =, y =, z =, and 6 x y + 6 z =, x + y + 2 z = 4, x y + z =.998 x = 0, y = 0, z = 8. The only difference between these systems is the value of the third component in the right-hand side vectors. If we take the right-hand side vectors as inputs u,u 2 U = R, the inverse of the coefficient matrix as R and the solutions as r,r 2 F = R, then according to the Euclidean metric ( 4 d U (u,u 2 ) = ( ) ) 2 + ( ) 2 = and d F (r,r 2 ) = ( 0) 2 + ( 0) 2 + ( ( 8)) 2 = 9. Small changes in inputs result large changes in outputs. In order to see that the problem is well posed, we need to check two things. First one is the existence and uniqueness and the second one is the stability of the solution.

3 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) For a real linear equation system it is easy to check the existence and uniqueness of the solution. But it is more difficult to check the stability. Let us focus on the latter problem. For a nonsingular matrix A R n n and b R n the solution of the system Ax = b exists and is unique. Suppose that A and δb are, respectively, the perturbations of A and b in the linear system Ax = b. IfA is replaced by a nonsingular matrix A + A and b is replaced by b + δb, then the error (change) in the solution can be bounded as x A ( b + A x ) A A for A A < (see e.g. [2, p. 404] and [8, 0]). When there is no change in A,thatis,when A =0, then x A b. The relationship between the relative error x / x and the relative change on the right-hand side vector b / b is x x = k(a) b b, where k(a) is the condition number of A and is defined by [ ( / )] x b k(a) = sup sup = A A max λ A b b x b min λ A, where λ A is the absolute value of an eigenvalue λ A of A. When the norm is the Euclidean norm and A is symmetric, then k(a) = max λ A min λ A. If the condition number k(a) of the coefficient matrix in the linear equation system Ax = b is large, then the problem with input b and output x = A b is unstable, that is, the problem (inverse problem) is ill-conditioned. The matrix A itself is called ill-conditioned. 2. Some solution methods for ill-posed inverse problems In general, an inverse problem contains a known operator A, a function (a vector) z which is a characterization of the phenomenon that we are going to model and input u obtained from the measurements, such that Az = u. The purpose of the inverse problem is to solve for z. Since the input comes from measurements, we have some ũ for u. Ifũ does not belong to the range of A, then there is a problem of existence of a solution. Even if ũ is in the range of A and the solution is unique it may not be stable, that is, the inverse problem may be ill-conditioned. So, measurement errors will cause some problems. Also, when the inverse problem is ill-conditioned and ũ

4 298 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) is an observation of a random function (or vector), then the solution (for example, the estimate) will be unreliable. Now we will remind some solution methods for ill-posed inverse problems from the book [20] by Tikhonov and Arsenin. It will be supposed that the operator A has a closed range. 2.. Selection method For z in F with a metric d F, u in U with a metric d U and an operator A : F U let us consider the inverse problem Az =ũ. Then the selection method for ill-posed inverse problems consists of a selection of a subset V (V F) and search for a solution z 0 (z 0 V)such that inf z V d U (Az, ũ) = d U (Az 0, ũ). When F,U are linear spaces, A : F U is a linear operator and U is a Hilbert space, for a subspace V (V F), Az 0 = P A(V ) u, () where P A(V ) is the orthogonal projection onto A(V ). In this method, the question becomes the selection of the subspace V in order to overcome the ill-posedness, that is, the existence of a solution and (or) its unstability Replacement method In this method, the operator A in the equation Az =ũ is replaced by a regular operator A + δi (δ R, δ>0andiis the identity operator) and z δ = (A + δi) ũ (2) is taken to be a solution. The question is how to choose the regularization parameter δ. 2.. Regularization method In this method, in order to stabilize the solution a continuous functional Ψ : F F R + {0} is defined and the functional M Ψ,α (z, ũ) = du 2 (Az, ũ) + αψ(z) is constructed, for a chosen value α (α >0). The functional M Ψ,α is called smoothing functional for the inverse problem Az =ũ. The element z α in F which minimizes the smoothing functional is called regularized solution. inf z F M Ψ,α (z, ũ) = M Ψ,α (z α, ũ).

5 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) The regularized solution z α is a function of ũ and α and it also depends on Ψ. So, the question is how to choose Ψ and α. When F,U are Hilbert spaces, A : F U is a linear operator and Ψ is a functional defined by Ψ : F R + {0}, z Ψ(z) = z 2, then the smoothing functional is M α (z, ũ) = Az ũ 2 + α z 2 = (Az ũ) (Az ũ) + αz z, () where denotes the conjugate transpose. In order to minimize M α (z, ũ), the derivative with respect to z is set to zero and the following equation is obtained: ( A A + αi ) z = A ũ. The solution of this equation is the regularized solution z α = ( A A + αi ) c A ũ n = λ n + α ϕ n, (4) n= where λ s and ϕ s are eigenvalues and eigenvectors of A A, respectively, and the coefficients c n are such that A ũ = n= c n ϕ n Iterative method Assume that the inverse problem Az =ũ has a unique solution. An iterative method is given by z n+ = z n ha (Az n ũ), n = 0,, 2,..., (5) where z 0 (z 0 F) is an initial value and h is a number such that 0 <h A A < 2 [4]. In this method, h can be chosen differently for every iteration or h can be replaced by an operator H. By applying the replacement method to preceding equation system the following solutions have been obtained: Eureka the solver, version.0: 6 x y + 6 z =, x + y + 2 z = 4/, x y + z =.9998 x = , y = , z = ;

6 00 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4, x y + ( + 0.)z =.9998 x = , y = , z = ; Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4/, x y + ( )z =.9998 x = , y = , z =.8702; Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4, x y + ( )z =.9998 x = , y = , z = In the above computer output, it is seen that the solutions are dependent on the values of the regularization parameter δ and they are getting better for small values of δ.. The problem of multicollinearity In statistics, our inputs (data) are observed values. Outputs are values of some statistics. When the outputs are sensitive (unstable) to small changes in data we can consider this sensitivity as an ill-conditioned problem. The variance of the statistic is a natural measure of this sensitivity. In statistical inference, the statistics (estimators) are obtained according to some optimization criteria or principles. The best statistics under some criteria may be sensitive (unstable) to data even though the criteria is the minimization of the variance. For example, in linear models, the Gauss Markov estimator of any estimable

7 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) linear combination of the parameter vector has minimum variance among all linear unbiased estimators, but because of a bad design (multicollinearity) the variance may be large and the estimated value may be far distant from the true value. In this section, we will give a short summary of the problem of ill-conditioning in linear models, that is, the problem of multicollinearity. We especially will emphasize that the origins of methods to handle the multicollinearity are in the solution techniques of ill-posed problems. The multicollinearity is defined as the existence of nearly linear dependency among column vectors of the design matrix X =[X,X 2,...,X p ] in the linear model Y = Xβ + ε. The existence of multicollinearity may result in wide confidence intervals for individual parameters (unstable estimates), may give estimates with wrong signs and may affect our decision in a hypothesis testing. Severe multicollinearity may make the estimates so unstable that they are practically useless. Multicollinearity is synonymous with ill-conditioning. So, the problem of multicollinearity is another version of the problem of ill-conditioning in the solution of the following normal equations: X X ˆβ = X Y. Therefore, in this inverse problem the condition number k(x X) is a measure of existence of multicollinearity. Even though the condition number gives some information about the existence of multicollinearity, it does not explain the structure of the linear dependency among the column vectors X,X 2,...,X p. The best way of explaining the existence and structure of multicollinearity is to look at the eigenvalues and eigenvectors of the matrix X X [9,,,8]. Let the column vectors X,X 2,...,X p be standardized such that X X is in the form of a correlation matrix. Let the spectral decomposition of X X be λ X 0 λ p X = V... V = VDV = λ i v i v i, i= λ p where λ λ 2 λ p > 0. If the condition number k(x X) = λ /λ p is large, then the multicollinearity exists and its structure is explained by eigenvector v p as [X,X 2,...,X p ]v p 0. In the last one to two decades, there are a number of papers on the cause of multicollinearity. It is found that some observations may be influential on the value of condition number. The omission of such observations may create or remove the problem of multicollinearity [,5 7,2]. Hundreds of studies have been done on the problem of multicollinearity. Even though many solution methods have been proposed, the problem has not been solved completely. Among the methods, the most popular one is the ridge method proposed by Hoerl and Kennard [2]. The ordinary ridge estimator is

8 02 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) ˆβ k = (X X + kl) X Y (6) obtained by adding a small positive number k to the diagonal elements of the normal equations. Notice that this estimator has the same expression as the solution in the replacement method (2) and regularization method (4). Ridge method has been a popular estimation method for over 25 years. It can yield more reliable point estimates of the cofficients than the ordinary least-squares estimates by reducing the standard errors of the estimates at the expense of introducing some bias. However, there are some problems with the choice of the biasing parameter k. The addition of a small number to the diagonal elements of a matrix was also familiar to Marquardt [4], who used it in his algorithms for nonlinear optimization. Marquardt defined a class of generalized inverse estimators and discussed their properties in the frame of biased estimators including the ordinary ridge estimator [5]. Let us now consider the normal equations X X ˆβ = X Y with ill-conditioned matrix X X and let us try to solve the normal equations by the selection method. For a subspace K(K R p ) we look for a solution ˆβ 0 K such that min ˆβ K d E ( X X ˆβ,X Y ) = d E ( X X ˆβ 0,X Y ), where d E is the Euclidean metric on R p. From () X X ˆβ 0 = P X X(K) (X Y ). (7) When K = span{v,v 2,...,v m }, that is, K is spanned by eigenvectors corresponding to m largest eigenvalues λ,λ 2,...λ m with m<p,thenx X(K) = K and P X X(K) = P K = W(WW ) W = WW, where W =[v,v 2,...,v m ]. Moreover, from (7) ˆβ 0 = ( X X ) WW X Y = VD V WW X Y 0 0 λ λ =VD W X Y = V 0 0 V X Y λ p This estimator ˆβ 0 obtained by the selection method is the principle component estimator of β. In the selection method, the trouble was selecting the subspace for the solutions. Here the trouble is in choosing the integer m, the number of principal

9 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) components. There are numerous suggestions in the literature concerning appropriate criteria for retention or deletion of components. Estimators based on the iterative method (5) and their statistical properties are also investigated. An estimation procedure based on the iterative method can be constructed as follows. Let ˆβ (0) be a fixed point in the parameter space and ˆβ (n) = (I hx X) ˆβ (n ) + hx Y, n =, 2,..., where h (0, /λ ). For ˆβ (0) =0and suitable chosen values h, n the estimator ˆβ (n) can have smaller mean-squared error than the least-squares estimator [7]. An estimator similar to β (n) is proposed by Trenkler [2]. Trenkler s motivation is based on a serial expansion for the Moore Penrose generalized inverse. The estimators considered above are better than the ordinary least-squares estimator in the case of multicollinearity according to the mean-squared error criteria. Nowadays the problem of multicollinearity is also a subject of research in generalized and multivariate linear models. The ridge method, which corresponds to operator replacement method for illconditioned problems, is used extensively in linear models as a solution method to overcome multicollinearity. The ridge method is also used in statistical analysis techniques which involves matrix inverses. For example, the Mahalonobis distance D(x µ) = (x µ) Σ (x µ) involves a matrix inversion, where x is an observed vector, µ and Σ are the mean vector and covariance matrix of the distribution (population). Substituting Σ + ki for Σ, D k (x µ) = (x µ) (Σ + ki) (x µ) is called the ridge type Mahalonobis distance. In discriminant analysis applications the classification rule based on the Mahalonobis distance involves sample covariance matrix ˆΣ. By substituting ˆΣ + ki for ˆΣ a ridge type classification rule is obtained [9]. As another example let us consider the canonical correlation analysis: [ ] ([ ] [ ]) X p X N µ Σ Σ, 2, 2 k µ 2 Σ 2 Σ 22 V = p α i X i, V 2 = i= k β j X 2j. j= Canonical correlation analysis problem is to find the coefficients α s and β s such that the correlation coefficient between the random variables V and V 2 is maximized. Hottelling proved that the problem becomes the eigenvalue and eigenvector analysis of Σ 22 Σ 2Σ Σ 2. In applications, the ridge type canonical correlation analysis [22] is based on ( ˆΣ 22 + k 2 I) ˆΣ 2 ( ˆΣ + k I) ˆΣ 2.

10 04 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) In computations which frequently involve matrix inversion, as in control theory and nonlinear models, the ridge method can be used as a powerful tool against ill-conditioning. 4. Conclusion A statistical analysis can be affected by ill-conditioning in three ways (stages). First, measurement errors can create some problems in the presence of ill-conditioning, since the outputs will be sensitive (unstable) to small changes in the inputs. At this stage the problem is the sensitivity of measurement devices and it looks more technological rather than statistical. Although statisticians ignore the measurment errors, sometimes they consider models with errors in variables. When errors are included in some variables, the statistical inference procedures will require some information about the distributional properties of these errors. Second, inference procedures based on the optimization of some statistical criteria can give misleading results in the presence of ill-conditioning, caused by bad design or sampling. Because of randomness there always will exist a natural variability in the observations. Therefore, it is necessary to check the existence of ill-conditioning before performing the statistical analysis. That is, in some way, we have to check whether the values of statistics (outputs) concerned in an inference procedure are sensitive to small changes in the data (inputs). Third, rounding errors (small changes in the inputs) can create problems in statistical computations when ill-conditioning exists. This computational aspect of the ill-conditioning is a purely numerical problem and is the most difficult one in numerical analysis. Perhaps, because of this, Why has no one noticed asked McCullough and Vinod [6]. They also added It is understandable that economists have paid little attention to whether or not econometric software is accurate. Until recently, econometrics texts rarely discussed computational aspects of solving econometric problems...many textbooks convey the impression all one has to do is use a computer to solve the problem, the implicit and unwaranted assumptions being that the computer s solution is accurate and that one software package is as good as any other...pencil-and-paper arithmetic is not like computer arithmetic. To overcome the problems mentioned we can use some robust numerical algorithms and statistical procedures, make operations like in variable selection or outlier detection techniques or apply some cure methods like in ridge regression. Finally, let us remind that the problem associated with ill-conditioning is a problem of severity rather than existency. The method of computation and the method of statistical inference greatly affect the results. In the last 40 years, some known numerical methods were adopted to obtain some statistical methods for inference in the presence of ill-conditioning. Some of the pioneers are Marquardt [5], Hoerl and Kennard [2], etc. Conversely, some results obtained by statisticians, for example, the research on leverage and influential observations, are helpful in creating

11 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) numerical algorithms and performing numerical computations when ill-conditioning exists. In the frame of software reliability statistical methods play a great role in the development studies of numerical methods which are robust against ill-conditioning. Acknowledgment The authors would like to thank the associate editor and the referees for their helpful comments on this paper. References [] K.E. Atkinson, An Introduction to Numerical Analysis, Wiley, New York, 978. [2] N.S. Bakhvalov, Numerical Methods, Mir Publishers, Moscow, 977. [] D.A. Belsley, E, Kuh, R.E. Welsch, Regression Diagnostics, Wiley, New York, 980. [4] M, Brill, E. Schock, Iterative solution of ill-posed problems a survey, in: Proceedings of the Fourth International Mathematical Geophysics Seminar, Berlin, 987. [5] S. Chatterjee, A.S. Hadi, Influential observation high leverage points and outliers in linear regression, Statist. Sci. (986) [6] S. Chatterjee, A.S. Hadi, Sensitivity Analysis in Linear Regression, Wiley, New York, 988. [7] R.D Cook, S. Weisberg, Resudials and Influence in Regression, Chapman & Hall, New York, 982. [8] B.N. Datta, Numerical Linear Algebra and Applications, Brooks/Cole Publishing Company, 994. [9] J, Fox, G. Monette, Generalized collinearity diagnostics, J. Amer. Statist. Assoc. 87 (992) [0] G.H. Golub, Ch.F. Van, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 99. [] R.F. Gunst, Regression analysis with multicollinear predictor variables: definition detection and effects, Comm. Statist. Theory Methods 2 (98) [2] A.E Hoerl, R.W. Kennard, Ridge regression: biased estimation of nonorthogonal problems, Technometrics 2 (970) [] E.R Mansfield, B.P. Helms, Detecting multicollinearity, Amer. Statist. 6 (982) [4] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, J. Soc. Ind. App. Math. (96) [5] D.W. Marquardt, Generalized inverses ridge regression biased linear estimation and nonlinear estimation, Technometrics 2 (970) [6] B.D. McCullough, H.D. Vinod, The numerical reliability of econometric software, J. Econom. Literature XXXVII (999) [7] F. Öztürk, A discrete shrinking method as alternative to least-squares, Communications de la Faculté des Sciences de l Université d Ankara (984) [8] S.D. Silvey, Multicollinearity and imprecise estimation, J. Roy. Statist. Soc. B 5 (969) [9] R.K. Smidt, L.L. McDonald, Ridge discriminant analysis, Research Paper No. 08, Stat. Lab S , University of Wyoming, 976. [20] A.N. Tikhonov, V.Y. Arsenin, Solutions of Ill-posed Problems, Wiley, New York, 977. [2] G. Trenkler, An iteration estimator for the linear model, Compstat, Physica Werlag, Würtzburg, 978, 25. [22] H.D. Vinod, A. Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 98. [2] E. Walker, Detection of collinearity-influential observatoins, Comm. Statist. Theory Methods 8 (989)

Measuring Local Influential Observations in Modified Ridge Regression

Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University