Ill-conditioning and multicollinearity

Size: px
Start display at page:

Download "Ill-conditioning and multicollinearity"

Transcription

1 Linear Algebra and its Applications 2 (2000) Ill-conditioning and multicollinearity Fikri Öztürk a,, Fikri Akdeniz b a Department of Statistics, Ankara University, Ankara, Turkey b Department of Mathematics, Çukurova University, Adana, Turkey Received 2 June 997; accepted 20 April 2000 Submitted by H.J. Werner Abstract It is well known that unstability of solutions to small changes in inputs causes many problems in numerical computations. Existence, uniqueness and stability of solutions are important features of mathematical problems. Problems that fail to satisfy these conditions are called illposed. The purpose of this study is to remind briefly some methods of solution to ill-posed problems and to see the impacts or connections of these techniques to some statistical methods Elsevier Science Inc. All rights reserved. AMS classification: 65F5; 62J07 Keywords: Ill-posed problem; Ill-conditioning; Multicollinearity. Introduction In general, a system (or problem) has an input (initial value, data) u and an output (solution) r, wherer = R(u) and R is a method of solution. Let u be in the metric space U with a metric d U and r = R(u) be in the metric space F with a metric d F. Then the problem is called stable in the pair of spaces (F, U) if for all ɛ>0thereexists δ(ɛ) > 0 such that d U (u,u 2 ) δ(ɛ) d F (r,r 2 ) ɛ, where u,u 2 U and r = R(u ), r 2 = R(u 2 ). Corresponding author. addresses: ozturk@turkuaz.kku.edu.tr (F. Öztürk), akdeniz@mail.cu.edu.tr (F. Akdeniz) /00/$ - see front matter 2000 Elsevier Science Inc. All rights reserved. PII:S (00)0047-6

2 296 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) The problem of determining the solution r in the space F from the initial data u in the space U is said to be well posed on the pair of metric spaces (F, U) if the following three conditions are satisfied: (i) for every element u U, there exists a solution r in F ; (ii) the solution is unique; (iii) the problem is stable on (F, U). Problems that fail to satisfy these conditions are called ill-posed. Problems that satisfy conditions (i) and (ii) but not (iii) are called ill-conditioned, that is, problems which are unstable are called ill-conditioned [,2,20]. It should be pointed out that the definition of an ill-posed problem is related to the pair of metric spaces (F, U). The same problem may be well posed in other metrics. As a simple example let us consider the following two systems of equations: 6 x y + 6 z =, x + y + 2 z = 4, x y + z =.999 x =, y =, z =, and 6 x y + 6 z =, x + y + 2 z = 4, x y + z =.998 x = 0, y = 0, z = 8. The only difference between these systems is the value of the third component in the right-hand side vectors. If we take the right-hand side vectors as inputs u,u 2 U = R, the inverse of the coefficient matrix as R and the solutions as r,r 2 F = R, then according to the Euclidean metric ( 4 d U (u,u 2 ) = ( ) ) 2 + ( ) 2 = and d F (r,r 2 ) = ( 0) 2 + ( 0) 2 + ( ( 8)) 2 = 9. Small changes in inputs result large changes in outputs. In order to see that the problem is well posed, we need to check two things. First one is the existence and uniqueness and the second one is the stability of the solution.

3 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) For a real linear equation system it is easy to check the existence and uniqueness of the solution. But it is more difficult to check the stability. Let us focus on the latter problem. For a nonsingular matrix A R n n and b R n the solution of the system Ax = b exists and is unique. Suppose that A and δb are, respectively, the perturbations of A and b in the linear system Ax = b. IfA is replaced by a nonsingular matrix A + A and b is replaced by b + δb, then the error (change) in the solution can be bounded as x A ( b + A x ) A A for A A < (see e.g. [2, p. 404] and [8, 0]). When there is no change in A,thatis,when A =0, then x A b. The relationship between the relative error x / x and the relative change on the right-hand side vector b / b is x x = k(a) b b, where k(a) is the condition number of A and is defined by [ ( / )] x b k(a) = sup sup = A A max λ A b b x b min λ A, where λ A is the absolute value of an eigenvalue λ A of A. When the norm is the Euclidean norm and A is symmetric, then k(a) = max λ A min λ A. If the condition number k(a) of the coefficient matrix in the linear equation system Ax = b is large, then the problem with input b and output x = A b is unstable, that is, the problem (inverse problem) is ill-conditioned. The matrix A itself is called ill-conditioned. 2. Some solution methods for ill-posed inverse problems In general, an inverse problem contains a known operator A, a function (a vector) z which is a characterization of the phenomenon that we are going to model and input u obtained from the measurements, such that Az = u. The purpose of the inverse problem is to solve for z. Since the input comes from measurements, we have some ũ for u. Ifũ does not belong to the range of A, then there is a problem of existence of a solution. Even if ũ is in the range of A and the solution is unique it may not be stable, that is, the inverse problem may be ill-conditioned. So, measurement errors will cause some problems. Also, when the inverse problem is ill-conditioned and ũ

4 298 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) is an observation of a random function (or vector), then the solution (for example, the estimate) will be unreliable. Now we will remind some solution methods for ill-posed inverse problems from the book [20] by Tikhonov and Arsenin. It will be supposed that the operator A has a closed range. 2.. Selection method For z in F with a metric d F, u in U with a metric d U and an operator A : F U let us consider the inverse problem Az =ũ. Then the selection method for ill-posed inverse problems consists of a selection of a subset V (V F) and search for a solution z 0 (z 0 V)such that inf z V d U (Az, ũ) = d U (Az 0, ũ). When F,U are linear spaces, A : F U is a linear operator and U is a Hilbert space, for a subspace V (V F), Az 0 = P A(V ) u, () where P A(V ) is the orthogonal projection onto A(V ). In this method, the question becomes the selection of the subspace V in order to overcome the ill-posedness, that is, the existence of a solution and (or) its unstability Replacement method In this method, the operator A in the equation Az =ũ is replaced by a regular operator A + δi (δ R, δ>0andiis the identity operator) and z δ = (A + δi) ũ (2) is taken to be a solution. The question is how to choose the regularization parameter δ. 2.. Regularization method In this method, in order to stabilize the solution a continuous functional Ψ : F F R + {0} is defined and the functional M Ψ,α (z, ũ) = du 2 (Az, ũ) + αψ(z) is constructed, for a chosen value α (α >0). The functional M Ψ,α is called smoothing functional for the inverse problem Az =ũ. The element z α in F which minimizes the smoothing functional is called regularized solution. inf z F M Ψ,α (z, ũ) = M Ψ,α (z α, ũ).

5 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) The regularized solution z α is a function of ũ and α and it also depends on Ψ. So, the question is how to choose Ψ and α. When F,U are Hilbert spaces, A : F U is a linear operator and Ψ is a functional defined by Ψ : F R + {0}, z Ψ(z) = z 2, then the smoothing functional is M α (z, ũ) = Az ũ 2 + α z 2 = (Az ũ) (Az ũ) + αz z, () where denotes the conjugate transpose. In order to minimize M α (z, ũ), the derivative with respect to z is set to zero and the following equation is obtained: ( A A + αi ) z = A ũ. The solution of this equation is the regularized solution z α = ( A A + αi ) c A ũ n = λ n + α ϕ n, (4) n= where λ s and ϕ s are eigenvalues and eigenvectors of A A, respectively, and the coefficients c n are such that A ũ = n= c n ϕ n Iterative method Assume that the inverse problem Az =ũ has a unique solution. An iterative method is given by z n+ = z n ha (Az n ũ), n = 0,, 2,..., (5) where z 0 (z 0 F) is an initial value and h is a number such that 0 <h A A < 2 [4]. In this method, h can be chosen differently for every iteration or h can be replaced by an operator H. By applying the replacement method to preceding equation system the following solutions have been obtained: Eureka the solver, version.0: 6 x y + 6 z =, x + y + 2 z = 4/, x y + z =.9998 x = , y = , z = ;

6 00 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4, x y + ( + 0.)z =.9998 x = , y = , z = ; Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4/, x y + ( )z =.9998 x = , y = , z =.8702; Eureka the solver, version.0: ( ) x y + 6 z =, ( ) x y + 2 z = 4, x y + ( )z =.9998 x = , y = , z = In the above computer output, it is seen that the solutions are dependent on the values of the regularization parameter δ and they are getting better for small values of δ.. The problem of multicollinearity In statistics, our inputs (data) are observed values. Outputs are values of some statistics. When the outputs are sensitive (unstable) to small changes in data we can consider this sensitivity as an ill-conditioned problem. The variance of the statistic is a natural measure of this sensitivity. In statistical inference, the statistics (estimators) are obtained according to some optimization criteria or principles. The best statistics under some criteria may be sensitive (unstable) to data even though the criteria is the minimization of the variance. For example, in linear models, the Gauss Markov estimator of any estimable

7 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) linear combination of the parameter vector has minimum variance among all linear unbiased estimators, but because of a bad design (multicollinearity) the variance may be large and the estimated value may be far distant from the true value. In this section, we will give a short summary of the problem of ill-conditioning in linear models, that is, the problem of multicollinearity. We especially will emphasize that the origins of methods to handle the multicollinearity are in the solution techniques of ill-posed problems. The multicollinearity is defined as the existence of nearly linear dependency among column vectors of the design matrix X =[X,X 2,...,X p ] in the linear model Y = Xβ + ε. The existence of multicollinearity may result in wide confidence intervals for individual parameters (unstable estimates), may give estimates with wrong signs and may affect our decision in a hypothesis testing. Severe multicollinearity may make the estimates so unstable that they are practically useless. Multicollinearity is synonymous with ill-conditioning. So, the problem of multicollinearity is another version of the problem of ill-conditioning in the solution of the following normal equations: X X ˆβ = X Y. Therefore, in this inverse problem the condition number k(x X) is a measure of existence of multicollinearity. Even though the condition number gives some information about the existence of multicollinearity, it does not explain the structure of the linear dependency among the column vectors X,X 2,...,X p. The best way of explaining the existence and structure of multicollinearity is to look at the eigenvalues and eigenvectors of the matrix X X [9,,,8]. Let the column vectors X,X 2,...,X p be standardized such that X X is in the form of a correlation matrix. Let the spectral decomposition of X X be λ X 0 λ p X = V... V = VDV = λ i v i v i, i= λ p where λ λ 2 λ p > 0. If the condition number k(x X) = λ /λ p is large, then the multicollinearity exists and its structure is explained by eigenvector v p as [X,X 2,...,X p ]v p 0. In the last one to two decades, there are a number of papers on the cause of multicollinearity. It is found that some observations may be influential on the value of condition number. The omission of such observations may create or remove the problem of multicollinearity [,5 7,2]. Hundreds of studies have been done on the problem of multicollinearity. Even though many solution methods have been proposed, the problem has not been solved completely. Among the methods, the most popular one is the ridge method proposed by Hoerl and Kennard [2]. The ordinary ridge estimator is

8 02 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) ˆβ k = (X X + kl) X Y (6) obtained by adding a small positive number k to the diagonal elements of the normal equations. Notice that this estimator has the same expression as the solution in the replacement method (2) and regularization method (4). Ridge method has been a popular estimation method for over 25 years. It can yield more reliable point estimates of the cofficients than the ordinary least-squares estimates by reducing the standard errors of the estimates at the expense of introducing some bias. However, there are some problems with the choice of the biasing parameter k. The addition of a small number to the diagonal elements of a matrix was also familiar to Marquardt [4], who used it in his algorithms for nonlinear optimization. Marquardt defined a class of generalized inverse estimators and discussed their properties in the frame of biased estimators including the ordinary ridge estimator [5]. Let us now consider the normal equations X X ˆβ = X Y with ill-conditioned matrix X X and let us try to solve the normal equations by the selection method. For a subspace K(K R p ) we look for a solution ˆβ 0 K such that min ˆβ K d E ( X X ˆβ,X Y ) = d E ( X X ˆβ 0,X Y ), where d E is the Euclidean metric on R p. From () X X ˆβ 0 = P X X(K) (X Y ). (7) When K = span{v,v 2,...,v m }, that is, K is spanned by eigenvectors corresponding to m largest eigenvalues λ,λ 2,...λ m with m<p,thenx X(K) = K and P X X(K) = P K = W(WW ) W = WW, where W =[v,v 2,...,v m ]. Moreover, from (7) ˆβ 0 = ( X X ) WW X Y = VD V WW X Y 0 0 λ λ =VD W X Y = V 0 0 V X Y λ p This estimator ˆβ 0 obtained by the selection method is the principle component estimator of β. In the selection method, the trouble was selecting the subspace for the solutions. Here the trouble is in choosing the integer m, the number of principal

9 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) components. There are numerous suggestions in the literature concerning appropriate criteria for retention or deletion of components. Estimators based on the iterative method (5) and their statistical properties are also investigated. An estimation procedure based on the iterative method can be constructed as follows. Let ˆβ (0) be a fixed point in the parameter space and ˆβ (n) = (I hx X) ˆβ (n ) + hx Y, n =, 2,..., where h (0, /λ ). For ˆβ (0) =0and suitable chosen values h, n the estimator ˆβ (n) can have smaller mean-squared error than the least-squares estimator [7]. An estimator similar to β (n) is proposed by Trenkler [2]. Trenkler s motivation is based on a serial expansion for the Moore Penrose generalized inverse. The estimators considered above are better than the ordinary least-squares estimator in the case of multicollinearity according to the mean-squared error criteria. Nowadays the problem of multicollinearity is also a subject of research in generalized and multivariate linear models. The ridge method, which corresponds to operator replacement method for illconditioned problems, is used extensively in linear models as a solution method to overcome multicollinearity. The ridge method is also used in statistical analysis techniques which involves matrix inverses. For example, the Mahalonobis distance D(x µ) = (x µ) Σ (x µ) involves a matrix inversion, where x is an observed vector, µ and Σ are the mean vector and covariance matrix of the distribution (population). Substituting Σ + ki for Σ, D k (x µ) = (x µ) (Σ + ki) (x µ) is called the ridge type Mahalonobis distance. In discriminant analysis applications the classification rule based on the Mahalonobis distance involves sample covariance matrix ˆΣ. By substituting ˆΣ + ki for ˆΣ a ridge type classification rule is obtained [9]. As another example let us consider the canonical correlation analysis: [ ] ([ ] [ ]) X p X N µ Σ Σ, 2, 2 k µ 2 Σ 2 Σ 22 V = p α i X i, V 2 = i= k β j X 2j. j= Canonical correlation analysis problem is to find the coefficients α s and β s such that the correlation coefficient between the random variables V and V 2 is maximized. Hottelling proved that the problem becomes the eigenvalue and eigenvector analysis of Σ 22 Σ 2Σ Σ 2. In applications, the ridge type canonical correlation analysis [22] is based on ( ˆΣ 22 + k 2 I) ˆΣ 2 ( ˆΣ + k I) ˆΣ 2.

10 04 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) In computations which frequently involve matrix inversion, as in control theory and nonlinear models, the ridge method can be used as a powerful tool against ill-conditioning. 4. Conclusion A statistical analysis can be affected by ill-conditioning in three ways (stages). First, measurement errors can create some problems in the presence of ill-conditioning, since the outputs will be sensitive (unstable) to small changes in the inputs. At this stage the problem is the sensitivity of measurement devices and it looks more technological rather than statistical. Although statisticians ignore the measurment errors, sometimes they consider models with errors in variables. When errors are included in some variables, the statistical inference procedures will require some information about the distributional properties of these errors. Second, inference procedures based on the optimization of some statistical criteria can give misleading results in the presence of ill-conditioning, caused by bad design or sampling. Because of randomness there always will exist a natural variability in the observations. Therefore, it is necessary to check the existence of ill-conditioning before performing the statistical analysis. That is, in some way, we have to check whether the values of statistics (outputs) concerned in an inference procedure are sensitive to small changes in the data (inputs). Third, rounding errors (small changes in the inputs) can create problems in statistical computations when ill-conditioning exists. This computational aspect of the ill-conditioning is a purely numerical problem and is the most difficult one in numerical analysis. Perhaps, because of this, Why has no one noticed asked McCullough and Vinod [6]. They also added It is understandable that economists have paid little attention to whether or not econometric software is accurate. Until recently, econometrics texts rarely discussed computational aspects of solving econometric problems...many textbooks convey the impression all one has to do is use a computer to solve the problem, the implicit and unwaranted assumptions being that the computer s solution is accurate and that one software package is as good as any other...pencil-and-paper arithmetic is not like computer arithmetic. To overcome the problems mentioned we can use some robust numerical algorithms and statistical procedures, make operations like in variable selection or outlier detection techniques or apply some cure methods like in ridge regression. Finally, let us remind that the problem associated with ill-conditioning is a problem of severity rather than existency. The method of computation and the method of statistical inference greatly affect the results. In the last 40 years, some known numerical methods were adopted to obtain some statistical methods for inference in the presence of ill-conditioning. Some of the pioneers are Marquardt [5], Hoerl and Kennard [2], etc. Conversely, some results obtained by statisticians, for example, the research on leverage and influential observations, are helpful in creating

11 F. Öztürk, F. Akdeniz / Linear Algebra and its Applications 2 (2000) numerical algorithms and performing numerical computations when ill-conditioning exists. In the frame of software reliability statistical methods play a great role in the development studies of numerical methods which are robust against ill-conditioning. Acknowledgment The authors would like to thank the associate editor and the referees for their helpful comments on this paper. References [] K.E. Atkinson, An Introduction to Numerical Analysis, Wiley, New York, 978. [2] N.S. Bakhvalov, Numerical Methods, Mir Publishers, Moscow, 977. [] D.A. Belsley, E, Kuh, R.E. Welsch, Regression Diagnostics, Wiley, New York, 980. [4] M, Brill, E. Schock, Iterative solution of ill-posed problems a survey, in: Proceedings of the Fourth International Mathematical Geophysics Seminar, Berlin, 987. [5] S. Chatterjee, A.S. Hadi, Influential observation high leverage points and outliers in linear regression, Statist. Sci. (986) [6] S. Chatterjee, A.S. Hadi, Sensitivity Analysis in Linear Regression, Wiley, New York, 988. [7] R.D Cook, S. Weisberg, Resudials and Influence in Regression, Chapman & Hall, New York, 982. [8] B.N. Datta, Numerical Linear Algebra and Applications, Brooks/Cole Publishing Company, 994. [9] J, Fox, G. Monette, Generalized collinearity diagnostics, J. Amer. Statist. Assoc. 87 (992) [0] G.H. Golub, Ch.F. Van, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 99. [] R.F. Gunst, Regression analysis with multicollinear predictor variables: definition detection and effects, Comm. Statist. Theory Methods 2 (98) [2] A.E Hoerl, R.W. Kennard, Ridge regression: biased estimation of nonorthogonal problems, Technometrics 2 (970) [] E.R Mansfield, B.P. Helms, Detecting multicollinearity, Amer. Statist. 6 (982) [4] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, J. Soc. Ind. App. Math. (96) [5] D.W. Marquardt, Generalized inverses ridge regression biased linear estimation and nonlinear estimation, Technometrics 2 (970) [6] B.D. McCullough, H.D. Vinod, The numerical reliability of econometric software, J. Econom. Literature XXXVII (999) [7] F. Öztürk, A discrete shrinking method as alternative to least-squares, Communications de la Faculté des Sciences de l Université d Ankara (984) [8] S.D. Silvey, Multicollinearity and imprecise estimation, J. Roy. Statist. Soc. B 5 (969) [9] R.K. Smidt, L.L. McDonald, Ridge discriminant analysis, Research Paper No. 08, Stat. Lab S , University of Wyoming, 976. [20] A.N. Tikhonov, V.Y. Arsenin, Solutions of Ill-posed Problems, Wiley, New York, 977. [2] G. Trenkler, An iteration estimator for the linear model, Compstat, Physica Werlag, Würtzburg, 978, 25. [22] H.D. Vinod, A. Ullah, Recent Advances in Regression Methods, Marcel Dekker, New York, 98. [2] E. Walker, Detection of collinearity-influential observatoins, Comm. Statist. Theory Methods 8 (989)

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

Efficient Choice of Biasing Constant. for Ridge Regression

Efficient Choice of Biasing Constant. for Ridge Regression Int. J. Contemp. Math. Sciences, Vol. 3, 008, no., 57-536 Efficient Choice of Biasing Constant for Ridge Regression Sona Mardikyan* and Eyüp Çetin Department of Management Information Systems, School of

More information

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model 204 Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model S. A. Ibrahim 1 ; W. B. Yahya 2 1 Department of Physical Sciences, Al-Hikmah University, Ilorin, Nigeria. e-mail:

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

Available online at   (Elixir International Journal) Statistics. Elixir Statistics 49 (2012) 10108 Available online at www.elixirpublishers.com (Elixir International Journal) Statistics Elixir Statistics 49 (2012) 10108-10112 The detention and correction of multicollinearity effects in a multiple

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Package mctest. February 26, 2018

Package mctest. February 26, 2018 Type Package Title Multicollinearity Diagnostic Measures Version 1.2 Date 2018-02-24 Package mctest February 26, 2018 Author Dr. Muhammad Imdad Ullah and Dr. Muhammad Aslam Maintainer Dr. Muhammad Imdad

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Effect of outliers on the variable selection by the regularized regression

Effect of outliers on the variable selection by the regularized regression Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation Zheng-jian Bai Abstract In this paper, we first consider the inverse

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors Available online at www.worldscientificnews.com WSN 59 (016) 1-3 EISSN 39-19 elationship between ridge regression estimator and sample size when multicollinearity present among regressors ABSTACT M. C.

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Generalized Maximum Entropy Estimators: Applications to the Portland Cement Dataset

Generalized Maximum Entropy Estimators: Applications to the Portland Cement Dataset The Open Statistics & Probability Journal 2011 3 13-20 13 Open Access Generalized Maximum Entropy Estimators: Applications to the Portland Cement Dataset Fikri Akdeniz a* Altan Çabuk b and Hüseyin Güler

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science

Kutlwano K.K.M. Ramaboa. Thesis presented for the Degree of DOCTOR OF PHILOSOPHY. in the Department of Statistical Sciences Faculty of Science Contributions to Linear Regression Diagnostics using the Singular Value Decomposition: Measures to Identify Outlying Observations, Influential Observations and Collinearity in Multivariate Data Kutlwano

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

A distance measure between cointegration spaces

A distance measure between cointegration spaces Economics Letters 70 (001) 1 7 www.elsevier.com/ locate/ econbase A distance measure between cointegration spaces Rolf Larsson, Mattias Villani* Department of Statistics, Stockholm University S-106 1 Stockholm,

More information

A Note on the Pin-Pointing Solution of Ill-Conditioned Linear System of Equations

A Note on the Pin-Pointing Solution of Ill-Conditioned Linear System of Equations A Note on the Pin-Pointing Solution of Ill-Conditioned Linear System of Equations Davod Khojasteh Salkuyeh 1 and Mohsen Hasani 2 1,2 Department of Mathematics, University of Mohaghegh Ardabili, P. O. Box.

More information

((n r) 1) (r 1) ε 1 ε 2. X Z β+

((n r) 1) (r 1) ε 1 ε 2. X Z β+ Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Introduction. Chapter One

Introduction. Chapter One Chapter One Introduction The aim of this book is to describe and explain the beautiful mathematical relationships between matrices, moments, orthogonal polynomials, quadrature rules and the Lanczos and

More information

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were

More information

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix BIT 39(1), pp. 143 151, 1999 ILL-CONDITIONEDNESS NEEDS NOT BE COMPONENTWISE NEAR TO ILL-POSEDNESS FOR LEAST SQUARES PROBLEMS SIEGFRIED M. RUMP Abstract. The condition number of a problem measures the sensitivity

More information

The Effect of a Single Point on Correlation and Slope

The Effect of a Single Point on Correlation and Slope Rochester Institute of Technology RIT Scholar Works Articles 1990 The Effect of a Single Point on Correlation and Slope David L. Farnsworth Rochester Institute of Technology This work is licensed under

More information

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH

MULTICOLLINEARITY DIAGNOSTIC MEASURES BASED ON MINIMUM COVARIANCE DETERMINATION APPROACH Professor Habshah MIDI, PhD Department of Mathematics, Faculty of Science / Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research University Putra, Malaysia

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical

More information

On the simplest expression of the perturbed Moore Penrose metric generalized inverse

On the simplest expression of the perturbed Moore Penrose metric generalized inverse Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

nonlinear simultaneous equations of type (1)

nonlinear simultaneous equations of type (1) Module 5 : Solving Nonlinear Algebraic Equations Section 1 : Introduction 1 Introduction Consider set of nonlinear simultaneous equations of type -------(1) -------(2) where and represents a function vector.

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Hands-on Matrix Algebra Using R

Hands-on Matrix Algebra Using R Preface vii 1. R Preliminaries 1 1.1 Matrix Defined, Deeper Understanding Using Software.. 1 1.2 Introduction, Why R?.................... 2 1.3 Obtaining R.......................... 4 1.4 Reference Manuals

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Two Stage Robust Ridge Method in a Linear Regression Model

Two Stage Robust Ridge Method in a Linear Regression Model Journal of Modern Applied Statistical Methods Volume 14 Issue Article 8 11-1-015 Two Stage Robust Ridge Method in a Linear Regression Model Adewale Folaranmi Lukman Ladoke Akintola University of Technology,

More information

Section 2 NABE ASTEF 65

Section 2 NABE ASTEF 65 Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

THE MATRIX EIGENVALUE PROBLEM

THE MATRIX EIGENVALUE PROBLEM THE MATRIX EIGENVALUE PROBLEM Find scalars λ and vectors x 0forwhich Ax = λx The form of the matrix affects the way in which we solve this problem, and we also have variety as to what is to be found. A

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators

Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators STATISTICS RESEARCH ARTICLE Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators Diarmuid O Driscoll 1 and Donald E. Ramirez 2 * Received: 21 October 2015 Accepted:

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION

BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION K Y BERNETIKA VOLUM E 46 ( 2010), NUMBER 4, P AGES 655 664 BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION Guang-Da Hu and Qiao Zhu This paper is concerned with bounds of eigenvalues of a complex

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information