The successive raising estimator and its relation with the ridge estimator

Size: px
Start display at page:

Download "The successive raising estimator and its relation with the ridge estimator"

Transcription

1 The successive raising estimator and its relation with the ridge estimator J. García and D.E. Ramirez July 27, 206 Abstract The raise estimators are used to reduce collinearity in linear regression models by raising a column in the experimental data matrix which may be nearly linear with the other columns. The raising procedure has two components, namely stretching and rotating which we can analyze separately. We give the relationship between the raise estimators and the classical ridge estimators. Using a case study, we show how to determine the perturbation parameter for the raise estimators by controlling the amount of precision to be retained in the original data. J. García Department of Economics and Business, Almeria University, Almeria, Spain jgarcia@ual.es D. Ramirez Department of Mathematics, University of Virginia, Charlottesville, VA USA der@virginia.edu Keywords: Raise estimators, ridge estimators, collinearity, Stein estimators MSC: 62J05, 62J07 Introduction A technique to mitigate the existence of collinearity is constrained estimators such as ridge estimators and Stein estimators. In both cases, the goal is to diminish the Mean Square Error MSE and to mitigate the problems inherent with collinearity. These estimators improve the effect of collinearity by constraining the Mahalanobis distance of the estimator β under the constraint β Qβ = δ 2 with Q a positive definite matrix and δ > 0. From the model y = Xβ + u with uncorrelated, zero-mean, and homoscedastic errors, we know that the OLS estimators β are given by β = X X X y.

2 A direct application of Lagrange multipliers gives the constrained estimator β cs with β csq β cs = δ 2 as β cs = X X+kQ X y, 2 where k is the Lagrange parameter chosen to satisfy the constraint β csq β cs = δ 2. From the Loewner partial ordering X X L X X+kQ, the inverses will have the opposite partial order X X+kQ L X X. When Q and X X commute, for example with Q = ki p or Q = kx X, the two cases considered below, the pair X X, X X+kQ can be simultaneously diagonalized into the pair I p, Λ with Λ the diagonal matrix containing the characteristic roots of X X+kQ. From the Loewner partial ordering X X L X X+kQ, the Loewner partial ordering I p L Λ follows, and thus I 2 p L Λ 2 so X X 2 L X X+kQ 2, and it follows that β cs 2 β 2 with the constrained estimators shown to have decreased in length. We note that in general, A L B does not imply that A 2 L B 2, Puntanen et al. 20, p. 36. From this idea, we consider two types of constrained estimators:. Stein estimators: From Q = X X the expression 2 becomes: ˆβ s = X X + kx X X y = + k ˆβ. 3 For a detailed study of the Stein estimators we recommend the original reference of Stein 960. These estimators were also studied by Mayer and Willke 973. These estimators, which satisfy expression 3, define a whole class of biased estimators, the so called shrinkage estimators. They are obtained by shrinking the least squares estimator towards the origin, and they satisfy the M SE Admissibility Condition which assures an improvement decrease in MSE for some k 0,. 2. Ridge estimators: From Q = I p, the expression 2 becomes: ˆβ r = X X + ki p X y, 4 where I p is the matrix identity. Hoerl and Kennard 970 established that the ridge estimators also satisfy the M SE Admissibility Condition assuring an improvement in MSE for some k 0,. The earliest detailed expositions of these estimators are found in Marquardt 963 and Hoerl and Kennard 970 with Marquardt 963 acknowledging that Levenberg 944 had observed that a perturbation of the diagonal improved convergence in steepest descent algorithms. The history of the early use of matrix diagonal increments in statistical problems is given in the article by Piegorsh and Casella 989. Numerous papers have been written justifying the need to find values 0 < k i i p to perturb each of the characteristic roots of the matrix X X to 2

3 improve the condition number of the inverse X X. For a positive definite matrix A, we define the condition number κ as the ratio of the largest to smallest of the eigenvalues of A. From this, the generalized ridge estimator is established as β r = X X + K X y, where K is the diagonal matrix diagk,..., k p containing these perturbations. This regression was briefly commented on by Hoerl and Kennard 970 and Goldstein and Smith 974. As the ridge estimator is not equivariant under scaling, the common convention is to scale X X to correlation form with the explanatory variables centered and scaled to unit length and allowing for a constant term in the linear model. With X X in correlation form, the estimated variances varβ i of the estimates are equal, and the perturbation matrix is usually taken to be K =ki p with k > 0. The attempts to find the basis of the ridge estimator have been numerous; for example, by using the constrained regression imposing the constraint β β δ 2, Davidov 2006 used the Kuhn-Tucker Theorem to show that the conditioned least squared solution agrees with the ridge solution with constraint β β = δ 2. Some anomalies of the ridge methodology can be found in Jensen and Ramirez 2008, Kapat and Goel 200, and Jensen and Ramirez 200a. Using the notation from García et al. 205 define the augmented matrix for X and the augmented vector for y by X A = [X, ki p ] and y A = [y, 0 p], respectively. The least squared solutions for the constrained model X Xβ = X y subject to β β = δ 2 and the unconstrained model X A X Aβ = X A y A both have solution β = X X+kI p X y. Augmenting the matrix and the response vector is similar in spirit to imposing identifiability constraints on the parameters as in Seber 977. Hoerl and Kennard 970, Eq. 2.3 suggested the relationship between the ridge estimator and the OLS estimator as β r = [I p + kx X ] β. However, note that since β r is constrained to lie on the sphere β β = δ 2, β r has a joint singular distribution in R n with rank n while β has a joint non-singular distribution in R n with rank n. These and other anomalies with ridge regression are discussed in Jensen and Ramirez 200a. To avoid problems of singularity in the distribution of β r, one can consider a perturbation of the data X X S with X S X S = X X+kI p as in Jensen and Ramirez They call X S the surrogate matrix, and it is defined using a perturbation of the singular values of X, and yields the surrogate estimator β S from X S X Sβ = X Sy. Similarly, the raised estimators, discussed below, also improve the ill-conditioning X X with X used on both sides of the normal equation X Xβ = X y, an importance difference with ridge estimators. Some authors bypass the constrained optimization foundation of ridge estimators and adopt the view, from numerical analysis given in Riley 955, that perturbations of the diagonal improves the stability for matrix inverses. For the augmented model y A = X A β+u the error terms cannot be centered. This could be understood as a critique to ridge estimator obtained from the 3

4 augmented model. Minimizing the sum of the squared errors e i = y i ŷ i for the augmented rows p i= e2 i = k β β replaces the constraint β β = δ 2. Case studies in García et al. 205 show that the variance inflation factors V IF i i p computed for the surrogate model y = X S β + u and for the augmented model y A = X A β + u A are asymptotically equal n. To conclude, by using the words of Marshall and Olkin 979 it seems that the ridge estimator can find a justification in the result provided by Riley 955 where it is shown that the matrix X X + ki p is better conditioned than the matrix X X. In this paper, we present some properties of the raised estimators originally presented by García et al. 20 and its varieties: the stretched and the rotated estimators. We focus on finding an explication about the basis of the ridge and Stein estimators. Raised regression is similar to ridge estimator. One difference is that the ridge procedure seeks stability of X X on the left-hand side of the normal equations X X β= X y, transforming the normal equations into the ridge equations X X+kI p β= X y. Our procedure is in the spirit of the surrogate estimators of Jensen and Ramirez 200b where X is modified X X and the modified matrix is used on both sides of the normal equations. Thus the raised procedure allows the user to visualize the perturbed design matrix X, a feature not available with ridge regression. The paper is organized as follows: Section 2 presents the generalized raised estimator for two variables. Section 3 presents the Stretched and the Rotated estimators as varieties of the raised estimator. Section 4 studies the raised estimators in the case of two variables and shows the way to obtain X X+kI p by using the successive raising procedure. Section 5 studies the variance inflation factors and the metric number. Section 6 gives the relation between the raise and ridge estimators. Our case study is presented in Section 7. Finally, Section 8 contains the conclusions. Some technical results are shown in the two appendices. 2 The generalized raise method for two variables We consider the linear model y = Xβ + u where Eu = 0, Euu = σ 2 I n and X is a full rank matrix n p. For convenience, we have taken σ 2 =. We assume that the variables are centered and standardized, that is, X X is in correlation form. For the n k matrix A = [a, a 2,..., a p ], the columns span is denoted by SpA, with A j denoting the j th column vector a j, and A [j] denoting the n p matrix formed by deleting A j from A. For the linear model y = Xβ + u, central to a study of collinearity is the relationship between X j and SpX [j]. Indeed, there is a monotone relationship between the collinearity indices κ j of Stewart 987, the variance inflation factors V IF j, and the angle between X j, SpX [j], see Jensen and Ramirez 203, Theorem 4. 4

5 Let P [j] = X [j] X [j] X [j] X [j] be the projection operator onto the subspace SpX [j] R n spanned by the columns of the reduced or relaxed matrix X [j]. From the geometry of the right triangle formed by X j, P [j] X j, the squared lengths satisfy X j 2 = P [j] X j 2 + X j P [j] X j 2, where X j P [j] X j 2 = RSS j is the residual sum of squares from the regression X j = X [j] α + v with α = P [j] X j. The angle θ j between X j and SpX [j] is given by the angle between X j and P [j] X j and satisfies X j cosθ j = P X [j] j X j P [j] X j = P X [j] j. X j We show the details leading to the raise method with p = 2. In the case of two variables, X = [x, x 2 ], the model is given by y = β x + β 2 x 2 +u where the matrix X X is defined as X x2i x X = i = x2i x i ρ with ρ the correlation coeffi cient between the exogenous variables. ρ On the other hand, the matrix X y is given by X y = x i y i, x 2i y i. The collinearity problem occurs when the vectors x and x 2 are very close; that is, when the angle between the vectors is very small. To correct this problem, we will try to separate both vectors. The raise method begins with the regression between x and x 2 in the following way. Pick one of the columns of X, say X, and regress this column using the remaining columns: X = X [] α + v x = αx 2 + v. The matrix X 2 is X 2 = x 2, x 22,..., x 2n, with X 2 X 2 = x 2 2i = and X [] X = X 2 X = x 2i x i = ρ. Thus, ˆα = X [] X [] X [] X = [ ] [ ] X 2 X 2 X 2 X = ρ. 5 Summarizing, with x 2 = = x 2 2, we can write that x = ρx 2 + e with the estimated errors e orthogonal to SpX [] ; and, in particular, e x 2 with e = x ρx 2 with x 2 = P [] x 2 + e 2 x 2 = ρx e 2. As x 2 = x 2 2 =, the raise method transforms x x by the rule x = x + λe = x + λx ρx 2 = + λx λρx 2 5

6 with x 2 = + λ 2 x λλρx x 2 + λρ 2 x 2 2 = + λ 2 ρ 2 λ 2 + 2λ. 6 Equivalently with x 2 = +2λ ρ 2 +λ 2 ρ 2 with inner product x x 2 = x + λe x 2 = ρ + λe x 2 = ρ. Denote X = [ x, x 2]. With λ > 0, x 2 > x 2, and we say x has been raised from an angle θ to an angle θ with cosθ = P [] X / x and cos θ = P [] X / x. The Variance Inflation Factor V IF for X is functionally related to the angle θ by the rule θ = arccos /V IF, for example Jensen and Ramirez 203. Thus as λ, θ 90 and the variance inflation factor V IF converges to one indicating that collinearity is being eliminated, as in García et al. 20, Theorem 4.2. If we replace x by x in the original linear regression from Eq., the raised model will be y = β λ x +β 2 λx 2 +u where the estimated parameters depend on λ and will be denoted as β λ and β 2 λ. Summarizing, in the case of two variables, X = [x, x 2 ], we regressed the first column using the remaining second column from X = X [] α + v with x = ρx 2 + e and x = x + λe. With the first column raised, the raised transformation X X <> has the raised design matrix given by X <> = x, x 2 = + λ x λ ρx 2, x 2, 7 using <,..., k > to indicate the columns that have been raised, notation that will be helpful in the general case. We note that X + 2λ ρ <>X <> = 2 + λ 2 ρ 2 ρ, 8 ρ with only the, element of X X being effected by raising the first column of X. We consider additional columns to be raised in Section 4. With ridge regression, the coeffi cient of determination R 2 k monotonically decreases as k, MacDonald 200. We give the parallel result for surrogate regression for RS 2 k in Appendix A. The importance of these results is that it allows the user to set a lower bound for changes to the original model, such as R 2 k 0.95 or RS 2 k One very desirable property of the raised regression method is that the coeffi cient of determination does not change with R 2 λ = R 2 0 for all λ 0,. Additionally, the predicted values with the OLS regression X β and the predicted values with the raised regression X βλ are the same: X β = X βλ. 9 Thus raising a column vector in X is not effecting the basic OLS regression model. These results follow from noting that the raised vector remains in the original SpX, e j = X j P [j] X j SpX so Sp X = SpX, as shown in García et al

7 3 The stretched and rotated estimators The raised column x j = x j + λe j with λ 0 has two effects; namely, stretching the column and rotating the column. We study the effects separately with a case study. Set X 0 = [, x, x 2, x 3 ] the 20 4 matrix from Rawlings 998, Ex.. which has been designed to be highly collinear. The pattern has x as the sequence from 20 to 29 and repeated; x 2 is x 25 with the st and th term changed from 5 to 4; x 3 is the repeated sequence 5, 4, 3, 2,, 2, 3, 4, 5, 6. The eigenvalues for X 0X 0 are [2439, 52.4, 40., ] with condition number for X 0X 0 as λ max /λ min = Scaling X 0 X u to have unit length reduces the eigenvalues for X ux u to [2.898,.007, , ] with condition number for X ux u as Centering and rescaling X u X c into correlation form without the constant term reduces the eigenvalues for X cx c to [2.67, 0.830, ] with condition number for X cx c as The correlation between column of X c and column 2 of X c2 is ρx c, X c2 = The variance inflation factors V IF j for X c are the diagonal elements of X cx c and are [69.4, 75.7,.688] from which the angles in radians between the column vector X cj and the span of the remaining columns SpX c[j] can be computed by θ j = arccos /V IF j, for example Jensen and Ramirez 203. Converting to degree measure, the angles are [4.407, 4.327, ] supporting that one should first try to improve collinearity by modifying the second column. Following the case study in García et al. 20, the responses are generated for the linear model y = X u, with u i generated from a normal N0, distribution. Using this data set, the centered and standardize model is y y n = X c β c + u = X c u, with OLS estimates β = 4.48, 3.8, The raised column will be the second column of the centered and standardized X c with x c2 = x c2 + λe 2 with λ 0. The empirical risk for the OLS estimator given the true value β c is β β c β β c = β β c 2 = The goal is to improve, that is decrease, the empirical risk by using the raised estimators and its two components separately; namely, the stretched estimators and the rotated estimators. Following the Stein procedure, we consider the stretched estimators based on the transformation X c2 + λx c2 ; that is, the second column of X c will be stretched by the factor + λ with λ 0. Denote the stretched estimators by β st λ. They are given by β st λ = β, β 2 / + λ, β 3 with β st λ 2 monotonically decreasing to β, 0, β 3 2. For this case study, the 7

8 empirical risk for the stretched estimators β st λ β c 2 can be reduced from β st 0 β c 2 = to min λ 0 β st λ β c 2 = for + λ = 5.402, noting that = 3.8/ Using the Jackknife procedure, the minimum for the average empirical risk for the stretched estimators over the n = 20 replications is min λ 0 avg n=20 β st λ β c 2 = for +λ = As this is a scaling procedure, the angle between + λx c2 and SpX c[2] has not changed, and thus the V IF s have not changed. The rotated estimators are similar to the raised estimators with the added constraint that the columns length remains at unit length. This allows one to track the improvement in ill-conditioning without confounding with a change in scale. The rotated estimators are based on the transformation X c2 [X c2 + λe 2 ]/ X c2 + λe 2 with λ 0; that is, the second column of X c is raised, but then scaled back to unit length. For the same λ, both the rotated model and the raised model will have the same correlation matrix, the same angle between the second column of the design matrix and the span of the remaining columns, and the same variance inflation factors. Denote the rotated estimators by β rot λ. As the second column is perturbed by a multiple of e 2, and since e 2 is orthogonal to the other columns, both the rotated estimators β rot λ and the raised estimators βλ have the same values except in the row corresponding to the raised column. For this case study, the empirical risk for the rotated estimators β rot λ β c 2 can be reduced from β rot 0 β c 2 = to min λ 0 β rot λ β c 2 = for λ = For λ = 3.502, the V IF s are [9.394, 9.68,.2] with associated angle θ 2 = 8.8. Using the Jackknife procedure, the minimum for the average empirical risk for the rotated estimators over the n = 20 replications is min λ 0 avg n=20 β rot λ β c 2 = for λ = For this case study, the empirical risk for the raised estimators βλ β c 2 can be reduced from β0 β c 2 = to min λ 0 βλ β c 2 = for λ = For λ = 3.40, V IF s are [9.744, 9.98,.22] with associated angle θ 2 = Using the Jackknife procedure, the minimum for the average empirical risk for the raised estimators over the n = 20 replications is min λ 0 avg n=20 βλ β c 2 = for λ = These values are comparable to those using the rotated estimator. Summarizing, the raised estimators are based on stretching and rotating the column to be improved. Both effects should be positive as shown with the stretching estimators and the rotating estimators. Computationally, the raised estimators are easier to compute lacking the square root in the denominator of the transformed columns. For this case study, the rotated and raised methods yielded comparable results and both performed better than the stretching method. The Jackknife procedure showed that the average over the replications for the raised estimators was smaller better than the average over the replications for the stretched estimators. 8

9 4 The MSE risk for two variables We consider the linear model y = Xβ + u where Eu = 0, Euu = σ 2 I n and X is a full rank matrix n p. In this section p = 2. For convenience, we often take σ 2 =, however, for these calculations we need to track the value of σ 2. We assume that the variables are centered and standardized, that is, X X is in correlation form. With respect to the MSE risk, Risk θ = E θ θ E θ θ + trvar θ, we compare the three estimators: the raised βλ, the stretched β st λ, and the rotated β rot λ. For the raised model with column raised with X X <>, the design matrix and moment matrix are given in Eqs. 7 and 8. The calculations follow from standard results for OLS estimators and where we denote E β = δ. From Eq. 9 the predicted values using OLS estimators and raised estimators are the same with β x + β 2 x 2 = X β = X βλ = β λ x + β 2 λx 2 = β λ[ + λx λρx 2 ] + β 2 λx 2 = + λ β λx + [ β 2 λ λρ β λ]x 2 and thus β λ = + λ β β 2 λ = β 2 + λρ + λ β. A more general result is given in García et al. 20. It follows that E β λ = +λ δ, E β 2 λ = δ 2 + λρ +λ δ, with squared bias E βλ δ E βλ δ = λ2 +ρ 2 +λ δ 2. The variances are given by 2 var β λ σ 2 = var β 2 λ σ 2 = 2 + λ ρ 2 2 λρ ρ λρ ρ λ ρ λ ρ 2. The minimum value for Riskβλ can be shown to be For the stretched estimator β st λ, σ 2 λ = ρ 2 δ 2. β st, λ = + λ β β st,2 λ = β 2, with E β st, λ = +λ δ, E β st,2 λ = δ 2, with squared bias E β st λ 9

10 δ E β st λ δ = λ2 +λ 2 δ 2, and with variances var β st, λ σ 2 = var β st,2 λ σ 2 = with the minimum value for Risk β st λ when + λ 2 ρ 2 ρ 2, σ 2 λ st = ρ 2 δ 2. We note that this is the same λ value as with the raised estimator. For the rotated estimator β rot λ the calculations are more tedious. The design matrix is X <>rot = x +λe / x +λe, x 2 = +λx λρx 2 / x + λe, x 2 with moment matrix X <>rotx <>rot = ρ x +λe ρ x +λe recalling that e x 2. As the predicted responses are the same for both the OLS model and the rotated model, β x + β 2 x 2 = X β = X <>rot βrot λ = β rot, λ x / x +λe + β rot,2 λx 2 = β rot, λ[+λx λρx 2 / x +λe ]+ β rot,2 λx 2 = +λ β λρ x +λe rot, λx +[ β rot,2 λ β x +λe rot, λ]x 2. And thus β rot, λ = x + λe β + λ β rot,2 λ = β 2 + λρ + λ β, with E β rot, = x+λe +λ δ, E β rot,2 λ = δ 2 + λρ +λ δ, with squared bias 2 2 E β rot λ δ E β rot λ δ = x+λe +λ δ 2 + λρ +λ δ 2, and with variances var β rot, λ 2 x + λe σ 2 = + λ ρ 2 var β rot,2 λ σ 2 = ρ λρ + λ ρ ρ 2, 2 λρ + + λ ρ 2. The value of λ rot for the minimum of Risk β rot λ was found using Maple. First one solves for the positive root t 0 of the quartic polynomial p 4 t = a 0 t 4 + a t 3 + a 2 t 2 + a 3 t + a 4 with a 0 = ρ 2, a = 2 ρ 2, a 2 = ρ ρ 2 + σ δ 2 2, a 3 = 2ρ 2 ρ 2, a 4 = ρ 2 ρ 2. Appendix B establishes that 0

11 real roots exist for p 4 t. The value of λ rot is then given by 2 t 0 ρ 2 σ + 2t 0 δ λ rot = t 0 + ρ 2. As an example, with {ρ = 0.95, δ = 2, σ = }: for the raised estimator the Riskβ0 = 20.5 and decreases to Riskβ λ = with λ = for the stretched estimator, the risk decreases to Risk β st λ = 3.3 with λ st = and 3 for the rotated estimator p 4 t = t t t t with positive root t 0 =.69, λ rot = 3.479, and Risk β st λ rot = In summary, the raised and stretched estimators have their minimum risk at the same λ value; the raised and rotated estimators have comparable results; and the rotated estimators are more complex to compute. 5 Variance inflation factors and the metric number We consider the linear model y = Xβ + u where Eu = 0, Euu = σ 2 I n and X is a full rank matrix n p. We assume that the variables are centered. Reorder X = [X [p], X p ] with X p = x p the p th column and X [p], the design matrix X without the p th column, the resting columns. The problem in this section is to measure the effect of adding the last column X p to X [p]. An ideal column would be orthogonal to the previous columns with the entries in the off diagonal elements of the p th row and p th column of X X all zeros. Denote by M p the idealized moment matrix [ ] X M p = [p] X [p] 0 p 0 p x. px p The metric number associated to x p is defined by detx MNx p = X detm p. The metric number is easy to compute and has been used in García et al. 20 as a measure of collinearity. A similar measure of collinearity is mentioned in Footnote 2 in Wichers 975 and Theorem of Berk 977. The geometry for the metric number has been shown in García et al The case study in García et al. 20 suggests the functional relationship between MNx p and the variance inflation factor for β p as V IF β p = MNx p 2.

12 We note that this relationship holds, and so the metric number MNx j is also functionally equivalent to the collinearity indices κ j of Stewart 987, the variance inflation factors V IF j, and the angle between X j, SpX [j]. To evaluate V IF β p = V IF j, transform X X into correlation form R with S = diagx x,..., x px p the diagonal matrix with entries from the diagonal of X X with R = S /2 X XS /2. The V IF s are the diagonal entries of R = S /2 X X S /2. It remains to note that the inverse R can be computed using cofactors C i,j, and, in particular, V IF β p = R p,p = [S /2 X X S /2 ] p,p = x px p /2 detc p,p detx X x px p /2 = detm p detx X = MNx p 2. With the example in the previous section with {p = 2, ρ = 0.95}, V IF 2 is easy to compute from Eq. with: for the raised estimator V IF β 2 0 = 0.26 and decreases to V IF β 2 λ =.729 with λ = for the stretched estimator, V IF β st,2 λ st = 0.26 for any λ st 0, since there is no change in the angle between X 2, SpX [2] 3 for the rotated estimator V IF β rot,2 λ rot =.080 for λ rot = 3.479, noting that V IF for the rotated estimator is smaller than the raised estimator since the rotated estimator has the larger angle between X 2, SpX [2] as λ rot > λ. 6 How to overcome the ill-conditioned of the matrix X X by successive raises We will focus on finding a natural way to get the perturbation matrix X X+kI p from the raised estimators as this matrix is an essential component in ridge regression. We will show that the raised estimators allow an estimation by OLS which retains the coeffi cient of determination. Firstly, we will present the estimation for two variables and, secondly, to the general case. 6. For two variables In the case of two variables, X = [x, x 2 ] with the first column raised, X X <> with the raised design matrix given by X <> = x, x 2 = + λ x λ ρx 2, x 2 and X + 2λ ρ <>X <> = 2 + λ 2 ρ 2 ρ. ρ 2

13 After having raised column with λ > 0, we will raise column 2 with λ 2 > 0. First regress the second column of X <> using the remaining columns as X <>2 = αx <>[2] + v; that is, x 2 = α x + v. Scaling x to unit length gives x 2 = α x x / x + v with α x = ρ as in Eq. 5; and with x 2 = + λ 2 ρ 2 λ 2 + 2λ from Eq. 6. Thus x 2 = ρ x x = ρ x + λ x λ ρx 2, with residual e 2 from x 2 = ˆx 2 + e 2 = ρ x x + e 2 so e 2 = x 2 ρ x / x with e 2 x. We now raise the vector x 2 using the residual e 2 by the rule x 2 = x 2 + λ 2 e 2 = x 2 + λ 2 x 2 ρ x x = + λ 2 x 2 λ 2ρ x x. The doubly raised matrix is given by X <,2> = + λ x λ ρx 2 + λ 2 x 2 λ2ρ x x For any values k > 0, k 2 > 0, it is possible to show that there exist raising parameters λ > 0, λ 2 > 0 such that X + k ρ <,2>X <,2> =, 2 ρ + k 2 the matrix used in generalized ridge regression. This follows from noting that x 2 = +2 ρ 2 λ + ρ 2 λ 2 viewed as a polynomial in λ has positive coeffi cients so it can achieve any value greater than one; and similarly for x 2 2 = +2 ρ2 +k λ 2 + ρ 2 2 +k λ 2 2. The off-diagonal entries in Eq. 2 are not effected by raising column 2 since x x 2 = x x 2 + λ 2 e 2 = x x 2 since e 2 x ; and earlier we had established that x x 2 = x + λ e x 2 = x x 2 = ρ since e x 2. Thus, in particular, given k > 0 there are raising parameters λ and λ 2 such that X + k ρ <,2>X <,2> = = X X + ki ρ + k p, the perturbation matrix usually used in ridge regression. Some advantages of the raise method for addressing the problem of collinearity are: The raise estimators are estimated by OLS and thus confidence intervals can be computed. It retains the coeffi cient of determination of the initial regression. The V IF s associated with the raise estimators are monotone functions decreasing with k, see García et al

14 6.2 For the general case In the general case we start with a standardized matrix X with the following column vectors X = x, x 2,..., x p, and the raising procedure will be given by the following steps: Step : Firstly, we raise the vector x from the regression of x with the resting vectors X [] = x 2, x 3,...x p. For this, we take the residual e from x = x + λ e such that e SpX []. The raised design matrix is denoted X <> = x, x 2,..., x p. Step 2: Next, we raise the vector x 2 from the regression of x 2 with the resting vectors from X <>, namely X <>[2] = x, x 3,..., x p. Then, we get the residual e 2 from x 2 = x 2 + λ 2 e 2 such that e 2 SpX <>[2]. The raised design matrix is denoted X <,2> = x, x 2,..., x p. Step j: We raise the vector x j from the regression of x j with the resting vectors from X <,...,j >, namely X <,...,j >[j] = x,..., x j, x j+,..., x p. Then, we get the residual e j from x j = x j + λ j e j such that e j SpX <,...,j >[j]. The raised design matrix is denoted X <,...,j> = x,..., x j, x j+,..., x p. Step p: We raise the vector x p from the regression of x p with the resting vectors from X <,...,p >, namely X <,...,p >[p] = x,..., x p. Then, we get the residual e p from x p = x p +λ p e p such that e p SpX <,...,p >[p]. The raised design matrix is denoted X <,...,p> = x,..., x p. Suppose columns,..., j have been raised to form X <,...,j >. The j th column of X <,...,j > is raised to define X <,...,j>, with X <,...,j >[j] the resting columns, by the rule X <,...,j>j = X <,...,j >j + λ j e j with e j SpX <,...,j >[j]. Noted that X <,...,j> X <,...,j> will differ from X <,...,j > X <,...,j > only in the j, j entry as the j th raised column is perturbed by the vector λ j e j orthogonal to the span of the remaining columns. The [j, j] entry for the j th raised moment matrix is X <,...,j>x <,...,j> [j, j] = X <,...,j >j X <,...,j >j + 2λ j X <,...,j >j e j + λ 2 je je j, with the surplus term denoted by k j = 2λ j X <,...,j >j e j + λ 2 je j e j. We view X <,...,j> X <,...,j>[j, j] as a quadratic polynomial in λ j ; and since the coeffi - cients of λ 2 j and λ j are both positive, the quadratic polynomial can achieve any value larger than the original value X <,...,j >j X <,...,j >j. Let P [j] be the projection matrix for SpX <,...,j >[j]. That the coeffi cient of λ j is positive follows from 4

15 X <,...,j >j e j = X <,...,j >j I p P [j] X <,...,j >j 3 = X <,...,j >j I p P [j] 2 X <,...,j >j = I p P [j] X <,...,j >j 2 = e j 2 > 0. We started with X X in correlation form and with final raising matrix having moment matrix + k ρ 2 ρ 2... ρ p ρ 2 + k 2 ρ ρ 2p X <,2,...,p>X <,2,...,p> = ρ 3 ρ 23 + k 3... ρ 3p ρ p ρ 2p ρ 3p... + k p = X X + K, with K the diagonal matrix K = diagk,..., k p with k j 0 for j p. Thus the raised regression perturbation matrix is equivalent to a generalized ridge regression perturbation matrix. And conversely, any generalized ridge regression matrix has a corresponding raised regression matrix. Given ridge parameters k,..., k p, the associated raised parameters are given from Eq. 3 by k j = 2λ j X <,...,j >j e j + λ 2 je j e j = 2λ j e j 2 + λ 2 j e j 2 = 2λ j + λ 2 j e j 2, and thus λ j = 7 Empirical application + k j e j 2 4 k j = e j 2 λ 2 j + 2λ j. 5 We have chosen the following data set for an empirical application because it is relatively recent and presents a high level of collinearity. In the course of the financial crisis in the United States and over the whole world there is a big discussion about the life of the American people on credit. The original data set is presented in Table and it is taken from the Economic Report of the President 2008 where y is the outstanding Mortgage Debt in trillions of dollars and the three independent variables are X Personal Consumption in trillions of dollars, X 2 Personal Income in trillions of dollars, and X 3 Consumer credit outstanding in trillions of dollars. 5

16 Table. y Mortgage debt, X Personal Consumption, X 2 Personal Income, and X 3 Consumer Credit. year y X X 2 X The ridge procedure perturbs the moment matrix X X while the raised and surrogate procedures perturbs the design matrix X. This second approach has the distinct advantage of allowing the user to specify, for each of the variables, a precision that the data will retain during the perturbation stages by restricting the mean absolute perturbations λ j n n e j,i = π j. 6 i= For the case study, we set the precision π j = in the first trial, and for the second trial we set the precision π j = Thus given a specified precision π j > 0, we raise column j in X <,...,j> to x j = x j + λ j e j where λ j is solved from Eq. 6. The precision values should be based on the researcher s belief in the accuracy of the data. The raised parameters λ j are constrained to assure that the original data has not been perturbed more than what the researcher has permitted. The original data X 0 is centered and standardized X 0 X. For convenience, the columns are raised in sequence, 2, 3. In Table 2 with precision π j = for all j and in Table 3 with precision π j = 0.05 for all j, we report the raised parameters λ j, the corresponding generalized ridge parameters k j the V IF s, the associated angles θ, the condition number κ for the moment matrices as the ratio of the largest to smallest eigenvalue for the moment matrices, and 6

17 the squared length of the coeffi cient vector βλ j 2. Recall from Eq. 7 that the final raised matrix X <,..,p> has moment matrix X X + diagk,..., k p. Note that the initial V IF s present high values, and this fact justifies the need to correct the collinearity. With precision π = 0.05, after the successive raising process all the V IF s from X <,2,3> are less than 0, namely 7.063, 6.2, But what is more important is that the matrix we need to invert, namely X X + diagk,..., k p, is better conditioned than the initial matrix X X ; and it has been obtained by adding a parameter k i 0 in the main diagonal of the matrix X X, in a way similar to ridge estimation, and where we can preserve the original data to a specified precision. It is important to note that our methodology offers a concrete method to determine the perturbation parameter λ j for the raise estimators, and therefore of k, while the choice of the ridge parameter remains an open problem. As noted by one of the referees, "the authors have obtained, by using an intuitive way, the matrix X X+kI p with the successive raising estimator, while other proposed justifications, are a posteriori." Table 2. With precision π j = 0.005, raised parameters λ j, ridge parameters k j, V IF j, associated angles θ j, condition numbers κ = λ max /λ min and βλ j 2. X X <> X <,2> X <,2,3> λ j k j V IF θ κ βλ j

18 Table 3. With precision π j = 0.05, raised parameters λ j, ridge parameters k j, V IF j, associated angles θ j, condition numbers κ = λ max /λ min and βλ j 2. X X <> X <,2> X <,2,3> λ j k j V IF θ κ βλ j Simulation Example To determine if collinearity is present in a design matrix X, a common criterion is max{v IF j : i j p} > 0.0. As shown in Table 2, for X <,2,3> with precision π j = 0.005, max{v IF j } = 45.6 so the precision needs to be increased. From Table 3, for X <,2,3> with precision π j = 0.05, max{v IF j } = and collinearity has been eliminated in the design matrix X <,2,3> ; and the matrix X <,2,3> is the recommended design matrix. We return to the simulated data example considered in Section 3. This example is from Rawlings 998, and it has been studied in Garcia et al. 20. The example will demonstrate how the raise estimator βλ j will reduce collinearity and can also improve the Mean Squared Error βλ j β 2, a measure of the squared distance between the raise estimators and the true parameters. From Eq. 0 the centered and scaled model has true coeffi cients β = [5.38, 2.440, 2.683]. A measure of how the original design has been perturbed is the Mean Absolute Deviation X X <,...,j> MAD = nk i,l Xi, l X <,...,j>i, l. An important feature with the raise procedure is that the new design matrix can be explicitly computed; and that the perturbations used can be controlled column by column. This is not possible with the ridge procedure which perturbs X X. Table 4 reports the results using precision π j = As successive columns are raised the V IF s decrease, the corresponding angles θ increase, and X X <,2,3> MAD = the required precision. The matrix X <,2> satisfies the criterion max{v IF j } = < 0 and unknown to the user shows a decrease in Mean Squared Error from down to Thus X <,2> is the recommended design with reduced collinearity with the measure of perturbation in the design given by X X <,2> MAD = Note that column 3 8

19 of both X and X <,2> are the same since columns that need not be perturbed can be left as is, a feature not possible with the ridge or surrogate procedures. Table 5 reports the results using precision π j = As in Table 4, as successive columns are raised, the V IF s decrease, the corresponding angles θ increase, and X X <,2,3> MAD = the required precision. The matrix X <> satisfies the criterion max{v IF j } = 7.76 < 0 and shows a decrease in Mean Squared Error from down to For the raised matrix X <>, the Mean Absolute Deviation X X <> MAD = with columns 2 and 3 the same for both X and X <>. The raise procedure is able to perturb some or all columns and with varying precisions a feature not available with the ridge or surrogate procedures. The recommended new design is either X <,2> from Table 4 or X <> from Table 5 both having identical MADs. In summary, the raise procedure has identified candidates for the perturbed design matrix which have eliminated collinearity and are nearly identical to the original design matrix. 9

20 Table 4. With precision π j = 0.025, raised parameters λ j, ridge parameters k j, V IF j, associated angles θ j, condition numbers κ = λ max /λ min, βλ j 2, βλ j, βλ j β 2, and Mean Absolute Deviation X X <,...,j> MAD X X <> X <,2> X <,2,3> λ j k j V IF θ κ βλ j βλ j βλ j β M AD Table 5. With precision π j = 0.05, raised parameters λ j, ridge parameters k j, V IF j, associated angles θ j, condition numbers κ = λ max /λ min, βλ j 2, βλ j, βλ j β 2, and Mean Absolute Deviation X X <,...,j> MAD X X <> X <,2> X <,2,3> λ j k j V IF θ κ βλ j βλ j βλ j β M AD

21 9 Conclusions The ridge procedure perturbs the moment matrix X X X X + λi p but does not allow the user to compute the perturbed design so the changes to X will be unknown to the user. The surrogate procedure has the advantage of allowing the user to explicitly compute the surrogate design X S in terms of the singular values of X. Thus i,j Xi, j X Si, j can the Mean Absolute Deviation X X S MAD = nk be computed showing the average change in the design X X S. However, the surrogate procedure does not allow for varying perturbations for the varying explanatory variables. The surrogate procedure is based on perturbating the singular values of X and thus it is not intuitive nor geometric. On the other hand, the raise procedure it both intuitive and geometric. It is important to note that the raise procedure does yield the explicit new design with X X, and that the mean absolute perturbations given in Eq. 6 are permitted to be different for each explanatory variable. Thus the user can set the mean absolute deviation to be smaller for variables which are known to be accurate and allow larger deviations for variables which are known to be less accurate. The raised estimator βλ = X X X y obtained from the successive raising process in Section 4 improves collinearity. The raised regression retains the value of the coeffi cient of determination, contrary to what happens with the ridge and surrogate estimators. Thus, when the collinearity is mitigated with the ridge or surrogate estimator, the model can become not globally significant while with the successive raising estimator the model will remain globally significant, if initially it was, since the coeffi cient of determination is invariant. The estimator obtained from the successive raising process is given by βλ = X X + K X y with K a diagonal matrix. If we want K =ki p to be a constant diagonal matrix then this can be achieved with the proper choice of λ = λ,..., λ p using Eq. 4. To conclude, if the user wants a better conditioned matrix than X X, the proposed methodology offers a procedure to improve the collinearity in the spirit of Levenberg 944 with a perturbation matrix similar to the perturbation matrix used in ridge regression. Unlike ridge regression, the user can visualize the perturbations of the underlying model and easily control the amount of perturbations to the original data retaining a specified precision in the data. Unlike ridge regression where the choice of the ridge parameter remains an open problem, our methodology offers a concrete procedure for determining the perturbation parameter for the raise estimators. 0 Appendix A In a full rank model {Y = Xβ + ε} having zero mean, uncorrelated, and homoscedastic errors, the Ordinary Least Squares OLS estimator β solves the p equations {X Xβ = X y}. These solutions are unbiased with dispersion ma- 2

22 trix V β = σ 2 X X. Near dependency among the columns of X, as ill conditioning, often causes the OLS solutions to have inflated size β 2, and of questionable signs, and very sensitive to small changes in X Belsley, 986. Among standard remedies are the ridge system {X X + ki p β = X y; k 0} Hoerl and Kennard, 970 with solutions { β R k; k 0}. The OLS solution at k = 0 is known to be unstable with a slight movement away from k = 0 giving completely different estimates of the coeffi cients. McDonald 2009, 200 showed that the square of the correlation coeffi cient R 2 k of the observed values y and the predicted values ŷ R k = X β R k for the ridge regression estimators β R k is a monotone decreasing function of the ridge parameter k. We show the corresponding result for the surrogate estimators, Jensen and Ramirez 2008, 200b. The singular decomposition X = P D ξ Q, with P P = I p, together with θ = Q β as an orthogonal reparametrization, give y = Xβ + ε y = P D ξ Q β + ε U = P y = D ξ θ + P ε as a canonical and equivalent model on R p. Gauss Markov assumptions regarding y = Xβ + ε stipulate that Eε = 0 R n and V ε = σ 2 I n. From these assumptions, it follows that EP ε = 0 R p and V P ε = σ 2 P I n P = σ 2 I p, so that EU = D ξ θ and V U = σ 2 I p. Specifically, the singular decomposition X = P D ξ Q, and the surrogate X k = P Diagξ 2 + k 2,..., ξ 2 p + k 2 Q with singular values {ξ 2 ξ ξ 2 p > 0}, give X kx k = X X + ki p with {X kx k β = X y} for the ridge estimators. For the surrogate model, {y = X k β + ε} is taken as an approximation, or surrogate, for the ill conditioned model {y = Xβ + ε} itself with {X kx k β = X ky} for the ridge estimators. The canonical estimators { θ, θ R, θ S } in terms of {k, ξ i } are given by Estimator Definition θ D ξ U i θr D ξ i U ξ 2 i +k θs D U ξ 2 i +k Following McDonald 2009, 200, we assumed the model has been standardized with y = 0 and y y = and with X X having correlation form; in particular, X = 0 with ŷ R = X β R = 0 = X β S = ŷ S. It follows that the squared correlation coeffi cient RS 2 k of the observed values y and the 22

23 predicted values ŷ S k = X β S in terms of the canonical variables is [y y ŷ S k ŷ S k ] 2 RSk 2 = [ ] [y y y y] ŷ S k ŷ S k ŷ S k ŷ S k 2 = y ŷ S k y 2 X β U D ŷ S kŷ Sk. = S β = SX X β S R 2 Rk = ξ i ξ 2 i +k 2 U U D ξ 2 i ξ 2 i +k U with RS 2 0 = U U 2 /U U =. For ridge estimators, McDonald 200 gives the parallel result for ridge estimators U ξ 2 2 i D ξ 2 i +k U with drr 2 k/dk < 0 for k > 0. For the surrogate estimators, with dr 2 S k dk = 2G 2 H J G = p i= u2 i J = p i= u2 i ξ i ξ 2 i +k ξ 2 i ξ 2 i +k U D ξ 4 i ξ 2 i +k2 U G2 K J 2 We show that HJ + GK < 0 for k > 0 : p p HJ + GK = H i i= = H = p i= u2 i j= K = p i= u2 i J j + with H i J i + G i K i = 0. For i j, we consider, G HJ + GK J 2 p i= ξ i ξ 2 i +k3/2 ξ 2 i ξ 2 i +k2 G i p j= K j, H i J j H j J i + G i K j + G j K i = u 2 ξ i i ξ 2 u 2 ξ 2 j i + k 3/2 j u 2 ξ j j ξ j + k ξ 2 j + k 3/2 u 2 i ξ i ξ 2 j + u 2 ξ 2 j i + k ξ 2 j + k 2 = u2 i u2 j ξ iξ j ξ 2 i + k ξ 2 j + k ξ 2 i + k 5/2 ξ 2 j + k 5/2 = u2 i u2 j ξ iξ j ξ 2 i + k ξ 2 j + k ξ 2 i + k 5/2 ξ 2 j + k 5/2 + u 2 j ξ j ξ 2 j + k u 2 i u 2 i ξ 2 i ξ i + k ξ 2 i ξ i + k 2 [ ξ 2 i ξ 2 ] j ξ j ξ 2 i + k ξ i ξ 2 j + k [ ξ 2 i ξ 2 ] j ξ 2 i ξ 2 j + kξ 2 j ξ 2 i ξ 2 j + kξ 2 i, 23

24 with the term in the brackets negative as the two terms in parentheses will have opposite signs. Appendix B With p = 2, to find the minimum MSE risk for the rotated estimators required solving for the roots of the quartic polynomial p 4 t = a 0 t 4 +a t 3 +a 2 t 2 +a 3 t+a with a 0 = ρ 2, a = 2 ρ 2, a 2 = ρ ρ 2 σ + δ, a 3 = 2ρ 2 ρ 2, a 4 = ρ 2 ρ 2. Yang 999 has given the root classification for quartic polynomials with general coeffi cients. Following his notation, we compute D 4 = 256a 3 0a a 2 0a a 2 0a 3 a 2 4a 27a 4 a 2 4 6a 0 a 2 a 4 a a 2 2a 2 3a 2 4a 0 a 3 2a a 2 a 4 a 3 a 3 +44a 0 a 2 a 2 4a 2 80a 0 a 2 2a 4 a a 3 + 8a 0 a 2 a 3 3a 4a 3 2a 4 a 2 4a 3 a a 0 a 4 2a 4 28a 2 0a 2 2a a 2 0a 2 a 4 a 2 3. A quartic polynomial has two distinct roots when D 4 < 0. For p 4 t, this follows from factoring D 4 with A, B, C, D, E, F, G, H, I > 0 as D 4 = AB C 4σ 2 D 4σ 4 E 6σ 6 F 6σ 8 G 64σ 0 H σ 2 I where A = 64 ρ2 2 ρ 2 > 0 δ 6 B = δ 2 δ 2 ρ 2 + σ 2 2 > 0 C = 27δ 2 + ρ 2 2 ρ 4 + ρ 4 > 0 D = 54δ 0 ρ 2 + ρ 4 ρ 3 + ρ 3 > 0 E = 9δ ρ 2 + 9ρ 4 ρ 2 + ρ 2 > 0 F = 68δ 6 ρ 3 + ρ 3 > 0 G = 57δ 4 ρ 2 + ρ 2 > 0 H = 6δ 2 ρ + ρ > 0 I = 64 > 0. 2 References. Belsley, D.A Centering, the constant, first differencing, and assessing conditioning. In Model Reliability E. Kuh and D.A. Belsley, D.A. eds.. Cambridge: MIT Press. 2. Berk, K Tolerance and condition in regression computations. J. Amer. Stat. Assoc. 72:

25 3. Davidov, O Constrained estimation and the theorem of Kuhn- Tucker. J. Appl. Math. Decis. Sci. Article ID 92970: García, J., Andújar, A., Soto, J Acerca de la detección de la colinealidad en los modelos de regresión lineal. XIII Reunión Asepelt- España. Universidad de Burgos. 5. García, C.B., García, J. and Soto, J. 20. The raise method: An alternative procedure to estimate the parameters in presence of collinearity. Quality and Quantity 45: García, C.B., García, J., López Martin, M.M. and Salmeron, R Collinearity: Revisiting the variance inflation factor in ridge regression. Journal of Applied Statistics 42: Goldstein, M. and Smith, A. F. M Ridge-type estimators for regression analysis. J. of R. Stat. Soc. B 36: Hoerl, A.E Ridge Analysis. Chemical Engineering Progress, Symposium Series 60: Hoerl, A.E. and Kennard, R.W Ridge regression: biased estimation for nonorthogonal problems. Technometrics 2: Jensen, D.R. and Ramirez, D.E Anomalies in the foundations of ridge regression. Int. Stat. Rev. 76: Jensen, D.R. and Ramirez, D.E. 200a. Tracking MSE effi ciencies in ridge regression. Advances and Applications in Statistical Sciences : Jensen, D.R. and Ramirez, D.E. 200b. Surrogate models in ill-conditioned systems. Journal of Statistical Planning and Inference 40: Jensen, D.R. and Ramirez, D.E Revision: Variance inflation in regression. Adv. Decis. Sci. Article ID 67204: Kapat, P. and Goel, P.K Anomalies in the foundations of ridge Regression: Some Clarifications. Int. Stat. Rev. 78: Levenberg, K A method for the solution of certain non-linear problems in least-squares. Quart. of Appl. Math. 2: Marshal, A.W. and Olkin, I Inequalities : Theory of Majorization and Its Applications. New York: Academic Press. 7. McDonald, G.C Ridge regression. Wiley Interdisciplinary Reviews: Computational Statistics : McDonald, G.C Tracing ridge regression coeffi cients. Wiley Interdisciplinary Review: Computational Statistics 2:

26 9. Marquardt, D.W An algorithm for least-squares estimation for nonlinear parameters. J. Soc. Ind. Appl. Math. : Mayer, L.S. and Willke, T.A.973. On Biased Estimation in Linear Models. Technometrics 5: The Economic Report of the President Piegorsch, W. and Casella, G The early use of matrix diagonal increments in statistical problems. SIAM Review 3: Puntanen, S., Styan, G.P.H., and Isotalo, J. 20. Matrix Tricks for Linear Statistical Models. Berlin: Springer-Verlag. 24. Rawlings, J.O., Pentula, S.G., and Dickey, D.A Applied Regression Analysis: A Research Tool - 2 nd ed. New York: Springer. 25. Riley, J Solving Systems of linear equations with a positive definite, symmetric but possibly ill-conditioned matrix. Math Tables Aids Comput. 9: Seber, G.A.F Linear Regression Analysis. New York: John Wiley and Sons. 27. Stein, C.M. 960 Multiple regression, contribution to probability and Statistics. Chapter 37 in Essays in Honor of Harold Hotelling I. Olkin, S. Ghurye, W. Hoeffding, W. Madow and H. Mann eds. Stanford: Stanford University Press. 28. Stewart, G.W Collinearity and least squares regression. Stat. Sci. 2: Wichers, R The detection of multicollinearity: A comment. Review Econ. Stat. 57: Yang, L Recent advances on determining the number of real roots of parametric polynomials. J. Symbolic Computation 28:

Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators

Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators STATISTICS RESEARCH ARTICLE Mitigating collinearity in linear regression models using ridge, surrogate and raised estimators Diarmuid O Driscoll 1 and Donald E. Ramirez 2 * Received: 21 October 2015 Accepted:

More information

Revisting Some Design Criteria

Revisting Some Design Criteria Revisting Some Design Criteria Diarmuid O Driscoll diarmuid.odriscoll@mic.ul.ie Head, Department of Mathematics and Computer Studies Mary Immaculate College, Ireland Donald E. Ramirez der@virginia.edu

More information

Response surface designs using the generalized variance inflation factors

Response surface designs using the generalized variance inflation factors STATISTICS RESEARCH ARTICLE Response surface designs using the generalized variance inflation factors Diarmuid O Driscoll and Donald E Ramirez 2 * Received: 22 December 204 Accepted: 5 May 205 Published:

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model International Journal of Statistics and Applications 04, 4(): 4-33 DOI: 0.593/j.statistics.04040.07 Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Math Linear Algebra Final Exam Review Sheet

Math Linear Algebra Final Exam Review Sheet Math 15-1 Linear Algebra Final Exam Review Sheet Vector Operations Vector addition is a component-wise operation. Two vectors v and w may be added together as long as they contain the same number n of

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model 204 Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model S. A. Ibrahim 1 ; W. B. Yahya 2 1 Department of Physical Sciences, Al-Hikmah University, Ilorin, Nigeria. e-mail:

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model e Scientific World Journal, Article ID 206943, 8 pages http://dx.doi.org/10.1155/2014/206943 Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model Jibo

More information

Efficient Choice of Biasing Constant. for Ridge Regression

Efficient Choice of Biasing Constant. for Ridge Regression Int. J. Contemp. Math. Sciences, Vol. 3, 008, no., 57-536 Efficient Choice of Biasing Constant for Ridge Regression Sona Mardikyan* and Eyüp Çetin Department of Management Information Systems, School of

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function Communications in Statistics Theory and Methods, 43: 4253 4264, 2014 Copyright Taylor & Francis Group, LLC ISSN: 0361-0926 print / 1532-415X online DOI: 10.1080/03610926.2012.725498 Bayesian Estimation

More information

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Linear Models Review

Linear Models Review Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Specification errors in linear regression models

Specification errors in linear regression models Specification errors in linear regression models Jean-Marie Dufour McGill University First version: February 2002 Revised: December 2011 This version: December 2011 Compiled: December 9, 2011, 22:34 This

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors Available online at www.worldscientificnews.com WSN 59 (016) 1-3 EISSN 39-19 elationship between ridge regression estimator and sample size when multicollinearity present among regressors ABSTACT M. C.

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF

More information

Minimizing Oblique Errors for Robust Estimating

Minimizing Oblique Errors for Robust Estimating Irish Math. Soc. Bulletin 62 (2008), 71 78 71 Minimizing Oblique Errors for Robust Estimating DIARMUID O DRISCOLL, DONALD E. RAMIREZ, AND REBECCA SCHMITZ Abstract. The slope of the best fit line from minimizing

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

The Distribution of Mixing Times in Markov Chains

The Distribution of Mixing Times in Markov Chains The Distribution of Mixing Times in Markov Chains Jeffrey J. Hunter School of Computing & Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand December 2010 Abstract The distribution

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

MATH 240 Spring, Chapter 1: Linear Equations and Matrices MATH 240 Spring, 2006 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 8th Ed. Sections 1.1 1.6, 2.1 2.2, 3.2 3.8, 4.3 4.5, 5.1 5.3, 5.5, 6.1 6.5, 7.1 7.2, 7.4 DEFINITIONS Chapter 1: Linear

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Lecture 7. Econ August 18

Lecture 7. Econ August 18 Lecture 7 Econ 2001 2015 August 18 Lecture 7 Outline First, the theorem of the maximum, an amazing result about continuity in optimization problems. Then, we start linear algebra, mostly looking at familiar

More information

Lecture 14 Singular Value Decomposition

Lecture 14 Singular Value Decomposition Lecture 14 Singular Value Decomposition 02 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Goals for today singular value decomposition condition numbers application to mean squared errors

More information

ECON 4160, Spring term 2015 Lecture 7

ECON 4160, Spring term 2015 Lecture 7 ECON 4160, Spring term 2015 Lecture 7 Identification and estimation of SEMs (Part 1) Ragnar Nymoen Department of Economics 8 Oct 2015 1 / 55 HN Ch 15 References to Davidson and MacKinnon, Ch 8.1-8.5 Ch

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε, THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, which can be written in the following form: (2) y 1 y 2.. y T = 1 x 11... x 1k 1

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Research Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in Linear Regression Model

Research Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in Linear Regression Model Applied Mathematics, Article ID 173836, 6 pages http://dx.doi.org/10.1155/2014/173836 Research Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in

More information

Economics 620, Lecture 4: The K-Varable Linear Model I

Economics 620, Lecture 4: The K-Varable Linear Model I Economics 620, Lecture 4: The K-Varable Linear Model I Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 4: The K-Varable Linear Model I 1 / 20 Consider the system

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 +  1 y 2 = + x 2 +  2 :::::::: :::::::: y N = + x N +  N 1 Economics 620, Lecture 4: The K-Variable Linear Model I Consider the system y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N or in matrix form y = X + " where y is N 1, X is N

More information

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................

More information

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

OPTIMIZATION OF FIRST ORDER MODELS

OPTIMIZATION OF FIRST ORDER MODELS Chapter 2 OPTIMIZATION OF FIRST ORDER MODELS One should not multiply explanations and causes unless it is strictly necessary William of Bakersville in Umberto Eco s In the Name of the Rose 1 In Response

More information

Nonlinear Inequality Constrained Ridge Regression Estimator

Nonlinear Inequality Constrained Ridge Regression Estimator The International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat2014) 24 28 August 2014 Linköping, Sweden Nonlinear Inequality Constrained Ridge Regression Estimator Dr.

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Matematické Metody v Ekonometrii 7.

Matematické Metody v Ekonometrii 7. Matematické Metody v Ekonometrii 7. Multicollinearity Blanka Šedivá KMA zimní semestr 2016/2017 Blanka Šedivá (KMA) Matematické Metody v Ekonometrii 7. zimní semestr 2016/2017 1 / 15 One of the assumptions

More information

Ill-conditioning and multicollinearity

Ill-conditioning and multicollinearity Linear Algebra and its Applications 2 (2000) 295 05 www.elsevier.com/locate/laa Ill-conditioning and multicollinearity Fikri Öztürk a,, Fikri Akdeniz b a Department of Statistics, Ankara University, Ankara,

More information

A note on the equality of the BLUPs for new observations under two linear models

A note on the equality of the BLUPs for new observations under two linear models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 14, 2010 A note on the equality of the BLUPs for new observations under two linear models Stephen J Haslett and Simo Puntanen Abstract

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Total Least Squares Approach in Regression Methods

Total Least Squares Approach in Regression Methods WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Jim Lambers MAT 610 Summer Session Lecture 1 Notes Jim Lambers MAT 60 Summer Session 2009-0 Lecture Notes Introduction This course is about numerical linear algebra, which is the study of the approximate solution of fundamental problems from linear algebra

More information