Indian Statistical Institute

Size: px

Start display at page:

Download "Indian Statistical Institute"

Silvia Lee
5 years ago
Views:

1 Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018

2 Contents 1 Introduction 2 2 Criteria for evaluating regression estimators Breakdown point The Mean Square Error The Bounded Influence Robust Regression methods Least Trimmed Square estimator M estimator S estimator Algorithms M estimation S estimation Simulation Study 8 6 Simulation Results Mean of the estimates of coefficients Mean Square Error Absolute Bias Mean Square Error Absolute Bias Comparison of Run times Least Absolute Deviation Huber M estimate Bisquare M estimate Conclusions 19 1

3 1 Introduction Regression Analysis is used to study how a dependent variable is linearly related to the set of explanatory variables. The linear regression model is given by y i = β 0 + β 1 x i1 + β 2 x i2 +...β p 1 x ip 1 + ɛ i i = 1, 2, 3...n or in matrix notation Y = Xβ + ɛ where Y denote the n 1 column vector containing the value of the dependent variable X is a n p matrix with first column consisting of 1s and the i th column as the vector containing (i 1) th explanatory variable β as the unknown p 1 vector of parameters and ɛ as the vector containing errors. Regression analysis aims to find the estimate of the unknown vector β.one of the commonly used methods is to find those values of ˆβ which minimises the value of e e where e is the vector of residuals i.e., ˆβ = arg min(y Xβ) (Y Xβ) β In order to use this, the assumptions on which this is based should be met the most important of which being that the errors are normally distributed. However one of the common problems is the presence of the outliers in the data which can be attributed to different sources like faulty measurement, incorrect reading, failure of instrument etc. Outlier in general can be classified in two types one being the outlier in X direction and outlier in y direction. Outlier in x direction(or high leverage points)corresponds to those in which the x i may be an outlier in the p dimensional space occupied by the rows of the regression matrix. Outlier in y direction are those where there is a large difference between the y value and the value predicted by the model. In this study we will compare various robust regression methods on the basis of their MSE and absolute bias in case when the data has outliers in y direction. 2

4 2 Criteria for evaluating regression estimators 2.1 Breakdown point Breakdown points is a very popular quantitative characteristic of robustness. Consider a random sample x 0 = (x 1, x 2,...x n ) and the corresponding value T n (x 0 ) of an estimator of functional T based on the sample x 0. Now suppose in this original sample we can replace any of the m components by arbitrary values(even infinity is allowed) and denote the new sample by x m and let T n (x m ) denotes the value of the estimator for this new sample. The Breakdown point of the estimator T n for sample x 0 is the number ɛ n(t n, x 0 ) = m (x 0 ) n where m (x 0 ) is the smallest integer m,for which 2.2 The Mean Square Error sup T n (x m ) T n (x o ) = x 0 The Mean Square error is another performance criteria used to evaluate the perfromance of the estimator. It is defined as MSE = ( ˆβ R β) ( ˆβ R β) where ˆβ R is the vector of robust parameter estimates and β is the vector of the true model coefficients. 2.3 The Bounded Influence Bounded influence in the X space is the estimators resistance to being pulled toward extreme observation in the X space. To determine whether an estimator has bounded influence is done by studying the influence function. Define influence function IF T,F (.) of the estimator T, at the underlying probability distribution F,by IF T,F (x 0 ) = lim ɛ 0 T (F ) T ( F ) ɛ = [ T ( F ] ) ɛ ɛ=0 3

5 Here T(F) denote the estimator of interest expressed as a functional. The functional T ( F ) represent the estimator of interest under the altered CDF. Thus the influence function is actually a first order derivative of an estimator viewed as a functional and measures the influence of the point x 0 on the estimator T. 3 Robust Regression methods. 3.1 Least Trimmed Square estimator The Least trimmed square estimator is defined as ˆβ LT S = arg min β h i=1 (e 2 (i)) where e (j) denotes the jth smallest residual. The best robust properties are achieved when h = n and in this case breakdown 2 point is equal to 50% which is the highest possible breakdown point. 3.2 M estimator The M estimator of β is defined as the solution M n of the minimization n ρ(y i x iβ) i=0 with respect to β R p,where ρ is continuous. Here the residuals are also scaled but we have merged the scaling factor in our tuning constant in the ρ function A reasonable ρ function should satisfy following properties Always non negative i.e., ρ(e) 0 e R. ρ(0) = 0 Symmetric i.e., ρ(e) = ρ( e) Monotone i.e., ρ(e) ρ(e ) if e > e Let the derivation of ρ i.e., ρ be ψ and define the weight function w(e) = ψ(e) and e set w i = w(e i ). 4

6 Differentiating the above equation and setting the derivative equal to 0 we get set of p estimating equations. substituting w i = w(e i ) we get, n ψ(y i x ib)x i = 0 (1) i=1 n w i (y i x ib)x i = 0 (2) i=1 Solving these equation is equivalent to solving a weighted least square problem,however the weights depend upon the residuals itself. We shall use an iterative algorithm(irls) to solve it which is described later in detail. The choice of ρ function determines the estimate of β.we will investigate following ρ function: 1. Huber 2. Bisquare 3. Least Square e 2 ρ H (e) = for e k 2 k e k2 otherwise 2 1 for e k w H (e) = k otherwise e e 2 ρ B (e) = for e k 2 k e k2 otherwise 2 { [ k 2 ( e ) ] } for e k w B (e) = 6 k k 2 otherwise 6 ρ LS (e) = e 2 w LS (e) = 1 5

7 4. Least Absolute Deviation ρ LAD = e e LAD = 1 e Note that least square method is a special case of M estimation with ρ LS (e) = e 2 NOTE: The value of k for Huber and Bisquare estimator is called tuning constant. Smaller value of k produce more resistance to outliers but at the expense of lower efficieny when errors are normally distributed. The tuning constant is chosen to give reasonably high efficiency in case of normal case. We have chosen k = 1.345ˆσ for huber estimate and k = ˆσ for bisquare estimator which provides an efficiency of close to 95% in case of normal errors. 3.3 S estimator The goal of S estimator is to have a simple high breakdown point estimator which share the flexibility and nice asymptodic properties of the M estimator.the name S estimator was chosen because they are based on estimate of scale The weakness of M estimation is the lack of consideration on the data distribution and not a function of the overall data because only using the median as the weighted value. This method uses the residual standard deviation to overcome the weaknesses of median. Huber (1964) defined M-scale estimates for e = (e 1,..., e n ) as { } s M (e) = inf s > 0 : 1 n ( ei ) ρ b n s To guarantee consistency when the data are normally distributed, the constant b is usually chosen to be E φ (ρ(u)), where φ denotes the standard normal distribution. When ρ is continuous, equality is achieved for s M (e) The regression estimates associated with M-scales are the S-estimators,in particular they satisfy where ŝ = s M (r( ˆβ n )) ˆβ n = arg min β R p i=1 n ( ) ei (β) ρ ŝ i=1 6

8 4 Algorithms 4.1 M estimation 1. Obtain an initial estimate of β 0 such as the least squares estimates. 2. At each iteration calculate the residuals e (t 1) i w[e (t 1) i ] 3. Solve for the new weighted least square estimates b (t) = [ X W (t 1) x ] 1 X W (t 1) y 4. Repeat step 2 and 3 until the estimated coefficients converge and associated weights w (t 1) i = This method is called Iteratively Reweighted Least Squares(IRLS) method. 4.2 S estimation 1. Obtain an initial estimate of β 0 using the method of least squares. 2. Calculate the residual values e i = y i ŷ i. 3. Calculate value median e i median(e i ) ˆσ i = n i=1 w ie 2 i nk 4. Calculate value u i = e i ˆσ i, iteration = 1, iteration > 1 5. Calculate weighted values If iteration number = 1 [ ( ui ) ] 2 2 1, u w i = i , otherwise If iteration number > 1 w i = ρ(u i) u 2 i 6. Calculate ˆβ s with WLS method with weights w i. 7. Repeat steps 2-6 to obtain a convergent value of ˆβ s. 7

9 5 Simulation Study Simulation study is conducted to compare different methods of estimation. These methods are 1. The ordinary least square estimator(ols). 2. The least median square estimator(lms) 3. The least trimmed square estimator (LTS). 4. The S estimator. 5. The least absolute deviation estimator (lad). 6. The Huber M estimator (M h uber) with k = The Turkey s M estimator(m t urkey) with k = Criterias used for comparison of the regression estimates are Mean Square Error(MSE) and Absolute Bias(AB). The data is generated according to the model: Y = 1 + X 1 + X 2 + X 3 + X 4 + e and hence the true value of coefficients are all equal to 1 The data simulation is repeated 5000 times to obtain 5000 independent samples of X and Y of given size n. This process is done for sample of size n=30 and n=100.(least Median Square is only implemented for n=30 case). In order to cover effects of various situations on the regression coefficient five scenarios of the density function of error have been used. These are as follows: Scenario I e N(0,1) ; the standard normal distribution Scenario II e t distribution with degree of freedom 1; the cauchy distribution. Scenario III e t distribution with degree of freedom 5. Scenario IV e N(0,1) with 25% outliers in y direction from N(0,10) distribution. Scenario V e N(0,1) with 40% outliers in y direction from N(0,10) distribution. 8

10 6 Simulation Results Simulation results for the case when n= Mean of the estimates of coefficients. Scenario I Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario II Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario III Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S

11 Scenario IV Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Scenario V Method β 0 β 1 β 2 β 3 β 4 LTS OLS M BISQUARE M HUBER LAD S Mean Square Error Scenario I Scenario II LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

12 Scenario III Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S Absolute Bias Scenario I LTS OLS M Bisquare M Huber LAD S

13 Scenario II Scenario III Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

14 Case when n= Mean Square Error Scenario I Scenario II LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S Scenario III Scenario IV LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

15 Scenario V LTS OLS M Bisquare M Huber LAD S Absolute Bias Scenario I Scenario II Scenario III LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

16 Scenario IV Scenario V LTS OLS M Bisquare M Huber LAD S LTS OLS M Bisquare M Huber LAD S

17 7 Comparison of Run times We did a study to compare the running time of two different implementation of the same methods. For a separate implementation we used functions of package Robustbase. 7.1 Least Absolute Deviation The function lmrob.lar() in the package uses Simplex method to find the solution of the required equation while we have used IRLS which is an iterative algorithm. Plot of run time obtained is: Time (seconds) Index 16

18 7.2 Huber M estimate Comparison of run time required for the two implementation is done. We have used the function rlm() of the MASS package.from the graph it is clearly evident that the function rlm() performs significantly better than m huber() function Time (seconds) Size of sample in thousands 17

19 7.3 Bisquare M estimate The function rlm() also gives Bisquare M estimate upon changing the psi function in its arguments. The plot of run time comparison is as follows: Time (seconds) Size of sample in thousands 18

20 8 Conclusions Scenario I The OLS estimate obtain the best performance. The M estimate and the S estimates have a better performance than the high breakdown point estimates(i.e., LTS). As is expected the MSE is lower for case when sample size is 100 compared to when sample size is 30. This is also the case with absolute bias. Scenario II Ordinary least square method perform worst with the estimates being biased and very MSE. Even LAD and Huber M estimate perform very poorly. MSE for S estimate was better compared to others but still are very poor. LTS estimates has best performance. In terms of absolute bias, the LTS, bisquare M and S estimates has the lowest bias. Scenario III In this case Huber M and Bisquare M estimate have the best performance in terms of both absolute bias. OLS perform better than LTS and LAD estimates. Scenario IV In terms of MSE, LAD and OLS have the worst performance both for the case when n=30 and n=100.on the other hand S estimates have best performance followed by LTS and M estimates. In terms of absolute bias,lts and S estimates have lowest bias. Scenario V Again the worst performance is by OLS and then LAD whereas the best performance is by S and LTS followed by M estimates In terms of absolute bias LTS AND S estimtes have the lowest bias. Final Verdict Except for the case when errors have standard normal distribution, OLS perform the worst suggesting needs of alternate methods which are more robust. LTS and S performs the best in all the other 4 scenarios. S estimates perform better than M estimates because M estimates does not take care of scaling factor which S estimate does. 19

Regression Analysis for Data Containing Outliers and High Leverage Points

Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain