Chapter 4 High Breakdown Regression Procedures

Chapter 4 Hgh Breakdown Regresson Procedures Introducton Whle M and BI estmators provde an mprovement over OLS f the data has an outler or hgh nfluence pont, they cannot provde protecton aganst data wth large amounts of contamnaton. As seen n Example 2.3 (Secton 2.5.3), clusters of bad data may overwhelm these methods, whch leads to poor estmators. Thus, the dscusson next turns to methods wth hgh breakdown ponts (see also Markatou and He (1994)). They have the ablty to sft through as much as 5% of the data beng contamnated and stll provde decent estmators for explanng the general trend. 4.1 Least Medan of Squares (LMS) All of the methods mentoned above have an objectve functon that nvolves a Σ operator. Changng the functon nsde ths summaton operator, as n M or BI regresson, had lmted success. Rousseeuw (1984) suggested replacng the summaton by the medan, parallelng the fact that the sample medan s more robust than the sample mean n locaton estmaton. Ths leads to Least Medan of Squares regresson (LMS), wth the objectve functon beng mn med ( y ) b 2 xb. 1/3 O( n ) The resultng estmator has a 5% breakdown pont, but converges at the slow rate of makng ts asymptotc effcency aganst normal errors zero (Rousseeuw and Leroy, 1987, page 179). In the locaton model there exsts a closed-form algorthm to calculate LMS. Orgnally, no exact algorthm was avalable to calculate LMS n the regresson settng (excludng, of course, degenerate cases). Stromberg (1993) then provded an exact LMS algorthm, but 6

n requres that all C p + 1 subsets be nvestgated. Ths beng a dauntng task, the approach generally taken s to approxmate LMS by one of the handful of avalable subsamplng algorthms. Consder a subsample of sze p, the number of unknown coeffcents. Ths s referred to as an elemental set because assumng that the reduced X matrx,.e. the matrx formed by usng only the p rows of X that correspond to the p subsampled observatons, has full rank, an exact ft can be obtaned from these ponts and the objectve functon can then be evaluated. One subsamplng algorthm (Rousseeuw and Leroy, 1987, page 197) smply draws an elemental set and evaluates the objectve functon. Ths s then repeated a large number of tmes, and the fnal LMS estmate corresponds to the estmate that had the smallest observed objectve functon. Ths estmate retans the hgh breakdown and convergence propertes of the theoretcal LMS (Rousseeuw and Bassett, 1991). However, even f an exhaustve search of all n C p elemental sets s performed, the resultng estmator s generally not the true LMS estmator, but rather only an estmate of t (just as n the case of MVE estmaton va ths method). Furthermore, the p calculatons nvolved to obtan the estmated regresson coeffcents are of order O( n + 1 ) (Rousseeuw, 1993). Instead, the number of elemental subsets selected can be based on a probablstc argument of how lkely t s to obtan an elemental set that contans only good observatons. The drawback s that snce ths probablty s less than 1, the algorthm may break down n ts calculaton of a hgh breakdown estmator. Ths defeats the purpose of the hgh breakdown phlosophy. In any event, the number of elemental subsets needed s roughly 3 2 p and 4.6 2 p for 95% and 99% probabltes, respectvely, of obtanng at least one purely good elemental set. To obtan the regresson coeffcents, the order of calculaton would then be these numbers of subsets multpled by n, respectvely (Rousseeuw, 1993). Of course, whle the probablstc argument provdes for the analyss of a strctly good elemental subset wth hgh probablty, there s no guarantee that any of those randomly obtaned 61

good elemental subsets reflects the general trend by tself. Therefore, the resultng estmator can be potentally very msleadng. A second algorthm was ntroduced by Rousseeuw (1993) to reduce the order of computatons requred and elmnate the problem of havng an algorthm breakdown due to the probablty of obtanng a purely good elemental set beng less than one. Bascally, the data s randomly assgned nto blocks of sze 2p 2. Any extra ponts are dsbursed as evenly as possble. Then, wthn each block, all possble subsets of sze p are evaluated wth the objectve functon. Agan, the fnal LMS estmate corresponds to the estmate that had the smallest observed objectve functon. 35 1 1 2 y 25 2 3 3 4 15 4 1 11 12 13 14 x 15 16 17 18 19 Fgure 4.1: Possble confguraton of the four blocks used n the second LMS subsamplng algorthm, gven eght observatons on a scatterplot. 62

It s guaranteed that at least one elemental set wll consst only of good observatons. The advantage of ths algorthm s theoretcal. It acheves a hgh breakdown pont, but the estmates obtaned could be very msleadng. A major problem s that there s no guarantee that any block wll provde enough nformaton concernng the general trend. To llustrate ths dea, suppose that a smple lnear regresson (SLR) model s posed for a dataset of eght good observatons. Here, the block sze s 2(2) 2 = 2, and there are 82= 4 blocks. Next, suppose that the data are shown n Fgure 4.1, and are labeled as to whch block each observaton was randomly assgned. Even though the slope s obvously negatve, all four blocks result n postve slope estmates. It seems that the algorthm reles on asymptotc combnatorc probabltes, that obtanng ths type of block structure goes to zero as n gets large. If there are replcatons at the regressor locatons, such as n a desgned experment, ths scenaro may not be all that uncommon. The frst algorthm wll be utlzed n the case studes to come. The number of randomly drawn elemental sets s generally taken to be 5 or 1 n practce. In some case studes to follow, as many as 5, randomly drawn elemental subsets were used n an attempt to avod an algorthm breakdown. Addtonally, the LMS estmator that s obtaned from a random subsamplng algorthm can be modestly mproved by adjustng the ntercept. By vewng the resduals as a unvarate sample, the exact LMS algorthm for locaton estmaton can be performed. The updated ntercept s found as the locaton LMS of the resduals from the current LMS regresson estmator. Ths procedure s guaranteed to reduce (or not change) the LMS objectve functon, and t also elmnates the condton of always havng at least p resduals beng zero (because of the exact ft). Ths ntercept adjustment procedure s ncorporated nto all LMS calculatons. 4.2 Least Trmmed Squares (LTS) Recall that one drawback to LMS s that t possesses a very slow convergence rate of 1/3 O( n ). Rousseeuw (1984) ntroduced Least Trmmed Squares (LTS) to remedy ths stuaton. 63

1/2 It also possesses a 5% breakdown pont, but converges at the faster rate of O( n ). Here, the objectve functon s mn b h 2 r[] = 1 whch represents the sum of the h smallest squared resduals. As mentoned before (n Chapter 3), h s generally taken to be [( n p 1) 2], + +. The problem s that no closed-form algorthm exsts, except for the locaton model, to construct the true LTS estmator. Instead, the algorthms stated prevously for approxmatng LMS can also be used to approxmate LTS smply by changng the objectve functon beng evaluated for each elemental set. The ntercept adjustment step s also avalable for the LTS estmator. Smply replace the current ntercept wth the locaton LTS estmate of the resduals from the current LTS regresson estmator. Ths update s performed n all LTS calculatons. Agullo (21) offers more dscusson regardng LTS algorthms. Both LMS and LTS are neffcent estmators. In fact, for the locaton model, LMS has a 1.39% asymptotc effcency versus the sample mean under normal errors. LTS s only slghtly better, havng a 7.14% asymptotc effcency versus the sample mean under normal errors. Therefore, several methods employ LTS, whch converges more rapdly than LMS, as an ntal estmator and perform some mprovement calculaton. These are referred to as one-step estmators. The dea s to utlze the hgh breakdown propertes of these ntal estmators, but mprove on ther lack of effcency. However, ths ntroduces another problem. The mprovement step generally wll requre weghts for the observatons. Thus, robust weghts are requred to retan the desred hgh breakdown propertes. Ths leads back to the materal of Chapter 3. As mentoned n Secton 3.3, a popular choce among the robust weghtng schemes s the MVEbased Mallows weght. The remander of Chapter 4 wll ntroduce two competng one-step estmators and provde an overall hgh breakdown regresson analyss of the stackloss case study. 64

4.3 One-Step Generalzed M Estmators The prevous hgh breakdown estmators, LMS and LTS, attack the regresson problem by changng the objectve functon to an expresson that leads to mproved breakdown pont capabltes. Because of poor effcency and numercal senstvty due to both the random subsamplng process as well as to small nternal movements of the data (these topcs are dscussed later n Chapter 8), other hgh breakdown regresson technques have been developed. One remedy for the poor effcency s to ncorporate a hgh breakdown ntal estmator wth the generalzed M estmator to obtan the one-step generalzed M estmator. The objectve functon has the same form as the bounded nfluence estmator of Secton 2.1, but wth robust leverage weghts. The soluton s no longer found va the IRLS procedure, but nstead through a one step Taylor seres expanson of the objectve functon. Ths estmator can be wrtten as the sum of two terms: the ntal estmator and the one-step mprovement calculaton. LTS has become the ntal estmator of choce for many one-step mprovement algorthms. It has a hgh breakdown pont and converges more rapdly than LMS. The GM-estmators nhert the hgh breakdown pont of LTS, but mprove on the effcency aspect. In the followng dscusson on one-step GM estmators, t s understood that (1) the ntal estmator, ˆβ, s LTS, (2) the resduals from the ntal ft are denoted by r ˆ ˆ ( β) = y xβ, (3) a dagonal matrx W = dag( w ( x )) of robust Mallows weghts, wth 2 χ.95, p 1 w = mn 1, 2, RD s calculated usng MVE estmates (based solely on the regressor space), and (4) The robust scale estmate, ˆ σ, s based on the LMS estmate (Rousseeuw and Leroy, 1987, pg. 22), and s found as 5 σ ˆ ˆ = 1.4826 1 + med r ( β ). n p 65

4.3.1 Mallows 1-Step Estmator The Mallows-1-step (M1S) estmator, a Generalzed M estmator, was ntroduced by Smpson, Ruppert, and Carroll (1992). The focus of the 1-step mprovement s to ncorporate a leverage control term and an outler control term n the estmaton of ˆβ. The resduals from the ntal estmate are utlzed, wth the M1S estmator beng the soluton to the altered normal equatons n r ( ˆ β) wψ = = 1 ˆ σ x. Outlers are controlled by the ψ-functon downweghtng large scaled resduals. For our dscusson the Huber ψ-functon s used. In addton, wth the w ' s beng robust Mallows weghts, hgh leverage ponts get downweghted due to ths term. Usng Newton-Raphson to solve the altered normal equatons, the form of the estmator s where and g ˆ ˆ β= β + H g, 1 r ( βˆ ) wx n = ˆ σ ψ = 1 ˆ σ n (1) r ˆ ( β) H = wxx ψ = 1 ˆ σ, (1) (1) ψ ( u) wth ψ beng the frst dervatve of ψ,.e. ψ ( u) =. Ths can be smplfed when u wrtten n matrx notaton as βˆ = βˆ + ( XBX ) XWψ ˆ σ. 1 Here, ψ s an n 1 vector, W s the n n weght matrx defned earler n Secton 4.3, and B s the n n dagonal matrx such that dagonal elements of B become B (1) = dag wψ ˆ σ r ( βˆ ). Usng the Huber ψ-functon, the 66

b w, ( ˆ ˆ f r β) cσ, =, otherwse. The ψ vector elements are calculated as c, f r ˆ ˆ ( β) < cσ, r ( ˆ β) ψ, ( ˆ ˆ = f r β) c σ, ˆ σ c, f r( ˆ ˆ β) > cσ. To further the analyss beyond estmaton, standard errors are needed for each of the coeffcents. If the p p matrx M s defned as n 2 2 2 r ˆ ( β) M ˆ = σ w xx ψ =1 ˆ σ, then the estmated (asymptotc) covarance matrx for the parameter estmates s gven by Cov( βˆ ) = H M H. 1 1 By defnng the matrx wrtten n matrx form as V r ( βˆ ) dag wψ, the estmated covarance matrx can be = ˆ σ ˆ ˆ 2 1 2 1 = σ Cov( β) ( XBX) ( XV X)( XBX ). Thus, standard errors for the M1S coeffcents are determned by the square root of the dagonal elements of ths estmated covarance matrx. 4.3.2 Schweppe 1-Step Estmator Another generalzed M estmator s the Schweppe-1-Step (S1S) estmator ntroduced by Coakley and Hettmansperger (1993). The focus s on the selecton of an approprate weghtng scheme. The M1S estmator s modfed by replacng the Mallows form of the altered normal equatons wth the Schweppe form of the altered normal equatons. Bascally, ths entals addng 67

a weght to the denomnator of the ψ-functon argument, whch mproves the effcency of the estmator (Coakley and Hettmansperger (1993)). The altered normal equatons for the S1S estmator are n wψ = 1 ˆ σ w r ( βˆ ) x =. These equatons are of the same form as those for BI regresson. The dfference s that the S1S method uses the same Mallows weghts that are used n the M1S method rather than the hat dagonal-based Welsch weghts that are used n BI regresson. A Gauss-Newton approxmaton usng a frst-order Taylor seres expanson about the ntal estmate ˆβ yelds a one-step mprovement of the form βˆ = βˆ + ( XBX ) XWψ ˆ σ. 1 Ths has the same form as the M1S method, and uses the same weght matrx, W, but wth changes n the defnton of the B matrx and ψ vector. Now, the Huber ψ-functon, the dagonal elements of B are B dag ψ r ( βˆ ) ˆ σ w. Usng (1) = b 1, f r ( ˆ ˆ β) cσ w, =, otherwse. The ψ vector entres are calculated as ψ c, f r( ˆ ˆ β) < cσ w, r βˆ, ( ), c, f r( ˆ ˆ β) > cσ w. ( ) ˆ ˆ = f r β cσw ˆ σ w 68

By defnng the matrx wrtten n matrx form as V r ( βˆ ) dag wψ, the estmated covarance matrx can be = ˆ σ w ˆ ˆ 2 1 2 1 = σ Cov( β) ( XBX) ( XV X)( XBX ). Standard errors for the S1S coeffcents are then determned by the square root of the dagonal elements of ths estmated covarance matrx. 4.4 Case Study: Stackloss Data In order to obtan the M1S and S1S estmates, LTS and MVE (wthout the 1-step mprovement) are frst estmated to provde a hgh breakdown startng pont and robust weghts for the mprovement step. Usng the repeated subsample (elemental set) algorthm wth 5, teratons yelds and 1 ( ) MVE Z 56.5 = 19.75 88. 99.6575 4.5274 31.894 MVE2 ( Z ) = 4.5274 38.221 23.9178, 31.894 23.9178 82.3836 as the MVE estmates defnng an ellpsod havng a (mnmum) volume of 224.3446. Correspondng to an objectve functon evaluaton of 3.1797, the LTS estmator produces the ftted equaton yˆ = 37.31+.734 x +.438 x +. x. 1 2 3 Gven these prelmnary calculatons, along wth the LTS-based scale estmate of ˆ σ = 1.793, the hgh breakdown regresson estmators M1S and S1S both yeld the equaton ŷ = 4.8148 + 1.443x +.685 x.2133 x, 1 2 3 as shown n Table 4.1, along wth ther respectve asymptotc standard errors. 69

Table 4.1: Hgh breakdown regresson for stackloss data. Parameter LTS M1S M1S s.e. S1S S1S s.e. Intercept -37.31-4.815 5.169-4.815 5.169 x 1.734 1.44.289 1.44.289 x 2.438.681.197.681.197 x. -.213.132 -.213.132 3 M1S and S1S essentally dffer n ther respectve weghtng schemes, partcularly n cases wth hgh nfluence ponts. Snce the stackloss data has no hgh nfluence ponts, just the four outlers, t s not surprsng that the two 1-step methods agree. By vewng the standard errors for the M1S and S1S estmators, t s evdent that acd concentraton ( x 3 ) s not sgnfcant n the presence of ar flow ( x 1 ) and temperature ( x 2 ). One could extend the analyss further by vewng observaton weghts, leverage weghts, plots, etc., but ths s omtted for the dscusson purposes here. 4.5 Computatonal Issues for Hgh Breakdown Regresson The man goal of hgh breakdown regresson methods lke those mentoned n ths chapter s to keep large quanttes (up to 5% of the data) of outlers from runng the analyss. Thus, outlers n the response and hgh nfluence ponts are under control, not exertng any undue nfluence on the regresson analyss. There s, however, a major problem that stll needs to be addressed. The M1S and S1S methods requre both an ntal estmate and a set of weghts n order to proceed wth the one-step mprovement. The ntal estmate s taken to be LTS, wth the weghts beng robust Mallows weghts based on the MVE estmator. Both have problems attached to them. Recall, LTS s generally not the soluton to ts objectve functon, but merely an estmate of the estmate. As ponted out by Hettmansperger and Sheather (1992), LMS (and LTS for that matter) s hghly senstve to small changes n the mddle of the regressor space. The process of repeated subsamplng can easly result n drastcally dfferent fnal estmates for LTS. Ths results from the objectve functon n queston havng many local mnma at vastly dfferent locatons. 7

In a smlar fashon, the MVE estmator s also very unstable n the sense that drastcally dfferent results are common f the repeated subsamplng algorthm were tself repeated. Thus, the robust weghts generated from ths algorthm may become very dfferent n another smulaton. Cook and Hawkns (199) dscuss ths lack of repeatablty problem when tryng to mmc the results of Rousseeuw and van Zomeren (199). Of note, for a data set havng 2 observatons and 5 regressors, t took the authors nearly 6, teratons to obtan the true MVE. Ths ndcates that determnng the number of teratons needed n subsamplng by probablstc arguments such as those gven by Rousseeuw and Leroy (1987) and Rousseeuw (1993) may fal to fnd the proper estmates. The alternatve would be to ncorporate the FSA approach n obtanng the MVE estmates, whch s much more stable n terms of the effects of random starts. Ths approach would defntely ncrease the computatonal tme requred to perform the regresson snce the ntal regresson estmator and the robust weghts are no longer calculated n a parallel fashon. Basng a method on LTS-based ntal estmate leaves the method vulnerable to an nternal breakdown. To obtan decent results wll requre an enormous amount of calculaton, not the small number of teratons (say under 1) as currently suggested (Rousseeuw and Leroy (1987), Rousseeuw (1993)). Even so, the researcher s not guaranteed to avod msleadng results. Ths extends to M1S and S1S methods as well. These methods are very relant on ther ntal estmates, and very dfferent results are a dsturbng realty. Agostnell and Markatou (1998) also offer a one-step robust regresson estmator nto the feld. In concluson, t s stressed that a current hgh breakdown estmator such as LTS may not be reproducble. Two researchers analyzng the same data by the same regresson procedure may obtan vastly dfferent results. Case studes to come n Chapter 7 show a wde dsparty of values over a small number of repeated analyses, whle extendng the dscusson to nclude M1S and S1S, and ther nherent reproducblty ssues, as well. Ths ssue s another reason as to why takng a dfferent approach n obtanng a hgh breakdown regresson estmator s justfed. 71