Robust Estimator with the SCAD Function in Penalized Linear Regression

Size: px

Start display at page:

Download "Robust Estimator with the SCAD Function in Penalized Linear Regression"

Elinor Griffith
6 years ago
Views:

1 The SIJ Trasactios o Computer Sciece Egieerig & its Applicatios (CSEA), Vol. 2, No. 4, Jue 24 Robust Estimator with the SCAD Fuctio i Pealize Liear Regressio Kag-Mo Jug* *Professor, Departmet of Statistics a Computer Sciece, Kusa Natioal Uiversity, Kusa, Chobuk, SOUTH KOREA. kmjug@kusa.ac.kr Abstract Pealize regressio proceures have recetly see a lot of attetio, because it ca yiel both estimatio a variable selectio simultaeously. There are icreasig applicatios i bioiformatics research which treats large umber of variables. However, their performace ca be severely ifluece by outliers i either the respose or the covariate space. This paper proposes a weighte regressio estimator with the Smoothly Clippe Absolute Deviatio (SCAD) fuctio, because it has two avatages, sparse system a ubiaseess for large coefficiets. It eals with robust variable selectio a robust estimatio. We evelop a uifie algorithm for the propose estimator icluig the SCAD estimate a the tuig parameter, base o the Local Quaratic Approximatio (LQA) a the Local Liear Approximatio (LLA) of the o-covex SCAD pealty fuctio. We compare the robustess of the propose algorithm with other pealize regressio estimators. Numerical simulatio results show that the propose estimator is effective to aalyze a cotamiate ata. Keywors Liear Regressio; Local Liear Approximatio; Local Quaratic Approximatio; Robust; Pealize Fuctio; Smoothly Clippe Absolute Deviatio; Tuig Parameter; Weight. Abbreviatios Akaike Iformatio Criterio (AIC); Bayesia Iformatio Criterio (BIC); Least Absolute Deviatio (LAD); LAD a the L -type pealty (LAD-L); LAD a the SCAD pealty (LAD-SCAD); Least Absolute Shrikage a Selectio Operator (LASSO); Local Liear Approximatio (LLA); Local Quaratic Approximatio (LQA); Least Squares Estimator (LSE); Smoothly Clippe Absolute Deviatio (SCAD); the Weighte LAD a the SCAD pealty (WLAD-SCAD). I. INTRODUCTION RECENT ays we ca obtai easily large sample high imesioal ata sets from cheap sesors. However the growth of variables ca prevet us from costructig a parsimoious moel which provies goo iterpretatio about the system. Oe importat stream of statistical research requires effective variable selectio proceures to improve both accuracy a iterpretability of the learig techique [Kitter, 986]. Variable selectio is a importat research topic i liear regressio especially for moel selectio i high-imesioal ata situatio [Jug, 28]. Tibshirai (996) propose the least absolute shrikage a selectio operator (LASSO), which ca simultaeously select valuable covariates a estimate regressio parameters. Traitioal moel selectio criteria such as Akaike Iformatio Criterio (AIC) [Akaike, 973] a Bayesia Iformatio Criterio (BIC) [Schwarz, 978] have major rawbacks that parameter estimatio a moel selectio are two separate processes. The LASSO is a regularisatio with l -type pealizatio a it becomes extremely popular, because it shriks the regressio coefficiets towar zero with the possibility of settig some coefficiets equal to zero, resultig i a simultaeous estimatio a variable selectio. May literatures show successful applicatios usig the LASSO. However, the LASSO ca be biase for the coefficiets whose absolute values are large. Fa & Li (2) propose a pealize regressio with the Smoothly Clippe Absolute Deviatio (SCAD) pealty fuctio a showe that it has better theoretical properties tha the LASSO with L -type pealty. The pealize regressio with SCAD ot oly selects importat covariates cosistetly but also prouces parameter estimators as efficiet as if the true moel were kow, i.e., the oracle property. The LASSO oes ot satisfy the oracle property. The SCAD fuctio is a o-covex pealty fuctio to make up for the eficiecies of the LASSO. The pealize regressio cosists of a loss fuctio a a pealty fuctio. I traitioal regressio settig it is well kow that the least squares metho is sesitive to eve sigle outlier. Alterative to the least squares metho is the Least Absolute Deviatio (LAD) metho. Wag et al., (27) evelope a robust algorithm with the LAD loss fuctio a L -type pealty (LAD-L). They showe that the LAD-L is resistat to o-ormal error istributios a outliers. Jug (27, 22) propose a robust metho with the LAD loss fuctio a the SCAD pealty (LAD-SCAD) estimate. It shows that the SCAD pealty fuctio is more efficiet tha the LAD-L. ISSN: Publishe by The Staar Iteratioal Jourals (The SIJ) 56

2 The SIJ Trasactios o Computer Sciece Egieerig & its Applicatios (CSEA), Vol. 2, No. 4, Jue 24 Recetly statisticias ofte treat ata sets with a oormal respose variable or covariates that may cotai multiple outliers or leverage poits. Eve though the LAD is more robust tha the least squares metho, the uboue loss fuctio of the LAD affects strogly the LAD estimator. I this paper we cosier a weight metho for the boue loss fuctio. The weight i our algorithm atteuates the ifluece of outliers o the estimator a works the same ifluece of o-outliers as the u-weighte LAD-SCAD metho. The weighte LAD loss fuctio with the SCAD pealty fuctio (WLAD-SCAD) improves the error performace of the LAD-SCAD. The propose metho combies the robustess of the weighte LAD a the oracle property of the SCAD pealty. The tuig parameter cotrols the moel complexity a plays a importat role i the variable selectio proceure. We propose a BIC-type tuig parameter selector usig a ata-rive metho, the WLAD- SCAD with the BIC tuig parameter ca ietify the most parsimoious correct moel. The paper is orgaize as follows. Sectio 2 escribes the previously relate works. Sectio 3 provies our propose algorithm of the weighte LAD with the SCAD pealty fuctio. Sice the SCAD fuctio is ot covex, we use a approximatio algorithm to solve the o-ifferetiable a o-covex objective fuctio i pealize regressio with the SCAD pealty. We use approximatio methos such as the Local Quaratic Approximatio (LQA) a the Local Liear Approximatio (LLA) to solve the optimizatio problem for the o-covex SCAD pealty fuctio. We provie two results, the solver of a Newto-Raphso solutio a the LAD criterio. Sectio 4 illustrates simulatio results. It shows that the propose algorithm has superior to other methos from the view of robustess a parsimoious moel. II. RELATED WORKS Cosier the liear regressio moel y i = α + x T i β + ε i, i =,,, where x i = (x i,, x i ) T is the -imesioal covariate, β = (β,, β ) T, is the umber of covariates a is the umber of observatios. Let y = (y,, y ) T a let X be a T ( + ) matrix whose i-th row, x i. The Least Squares Estimator (LSE) (X T X) X T y which miimizes the sum of squares of resiuals ca be istorte by the heay-taile probability istributio of errors or eve sigle outlier [Rousseeuw & Leroy, 987]. The criterio ca be writte by ( y i α + x T i β ) 2. Oe alterative to the least squares metho is the LAD metho which miimises the criterio fuctio, the sum of absolute eviatios of the errors y i ( α + x T i β). The major avatage of the LAD metho lies i its robustess relative to the LSE. The LAD estimates are less affecte by the presece of a few outliers or ifluetial observatios. However both the LSE a the LAD caot be useful for moel selectio whe especially the umber of covariates is very large. Tibshirai (996) propose a pealty base o the L orm for automatically eletig uecessary covariates. The LASSO criterio is a simply pealize least squares with the L pealty ( y i α + x T i β ) 2 + λ, () where λ > is the tuig parameter which cotrols the traeoff betwee moel fittig a moel sparsity. Whe the tuig parameter is large, the criterio focuses o moel sparsity. Traitioally i moel selectio, crossvaliatio a iformatio criteria-icluig the AIC [Akaike, 973] a BIC [Schwarz, 978]-are wiely applie. Shao (997) showe that the BIC ca ietify the true moel cosistetly i liear regressio with fixe imesioal covariates, but the AIC may fail ue to over-fittig. Yag (25) showe that cross-valiatio is asymptotically equivalet to the AIC [Yag, 25] a so they behave similarly. Leg et al., (26) showe that the LASSO is ot asymptotically cosistet a so the LASSO ca be biase for large coefficiets. Fa & Li (2) aresse this problem a propose the SCAD pealty fuctio. They escribe the coitios of a goo pealty fuctio (a) ubiaseess: the resultig estimator is early ubiase whe the true ukow parameter is large; (b) sparsity: the resultig estimator is a thresholig rule, which automatically sets small estimate coefficiets to be zero; (c) cotiuity: the resultig estimator is cotiuous i the ata. The LSE with the SCAD pealty fuctio miimizes the criterio fuctio ( y i α + x T i β ) 2 + p λ ( ), (2) where λ β, if β < λ a 2 λ 2 ( β aλ) 2, p λ ( β ) = 2(a ) if λ β < aλ 2 (a + )2 λ 2, a so its erivative becomes if β > aλ λ, if β < λ p λ β = aλ β, a if λ β < aλ, if β > aλ where a ca be chose usig cross-valiatio or geeralize cross-valiatio. However, the simulatio of Fa & Li (2) gives us a = 3.7 which is approximately optimal. I this article we set a = 3.7. Similar to that the LAD is more robust tha the LSE i o-pealize regressio moel, Wag et al., (27) propose the LAD with the SCAD pealty which miimises the criterio fuctio ISSN: Publishe by The Staar Iteratioal Jourals (The SIJ) 57

3 The SIJ Trasactios o Computer Sciece Egieerig & its Applicatios (CSEA), Vol. 2, No. 4, Jue 24 y i α + x T i β + p λ ( ). (3) Eve though the LAD is robust, its breakow poit is also / which is equivalet to the LSE. Jug (22) propose the weighte LAD with the SCAD pealty, because the simulatio results of Giloi et al., (26) show that i o-pealize liear regressio the performace of the weighte LAD estimator is competitive with that of high breakow regressio estimators, particularly i the presece of outliers locate at leverage poits. Jug (2) propose a weighte LAD pealize estimator which combies the weighte LAD estimator with the L pealty. Jug (22) use the criterio fuctio y i α + x T i β + p λ ( ), (4) where the weight epes o the space of covariates. We call the solutio of (4) the weighte LAD with the SCAD (WLAD-SCAD). The propose metho uses the weight for robustess which is resistat to leverage poits or ifluetial observatios, because the weight reuces the effects of the observatios havig large eviatios or the leverage poits. The objective fuctio will give the robustess of weight methos a the avatages of the SCAD pealize fuctio. III. METHODS The solutio of (4) ca be obtaie by a staar optimizatio program if the criterio fuctio is covex a ifferetiable. Ufortuately the absolute fuctio i (4) is ot ifferetiable at zero a the SCAD pealty fuctio is ot covex i β. Approximatio of the absolute fuctio a the SCAD fuctio trasforms the objective fuctio ito liear equatios a so we ca obtai efficietly a iterative solutio of the WLAD-SCAD estimator. The absolute fuctio u u2 ear u gives + 2 u 2 u y i α + x i T β (y i α + x i T β ) 2 2 y i α + x i T β + 2 y i α + x i T β, for ozero u for iitial values α, β ear the miimisatio of (3). Also the Taylor expasio at the o-zero yiels p λ p λ + p λ β 2 2 j, (6) 2 a we set = if is ear zero. Assume that the loglikelihoo fuctio is smooth with respect to β a its first two partial erivatives are cotiuous. The liear quaratic approximatio (LQA) is a moificatio of the Newto- Raphso algorithm [Fa & Li, 2]. The LQA is broaly useful for solvig optimizatio problems with the oifferetiable criterio fuctio. The the criterio fuctio (4) becomes (5) (y i α + x i T β ) 2 y i α + x i T β + [ p λ y i α + x i T β + p λ β 2 2 j ]. 2 a we obtai the criterio fuctio with up to costats (y X β)t W y X β + 2 βt Qβ, (8) where iag( p λ β = (α, β T ) T, W = iag y i α +x i T β (7), Q = ) a X is the ( + ) ata matrix whose first colum vector is the vector of oes with legth. Thus the Newto-Raphso solutio yiels the iterative solutio of (8) β (l+) = (X T W l X + Q (l) ) X T W l y, (9) for the lth solutio β (l), the matrix Q l a the lth weight matrix W l. Whe W = I a Q l = Q, the solutio (2) reuces to the LSE-SCAD estimate which is the solutio of the criterio (3). For the case the lth weight matrix W l = iag iag( p λ y i α +x i T β a the matrix Q l = ), the solutio of (9) becomes the LAD- SCAD. A for the lth weight matrix W l = iag iag( p λ y i α +x i T β a the matrix Q l = ), the solutio of (9) is calle the WLAD- SCAD. I Sectio 4 the iteratio stops whe the maximum ifferece amog betwee the previous a the curret solutio is less tha 4. We will provie aother approach to solve the miimizatio of the criterio fuctio (4). By the Taylor expasio of p λ ( ) we obtai the relatioship as follows p λ p λ + p λ, for β () j. The costat terms i () ca ot affect the fuctio of (3) or (4) a so they ca be elimiate. The criterio fuctio (4) ca be writte by y i α + x T i β + p λ ( β j ), () whose regularizatio part is the liearize SCAD pealty fuctio [Zou & Li, 28]. It is a local liear approximatio (LLA) metho for the SCAD pealty fuctio. Computatioally, it is very easy to fi the solutio of (). Specifically, we ca costruct the augmete ata set { y i, x i } as ISSN: Publishe by The Staar Iteratioal Jourals (The SIJ) 58

4 The SIJ Trasactios o Computer Sciece Egieerig & its Applicatios (CSEA), Vol. 2, No. 4, Jue 24 y i, x i = y i, x i for i =,,, p λ β i e i for i = +,, + p, where e j is the uit vector havig except the jth elemet oe [Wag et al., 27]. The the WLAD-SCAD estimator of () ca be obtaie by miimisig the criterio fuctio +p y T i (α + x i β). (2) Cosequetly we ca use a staar LAD program (the fuctio rq i R program) without computatioal effort. To fi a goo tuig parameter is a importat proceure i pealize estimatio methos. Fa & Li (2) fou the values of tuig parameters by optimizig the performace via cross-valiatio a geeralize cross valiatio. Zou (26) i LASSO use the tuig parameters set by the reciprocal of the absolute value of the LSE. Wag et al., (27) use the tuig parameter by miimisig a BICtype criterio fuctio. I this paper we propose the tuig parameter which ca be obtaie by miimizig (y i α + x T i β λ ) 2 / GCV λ = ( f(λ)/) 2, Where f λ = tr[ X(X T X + Q) X T ], X is the ata matrix excluig the costat term, a Q = iag( p λ β λj ). For liear moels, the geeralize cross-valiatio is asymptotically equivalet to the Mallows C p, AIC a leaveoe-out cross valiatio [Hastie et al., 2]. IV. SIMULATION RESULTS This sectio emostrates simulatios i various situatios to show the robustess of the metho propose i Sectio III. We umerically compare the propose metho WLAD- SCAD estimates with the LASSO estimates [Thibshirai, 996], the LSE with the SCAD pealize fuctio [Fa & Li, 2] a the LAD-SCAD estimate [Jug, 27]. All simulatios are performe by R program. We cosier the liear regressio moel y i = x T i β + σε i, i =,,, Where β T = (3,,,, 2,,) a ε i are staar ormally istribute. The covariate vector follows the multivariate ormal istributio with the zero mea vector a the correlatio matrix whose (j, k)th elemet is.5 j k for j, k =,,7. This is a similar to Tibshirai (996) a Fa & Li (2). We set several situatios. The sample sizes are give by = 2, 4,. Two ifferet values for the error staar eviatio σ are give by σ =, 2. The error istributios are the staar ormal istributio, the ouble expoetial istributio a the t istributio with egrees of 2. The last two istributios have thick-taile probability istributio fuctios. To show the robustess of our propose algorithm we cosier the cotamiate ata with leverage poits about 2%. β λj The simulatio ata cosist of the traiig ata a iepeet test ata. The regressio coefficiets ca be obtaie o the traiig ata, a the performace is evaluate o the test ata of sample size geeratig from the ormal istributio with the above efie matrix. We coucte 2 simulatio iteratios to evaluate the performace of the fitess of the algorithm, which ca be calculate by the Average of the Mea Absolute Deviatios () o the test ata. The performace of the sparseess of the algorithm ca be evaluate by the average umber of correctly estimate zero coefficiets which is the colum labelle i the tables. Aalogously the colum labelle Icorrect eotes the umber of zero estimates which are ot zero coefficiets, which meas the iaccuracy for each algorithm. The umber i the parethesis is the sample staar eviatio for each algorithm. Thus the moel is best i case the lesser of the a the Icorrect terms. For the true moel the umber of o-zero coefficiets is 5, a thus the term shoul be close to 5 if the algorithm is compatible. Table summarize the simulatio results for the ata without outliers. We cosiere three error moels, a summarize oly the results for the ouble expoetial error istributio. It shows that eve though LASSO is the best moel i the sparseess of the moel, it has large moel errors. As expecte the staar eviatio of the values become larger whe σ becomes larger. Whe, the values of the WLAD-SCAD is little large tha that of the LASSO. It meas that the propose estimator yiels a sparse moel whe the error istributio is ot ormal. Table : Simulatio Results with Double Expoetial Errors without Outliers DE LAD Methos LASSO SCAD WLAD Errors -SCAD -SCAD (.94) (.85) (.77) (.78) = σ = (.43) (.66) (.76) (.75).... Icorrect (.) (.) (.) (.) (.354) (.346) (.364) (.393) = (.846) (.845) (.95) (.838) Icorrect (.549) (.538) (.57) (.54) Table 2 summarize the simulatio results for the staar ormal errors a 2% leverage poits. Seeig term implies that i moel sparsity the WLAD- SCAD is the best amog the total estimators. The ifferece betwee the WLAD-SCAD a other estimator is meaigful eve the icorrectess of the propose metho is somewhat large. Table 2: Simulatio Results with the Normal Errors with 2% Outliers Normal Methos LASSO SCAD LAD- WLAD ISSN: Publishe by The Staar Iteratioal Jourals (The SIJ) 59

5 The SIJ Trasactios o Computer Sciece Egieerig & its Applicatios (CSEA), Vol. 2, No. 4, Jue 24 Errors SCAD -SCAD (.25) (.88) (.239) (.29) = σ = (.48) (.842) (.966) (.929) Icorrect (.749) (.6) (.77) (.742) (.226) (.98) (.257) (.25) = (.45) (.837) (.995) (.868) Icorrect (.83) (.639) (.724) (.728) Table 3 summarize the simulatio results for the ouble expoetial error istributio a 2% leverage poits. The WLAD-SCAD is the best algorithm amog 4 estimators i the view of variable selectio, because the value of the WLAD-SCAD is the closest to 5 for all estimators. Table 3 shows that i moel error the WLAD-SCAD is the most efficiet estimatio regarless of outliers a the sprea of errors a the sample size, sice we cosiere the estimatio metho which reuces the ifluece of leverage poits. Table 3: Simulatio Results with the Double Expoetial Errors with 2% Outliers Normal LAD- WLAD Methos LASSO SCAD Errors SCAD -SCAD (.225) (.26) (.278) (.22) = σ = (.68) (.86) (.97) (.878) Icorrect (.77) (.598) (.642) (.73) (.) (.98) (.29) (.29) = σ = (.58) (.824) (.884) (.582) Icorrect (.657) (.532) (.63) (.5) (.272) (.255) (.34) (.88) = (.7) (.85) (.) (.77) Icorrect (.82) (.636) (.742) (.628) V. CONCLUSION I this paper we propose a robust algorithm for the pealize regressio moel with the LAD loss fuctio a the SCAD pealty fuctio. We use a weight fuctio for the robust loss fuctio a improve the effectiveess of the propose algorithm through umerical simulatios. We erive two approximatios for objective fuctios to treat the o-covex optimizatio problem. Oe is LLA a the other is LQA. Sice the former is liear, it is so easy to implemet it. Both two methos are robust to outliers or ifluetial observatios. The umerical simulatios show that the propose metho is more robust tha other methos from the poit view of fiig exact o-zero coefficiets. Thus the propose metho gives a metho of variable selectio from thousas of iput variables appeare i the applicatios of biometrical experimets. For the further stuy we will cosier the Huber fuctio for the loss fuctio with the SCAD pealty fuctio. ACKNOWLEDGEMENTS This research was supporte by Basic Sciece Research Program through the Natioal Research Fouatio of Korea (NRF) fue by the Miistry of Eucatio, Sciece a Techology (NRF-22RAA4A4594). REFERENCES [] H. Akaike (973), Iformatio Theory a a Extesio of the Maximum Likelihoo Priciple, Proceeigs of the Seco Iteratioal Symposium o Iformatio Theory, Eitors: B.N. Petrov & F. Csàki, Akaemiai Kiao, Buapest. [2] G. Schwarz (978), Estimatig the Dimesio of a Moel, Aals of Statistics, Vol. 6, Pp [3] J. Kitter (986), Feature Selectio a Extractio, Habook of Patter Recogitio a Image Processig, Eitors: T.Y. Youg a K.-S. Fu, Acaemic Press, New York. [4] P.J. Rousseeuw & A.M. Leroy (987), Robust Regressio a Outlier Detectio, Joh Wiley, New York. [5] R. Tibshirai (996), Regressio Shrikage a Selectio via the LASSO, Joural of the Royal Statistical Society, Series B, Vol. 58, Pp [6] J. Shao (997), A Asymptotic Theory for Liear Moel Selectio, Statistical Siica, Vol. 7, Pp [7] T. Hastie, R. Tibshirai & J. Friema (2), The Elemets of Statistical Learig: Data Miig, Iferece, a Preictio, Spriger, New York. [8] J. Fa & R. Li (2), Variable Selectio via Nococave Pealize Likelihoo a its Oracle Properties, Joural of the America Statistical Associatio, Vol. 96, Pp [9] Y. Yag (25), Ca the Stregths of AIC a BIC be Share? A Coflict betwee Moel Ietificatio a Regressio Estimatio, Biometrika, Vol. 92, Pp [] H. Zou (26), The Aaptive Lasso a its Oracle Properties, Joural of the America Statistical Associatio, Vol., Pp [] C. Leg, Y. Li & G. Wahba (26), A Note o the LASSO a Relate Proceures i Moel Selectio, Statistical Siica, Vol. 6, Pp [2] A. Giloi, J.S. Simooff & B. Segupta (26), Robust Weighte LAD Regressio, Computatioal Statistics a Data Aalysis, Vol. 5, Pp [3] H. Wag, G. Li & G. Jiag (27), Robust Regressio Shrikage a Cosistet Variable Selectio through the LAD- Lasso, Joural of Busiess & Ecoomic Statistics, Vol. 25, Pp [4] K.-M. Jug (27), A Robust Estimator i Rige Regressio, Joural of the Korea Data Aalysis Society, Vol. 9, Pp [5] K.-M. Jug (28), Robust Statistical Methos i Variable Selectio, Joural of the Korea Data Aalysis Society, Vol., Pp [6] H. Zou & R. Li (28), Oe-step Sparse Estimates i Nococave Pealize Likelihoo Moels, Aals of Statistics, Vol. 36, Pp [7] K.-M. Jug (2), Weighte Least Absolute Deviatio Lasso Estimator, Commuicatios of the Korea Statistical Society, Vol. 8, Pp [8] K.-M. Jug (22), Weighte Least Absolute Deviatio Regressio Estimator with the SCAD Fuctio, Joural of the Korea Data Aalysis Society, Vol. 4, Pp ISSN: Publishe by The Staar Iteratioal Jourals (The SIJ) 6

Robust Algorithm for Multiclass Weighted Support Vector Machine

he SIJ rasactios o Iustrial Fiacial & Busiess Maagemet (IFBM) Vol. 2 No. 3 May 24 Robust Algorithm for Multiclass Weighte Support Vector Machie ag-mo Jug* *Professor Departmet of Statistics a Computer