THE ADAPTIVE LASSSO UNDER A GENERALIZED SPARSITY CONDITION. Joel L. Horowitz Department of Economics Northwestern University Evanston, IL

Size: px

Start display at page:

Download "THE ADAPTIVE LASSSO UNDER A GENERALIZED SPARSITY CONDITION. Joel L. Horowitz Department of Economics Northwestern University Evanston, IL"

Vivian Poole
5 years ago
Views:

1 THE ADAPTIVE LASSSO UNDER A GENERALIZED SPARSITY CONDITION by Joel L. Horowitz Departmet of Ecoomic Northweter Uiverity Evato, IL 68 ad Jia Huag Departmet of Statitic ad Actuarial Sciece Uiverity of Iowa Iowa City, IA 54 October ABSTRACT We coider etimatio of a liear model i which a few coefficiet are large ad may be obect of ubtative iteret, wherea other are mall but ot ecearily zero. The umber of mall coefficiet may exceed the ample ize. It i ot kow which coefficiet are large ad which are mall. The large coefficiet ca be etimated with a maller mea-quare error if the mall coefficiet ca be idetified ad the covariate aociated with them dropped from the model. We how that the adaptive LASSO ditiguihe correctly betwee large ad mall coefficiet with probability approachig a the ample ize icreae. The reult of Mote Carlo experimet ad a empirical example how that the adaptive LASSO reduce the mea-quare error of the etimate of the large coefficiet eve i quite mall ample. Key word: Pealized regreio, high-dimeioal data, variable electio. AMS ubect claificatio: Primary 6J5, 6J7; ecodary 6E, 6F The reearch of Joel L. Horowitz wa upported i part by NSF grat SES-8755.

2 THE ADAPTIVE LASSSO UNDER A GENERALIZED SPARSITY CONDITION. Itroductio We coider the liear mea-regreio model () p Y = x β + ε ; i=,..., i i i = where Yi i the repoe variable, the x i are fixed covariate, ad the ε i are uoberved mea-zero radom variable. We aume without lo of geerality that the data are cetered, o there i o itercept i the model, ad that = for each =,..., p. The umber of x i= i covariate ( p ) may be large, poibly larger tha the ample ize ( ). Our iteret i i idetifyig ad etimatig the coefficiet β that are large i a ee that will be defied. We aume that the umber of large β i maller tha. The remaiig coefficiet are defied to be mall. Their value are mall i a ee that will be explaied, but they are ot ecearily zero. We give coditio uder which the adaptive LASSO procedure ditiguihe correctly betwee covariate with large ad mall coefficiet with probability approachig oe a. The adaptive LASSO alo provide coitet etimator of the large coefficiet. Specifically, we coider a geeralized parity coditio (GSC; Zhag ad Huag 8). Deote A {,..., p} a the et of mall coefficiet. Uder the GSC, () β η, A where { η } i a equece of o-egative cotat. Thi aumptio i weaker tha the oe commoly made i the literature, which i that (3) β = if A. Note that (3) i a pecial cae of the GSC. I practice, the GSC ca be a more realitic formulatio of pare model tha i (). Let A deote the complemet of A. We defie the elemet of A to be large coefficiet. We defie a covariate to be importat if it coefficiet i i A ad uimportat if it coefficiet i i A. A importat pecial cae occur whe the umber of large coefficiet i fixed, / η = o ( ), ad / mi A β a. I other word, the mall coefficiet are / maller tha O ( ) ad the large coefficiet are larger tha O ( / ) a. I thi cae, the mea-quare etimatio error of the large coefficiet are maller if all the uimportat

3 covariate are excluded from the model tha if ay of the uimportat covariate i icluded. Thu, if the obective i to etimate oe or more large coefficiet, it i better to drop the uimportat covariate from the model. For example, i a ecoomic wage equatio, the depedet variable i the logarithm of a idividual weekly wage, ad the obect of iteret i applicatio iclude the coefficiet of covariate uch a a idividual year of educatio, year of labor-force experiece, marital tatu, ad labor uio memberhip. However, widely available data et for etimatig wage equatio ca cotai hudred or eve thouad of variable that may be weakly related to wage. It i ot clear a priori which of thee variable hould be icluded i a wage equatio, though it i clear that icludig all of them will reult i very imprecie etimate of the coefficiet of iteret. Thi illutrate the uefule of a ytematic method for dicrimiatig betwee covariate with large ad mall coefficiet. We give coditio uder which the adaptive LASSO doe thi with probability approachig a. LASSO-type pealizatio method for model electio (Tibhirai 996) have attracted much attetio i recet year. There i alo a large literature o the ue of LASSO for the related problem of predictio (ee, e.g., Greehtei ad Ritov (4) ad Bickel, Ritov, ad Tybakov 9), but here we are cocered oly with model electio. Meihaue ad Bühlma (6) ad Zhao ad Yu (6) howed that, uder a trog irrepreetable coditio o the deig matrix, the LASSO i model-electio coitet i high-dimeioal ettig. Zhag (9) gave coditio uder which the LASSO combied with a threholdig procedure coitetly ditiguihe betwee coefficiet that are zero ad coefficiet whoe magitude a exceed for ome < /. Zou (6) gave coditio uder which the adaptive LASSO i model-electio coitet whe the umber of covariate i fixed. Huag, Ma, ad Zhag (8) provided coditio uder which the adaptive LASSO i model-electio coitet eve whe the umber of covariate i a large a ex p( ) for ome a (,). Kight ad Fu () ad Huag, Horowitz, ad Ma (8) etablihed model-electio coitecy of bridge etimator. Other pealizatio approache have bee ivetigated by Fa ad Li (); Fa ad Peg (4); Fa, Peg, ad Huag (5); Lv ad Fa (9); ad Zou ad Zhag (9). that the mall The foregoig model-electio procedure aume that the large β are o-zero ad β are exactly zero. I a recet paper, Zhag ad Huag (8) tudied the a electio propertie of the LASSO uder the GSC whe p >. They howed that the LASSO elect a model that iclude all the covariate with large coefficiet ad ha the right order of dimeioality. However, i geeral, the LASSO alo iclude ome covariate with mall

4 coefficiet. Thu, for example, the LASSO ted to elect a model that i too large whe the large coefficiet are larger ad the mall coefficiet are maller tha O ( ). Zhag (9) gave coditio uder which the LASSO combied with a threholdig procedure correctly elect coefficiet that are ufficietly far from zero. However, Zhag procedure require a uer-elected threhold, ad it i ot clear how to chooe thi threhold i applicatio. I thi paper, we how that with probability approachig oe a, the adaptive LASSO correctly ditiguihe betwee large ad mall coefficiet uder the GSC. No uerelected threhold i eeded. Sectio of thi paper decribe the adaptive LASSO. Sectio 3 preet it aymptotic propertie uder the GSC. Sectio 4 preet the reult of a Mote Carlo ivetigatio of the umerical performace of the adaptive LASSO. Sectio 5 preet a empirical example, ad Sectio 6 preet cocludig commet. The proof of theorem are i the appedix, which i Sectio 7. /. The Adaptive LASSO Defie = ( Y,..., Y ). Let y x = ( x,..., x ) deote the vector of value of the th covariate, ad let X = ( x,..., x p ) deote the deig matrix. Let β = ( β,..., β p ), ad let β deote the true but ukow value of β. The LASSO obective fuctio i p = + = L ( β; λ ).5 y Xβ λ β, where λ i the pealty parameter. The LASSO etimator i defied a β ( λ ) = argmi β ( β; λ ). L The adaptive LASSO obective fuctio i (3) p = + = L ( β; λ ).5 y Xβ λ w β, where λ i the pealty parameter. The weight w are w = β, where β i the th compoet of β ( λ ). The adaptive LASSO etimator i defied a ˆ ( ) argmi ( ; ) β λ = β L β λ. We defie w = whe β =, ad we et =. Miimizatio of (3) reult i ˆ β = if w =. Thu, if a variable i ot elected by the LASSO, it i ot elected by the adaptive LASSO. 3

5 Uder coditio that are tated i Sectio 3, the LASSO elect (aymptotically) all coefficiet that exceed a certai threhold. However, the LASSO alo ted to elect coefficiet that are below the threhold. The adaptive LASSO i a way to correct LASSO over-electio problem. 3. Aymptotic Propertie For ay A {,..., p}, let X = { x : A} ad C = X X /. Defie A = m ν = A A A A cmi ( m) = mimiν CAν, cmi ( m) = maxmaxν CAν, A = m ν = where A i the umber of elemet of A ad i the orm. We ay that the covariate matrix X atifie the pare Riez coditio (SRC) with rak q ad pectrum boud c c < * < * < if (5) c c ( q) c ( q) c* Awith A = q ad ν q. * mi max Uder (5), all the eigevalue of C are cotaied i the iterval [ c, c *] whe A q. We make the followig aumptio. A (A) The radom variable ε, ε,... are idepedetly ad idetically ditributed with mea. * There are cotat C > ad K > uch that ad i =,,... P( ε > z) Kexp( Cz ) for all i z (A) There i a fiite cotat c > uch that η c qλ /. (A3) The SRC hold. Coditio (A) require the ε i to have ubgauia tail. Coditio (A) defie the cla of mall coefficiet. Let A = { : β ( λ ) } be the et of coefficiet etimated to be o-zero by the LASSO. The followig lemma, which i proved i Zhag ad Huag (8), ummarize importat propertie of A ad β. Lemma : Let (A)-(A3) hold, ad let λ = O( log p). The there are fiite cotat M ad M uch that (i) A M q with probability approachig a. (ii) All covariate with β > M qλ /( c c* ) are elected with probability approachig a. * 4

6 (iii) p β β =O ( h ), where h = q(log p) /. Lemma how that with high probability, the umber of covariate elected by the LASSO i a fiite multiple of the umber of covariate i A (that i, of the umber of covariate with large coefficiet). Moreover, all covariate exceedig the threhold i (ii) are elected with probability approachig a are elected with probability approachig if. I particular, all of the covariate with large coefficiet η = o( q(log p)/ ) ad the large β are larger tha O( q(log p) / ). I additio, the LASSO etimator i etimatio coitet. However, etimatio coitecy doe ot imply model-electio coitecy. We ow give coditio uder which the adaptive LASSO achieve model-electio coitecy. Deote the mallet ad larget eigevalue of repectively. Make the followig additioal aumptio. C = X X by τ ad τ, A A A / (A4) There are cotat < τ τ < uch that τ τ τ τ for all ufficietly large. (A5) atify Let b = mi β. A, the otochatic quatitie q, η,, λ, ad A λ q q( h + η) η( h + η) + +. b b λ h I a pare model, A i much maller tha ad may eve be fixed a, o it i reaoable to aume i (A4) that the eigevalue of (A5) retrict q, η, too mall ad that the b C A are bouded away from ad. λ,ad. It require, the mallet of the large coefficiet, to be ot b b orm of the mall coefficiet i ot too large. I particular, it require b qη. I other word, there mut be eough eparatio betwee the large ad mall coefficiet for the adaptive LASSO to ditiguih betwee them. Now defie ˆ β = { ˆ β : A ad A } βa = { β : A }. For ay vector u = ( u, u,...), defie g( u) = (g( u),g( u ),...), where g( u ) =,, or accordig to whether u <, u =, or u >. Theorem : Let (A)-(A5) hold. The a, P( ˆ β = A ) ad P(g( ˆ β ) = g( β )). A A 5

7 Thu, with probability approachig a, the adaptive LASSO elect all the covariate with large coefficiet ad drop the covariate with mall coefficiet i the ee that it et the coefficiet of thoe covariate equal to zero. 3. A importat pecial cae I the ocial ciece, it i ot uuual for a urvey data et to cotai thouad of obervatio ad hudred or thouad of variable that are arguably related to the depedet variable of iteret i the ee of havig o-zero β coefficiet i (). However, i typical applicatio, mot of thee coefficiet are thought to be mall i the ee of havig magitude ad effect o the depedet variable that are maller tha the radom amplig error of their etimate. The large coefficiet are typically few i umber relative to the ample ize. If, a ofte happe, the total umber of covariate i le tha the ample ize, thi ettig ca be repreeted by a model i which p ad q are fixed a, the mall / coefficiet atify η = o ( ), ad the large coefficiet atify b κ (log ) / a for ome cotat κ >. It follow from Theorem with λ log that a, the adaptive LASSO etimate of the large coefficiet are o-zero ad the etimate of the mall coefficiet are zero. Moreover, a traightforward calculatio how that the mea-quare error (MSE) of the adaptive LASSO etimator of each large coefficiet i ever larger ad, except i pecial cae, i trictly maller tha the MSE of the ordiary leat quare (OLS) etimator that i obtaied whe all covariate are icluded i (). Thu, the adaptive LASSO improve the preciio of the etimate of the large coefficiet. If p > but the umber of large coefficiet i fixed at q, the we ca coider a model i which the mall coefficiet atify η = O[/ log p], the large coefficiet atify / b κ (log p) / for ome cotat κ >, ad λ log p. The it follow agai from Theorem that a, the adaptive LASSO etimate of the large coefficiet are o-zero ad the etimate of the mall coefficiet are zero. Moreover, the MSE of the adaptive LASSO etimator of each large coefficiet i o larger ad, except i pecial cae, i trictly maller tha MSE of the OLS etimator that i obtaied by icludig i the model ay group of up to q uimportat covariate or liear combiatio of uimportat covariate. I ummary, the adaptive LASSO etimator reduce the MSE of the etimator of ay large coefficiet if there i ufficiet eparatio betwee the magitude of the large ad mall coefficiet. 6

8 4. Mote Carlo Experimet Thi ectio report the reult of a Mote Carlo ivetigatio of the fiite-ample performace of the LASSO ad adaptive LASSO whe the mall coefficiet are ot ecearily zero. We write model () i the form d Y = β X + β X + ε ; i=,..., i i i i = = d+ p, where β,..., β d are large coefficiet ad the coefficiet βd+,..., β p are mall or zero. The radom variable ε i are idepedetly ditributed a repeated ample ad are cetered ad caled o that ε N(, σ ). The covariate are fixed i i= X i = ; X = ; i=,..., i= The covariate are geerated a follow. Defie / ρ ξi = ζi + νi; i=,..., ; =,..., p/ ρ / i ρ ξi = ζi + νi; i=,..., ; = p/ +,..., p, ρ where the ζ i ad ν i are idepedetly ditributed a (,) ad ρ, ρ <. Defie N The = i; = ( i i= i= ξ ξ ξ ). X i ξi ξ =. ξ Moreover, ρ if k, p/ corr( Xi, Xik ) = ρ if p / <, k < p / ( ρρ ) if p/< k p I the experimet reported here, if d β =.5 if d + p/ p/+ p 7

9 I additio, =, p = 5, σ ε =, ρ =.5, ad ρ =.. The coefficiet of iteret i β. Experimet are reported with pealizatio parameter i obtaied by miimizig the BIC. d =, 4, ad 6 ad with the LASSO ad adaptive LASSO. The Table how the mea-quare error of the etimate of β obtaied from applyig OLS to the full model ad to the model cotaiig oly the variable whoe coefficiet are large (the reduced model). Thee reult are obtaied aalytically uig the algebra of leat quare. They how that the mea-quare error i maller whe β i etimated from the reduced model tha whe it i etimated from the full model. Table how the reult of etimatio uig the LASSO ad adaptive LASSO. There are Mote Carlo replicatio i each experimet. Both verio of the LASSO reduce the mea-quare etimatio error by about a factor of two relative to OLS etimatio with the full model. Not urpriigly, either verio achieve the mea-quare error that i achievable whe the variable with large coefficiet are kow. The model elected by the LASSO ha a higher probability of cotaiig all the importat covariate tha doe the model elected by the adaptive LASSO. If β i the coefficiet of iteret, it i reaoable to coider verio of the LASSO ad adaptive LASSO i which X i ( i=,,., ) i alway i the choe model. Thi ca be achieved by leavig β out of the pealty fuctio. Table 3 how the reult of LASSO ad adaptive LASSO etimatio with β ot i the pealty fuctio. Forcig X i ito the model greatly reduce the mea-quare error of the adaptive LASSO etimator of β. It i eetially the ame a the mea-quare error that i obtaied by applyig OLS to the model with oly the covariate with large coefficiet. 5. A Empirical Example Thi ectio preet a empirical example that illutrate the applicatio of the LASSO ad adaptive LASSO i a ettig where may coefficiet are plauibly mall but o-zero. The applicatio coit of etimatig a wage equatio for black male aged 4-49 year who reide i the ortheater U.S. The data are from the Natioal Logitudial Survey of Youth. There are 6 obervatio. The depedet variable i the logarithm of the hourly wage. There are 4 covariate, icludig core o ectio of the armed force qualificatio examiatio, idicator of educatio level, a variety of peroal characteritic, a biary idicator of marital tatu (married or ot), ad a biary idicator of memberhip i a labor uio. The variable of iteret i thi example are marital tatu ad uio memberhip. Their coefficiet meaure the 8

10 fractioal chage i the wage aociated with beig married or belogig to a labor uio. It i arguable that all of the covariate affect productivity ad, therefore, the hourly wage but that the effect of may covariate may be mall. Applicatio of the LASSO ad adaptive LASSO uig the BIC to elect the pealty parameter reulted i electio of 7 ad 4 covariate, repectively. A aymptotic chi-quare tet doe ot reect the hypothee that the coefficiet of the variable ot elected by the LASSO or adaptive LASSO are zero ( p >.6 ). Thi implie that the value of thee coefficiet are mall eough to be withi radom amplig error of zero. They are ot ecearily equal to zero. Table 4 how the etimate ad aymptotic tadard error of the two coefficiet of iteret that are obtaied from applyig ordiary leat quare to the full model (all 4 covariate), the model elected by the LASSO, ad the model elected by the adaptive LASSO. The three poit etimate of the coefficiet of labor uio memberhip are imilar, but the tadard error of the etimate obtaied from the full model i early twice a large a the tadard error obtaied from the model elected by the LASSO ad adaptive LASSO. The etimate of the coefficiet of marital tatu obtaied from the model elected by the LASSO ad adaptive LASSO are early 4 time a large a the etimate obtaied from the full model, ad the tadard error of the etimate obtaied from the elected model are about 55% of the tadard error obtaied with the full model. 6. Cocluio I applicatio of regreio aalyi, it i ofte the cae that there are may covariate whoe coefficiet are thought to be mall but ot ecearily zero ad relatively few covariate with large coefficiet. I uch ituatio, the preciio of etimatig the large coefficiet ca be icreaed by leavig the covariate with mall coefficiet out of the model. However, it i rarely kow a priori which coefficiet are large ad which are mall. Thi paper ha give coditio uder which the adaptive LASSO correctly ditiguihe betwee large coefficiet ad coefficiet that are mall but ot ecearily zero. Specifically, we have how that uder a geeralized parity coditio ad other mild regularity coditio, the adaptive LASSO correctly ditiguihe betwee large ad mall coefficiet with probability approachig oe a the ample ize icreae. 9

11 7. Proof of Theorem Let ψ ( v) = exp( v ). The ψ -Orlicz orm x ψ of ay radom variable x i defied a x = if{ C > : Eψ ( x / C) }. The Orlicz orm i ueful for obtaiig maximal ψ iequalitie (Va der Vaart ad Weller 996). Lemma : Suppoe that ε,..., ε are iid radom variable with Eε = ad i i Var( ε ) = σ. Suppoe that P( ε > z) Kexp( Cz ) for i=,..., ad cotat C ad K. i The, for all cotat a i atifyig a i= i =, (6) where K / / aiεi K[ σ + ( + K) C ], i= ψ i a cotat. Coequetly (7) g( t) up P aiεi > t exp( t / ) M a a = i= for ome cotat M that deped oly o K ad C. Proof of Theorem : By the Karuh-Kuh-Tucker coditio, uique adaptive LASSO etimator if ˆ β = ( ˆ β,..., ˆ β ) p i the (8) x ( ˆ ) ˆ ˆ y Xβ = λw g( β) if β ( ˆ ) ˆ x y Xβ λw if β = ad the vector { x, ˆ β ) are liearly idepedet. Let = { w g( β ): A } ad A ˆ A = ( X A XA ) ( XA y- λ A ) β A CA XA XA XA A (9) = β + ( ε + β ), λ A where C = X X A A A /. If g( ˆ β ) = g( β ), the (8) hold for ˆ β = ( ˆ β, ), where A A A A A i a vector of zero with legth A. Let how that P[g( ˆ β ) = g( β )]. A A * β = ( β, ). To prove the theorem, it uffice to A A Sice Xβ = X ˆA β for thi ˆ β ad { x : A } are liearly idepedet, ˆ A

12 () ˆ * g( β ) g( ) if g( ˆ βa ) = g( βa ) = β ( ˆ x y β ) λ X A A w A. Let A A A H = I X C X / idetity matrix. From (9), we have () be the proectio oto the ull of y X ˆ = H + X C + H X A. A β A ε λ A A A A β X A, where I i the By () ad (), g( ˆ β ) = g( β ) if * () g( )( ˆ β β β ) < β A x ( Hε + λxa CA A + HXA βa ) λw A. Thu, by (9) ad (), for ay < κ < κ + ν < P{g( ˆ β ) g( β )} * e A A A βa β P { C X X /3 for ome A } e A Aε β + P { C X /3 for ome A} λ e A A β + P { C / 3 for ome A} + P{ x H ε λ w / 3 for ome A } + P { A A A x X C w / 3 for ome A } + P{ x H X β λ w / 3 for ome A } A A PB ( ) + PB ( ) + PB ( ) + PB ( ) + PB ( ) + PB ( ), where e i the uit vector i the directio of the th coordiate. Coider B. Becaue e A A A βa e A A A βa C X X C X X / / / A η τη C, we have PB ( ) by (A5).

13 Now coider B. Becaue / / e A A A ( ) C X C τ / ad β b for A, = A Aε β PB ( ) P ( e C X /3 A ) / qg [ b ( τ ) /3] with the tail probability g () t i Lemma. Therefore, PB ( ) by (A), Lemma, (A4) ad (A5). Now / A = Op q b [ /( )]. Therefore, by (A5), λ λq λ C O o b ). Thi give PB ( ). A e A () ( A = p = p τ τb 3 For B 4, we have w = β Op( h ) + η. Sice x H /, for large C / 4 ε λ η PB ( ) P{ x H (/3) /[ C ( h + )] A } + o() / λ η qg{(/ 3) /[ C ( h + )]}. Therefore, by Lemma ad (A5), PB ( ). For B 5 we have 4 XA CA A x w x A A A β X C Therefore, Therefore, PB ( ) 5 / / q τ [ Op( h) + η]. τ b by (A5). Fially, for B 6 we have x H X β x X β η. A A A A x HXA β A η β η[ Op( h) + η]. w Therefore, PB ( ) 6 by (A5). Thi complete the proof.

14 REFERENCES Bickel, P.J., Y. Ritov, A.B. Tybakov (9). Simultaeou aalyi of Lao ad Datzig elector, Aal of Statitic. 37, Fa, J. ad R. Li (). Variable electio via ococave pealized likelihood ad it oracle propertie, Joural of the America Statitical Aociatio, 96, Fa, J. ad H. Peg (4). Nococave pealized likelihood with a divergig umber of parameter, Aal of Statitic, 3, Fa, J., H. Peg, ad T. Huag (5). Semiliear high-dimeioal model for ormalizatio of microarray data, Joural of the America Statitical Aociatio,, Greehtei, E. ad Y. Ritov (4). Peritece i high-dimeioal liear predictor electio ad the virtue of overparameterizatio, Beroulli,, Huag, J., J.L. Horowitz, ad S. Ma (8). Aymptotic propertie of bridge etimator i pare high-dimeioal regreio model, Aal of Statitic, 36, Huag, J., S. Ma, ad C.-H. Zhag (8). Adaptive Lao for pare high-dimeioal regreio model, Statitica Siica, 8, Kight, K. ad W.J. Fu (). Aymptotic for lao-type etimator, Aal of Statitic, 8, Lv, J. ad Y. Fa (9). A uified approach to model electio ad pare recovery uig regularized leat quare, Aal of Statitic, 37, Meihaue, N. ad P. Bühlma (6). High dimeioal graph ad variable electio with the Lao, Aal of Statitic, 34, Tibhirai, R. (996). Regreio hrikage ad electio via the Lao, Joural of the Royal Statitical Society Serie B, 58, Va der Vaart, A.W. ad J.A. Weller (996). Weak Covergece ad Empirical Procee, New York: Spriger-Verlag. Zhag, C.-H. ad J. Huag (8). The parity ad bia of the Lao electio i highdimeioal liear model, Aal of Statitic, 36, Zhag, T. (9). Some harp performace boud for leat quare regreio with pealizatio, Aal of Statitic, 37, L Zhao, P. ad B. Yu (6). O model electio coitecy of LASSO, Joural of Machie Learig Reearch, 7, Zou, H. (6). The adaptive Lao ad it oracle propertie, Joural of the America Statitical Aociatio,,

15 Zou, H. ad H.H. Zhag (9). O the adaptive elatic-et with a divergig umber of parameter, Aal of Statitic, 37,

16 TABLE : Mea Square Error of OLS Etimate of β from Full ad Reduced Model Mea-Square Error of Etimate of β d Full Model Reduced Model

17 TABLE : Reult of LASSO ad Adaptive LASSO Etimatio Prob. that Selected Average Size of Model Cotai d MSE of ˆ β Selected Model Large Variable Average λ LASSO ADAPTIVE LASSO

18 TABLE 3: Reult of LASSO ad Adaptive LASSO Etimatio with β Not i the Pealty Fuctio d Prob. that Selected Average Size of Model Cotai MSE of ˆ β Selected Model Large Variable Average λ LASSO ADAPTIVE LASSO

19 TABLE 4: Reult of Etimatig Effect of Uio Memberhip ad Marital Statu o Wage Coefficiet (Stadard Error) Obtaied from Variable OLS LASSO Adaptive LASSO Uio... Member (.7) (.96) (.94) Marital.5.9. Statu (.9) (.) (.) 8

10-716: Advanced Machine Learning Spring Lecture 13: March 5

10-716: Advanced Machine Learning Spring Lecture 13: March 5 10-716: Advaced Machie Learig Sprig 019 Lecture 13: March 5 Lecturer: Pradeep Ravikumar Scribe: Charvi Ratogi, Hele Zhou, Nicholay opi Note: Lae template courtey of UC Berkeley EECS dept. Diclaimer: hee