Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

Proceedngs of he weny-second Inernaonal Jon Conference on Arfcal Inellgence l, -Norm Regularzed Dscrmnave Feaure Selecon for Unsupervsed Learnng Y Yang, Heng ao Shen, Zhgang Ma, Z Huang, Xaofang Zhou School of Informaon echnology & Elecrcal Engneerng, he Unversy of Queensland. Deparmen of Informaon Engneerng & Compuer Scence, Unversy of reno. yangy zju@yahoo.com.cn, shenh@ee.uq.edu.au, ma@ds.unn., {huang, zxf}@ee.uq.edu.au. Absrac Compared wh supervsed learnng for feaure selecon, s much more dffcul o selec he dscrmnave feaures n unsupervsed learnng due o he lack of label nformaon. radonal unsupervsed feaure selecon algorhms usually selec he feaures whch bes preserve he daa dsrbuon, e.g., manfold srucure, of he whole feaure se. Under he assumpon ha he class label of npu daa can be predced by a lnear classfer, we ncorporae dscrmnave analyss and l, -norm mnmzaon no a jon framework for unsupervsed feaure selecon. Dfferen from exsng unsupervsed feaure selecon algorhms, our algorhm selecs he mos dscrmnave feaure subse from he whole feaure se n bach mode. Exensve expermen on dfferen daa ypes demonsraes he effecveness of our algorhm. Inroducon In many areas, such as compuer vson, paern recognon and bologcal sudy, daa are represened by hgh dmensonal feaure vecors. Feaure selecon ams o selec a subse of feaures from he hgh dmensonal feaure se for a compac and accurae daa represenaon. I has wofold role n mprovng he performance for daa analyss. Frs, he dmenson of seleced feaure subse s much lower, makng he subsequenal compuaon on he npu daa more effcen. Second, he nosy feaures are elmnaed for a beer daa represenaon, resulng n a more accurae cluserng and classfcaon resul. Durng recen years, feaure selecon has araced much research aenon. Several new feaure selecon algorhms have been proposed wh a varey of applcaons. Feaure selecon algorhms can be roughly classfed no wo groups,.e., supervsed feaure selecon and unsupervsed feaure selecon. Supervsed feaure selecon algorhms, e.g., Fsher score [Duda e al., ], robus regresson [Ne e al., ], sparse mul-oupu regresson [Zhao e al., ] and race rao [Ne e al., 8], usually selec feaures accordng o labels of he ranng daa. Because dscrmnave nformaon s enclosed n labels, supervsed feaure selecon s usually able o selec dscrmnave feaures. In unsupervsed scenaros, however, here s no label nformaon drecly avalable, makng much more dffcul o selec he dscrmnave feaures. A frequenly used creron n unsupervsed learnng s o selec he feaures whch bes preserve he daa smlary or manfold srucure derved from he whole feaure se [He e al., 5; Zhao and Lu, 7; Ca e al., ]. However, dscrmnave nformaon s negleced hough has been demonsraed mporan n daa analyss [Fukunaga, 99]. Mos of he radonal supervsed and unsupervsed feaure selecon algorhms evaluae he mporance of each feaure ndvdually [Duda e al., ; He e al., 5; Zhao and Lu, 7] and selec feaures one by one. A lmaon s ha he correlaon among feaures s negleced [Zhao e al., ; Ca e al., ]. More recenly, researchers have appled he wo-sep approach,.e., specral regresson, o supervsed and unsupervsed feaure selecon [Zhao e al., ; Ca e al., ]. hese effors have shown ha s a beer way o evaluae he mporance of he seleced feaures jonly. In hs paper, we propose a new unsupervsed feaure selecon algorhm by smulaneously explong dscrmnave nformaon and feaure correlaons. Because we ulze local dscrmnave nformaon, he manfold srucure s consdered oo. Whle [Zhao e al., ; Ca e al., ] also selec feaures n bach mode, our algorhm s a one-sep approach and s able o selec he dscrmnave feaures for unsupervsed learnng. We also propose an effcen algorhm o opmze he problem. he Objecve Funcon In hs secon, we gve he objecve funcon of he proposed Unsupervsed Dscrmnave Feaure Selecon (UDFS algorhm. Laer n he nex secon, we propose an effcen algorhm o opmze he objecve funcon. I s worh menonng ha UDFS ams o selec he mos dscrmnave feaures for daa represenaon, where manfold srucure s consdered, makng dfferen from he exsng unsupervsed feaure selecon algorhms. Denoe X = {x,x,..., x n } as he ranng se, where R d ( n s he -h daum and n s he oal x 589

number of ranng daa. In hs paper, I s deny marx. For a consan m, m R m s a column vecor wh all of s elemens beng and H m = I m m m R m m.foran arbrary marx A R r p, s l, -norm s defned as A, = r = p j= A j. ( Suppose he n ranng daa x,x,..., x n are sampled from c classes and here are n samples n he -h class. We defne y {, } c ( n as he label vecor of x. he j-h elemen of y s f x belongs o he j-h class, and oherwse. Y =[y,y,..., y n ] {, } n c s he label marx. he oal scaer marx S and beween class scaer marx S b are defned as follows [Fukunaga, 99]. n S = (x μ(x μ = X X ( = S b = c n (μ μ(μ μ = XGG X (3 = where μ s he mean of all samples, μ s he mean of samples n he -h class, n s he number of samples n he -h class, X = XH n s he daa marx afer beng cenered, and G = [G,..., G n ] = Y (Y Y / s he scaled label marx. A well-known mehod o ulze dscrmnave nformaon s o fnd a low dmensonal subspace n whch S b s maxmzed whle S s mnmzed [Fukunaga, 99]. Recenly, some researchers proposed wo dfferen new algorhms o explo local dscrmnave nformaon [Sugyama, 6; Yang e al., b] for classfcaon and mage cluserng, demonsrang ha local dscrmnave nformaon s more mporan han global one. Inspred by hs, for each daa pon x, we consruc a local se N k (x comprsng x and s k neares neghbors x,..., x k. Denoe X =[x,x,..., x k ] as he local daa marx. Smlar o ( and (3, he local oal scaer marx S ( and beween class scaer marx S ( b of N k (x are defned as follows. S ( = X X ; (4 S ( b = X G ( G X (, (5 where X = X H k+ and G ( =[G,G,..., G k ]. For he ease of represenaon, we defne he selecon marx S {, } n (k+ as follows. { f p = F {q}; (S pq = (6 oherwse, where F = {,,..., k }. In hs paper, remans unclear how o defne G because we are focusng on unsupervsed learnng where here s no label nformaon avalable. In order o make use of local dscrmnave nformaon, we assume here s a lnear classfer W R d c whch classfes each daa pon o a class,.e., G = W x. Noe ha G,G,..., G k are seleced from G,.e., G ( = S G.hen we have G ( =[G,G,..., G k ] = S G = S X W. (7 I s worh nong ha he proposed algorhm s an unsupervsed one. In oher words, G defned n (7 s he oupu of he algorhm,.e., G = W x, bu no provded by he human supervsors. If some rows of W shrnk o zero, W can be regarded as he combnaon coeffcens for dfferen feaures ha bes predc he class labels of he ranng daa. Nex, we gve he approach whch learns a dscrmnave W for feaure selecon. Inspred by [Fukunaga, 99; Yang e al., b], we defne he local dscrmnave score DS of x as [ ] DS = r (S ( + λi S ( b = r [G X ] ( ( X X + λi XG ( (8 [ ] = r W XS X ( X X + λi XS X W, where λ s a parameer and λi s added o make he erm ( X X + λi nverble. Clearly, a larger DS ndcaes ha W has a hgher dscrmnave ably w.r.. he daum x.we nend o ran a W correspondng o he hghes dscrmnave scores for all he ranng daa x,..., x n. herefore we propose o mnmze (9 for feaure selecon. n } {r[g ( H k+g ( ] DS + W, (9 = Consderng ha he daa number n each local se s usually small, G ( H k+g ( s added n (9 o avod overfng. he regularzaon erm W, conrols he capacy of W and also ensures ha W s sparse n rows, makng parcularly suable for feaure selecon. Subsung DS n (9 by (8, he objecve funcon of our UDSF s gven by n mn r{w XS H k+ S W X W W =I = ( [W XS X ( X X + λi XS X W ]} + W, where he orhogonal consran s mposed o avod arbrary scalng and avod he rval soluon of all zeros. Noe ha he frs erm of ( s equvalen o he followng : n r{w X{ [S (H k+ X ( X X + λi X S ]}X W } = Meanwhle we have H k+ X ( X X + λi X =H k+ H k+ X ( X X + λi XH k+ =H k+ H k+ ( X X + λi ( X X + λi X ( X X + λi XH k+ =H k+ H k+ ( X X + λi X XH k+ =H k+ H k+ ( X X + λi ( X X + λi λih k+ =λh k+ ( X X + λi H k+ herefore, he objecve funcon of UDFS s rewren as mn r(w MW+ W, ( W W =I I can be also nerpreed n regresson vew [Yang e al., a]. 59

where M = X [ n = ( S H k+ ( X X + λi H k+ S ] X ( Denoe w as he -h row of W,.e., W =[w,...w d ],he objecve funcon shown n ( can be also wren as mn r(w MW+ d w. (3 W W =I = We can see ha many rows of he opmal W correspondng o (3 shrnk o zeros. Consequenly, for a daum x, x = W x s a new represenaon of x usng only a small se of seleced feaures. Alernavely, we can rank each feaure f d = accordng o w n descendng order and selec op ranked feaures. Opmzaon of UDFS Algorhm he l, -norm mnmzaon problem has been suded n several prevous works, such as [Argyrou e al., 8; Ne e al., ; Oboznsk e al., 8; Lu e al., 9; Zhao e al., ; Yang e al., ]. However, remans unclear how o drecly apply he exsng algorhms o opmzng our objecve funcon, where he orhogonal consran W W = I s mposed. In hs secon, nspred by [Ne e al., ], we gve a new approach o solve he opmzaon problem shown n ( for feaure selecon. We frs descrbe he dealed approach of UDFS algorhm n Algorhm as follows. Algorhm : he UDFS algorhm. for =o n do B =( X X + λi 3 M = S H k+ B H k+ S ( ; n 4 M = X M X ; = 5 Se =and nalze D R d d as an deny marx; 6 repea 7 P = M + D ; 8 W =[p,..., p c ] where p,..., p c are he egenvecors of P correspondng o he frs c smalles egenvalues; 9 Updae he dagonal marx D + as D + = w... w d ; = +; unl Convergence; Sor each feaure f d = accordng o w n descendng order and selec he op ranked ones. Below, we brefly analyze Algorhm- proposed n hs secon. From lne o lne 4, compues M defned n Usually, many rows of he opmal W are close o zeros. (. From lne 6 o lne, opmzes he objecve funcon shown n (3. Nex, we verfy ha he proposed erave approach,.e., lne 6 o lne n Algorhm, converges o he opmal W correspondng o (3. We begn wh he followng wo Lemmas. Lemma. For any wo non-zero consans a and b, he followng nequaly holds [Ne e al., ]. a a b b b b. (4 P roof. he dealed proof s smlar as ha n [Ne e al., ]. Lemma. he followng nequaly holds provded ha v r = are non-zero vecors, where r s an arbrary number [Ne e al., ]. v+ v + v v (5 v v P roof. Subsung a and b n (4 by v + and v respecvely, we can see ha he followng nequaly holds for any. v v + + v v v (6 v Summng (6 over, can be seen ha (5 holds. Nex, we show ha he erave algorhm shown n Algorhm- converges by he followng heorem. heorem. he erave approach n Algorhm (lne 6 o lne monooncally decreases he objecve funcon value of mn r(w MW+ d W = w W =I n each eraon 3. P roof. Accordng o he defnon of W n lne 8 of Algorhm, we can see ha herefore, we have W = arg mn r[w (M + D W ] (7 W W =I r[w (M + D W ] r[w (M + D W ] r(w MW + w w r(wmw + w w 3 When compung D +, s dagonal elemen d =. w I s worhy nong ha n pracce, w could be very close o zero bu no zero. However, w can be zero heorecally. In hs case, we can follow he radonal regularzaon way and defne d =,whereς s a very small consan. When ς w +ς s easy o see ha w approxmaes. +ς w 59

hen we have he followng nequaly r(w MW + w ( r(w MW + ( w w w Meanwhle, accordng o Lemma, w w w we have he followng nequaly: r(w + AW ++ w + w w. r(w AW + w w w w w herefore, w, whch ndcaes ha he objecve funcon value of mn r(w MW+ d W = w W =I monooncally decreases usng he updang rule n Algorhm. Accordng o heorem, we can see ha he erave approach n Algorhm converges o he opmal W correspondngo (3. Because k s much smaller han n, he me complexy of compung M defned n ( s abou O(n. o opmze he objecve funcon of UDFS, he mos me consumng operaon s o perform egen-decomposon of P. Noe ha P R d d. he me complexy of hs operaon s O(d 3 approxmaely. Expermens In hs secon, we es he performance of UDFS proposed n hs paper. Followng [He e al., 5; Ca e al., ], we es he performance of he proposed algorhm n erms of cluserng. Expermen Seup In our expermen, we have colleced a dversy of 6 publc daases o compare he performance of dfferen unsupervsed feaure selecon algorhms. hese daases nclude hree face mage daases,.e., UMIS 4, FERE 5 and YALEB [Georghades e al., ], one ga mage daase,.e., USF HumanID [Sarkar e al., 5], one spoken leer recognon daa,.e., Isole 6 and one hand wren dg mage daase,.e., USPS [Hull, 994]. Dealed nformaon of he sx daases s summarzed n able. We compare UDFS proposed n hs paper wh he followng unsupervsed feaure selecon algorhms. 4 hp://mages.ee.ums.ac.uk/danny/daabase.hml 5 hp://www.frv.org/fere/defaul.hm 6 hp://www.cs.uc.edu/ mlearn/mlsummary.hml able : Daabase Descrpon. Daase Sze # of Feaures # of Classes UMIS 575 644 FERE 4 96 YALEB 44 4 38 USF HumanID 5795 86 Isole 56 67 6 USPS 998 56 All Feaures whch adops all he feaures for cluserng. I s used as he baselne mehod n hs paper. Max Varance whch selecs he feaures correspondng o he maxmum varances. Laplacan Score [He e al., 5] whch selecs he feaures mos conssen wh he Gaussan Laplacan marx. Feaure Rankng [Zhao and Lu, 7] whch selecs feaures usng specral regresson. Mul-Cluser Feaure Selecon (MCFS [Ca e al., ] whch selecs feaures usng specral regresson wh l -norm regularzaon. For LS, MCFS and UDFS, we fx k, whch specfes he sze of neghborhood, a 5 for all he daases. For LS and FR, we need o une he bandwdh parameer for Gaussan kernel, and for MCFS and UDFS we need o une he regularzaon parameer. o farly compare dfferen unsupervsed feaure selecon algorhms, we une hese parameers from { 9, 6, 3,, 3, 6, 9 }. We se he number of seleced feaures as {5,, 5,, 5, 3} for he frs fve daases. Because he oal feaure number of USPS s 56, we se he number of seleced feaures as {5, 8,, 4, 7, } for hs daase. We repor he bes resuls of all he algorhms usng dfferen parameers. In our expermen, each feaure selecon algorhm s frs performed o selec feaures. hen K-means cluserng algorhm s performed based on he seleced feaures. Because he resuls of K-means cluserng depend on nalzaon, s repeaed mes wh random nalzaons. We repor he average resuls wh sandard devaon (sd. wo evaluaon mercs,.e., Accuracy (ACC and Normalzed Muual Informaon (NMI, are used as evaluaon mercs n hs paper. Denoe q as he cluserng resuls and p as he ground ruh label of x. ACC s defned as follows. n = ACC = δ(p,map(q (8 n where δ(x, y = f x = y; δ(x, y = oherwse, and map(q s he bes mappng funcon ha permues cluserng labels o mach he ground ruh labels usng he Kuhn- Munkres algorhm. A larger ACC ndcaes beer performance. Gven wo varables P and Q, NMI s defned n (9. I(P, Q NMI(P, Q =, (9 H(P H(Q where I(P, Q s he muual nformaon beween P and Q, and H(P and H(Q are he enropes of P and Q [Srehl and 59

able : Cluserng Resuls (ACC% ± sd of Dfferen Feaure Selecon Algorhms All Feaures Max Varance Laplacan Score Feaure Rankng MCFS UDFS UMIS 4.9 ± 3. 46. ±.3 46.3 ± 3.3 48. ± 3.7 46.5 ± 3.5 49. ± 3.8 FERE. ±.5. ±.3.4 ±.5.8 ±.5 5. ±.7 6. ±.6 YALEB. ±.6 9.6 ±.3.4 ±.6 3.3 ±.8.4 ±. 4.7 ±.6 USF HumanID 3. ±.6.9 ±.5 8.8 ±.3. ±. 3. ±.6 4.6 ±.8 Isole 57.8 ± 4. 56.6 ±.6 56.9 ±.9 57. ±.9 6. ± 4.4 66. ± 3.6 USPS 6.9 ± 4.3 63.4 ± 3. 63.5 ± 3. 63.6 ± 3. 65.3 ± 5.4 65.8 ± 3.3 able 3: Cluserng Resuls (NMI% ± sd of Dfferen Feaure Selecon Algorhms All Feaures Max Varance Laplacan Score Feaure Rankng MCFS UDFS UMIS 6.9 ±.4 63.6 ±.8 65. ±. 64.9 ±.6 65.9 ±.3 66.3 ±. FERE 6.7 ±.4 6.3 ±.4 63. ±.3 63.3 ±.5 64.8 ±.5 65.6 ±.4 YALEB 4. ±.7 3. ±.4 8.4 ±..3 ±.9 8.8 ±. 5.4 ±.9 USF HumanID 5.9 ±.4 49. ±.4 47.5 ±. 9.3 ±.3 5.6 ±.4 5.6 ±.5 Isole 74. ±.8 73. ±. 7. ±. 7.5 ±.7 75.5 ±.8 78. ±.3 USPS 59. ±.5 59.6 ±. 6. ±.3 59.6 ±. 6. ±.7 6.6 ±.5 Ghosh, ]. Denoe l as he number of daa n he cluser C l ( l c accordng o cluserng resuls and h be he number of daa n he h-h ground ruh class ( h c. NMIsdefnedasfollows[Srehl and Ghosh, ]: c c l= h= NMI = l,h log( n l,h l h (c l= l log ( l c, ( n h= h log hn where l,h s he number of samples ha are n he nersecon beween he cluser C l and he h-h ground ruh class. Agan, a larger NMI ndcaes a beer cluserng resul. Expermenal Resuls and Dscusson Frs, we compare he performance of dfferen feaure selecon algorhms. he expermen resuls are shown n able and able 3. We can see from he wo ables ha he cluserng resuls of All Feaures are beer han hose of Max Varance. However, because he feaure number s sgnfcanly reduced by performng Max Varance for feaure selecon, resulng n he subsequenal operaon, e.g., cluserng, faser. herefore, s more effcen. he resuls from oher feaure selecon algorhms are generally beer han All Feaures and also more effcen. Excep for Max Varance, all of he oher feaure selecon algorhms are non-lnear approaches. We conclude ha local srucure s crucal for feaure selecon n many applcaons, whch s conssen wh prevous work on feaure selecon [He e al., 5]. We can also see from he wo ables ha MCFS gans he second bes performance. Boh Feaure Rankng [Zhao and Lu, 7] and MCFS [Ca e al., ] adop a wo-sep approach,.e, specral regresson, for feaure selecon. he dfference s ha Feaure Rankng analyzes feaures separaely and selecs feaures one afer anoher bu MCFS selecs feaures n bachmode. hs observaon valdaes ha s a beer way o analyze daa feaures jonly for feaure selecon. Fnally, we observe ha he UDFS algorhm proposed n hs paper obans he bes performance. here are wo man reasons for hs. Frs, UDFS analyzes feaures jonly. Second, UDFS smulaneously ulzes dscrmnave nformaon and local srucure of daa dsrbuon. Nex, we sudy he performance varaon of UDFS wh respec o he regularzaon parameer n ( and he number of seleced feaures. Due o he space lm, we use he hree face mage daases as examples. he expermenal resuls are shown n Fg.. We can see from Fg. ha he performance s no very sensve o as long as s smaller han. However, he performance s comparavely sensve o he number of seleced feaures. How o decde he number of seleced feaures s daa dependen and sll an open problem. Concluson Whle has been shown n many prevous works ha dscrmnave nformaon s benefcal o many applcaons, s no ha sraghforward o ulze n unsupervsed learnng due o he lack of label nformaon. In hs paper, we have proposed a new unsupervsed feaure selecon algorhm whch s able o selec dscrmnave feaures n bach mode. An effcen algorhm s proposed o opmze he l, -norm regularzed mnmzaon problem wh orhogonal consran. Dfferen from exsng algorhms whch selec he feaures whch bes preserve daa srucure of he whole feaure se, UDFS proposed n hs paper s able o selec dscrmnave feaure for unsupervsed learnng. We show ha s a beer way o selec dscrmnave feaures for daa represenaon and UDFS ouperforms he exsng unsupervsed feaure selecon algorhms. Acknowledgemen hs work s suppored by ARC DP94678 and parally suppored by he FP7-IP GLOCAL european projec. 593

.8.4..6.3.5 ACC.4 ACC. ACC....5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 (a UMIS-ACC (b FERE-ACC (c YALEB-ACC.8.8.4.6.6.3 NMI.4 NMI.4 NMI.... ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 (d UMIS-NMI (e FERE-NMI (f YALEB-NMI Fgure : Performance varaon of UDFS w.r. dfferen parameers. References [Oboznsk e al., 8] G. Oboznsk, M. J. Wanwrgh, [Argyrou e al., 8] Andreas Argyrou, heodoros Evgenou, Massmlano Ponl, Andreas Argyrou, heodoros n mulvarae regresson. In NIPS, 8. and M. I. Jordan. Hghdmensonal unon suppor recovery Evgenou, and Massmlano Ponl. Convex mul-ask [Sarkar e al., 5] S. Sarkar, P.J. Phllps, Z. Lu, I.R. Vega, feaure learnng. In Machne Learnng, 8. P. Groher, and K.W. Bowyer. he humand ga challenge problem: daa ses, performance, and analyss. IEEE [Ca e al., ] Deng Ca, Chyuan Zhang, and Xaofe He. Unsupervsed feaure selecon for mul-cluser daa. PAMI, pages 7(:6 77, 5. KDD,. [Srehl and Ghosh, ] A. Srehl and J. Ghosh. Cluser [Duda e al., ] R.O. Duda, P.E. Har, and D.G. Sork. ensembles a knowledge reuse framework for combnng Paern Classfcaon (nd Edon. John Wley & Sons, mulple parons. Journal of Machne Learnng Research, 3:583 67,. New York, USA,. [Fukunaga, 99] K. Fukunaga. Inroducon o sascal [Sugyama, 6] Masash Sugyama. Local fsher dscrmnan analyss for supervsed dmensonaly reducon. In paern recognon (nd Edon. Academc Press Professonal, Inc, San Dego, USA, 99. ICML, 6. [Georghades e al., ] A. Georghades, P. Belhumeur, [Yang e al., a] Y Yang, Fepng Ne, Shmng Xang, and D. Kregman. From few o many: Illumnaon cone Yueng Zhuang, and Wenhua Wang. Local and global models for face recognon under varable lghng and regressve mappng for manfold learnng wh ou-ofsample exrapolaon. In AAAI,. pose. IEEE PAMI, page 3(6:64366,. [He e al., 5] Xaofe He, Deng Ca, and Parha Nyog. [Yang e al., b] Y Yang, Dong Xu, Fepng Ne, Laplacan score for feaure selecon. NIPS, 5. Shucheng Yan, and Yueng Zhuang. Image cluserng usng local dscrmnan models and global negraon. IEEE [Hull, 994] J.J. Hull. A daabase for handwren ex recognon research. IEEE ransacons on Paern Analyss IP, pages 9(76 773,. and Machne Inellgence, pages 6(5:55 554, 994. [Yang e al., ] Yang Yang, Y Yang, Z Huang, Heng ao Shen, and Fepng Ne. ag localzaon wh [Lu e al., 9] Jun Lu, Shuwang J, and Jepng Ye. spaal correlaons and jon group sparsy. In CVPR, Mul-ask feaure learnng va effcen l, -norm mnmzaon. In UAI, 9. pages 88 888,. [Zhao and Lu, 7] Zheng Zhao and Huan Lu. Specral [Ne e al., 8] Fepng Ne, Shmng Xang, Yangqng Ja, feaure selecon for supervsed and unsupervsed learnng. Changshu Zhang, and Shucheng Yan. race rao creron for feaure selecon. In AAAI, 8. In ICML, 7. [Zhao e al., ] Z. Zhao, L. Wang, and H. Lu. Effcen [Ne e al., ] Fepng Ne, Heng Huang, Xao Ca, and specral feaure selecon wh mnmum redundancy. In Chrs Dng. Effcen and robus feaure selecon va jon AAAI,. l, -norms mnmzaon. In NIPS,. 594