Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

Similar documents
Robust and Accurate Cancer Classification with Gene Expression Profiling

An introduction to Support Vector Machine

Advanced Machine Learning & Perception

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Clustering (Bishop ch 9)

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Machine Learning Linear Regression

Machine Learning 2nd Edition

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Variants of Pegasos. December 11, 2009

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Robustness Experiments with Two Variance Components

Introduction to Boosting

Lecture 6: Learning for Control (Generalised Linear Regression)

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

CHAPTER 10: LINEAR DISCRIMINATION

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Lecture VI Regression

Solution in semi infinite diffusion couples (error function analysis)

Lecture 11 SVM cont

Robust Principal Component Analysis with Non-Greedy l 1 -Norm Maximization

( ) [ ] MAP Decision Rule

FACIAL IMAGE FEATURE EXTRACTION USING SUPPORT VECTOR MACHINES

Detection of Waving Hands from Images Using Time Series of Intensity Values

Cubic Bezier Homotopy Function for Solving Exponential Equations

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Computing Relevance, Similarity: The Vector Space Model

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

Math 128b Project. Jude Yuen

On One Analytic Method of. Constructing Program Controls

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

Fall 2010 Graduate Course on Dynamic Learning

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation

CHAPTER 5: MULTIVARIATE METHODS

MANY real-world applications (e.g. production

Video-Based Face Recognition Using Adaptive Hidden Markov Models

Approximation Lasso Methods for Language Modeling

CS286.2 Lecture 14: Quantum de Finetti Theorems II

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Volatility Interpolation

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

( ) () we define the interaction representation by the unitary transformation () = ()

A Novel Object Detection Method Using Gaussian Mixture Codebook Model of RGB-D Information

WiH Wei He

Comparison of Differences between Power Means 1

Panel Data Regression Models

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

Li An-Ping. Beijing , P.R.China

An Effective TCM-KNN Scheme for High-Speed Network Anomaly Detection

TSS = SST + SSE An orthogonal partition of the total SS

Robustness of DEWMA versus EWMA Control Charts to Non-Normal Processes

January Examinations 2012

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Boosted LMS-based Piecewise Linear Adaptive Filters

FTCS Solution to the Heat Equation

P R = P 0. The system is shown on the next figure:

Tight results for Next Fit and Worst Fit with resource augmentation

Comb Filters. Comb Filters

Pattern Classification (III) & Pattern Verification

Notes on the stability of dynamic systems and the use of Eigen Values.

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

Department of Economics University of Toronto

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

CHAPTER 2: Supervised Learning

Normal Random Variable and its discriminant functions

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

Mechanics Physics 151

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems

Clustering with Gaussian Mixtures

Constrained-Storage Variable-Branch Neural Tree for. Classification

The preemptive resource-constrained project scheduling problem subject to due dates and preemption penalties: An integer programming approach

Mechanics Physics 151

Predicting and Preventing Emerging Outbreaks of Crime

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

A NOVEL NETWORK METHOD DESIGNING MULTIRATE FILTER BANKS AND WAVELETS

Author s Accepted Manuscript

Relative controllability of nonlinear systems with delays in control

Joint Channel Estimation and Resource Allocation for MIMO Systems Part I: Single-User Analysis

Bayesian Inference of the GARCH model with Rational Errors

Using Fuzzy Pattern Recognition to Detect Unknown Malicious Executables Code

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

Improved Classification Based on Predictive Association Rules

Bernoulli process with 282 ky periodicity is detected in the R-N reversals of the earth s magnetic field

Fast Space varying Convolution, Fast Matrix Vector Multiplication,

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

Transcription:

Proceedngs of he weny-second Inernaonal Jon Conference on Arfcal Inellgence l, -Norm Regularzed Dscrmnave Feaure Selecon for Unsupervsed Learnng Y Yang, Heng ao Shen, Zhgang Ma, Z Huang, Xaofang Zhou School of Informaon echnology & Elecrcal Engneerng, he Unversy of Queensland. Deparmen of Informaon Engneerng & Compuer Scence, Unversy of reno. yangy zju@yahoo.com.cn, shenh@ee.uq.edu.au, ma@ds.unn., {huang, zxf}@ee.uq.edu.au. Absrac Compared wh supervsed learnng for feaure selecon, s much more dffcul o selec he dscrmnave feaures n unsupervsed learnng due o he lack of label nformaon. radonal unsupervsed feaure selecon algorhms usually selec he feaures whch bes preserve he daa dsrbuon, e.g., manfold srucure, of he whole feaure se. Under he assumpon ha he class label of npu daa can be predced by a lnear classfer, we ncorporae dscrmnave analyss and l, -norm mnmzaon no a jon framework for unsupervsed feaure selecon. Dfferen from exsng unsupervsed feaure selecon algorhms, our algorhm selecs he mos dscrmnave feaure subse from he whole feaure se n bach mode. Exensve expermen on dfferen daa ypes demonsraes he effecveness of our algorhm. Inroducon In many areas, such as compuer vson, paern recognon and bologcal sudy, daa are represened by hgh dmensonal feaure vecors. Feaure selecon ams o selec a subse of feaures from he hgh dmensonal feaure se for a compac and accurae daa represenaon. I has wofold role n mprovng he performance for daa analyss. Frs, he dmenson of seleced feaure subse s much lower, makng he subsequenal compuaon on he npu daa more effcen. Second, he nosy feaures are elmnaed for a beer daa represenaon, resulng n a more accurae cluserng and classfcaon resul. Durng recen years, feaure selecon has araced much research aenon. Several new feaure selecon algorhms have been proposed wh a varey of applcaons. Feaure selecon algorhms can be roughly classfed no wo groups,.e., supervsed feaure selecon and unsupervsed feaure selecon. Supervsed feaure selecon algorhms, e.g., Fsher score [Duda e al., ], robus regresson [Ne e al., ], sparse mul-oupu regresson [Zhao e al., ] and race rao [Ne e al., 8], usually selec feaures accordng o labels of he ranng daa. Because dscrmnave nformaon s enclosed n labels, supervsed feaure selecon s usually able o selec dscrmnave feaures. In unsupervsed scenaros, however, here s no label nformaon drecly avalable, makng much more dffcul o selec he dscrmnave feaures. A frequenly used creron n unsupervsed learnng s o selec he feaures whch bes preserve he daa smlary or manfold srucure derved from he whole feaure se [He e al., 5; Zhao and Lu, 7; Ca e al., ]. However, dscrmnave nformaon s negleced hough has been demonsraed mporan n daa analyss [Fukunaga, 99]. Mos of he radonal supervsed and unsupervsed feaure selecon algorhms evaluae he mporance of each feaure ndvdually [Duda e al., ; He e al., 5; Zhao and Lu, 7] and selec feaures one by one. A lmaon s ha he correlaon among feaures s negleced [Zhao e al., ; Ca e al., ]. More recenly, researchers have appled he wo-sep approach,.e., specral regresson, o supervsed and unsupervsed feaure selecon [Zhao e al., ; Ca e al., ]. hese effors have shown ha s a beer way o evaluae he mporance of he seleced feaures jonly. In hs paper, we propose a new unsupervsed feaure selecon algorhm by smulaneously explong dscrmnave nformaon and feaure correlaons. Because we ulze local dscrmnave nformaon, he manfold srucure s consdered oo. Whle [Zhao e al., ; Ca e al., ] also selec feaures n bach mode, our algorhm s a one-sep approach and s able o selec he dscrmnave feaures for unsupervsed learnng. We also propose an effcen algorhm o opmze he problem. he Objecve Funcon In hs secon, we gve he objecve funcon of he proposed Unsupervsed Dscrmnave Feaure Selecon (UDFS algorhm. Laer n he nex secon, we propose an effcen algorhm o opmze he objecve funcon. I s worh menonng ha UDFS ams o selec he mos dscrmnave feaures for daa represenaon, where manfold srucure s consdered, makng dfferen from he exsng unsupervsed feaure selecon algorhms. Denoe X = {x,x,..., x n } as he ranng se, where R d ( n s he -h daum and n s he oal x 589

number of ranng daa. In hs paper, I s deny marx. For a consan m, m R m s a column vecor wh all of s elemens beng and H m = I m m m R m m.foran arbrary marx A R r p, s l, -norm s defned as A, = r = p j= A j. ( Suppose he n ranng daa x,x,..., x n are sampled from c classes and here are n samples n he -h class. We defne y {, } c ( n as he label vecor of x. he j-h elemen of y s f x belongs o he j-h class, and oherwse. Y =[y,y,..., y n ] {, } n c s he label marx. he oal scaer marx S and beween class scaer marx S b are defned as follows [Fukunaga, 99]. n S = (x μ(x μ = X X ( = S b = c n (μ μ(μ μ = XGG X (3 = where μ s he mean of all samples, μ s he mean of samples n he -h class, n s he number of samples n he -h class, X = XH n s he daa marx afer beng cenered, and G = [G,..., G n ] = Y (Y Y / s he scaled label marx. A well-known mehod o ulze dscrmnave nformaon s o fnd a low dmensonal subspace n whch S b s maxmzed whle S s mnmzed [Fukunaga, 99]. Recenly, some researchers proposed wo dfferen new algorhms o explo local dscrmnave nformaon [Sugyama, 6; Yang e al., b] for classfcaon and mage cluserng, demonsrang ha local dscrmnave nformaon s more mporan han global one. Inspred by hs, for each daa pon x, we consruc a local se N k (x comprsng x and s k neares neghbors x,..., x k. Denoe X =[x,x,..., x k ] as he local daa marx. Smlar o ( and (3, he local oal scaer marx S ( and beween class scaer marx S ( b of N k (x are defned as follows. S ( = X X ; (4 S ( b = X G ( G X (, (5 where X = X H k+ and G ( =[G,G,..., G k ]. For he ease of represenaon, we defne he selecon marx S {, } n (k+ as follows. { f p = F {q}; (S pq = (6 oherwse, where F = {,,..., k }. In hs paper, remans unclear how o defne G because we are focusng on unsupervsed learnng where here s no label nformaon avalable. In order o make use of local dscrmnave nformaon, we assume here s a lnear classfer W R d c whch classfes each daa pon o a class,.e., G = W x. Noe ha G,G,..., G k are seleced from G,.e., G ( = S G.hen we have G ( =[G,G,..., G k ] = S G = S X W. (7 I s worh nong ha he proposed algorhm s an unsupervsed one. In oher words, G defned n (7 s he oupu of he algorhm,.e., G = W x, bu no provded by he human supervsors. If some rows of W shrnk o zero, W can be regarded as he combnaon coeffcens for dfferen feaures ha bes predc he class labels of he ranng daa. Nex, we gve he approach whch learns a dscrmnave W for feaure selecon. Inspred by [Fukunaga, 99; Yang e al., b], we defne he local dscrmnave score DS of x as [ ] DS = r (S ( + λi S ( b = r [G X ] ( ( X X + λi XG ( (8 [ ] = r W XS X ( X X + λi XS X W, where λ s a parameer and λi s added o make he erm ( X X + λi nverble. Clearly, a larger DS ndcaes ha W has a hgher dscrmnave ably w.r.. he daum x.we nend o ran a W correspondng o he hghes dscrmnave scores for all he ranng daa x,..., x n. herefore we propose o mnmze (9 for feaure selecon. n } {r[g ( H k+g ( ] DS + W, (9 = Consderng ha he daa number n each local se s usually small, G ( H k+g ( s added n (9 o avod overfng. he regularzaon erm W, conrols he capacy of W and also ensures ha W s sparse n rows, makng parcularly suable for feaure selecon. Subsung DS n (9 by (8, he objecve funcon of our UDSF s gven by n mn r{w XS H k+ S W X W W =I = ( [W XS X ( X X + λi XS X W ]} + W, where he orhogonal consran s mposed o avod arbrary scalng and avod he rval soluon of all zeros. Noe ha he frs erm of ( s equvalen o he followng : n r{w X{ [S (H k+ X ( X X + λi X S ]}X W } = Meanwhle we have H k+ X ( X X + λi X =H k+ H k+ X ( X X + λi XH k+ =H k+ H k+ ( X X + λi ( X X + λi X ( X X + λi XH k+ =H k+ H k+ ( X X + λi X XH k+ =H k+ H k+ ( X X + λi ( X X + λi λih k+ =λh k+ ( X X + λi H k+ herefore, he objecve funcon of UDFS s rewren as mn r(w MW+ W, ( W W =I I can be also nerpreed n regresson vew [Yang e al., a]. 59

where M = X [ n = ( S H k+ ( X X + λi H k+ S ] X ( Denoe w as he -h row of W,.e., W =[w,...w d ],he objecve funcon shown n ( can be also wren as mn r(w MW+ d w. (3 W W =I = We can see ha many rows of he opmal W correspondng o (3 shrnk o zeros. Consequenly, for a daum x, x = W x s a new represenaon of x usng only a small se of seleced feaures. Alernavely, we can rank each feaure f d = accordng o w n descendng order and selec op ranked feaures. Opmzaon of UDFS Algorhm he l, -norm mnmzaon problem has been suded n several prevous works, such as [Argyrou e al., 8; Ne e al., ; Oboznsk e al., 8; Lu e al., 9; Zhao e al., ; Yang e al., ]. However, remans unclear how o drecly apply he exsng algorhms o opmzng our objecve funcon, where he orhogonal consran W W = I s mposed. In hs secon, nspred by [Ne e al., ], we gve a new approach o solve he opmzaon problem shown n ( for feaure selecon. We frs descrbe he dealed approach of UDFS algorhm n Algorhm as follows. Algorhm : he UDFS algorhm. for =o n do B =( X X + λi 3 M = S H k+ B H k+ S ( ; n 4 M = X M X ; = 5 Se =and nalze D R d d as an deny marx; 6 repea 7 P = M + D ; 8 W =[p,..., p c ] where p,..., p c are he egenvecors of P correspondng o he frs c smalles egenvalues; 9 Updae he dagonal marx D + as D + = w... w d ; = +; unl Convergence; Sor each feaure f d = accordng o w n descendng order and selec he op ranked ones. Below, we brefly analyze Algorhm- proposed n hs secon. From lne o lne 4, compues M defned n Usually, many rows of he opmal W are close o zeros. (. From lne 6 o lne, opmzes he objecve funcon shown n (3. Nex, we verfy ha he proposed erave approach,.e., lne 6 o lne n Algorhm, converges o he opmal W correspondng o (3. We begn wh he followng wo Lemmas. Lemma. For any wo non-zero consans a and b, he followng nequaly holds [Ne e al., ]. a a b b b b. (4 P roof. he dealed proof s smlar as ha n [Ne e al., ]. Lemma. he followng nequaly holds provded ha v r = are non-zero vecors, where r s an arbrary number [Ne e al., ]. v+ v + v v (5 v v P roof. Subsung a and b n (4 by v + and v respecvely, we can see ha he followng nequaly holds for any. v v + + v v v (6 v Summng (6 over, can be seen ha (5 holds. Nex, we show ha he erave algorhm shown n Algorhm- converges by he followng heorem. heorem. he erave approach n Algorhm (lne 6 o lne monooncally decreases he objecve funcon value of mn r(w MW+ d W = w W =I n each eraon 3. P roof. Accordng o he defnon of W n lne 8 of Algorhm, we can see ha herefore, we have W = arg mn r[w (M + D W ] (7 W W =I r[w (M + D W ] r[w (M + D W ] r(w MW + w w r(wmw + w w 3 When compung D +, s dagonal elemen d =. w I s worhy nong ha n pracce, w could be very close o zero bu no zero. However, w can be zero heorecally. In hs case, we can follow he radonal regularzaon way and defne d =,whereς s a very small consan. When ς w +ς s easy o see ha w approxmaes. +ς w 59

hen we have he followng nequaly r(w MW + w ( r(w MW + ( w w w Meanwhle, accordng o Lemma, w w w we have he followng nequaly: r(w + AW ++ w + w w. r(w AW + w w w w w herefore, w, whch ndcaes ha he objecve funcon value of mn r(w MW+ d W = w W =I monooncally decreases usng he updang rule n Algorhm. Accordng o heorem, we can see ha he erave approach n Algorhm converges o he opmal W correspondngo (3. Because k s much smaller han n, he me complexy of compung M defned n ( s abou O(n. o opmze he objecve funcon of UDFS, he mos me consumng operaon s o perform egen-decomposon of P. Noe ha P R d d. he me complexy of hs operaon s O(d 3 approxmaely. Expermens In hs secon, we es he performance of UDFS proposed n hs paper. Followng [He e al., 5; Ca e al., ], we es he performance of he proposed algorhm n erms of cluserng. Expermen Seup In our expermen, we have colleced a dversy of 6 publc daases o compare he performance of dfferen unsupervsed feaure selecon algorhms. hese daases nclude hree face mage daases,.e., UMIS 4, FERE 5 and YALEB [Georghades e al., ], one ga mage daase,.e., USF HumanID [Sarkar e al., 5], one spoken leer recognon daa,.e., Isole 6 and one hand wren dg mage daase,.e., USPS [Hull, 994]. Dealed nformaon of he sx daases s summarzed n able. We compare UDFS proposed n hs paper wh he followng unsupervsed feaure selecon algorhms. 4 hp://mages.ee.ums.ac.uk/danny/daabase.hml 5 hp://www.frv.org/fere/defaul.hm 6 hp://www.cs.uc.edu/ mlearn/mlsummary.hml able : Daabase Descrpon. Daase Sze # of Feaures # of Classes UMIS 575 644 FERE 4 96 YALEB 44 4 38 USF HumanID 5795 86 Isole 56 67 6 USPS 998 56 All Feaures whch adops all he feaures for cluserng. I s used as he baselne mehod n hs paper. Max Varance whch selecs he feaures correspondng o he maxmum varances. Laplacan Score [He e al., 5] whch selecs he feaures mos conssen wh he Gaussan Laplacan marx. Feaure Rankng [Zhao and Lu, 7] whch selecs feaures usng specral regresson. Mul-Cluser Feaure Selecon (MCFS [Ca e al., ] whch selecs feaures usng specral regresson wh l -norm regularzaon. For LS, MCFS and UDFS, we fx k, whch specfes he sze of neghborhood, a 5 for all he daases. For LS and FR, we need o une he bandwdh parameer for Gaussan kernel, and for MCFS and UDFS we need o une he regularzaon parameer. o farly compare dfferen unsupervsed feaure selecon algorhms, we une hese parameers from { 9, 6, 3,, 3, 6, 9 }. We se he number of seleced feaures as {5,, 5,, 5, 3} for he frs fve daases. Because he oal feaure number of USPS s 56, we se he number of seleced feaures as {5, 8,, 4, 7, } for hs daase. We repor he bes resuls of all he algorhms usng dfferen parameers. In our expermen, each feaure selecon algorhm s frs performed o selec feaures. hen K-means cluserng algorhm s performed based on he seleced feaures. Because he resuls of K-means cluserng depend on nalzaon, s repeaed mes wh random nalzaons. We repor he average resuls wh sandard devaon (sd. wo evaluaon mercs,.e., Accuracy (ACC and Normalzed Muual Informaon (NMI, are used as evaluaon mercs n hs paper. Denoe q as he cluserng resuls and p as he ground ruh label of x. ACC s defned as follows. n = ACC = δ(p,map(q (8 n where δ(x, y = f x = y; δ(x, y = oherwse, and map(q s he bes mappng funcon ha permues cluserng labels o mach he ground ruh labels usng he Kuhn- Munkres algorhm. A larger ACC ndcaes beer performance. Gven wo varables P and Q, NMI s defned n (9. I(P, Q NMI(P, Q =, (9 H(P H(Q where I(P, Q s he muual nformaon beween P and Q, and H(P and H(Q are he enropes of P and Q [Srehl and 59

able : Cluserng Resuls (ACC% ± sd of Dfferen Feaure Selecon Algorhms All Feaures Max Varance Laplacan Score Feaure Rankng MCFS UDFS UMIS 4.9 ± 3. 46. ±.3 46.3 ± 3.3 48. ± 3.7 46.5 ± 3.5 49. ± 3.8 FERE. ±.5. ±.3.4 ±.5.8 ±.5 5. ±.7 6. ±.6 YALEB. ±.6 9.6 ±.3.4 ±.6 3.3 ±.8.4 ±. 4.7 ±.6 USF HumanID 3. ±.6.9 ±.5 8.8 ±.3. ±. 3. ±.6 4.6 ±.8 Isole 57.8 ± 4. 56.6 ±.6 56.9 ±.9 57. ±.9 6. ± 4.4 66. ± 3.6 USPS 6.9 ± 4.3 63.4 ± 3. 63.5 ± 3. 63.6 ± 3. 65.3 ± 5.4 65.8 ± 3.3 able 3: Cluserng Resuls (NMI% ± sd of Dfferen Feaure Selecon Algorhms All Feaures Max Varance Laplacan Score Feaure Rankng MCFS UDFS UMIS 6.9 ±.4 63.6 ±.8 65. ±. 64.9 ±.6 65.9 ±.3 66.3 ±. FERE 6.7 ±.4 6.3 ±.4 63. ±.3 63.3 ±.5 64.8 ±.5 65.6 ±.4 YALEB 4. ±.7 3. ±.4 8.4 ±..3 ±.9 8.8 ±. 5.4 ±.9 USF HumanID 5.9 ±.4 49. ±.4 47.5 ±. 9.3 ±.3 5.6 ±.4 5.6 ±.5 Isole 74. ±.8 73. ±. 7. ±. 7.5 ±.7 75.5 ±.8 78. ±.3 USPS 59. ±.5 59.6 ±. 6. ±.3 59.6 ±. 6. ±.7 6.6 ±.5 Ghosh, ]. Denoe l as he number of daa n he cluser C l ( l c accordng o cluserng resuls and h be he number of daa n he h-h ground ruh class ( h c. NMIsdefnedasfollows[Srehl and Ghosh, ]: c c l= h= NMI = l,h log( n l,h l h (c l= l log ( l c, ( n h= h log hn where l,h s he number of samples ha are n he nersecon beween he cluser C l and he h-h ground ruh class. Agan, a larger NMI ndcaes a beer cluserng resul. Expermenal Resuls and Dscusson Frs, we compare he performance of dfferen feaure selecon algorhms. he expermen resuls are shown n able and able 3. We can see from he wo ables ha he cluserng resuls of All Feaures are beer han hose of Max Varance. However, because he feaure number s sgnfcanly reduced by performng Max Varance for feaure selecon, resulng n he subsequenal operaon, e.g., cluserng, faser. herefore, s more effcen. he resuls from oher feaure selecon algorhms are generally beer han All Feaures and also more effcen. Excep for Max Varance, all of he oher feaure selecon algorhms are non-lnear approaches. We conclude ha local srucure s crucal for feaure selecon n many applcaons, whch s conssen wh prevous work on feaure selecon [He e al., 5]. We can also see from he wo ables ha MCFS gans he second bes performance. Boh Feaure Rankng [Zhao and Lu, 7] and MCFS [Ca e al., ] adop a wo-sep approach,.e, specral regresson, for feaure selecon. he dfference s ha Feaure Rankng analyzes feaures separaely and selecs feaures one afer anoher bu MCFS selecs feaures n bachmode. hs observaon valdaes ha s a beer way o analyze daa feaures jonly for feaure selecon. Fnally, we observe ha he UDFS algorhm proposed n hs paper obans he bes performance. here are wo man reasons for hs. Frs, UDFS analyzes feaures jonly. Second, UDFS smulaneously ulzes dscrmnave nformaon and local srucure of daa dsrbuon. Nex, we sudy he performance varaon of UDFS wh respec o he regularzaon parameer n ( and he number of seleced feaures. Due o he space lm, we use he hree face mage daases as examples. he expermenal resuls are shown n Fg.. We can see from Fg. ha he performance s no very sensve o as long as s smaller han. However, he performance s comparavely sensve o he number of seleced feaures. How o decde he number of seleced feaures s daa dependen and sll an open problem. Concluson Whle has been shown n many prevous works ha dscrmnave nformaon s benefcal o many applcaons, s no ha sraghforward o ulze n unsupervsed learnng due o he lack of label nformaon. In hs paper, we have proposed a new unsupervsed feaure selecon algorhm whch s able o selec dscrmnave feaures n bach mode. An effcen algorhm s proposed o opmze he l, -norm regularzed mnmzaon problem wh orhogonal consran. Dfferen from exsng algorhms whch selec he feaures whch bes preserve daa srucure of he whole feaure se, UDFS proposed n hs paper s able o selec dscrmnave feaure for unsupervsed learnng. We show ha s a beer way o selec dscrmnave feaures for daa represenaon and UDFS ouperforms he exsng unsupervsed feaure selecon algorhms. Acknowledgemen hs work s suppored by ARC DP94678 and parally suppored by he FP7-IP GLOCAL european projec. 593

.8.4..6.3.5 ACC.4 ACC. ACC....5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 (a UMIS-ACC (b FERE-ACC (c YALEB-ACC.8.8.4.6.6.3 NMI.4 NMI.4 NMI.... ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 ^ 9 ^ 6 ^ 3 ^3 ^6 ^9 5 3 5 5 (d UMIS-NMI (e FERE-NMI (f YALEB-NMI Fgure : Performance varaon of UDFS w.r. dfferen parameers. References [Oboznsk e al., 8] G. Oboznsk, M. J. Wanwrgh, [Argyrou e al., 8] Andreas Argyrou, heodoros Evgenou, Massmlano Ponl, Andreas Argyrou, heodoros n mulvarae regresson. In NIPS, 8. and M. I. Jordan. Hghdmensonal unon suppor recovery Evgenou, and Massmlano Ponl. Convex mul-ask [Sarkar e al., 5] S. Sarkar, P.J. Phllps, Z. Lu, I.R. Vega, feaure learnng. In Machne Learnng, 8. P. Groher, and K.W. Bowyer. he humand ga challenge problem: daa ses, performance, and analyss. IEEE [Ca e al., ] Deng Ca, Chyuan Zhang, and Xaofe He. Unsupervsed feaure selecon for mul-cluser daa. PAMI, pages 7(:6 77, 5. KDD,. [Srehl and Ghosh, ] A. Srehl and J. Ghosh. Cluser [Duda e al., ] R.O. Duda, P.E. Har, and D.G. Sork. ensembles a knowledge reuse framework for combnng Paern Classfcaon (nd Edon. John Wley & Sons, mulple parons. Journal of Machne Learnng Research, 3:583 67,. New York, USA,. [Fukunaga, 99] K. Fukunaga. Inroducon o sascal [Sugyama, 6] Masash Sugyama. Local fsher dscrmnan analyss for supervsed dmensonaly reducon. In paern recognon (nd Edon. Academc Press Professonal, Inc, San Dego, USA, 99. ICML, 6. [Georghades e al., ] A. Georghades, P. Belhumeur, [Yang e al., a] Y Yang, Fepng Ne, Shmng Xang, and D. Kregman. From few o many: Illumnaon cone Yueng Zhuang, and Wenhua Wang. Local and global models for face recognon under varable lghng and regressve mappng for manfold learnng wh ou-ofsample exrapolaon. In AAAI,. pose. IEEE PAMI, page 3(6:64366,. [He e al., 5] Xaofe He, Deng Ca, and Parha Nyog. [Yang e al., b] Y Yang, Dong Xu, Fepng Ne, Laplacan score for feaure selecon. NIPS, 5. Shucheng Yan, and Yueng Zhuang. Image cluserng usng local dscrmnan models and global negraon. IEEE [Hull, 994] J.J. Hull. A daabase for handwren ex recognon research. IEEE ransacons on Paern Analyss IP, pages 9(76 773,. and Machne Inellgence, pages 6(5:55 554, 994. [Yang e al., ] Yang Yang, Y Yang, Z Huang, Heng ao Shen, and Fepng Ne. ag localzaon wh [Lu e al., 9] Jun Lu, Shuwang J, and Jepng Ye. spaal correlaons and jon group sparsy. In CVPR, Mul-ask feaure learnng va effcen l, -norm mnmzaon. In UAI, 9. pages 88 888,. [Zhao and Lu, 7] Zheng Zhao and Huan Lu. Specral [Ne e al., 8] Fepng Ne, Shmng Xang, Yangqng Ja, feaure selecon for supervsed and unsupervsed learnng. Changshu Zhang, and Shucheng Yan. race rao creron for feaure selecon. In AAAI, 8. In ICML, 7. [Zhao e al., ] Z. Zhao, L. Wang, and H. Lu. Effcen [Ne e al., ] Fepng Ne, Heng Huang, Xao Ca, and specral feaure selecon wh mnmum redundancy. In Chrs Dng. Effcen and robus feaure selecon va jon AAAI,. l, -norms mnmzaon. In NIPS,. 594