Semi-supervised Classification with Active Query Selection

Similar documents
Kernel Methods and SVMs Extension

Generalized Linear Methods

10-701/ Machine Learning, Fall 2005 Homework 3

MAXIMUM A POSTERIORI TRANSDUCTION

Lecture 10 Support Vector Machines II

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Lecture Notes on Linear Regression

Tensor Subspace Analysis

Online Classification: Perceptron and Winnow

Boostrapaggregating (Bagging)

Supporting Information

Report on Image warping

Lecture 3: Dual problems and Kernels

18-660: Numerical Methods for Engineering Design and Optimization

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Lecture 12: Classification

A New Evolutionary Computation Based Approach for Learning Bayesian Network

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Sparse Gaussian Processes Using Backward Elimination

Soft-Supervised Learning for Text Classification

Discretization of Continuous Attributes in Rough Set Theory and Its Application*

International Journal of Mathematical Archive-3(3), 2012, Page: Available online through ISSN

Clustering gene expression data & the EM algorithm

Solving Nonlinear Differential Equations by a Neural Network Method

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Feature Selection: Part 1

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function

The Order Relation and Trace Inequalities for. Hermitian Operators

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni

EEE 241: Linear Systems

Support Vector Machines

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

1 Convex Optimization

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

The Expectation-Maximization Algorithm

Lecture 10: May 6, 2013

Feature Selection in Multi-instance Learning

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

An Improved multiple fractal algorithm

High resolution entropy stable scheme for shallow water equations

A Hybrid Variational Iteration Method for Blasius Equation

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Regularized Discriminant Analysis for Face Recognition

Adaptive Manifold Learning

Singular Value Decomposition: Theory and Applications

Ensemble Methods: Boosting

Problem Set 9 Solutions

Linear Feature Engineering 11

Non-linear Canonical Correlation Analysis Using a RBF Network

The Study of Teaching-learning-based Optimization Algorithm

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Learning with Tensor Representation

Errors for Linear Systems

Statistical pattern recognition

Lecture 20: November 7

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

A Robust Method for Calculating the Correlation Coefficient

Semi-supervised Learning using Kernel Self-consistent Labeling

Natural Language Processing and Information Retrieval

18.1 Introduction and Recap

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

VQ widely used in coding speech, image, and video

Maximizing the number of nonnegative subsets

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane

Large-Margin HMM Estimation for Speech Recognition

Multilayer Perceptron (MLP)

A new Approach for Solving Linear Ordinary Differential Equations

Organizing Teacher Education In German Universities

Valuated Binary Tree: A New Approach in Study of Integers

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

A NEW DISCRETE WAVELET TRANSFORM

Spectral Clustering. Shannon Quinn

Support Vector Machines

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Which Separator? Spring 1

Study of Selective Ensemble Learning Methods Based on Support Vector Machine

Lecture 10 Support Vector Machines. Oct

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

An Iterative Modified Kernel for Support Vector Regression

MDL-Based Unsupervised Attribute Ranking

Cryptanalysis of pairing-free certificateless authenticated key agreement protocol

2.3 Nilpotent endomorphisms

Lecture 12: Discrete Laplacian

Building Maximum Entropy Text Classifier Using Semi-supervised Learning

Linear Approximation with Regularization and Moving Least Squares

Classification as a Regression Problem

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Transient Stability Assessment of Power System Based on Support Vector Machine

The lower and upper bounds on Perron root of nonnegative irreducible matrices

Nonlinear Classifiers II

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Orientation Model of Elite Education and Mass Education

Transcription:

Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples are crucal n sem-supervsed classfcaton, but whch samples should we choose to be the labeled samples? In other words, whch samples, f labeled, would provde the most nformaton? We propose a method to solve ths problem. Frst, we gve each unlabeled examples an ntal class label usng unsupervsed learnng. Then, by maxmzng the mutual nformaton, we choose the samples wth most nformaton to be user-specfed labeled samples. After that, we run sem-supervsed algorthm wth the userspecfed labeled samples to get the fnal classfcaton. Expermental results on synthetc data show that our algorthm can get a satsfyng classfcaton results wth actve query selecton. Introducton Recently, there has been great nterest n Sem-supervsed classfcaton. The goal of sem-supervsed learnng s to use unlabeled data to mprove the performance of standard supervsed learnng algorthms. Snce n many felds, obtanng labeled data s hard or expensve, sem-supervsed learnng methods wth small labeled sample sze s of great use. In case the unsupervsed learnng methods can separate the ponts well (see e.g. Fg.a), there s no need for sem-supervsed methods. However, n case of nose (see e.g. Fg.b), or n case of two modes whch belong to two dfferent classes overlap (see e.g. Fg.c), sem-supervsed learnng wth a few labeled ponts n each class can mprove the performance sgnfcantly. A number of algorthms have been proposed for sem-supervsed learnng, ncludng EM [8], Co-tranng [, 4], Tr-tranng [5], random feld models [9, ], graph based approaches [, 6, 3]. Dfferent methods have dfferent assumptons, and can be used n dfferent stuaton. Especally, when data resdes on a low-dmensonal manfold wthn a hgh-dmensonal representaton space, sem-supervsed learnng methods should be adjusted to work on manfold. Belkn gves a soluton to ths problem wth manfold Regularzaton methods n [4]. Query selecton s extensvely studed n the supervsed framework. In [0], the queres are selected to mnmze the verson space sze for support vector machne. In [7], a commttee of classfers s employed, and a pont s quered whenever the commttee members dsagree. Many other methods are proposed to actvely choose the samples n supervsed learnng, but few are done to choose samples n sem-supervsed learnng. D.-Y. Yeung et al. (Eds.): SSPR&SPR 006, LNCS 409, pp. 74 746, 006. Sprnger-Verlag Berln Hedelberg 006

74 J. Wang and S. Luo The labeled samples play an mportant role n sem-supervsed learnng. Then a queston rses: whch samples should be the labeled samples? Among the exstng sem-supervsed learnng methods, some choose the labeled samples manually[6], to do ths, one has to have some doman knowledge of whch samples need most to be labeled; some choose the labeled samples randomly, whch may not contan the rght samples; and n [], Zhu et al. choose the samples actvely by greedly selectng queres from the unlabeled data to mnmze the estmated expected classfcaton error, but Zhu s actve learnng method can only be used together wth hs semsupervsed learnng method. In ths paper, we gve a more general and automatc query selecton method n the sem-supervsed framework. Our method can be appled to most of the exstng semsupervsed learnng methods. It s the pre-process of the exstng sem-supervsed methods. The man dea s to consder whch samples, f labeled, would gve more nformaton. Followng ths dea, we use the mutual nformaton I ( Y; y) ( y represents one sample s class label and Y represents the whole sample s class labels) as a measure of actve query selecton. By maxmzng the mutual nformaton, we get the sample whch needs most to be labeled. Usng ths method, we can choose the samples to be labeled actvely and automatcally, and t does not need any doman knowledge. In ths paper, n order to explan our method, we work wth the Laplacan Egenmaps [3] and manfold Regularzaton [4] of Belkn to show the entre process. We can see how the actve query selecton method works on manfold. We do not clam that ths method can only use on manfold, and ndeed we am to llumnate that applyng our method, to any sem-supervsed method, would always yeld satsfyng results. Ths paper s organzed as follows: In secton, we ntroduce our algorthm n a bref way. Secton 3 gves detals of every part of our algorthm. Expermental results on synthetc data are shown n secton 4, followed by conclusons n secton 5. Fg.. (a) Example of stuaton n whch unsupervsed learnng methods (here we use Laplacan Egenmaps) can work well. (b)(c) Examples of stuatons n whch unsupervsed learnng can not gve satsfyng results, and they need some labeled samples to help. Our Algorthm To explan the entre process of our actve query selecton method, we work wth the Laplacan Egenmaps [3] and manfold Regularzaton [4] of Belkn. The steps are as follow:

Sem-supervsed Classfcaton wth Actve Query Selecton 743 Step. Gve each sample (unlabeled) an ntal class label usng unsupervsed learnng. Here, we use Laplacan Egenmaps to map the sample to a real value functon f. Step. By maxmzng the mutual nformaton I ( Y; y) ( y represents one sample s class label and Y represents the whole sample s class labels), we actvely choose the samples wth most uncertan class labels to be user-specfed labeled samples. Step 3. Gve the chosen samples ther class label. Step 4. Run the sem-supervsed algorthm (here, we use the manfold Regularzaton algorthm) wth the user-specfed labeled samples to get the fnal classfcaton. 3 Detals of Our Method 3. Usng Laplacan Egenmaps to Get Intal Class Label x,..., x n R m Gven a sample set. Construct ts neghborhood graph G = ( V, E), whose vertces are sample ponts V = x, L, x }, and whose edge weghts n { w}, j = { n represent approprate parwse smlarty relatonshps between samples. For example, w can be the radal bass functon: w = exp( σ m d = ( x d x where σ s a scale parameter. The radal bass functon of w ensure that nearby ponts are assgned large edge weghts. We frst consder two-class stuaton. Assume that f s a real value functon whose value s bounded from 0 to (0 and each represents a class label). T y = f ( x ), Y = ( y, y,..., y n ). Laplacan Egenmaps try to mnmze the followng objectve functon ( y y ) W () By mnmzng ths objectve functon, we get of each sample, wth y [0,]. j jd ) ) y,..., (), y yn, the ntal class label 3. Usng Mutual Informaton to Choose the Samples wth Most Uncertan Class Labels Ths s our actve query selecton step. And we use the mutual nformaton I ( Y; y) ( y represents one sample s class label and Y represents the whole sample s class labels) as a measure of query selecton. By maxmzng the mutual

744 J. Wang and S. Luo nformaton, we get the sample whch would gve most nformaton, that s, whch needs most to be labeled. In order to calculate I ( Y; y), nspred by the work of [5], we defne a Gaussan random feld on the vertces of V { y T Δ / } p( y) exp λ y (3) where Δ = D W, and D s a dagonal matrx gven by = n D W The mutual nformaton between Y and y * s the expected decrease n entropy of Y when y * s observed: I ( Y; y ) = H ( Y) E = { H ( Y y )} (/ )log( + p( y )( p( y )) x * T j= H. x ) where H = ( log p( Y)), and s the Hessan matx. The best sample to label s the one that maxmzes I ( Y ; y ). And the mutual nformaton s largest when p ) 0. 5,.e., for samples wth most nformaton. ( y 3.3 Usng the User-Specfed Labeled Samples to Get the Fnal Classfcaton After we actvely choose the sample to gve label, we can run the sem-supervsed classfcaton methods to get the fnal result. In Manfold Regularzaton methods of Belkn, the author mnmzes the followng cost functon (4) mn H[ f ] = f H l l = ( f ( x ) y ) n + γ f + γ ( f ( x ) f ( x )) W (5) A K I, j= j Where l s the number of labeled samples. γ A, γ I are regularzaton parameters. K f s some form of constrant to ensure the smoothness of the learned manfold. Here, the l samples n the above cost functon are not chosen randomly or manually as n the orgnal work of Belkn. But rather, they are chosen wth the actve query selecton methods dscussed n 3.. 4 Expermental Results As we pont out at the begnnng of ths paper, unsupervsed learnng can not work well n case of nose, and n case of two modes whch belong to two dfferent classes overlapped. In these stuatons, sem-supervsed learnng wth a few labeled samples can help. Usng some synthetc data, we show that our actve query selecton method can choose the most nformatve samples to gve labels. Fg. (a) s a nose case of fg. (a), and wthout labeled samples, the Laplacan Egenmaps can not fnd a satsfyng

Sem-supervsed Classfcaton wth Actve Query Selecton 745 classfcaton (the yellow curve). Usng our actve query selecton method, the algorthm chooses some samples to be labeled, these samples are shown n (b) wth purple color. After that, user gve the class label of these chosen samples (the red and blue samples n (c), each color represents a class), then, wth these user-specfed labeled samples, manfold regularzaton method fnd the more satsfyng classfcaton as shown n (c) (the yellow curve). Fg.. (a) Laplacan Egenmaps can not fnd a satsfyng classfcaton wthout labeled samples. (b) The samples automatcally chosen to be labeled. (c) The manfold regularzaton results wth the labeled samples. 5 Conclusons A key problem of sem-supervsed learnng s to choose the most nformatve samples to be labeled at the very begnnng of sem-supervsed algorthms. Usng mutual nformaton, we gve a soluton to ths problem. Our method of samples chosen can apply to most of the exstng sem-supervsed learnng methods, and n ths paper, we combne t wth manfold regularzaton to show how t works. We also do experments on some synthetc data, and yeld satsfyng results. In future works, we wll try ths method on some real world experments. Another problem of sem-supervsed learnng s how many labeled sample are sutable, for example, should we choose fve samples to gve label, or, should we choose ten? In future work, we wll consder ths problem n the framework of the actve query selecton of ths paper. Acknowledgements The research s supported by natonal natural scence foundatons of chna (6037309), the Research Fund for the Doctoral Program of Hgher Educaton of Chna (005000400) and Co-Constructon Project of Key Subject of Beng. References. A. Blum and T. Mtchell: Combnng labeled and unlabeled data wth co-tranng. In Proceedngs of the th Annual Conference on Computatonal Learnng Theory. Madson, WI, pp.9 00, (998).. A. Blum, S. Chawla: Learnng from Labeled and Unlabeled Data usng Graph Mncuts. ICML (00).

746 J. Wang and S. Luo 3. Belkn M., Nyog P: Laplacan Egenmaps for Dmensonalty Reducton and Data Representaton. Neural Computaton, June (003) 4. Belkn M., Nyog P., Sndhwan V: On Manfold Regularzaton. Department of Computer Scence, Unversty of Chcago, TR-004-05. 5. B. Krshnapuram, D. Wllams, Ya Xue, A. Hartemnk, L. Carn, and M. A. T. Fgueredo: On Sem-Supervsed Classfcaton. NIPS (004). 6. D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Schoelkopf: Learnng wth Local and Global Consstency. NIPS (003). 7. Freund, Y., Seung, H. S., Shamr, E., & Tshby, N.: Selectve samplng usng the query by commttee algorthm. Machne Learnng, 8, 33-68. (997). 8. K. Ngam: Usng Unlabeled Data to Improve Text Classfcaton. PhD thess, Carnege Mellon Unversty Computer Scence Dept, (00). 9. M. Szummer and T. Jaakkola: Partally labeled classfcaton wth markov random walks. NIPS (00). 0. Tong, S., and Koller, D.: Support vector machne actve learnng wth applcatons to text classfcaton. ICML (000).. Xaojn Zhu, J. Lafferty, and Z. Ghahraman: Combnng actve learnng and semsupervsed learnng usng Gaussan felds and harmonc functons. ICML (003).. Xaojn Zhu, Z. Ghahraman, and J. Lafferty: Sem-supervsed learnng usng Gaussan felds and harmonc functons. ICML (003). 3. Xaojn Zhu: Sem-Supervsed Learnng wth Graphs. PhD thess, Carnege Mellon Unversty Computer Scence Dept, (005). 4. Zhou, Z.-H., & L, M.: Sem-supervsed regresson wth co-tranng. Internatonal Jont Conference on Artfcal Intellgence, (005). 5. Zhou, Z.-H., & L, M.: Tr-tranng: explotng unlabeled data usng three classfers. IEEE Trans. Knowledge and Data Engneerng, 7, 59 54, (005).