Multi-sensor classification with Consensus-based Multi-view Maximum Entropy Discrimination

Size: px

Start display at page:

Download "Multi-sensor classification with Consensus-based Multi-view Maximum Entropy Discrimination"

Helen Bryant
5 years ago
Views:

1 Multi-sensor classification with Consensus-based Multi-view Maximum Entropy Discrimination Tianpei Xie, Nasser M. Nasrabadi, Alfred O. Hero University of Michigan, Ann Arbor, U.S. Army Research Lab 1 / 25

2 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 2 / 25

3 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 3 / 25

Outline Problem Motivations Consensus-constraint via

References Motivations In many applications, samples

4 Outline Problem Motivations Consensus-constraint via information geometry CMV-MED Experiments Conclusion References Motivations In many applications, samples can be represented in multiple ways (referred as multi-view samples). For instance, 1 In web-page network,... 2 In muti-sensor network,... 3 In biometrics,... 4 etc. 4 / 25

5 Multi-view learning and Challenges We focus on multi-view learning: learning to predict or classify based on multi-view data. A few challenges arises in multi-view learning: 1 Information Fusion? Robustness? 2 Parsimony? 3 Unlabeled samples? In this work, we consider the semi-supervised multi-view learning problem. 5 / 25

6 Previous works 1 Multi-view feature learning methods, e.g. Canonical Correlation Analysis (CCA) [Rupnik and Shawe-Taylor, 2010], Bi-modal Deep Autoencoder (Bi-DAE) [Ngiam et al., 2011]; SVM-2K [Farquhar et al., 2005], etc. Cons: sensitive to local outliers; 2 Decision-level fusion e.g. Bayes-Fusion: e.g. MCMC, particle filter methods [Klein, 2004] Model averaging: e.g. boosting methods [Collins and Singer, 1999], etc Cons: between-view correlation not taken into account; 3 Consensus-based multi-view learning model, e.g. Co-training [Blum and Mitchell, 1998], Bayesian Co-training (Bayes Co-trn) [Yu et al., 2007], Multi-View MED [Sun and Chao, 2013] etc. 6 / 25

7 Our contribution We propose a Consensus-based Multi-View Maximum Entropy Discrimination (CMV-MED) framework: Features are view-specific posterior distributions; a consensus-view model proposed dissimilarity measure btw these posterior distributions. centroid in an intrinsic non-euclidean space induced via K-L div. 7 / 25

8 Comparison of multi-view learning methods fusion parsimonsup. tol views semi- noise Bayes. #. stage CCA feature x x x 2 Bi-DAE feature x x x 2 SVM-2K feature x x x 2 Bayes-Fusion decision x 2 Boosting decision x x 2 Co-training consens. x 2 Bayes Co-trn consens. x 2 MV-MED consens. x 2 CMV-MED consens. 2 8 / 25

9 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 9 / 25

10 Assumptions and Stochastic consensus Binary classification task with V views, with x [V] (x 1,..., x V ) X 1... X V and Y = { 1, +1}. Log-linear predictive model log p i (y x i, w i ) 1 2 y ( w T i x i) for view i. The proposed stochastic consensus measure is given as R π (w 1, w 2 ) [ ( = E (x 1,x 2 ) D p1 (y x 1, w 1 ), p 2 (y x 2, w 2 ) )] (V = 2) ( )) = E (x 1,x 2 ) min π i KL q(y x [2] ) p i (y x i, w i q(y x [2] ) (Y) i {1,2} where KL( ) denotes the K-L divergence, and the weight π. R π 0 = 0 iff p 1 = p 2. The optimal sol. q (y x [2] m ) consensus-view model. 10 / 25

Outline Problem Motivations Consensus-constraint via information geometry CMV-MED

6 distance distance 50 0.5 0.4 0.3 40 30 0.2 20 0.

classifier 1 (2) 7 distance 6 5 4 (1) Stochastic consensus: 3 2 (2) Exp-consensus:

11 Outline Problem Motivations Consensus-constraint via information geometry CMV-MED Experiments Conclusion References Comparison with other consensus measure distance distance classifier classifier 1 classifier 2-2 (1) classifier 1 (2) 7 distance (1) Stochastic consensus: 3 2 (2) Exp-consensus: classifier classifier 1 D(p, q) = exp( sign(p) p sign(q) p); (3) `2 norm -consensus: D(p, q) = kp qk2 (3) 11 / 25

12 Interpretation in information geometry q (y x [V] m ) = arg min q(y) (Y) V i=1 π ikl ( q(y x [V] ) p i (y x i, w i ) ). The centroid of conv. { p i (y x i, w i ), i = 1,..., V } in log( (Y)). 12 / 25

13 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 13 / 25

14 Maximum Entropy Discrimination (MED) MED framework introduced by Jaakkola et al. [1999]. Let F(y n, x n ; w) = log ( p(yn x n, w) p(y y n x n, w) ) is the discriminative functions. MED: learn a convex combination of discriminative functions via Maximum Entropy principle. Assume the prior on w and γ as p 0 (w)p 0 (γ), the goal is to learn q(w, γ D) via solving the following min KL (q(w, γ D) p 0 (w) p 0 (γ)) q(w,γ D) s.t. E q(w,γ D) [ F(y n, x n ; w) γ n ] 0, n MED defines decision rule via Bayesian averaging y = arg max p(y x, w)q(w, γ D)dwdγ w,γ MED is robust compared to single classifier [Jaakkola et al., 1999]. 14 / 25

15 Algorithm Our solution for CMV-MED is based on variational EM [Sindhwani et al., 2006] 1 Given the ŵ i t 1 = E [ ] q t 1 (w i ) w i, i = 1,..., V from single-view MED, find the consensus view on unlabeled data U via information projection, i.e. log q t (y x [V] n ) = 1 V V log p i (y x n, ŵ i t 1) log Z(x n ), n U, i=1 where Z(x n ) is the normalization factor. 2 Given the consensus view q t (y x n ), n U, solve for each view i = 1,..., V a MED problem independently to obtain the following optimal solution q t (w i α i, β i ) = MED-Solver( { (y n, x i n) }, { } x i n L m, {ŷ m U m q t (y x m )} m U ), where (α i, β i ) are dual variables associated with the SVM-type solution. 3 Repeat 1 and 2 until converge. 15 / 25

16 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 16 / 25

17 Multisensor footstep recognition We test on ARL-Footstep [Damarla et al., 2011] data. It is a multi-sensor data set that contains acoustic signals collected by four well-synchronized sensors (labeled as Sensor 1,2,3,4) in a natural environment. The task is to discriminate between human footsteps and human-leading animal footsteps. It involves 840 segments from human subjects and 660 segments from human-animal subjects. We choose 600 segments from each class as the training set with L = 50. In each view, the feature dimension d = 200 measure the classification accuracy vs. size of labeled samples We compare the proposed CMV-MED model with the SVM-2K, MV-MED as well as the single-view MED for each view 17 / 25

18 18 / 25

19 Web-Page Classification The WebKB4 [Craven et al., 2000] data set is widely-used in multi-view learning literature. It consists of 1051 two-view web pages collected from computer science department web sites at four universities. The task is to discriminate between course page and non-course page. There are 230 course pages and 821 non-course pages. The two natural views are words in a web page and words appearing in the links pointing to that page. In each view, we compute the term frequency-inverse document frequency weights (TF-IDF) features from the document word matrix. measure the classification accuracy vs. size of labeled samples 19 / 25

20 20 / 25

21 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 21 / 25

22 Conclusion The proposed method maximizes the stochastic agreement btw different models on unlabeled samples. The learned consensus-view distribution is the centroid of all view-specific posterior distributions over the space of probability measures The proposed multi-view learning algorithm has higher accuracy and lower variance compared to its single-view counterparts. 22 / 25

23 Acknowledgment This research was partially supported by US Army Research Office (ARO) grants W911NF and WA11NF A1. Thanks for Army Research Lab to provide data sets. 23 / 25

24 reference I Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (COLT), pages ACM, Michael Collins and Yoram Singer. Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pages Citeseer, Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Seán Slattery. Learning to construct knowledge bases from the world wide web. Artificial intelligence, 118(1):69 113, Thyagaraju Damarla, Asif Mehmood, and James Sabatier. Detection of people and animals using non-imaging sensors. Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on, pages 1 8, Jason Farquhar, David Hardoon, Hongying Meng, John S Shawe-taylor, and Sandor Szedmak. Two view learning: SVM-2K, theory and practice. In Advances in neural information processing systems, pages , Tommi Jaakkola, Marina Meila, and Tony Jebara. Maximum entropy discrimination. In Advances in neural information processing systems, Lawrence A Klein. Sensor and data fusion: a tool for information assessment and decision making, volume 324. SPIE press Bellinghamˆ ewa WA, Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages , Jan Rupnik and John Shawe-Taylor. Multi-view canonical correlation analysis. In Conference on Data Mining and Data Warehouses (SiKDD 2010), pages 1 4, / 25

25 reference II Vikas Sindhwani, S Sathiya Keerthi, and Olivier Chapelle. Deterministic annealing for semi-supervised kernel machines. In Proceedings of the 23rd international conference on Machine learning, pages ACM, Shiliang Sun and Guoqing Chao. Multi-view maximum entropy discrimination. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages AAAI Press, Shipeng Yu, Balaji Krishnapuram, Harald Steck, RB Rao, and Rómer Rosales. Bayesian co-training. In Advances in Neural Information Processing Systems, pages , / 25

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of