Multi-sensor classification with Consensus-based Multi-view Maximum Entropy Discrimination
|
|
- Helen Bryant
- 5 years ago
- Views:
Transcription
1 Multi-sensor classification with Consensus-based Multi-view Maximum Entropy Discrimination Tianpei Xie, Nasser M. Nasrabadi, Alfred O. Hero University of Michigan, Ann Arbor, U.S. Army Research Lab 1 / 25
2 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 2 / 25
3 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 3 / 25
4 Outline Problem Motivations Consensus-constraint via information geometry CMV-MED Experiments Conclusion References Motivations In many applications, samples can be represented in multiple ways (referred as multi-view samples). For instance, 1 In web-page network,... 2 In muti-sensor network,... 3 In biometrics,... 4 etc. 4 / 25
5 Multi-view learning and Challenges We focus on multi-view learning: learning to predict or classify based on multi-view data. A few challenges arises in multi-view learning: 1 Information Fusion? Robustness? 2 Parsimony? 3 Unlabeled samples? In this work, we consider the semi-supervised multi-view learning problem. 5 / 25
6 Previous works 1 Multi-view feature learning methods, e.g. Canonical Correlation Analysis (CCA) [Rupnik and Shawe-Taylor, 2010], Bi-modal Deep Autoencoder (Bi-DAE) [Ngiam et al., 2011]; SVM-2K [Farquhar et al., 2005], etc. Cons: sensitive to local outliers; 2 Decision-level fusion e.g. Bayes-Fusion: e.g. MCMC, particle filter methods [Klein, 2004] Model averaging: e.g. boosting methods [Collins and Singer, 1999], etc Cons: between-view correlation not taken into account; 3 Consensus-based multi-view learning model, e.g. Co-training [Blum and Mitchell, 1998], Bayesian Co-training (Bayes Co-trn) [Yu et al., 2007], Multi-View MED [Sun and Chao, 2013] etc. 6 / 25
7 Our contribution We propose a Consensus-based Multi-View Maximum Entropy Discrimination (CMV-MED) framework: Features are view-specific posterior distributions; a consensus-view model proposed dissimilarity measure btw these posterior distributions. centroid in an intrinsic non-euclidean space induced via K-L div. 7 / 25
8 Comparison of multi-view learning methods fusion parsimonsup. tol views semi- noise Bayes. #. stage CCA feature x x x 2 Bi-DAE feature x x x 2 SVM-2K feature x x x 2 Bayes-Fusion decision x 2 Boosting decision x x 2 Co-training consens. x 2 Bayes Co-trn consens. x 2 MV-MED consens. x 2 CMV-MED consens. 2 8 / 25
9 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 9 / 25
10 Assumptions and Stochastic consensus Binary classification task with V views, with x [V] (x 1,..., x V ) X 1... X V and Y = { 1, +1}. Log-linear predictive model log p i (y x i, w i ) 1 2 y ( w T i x i) for view i. The proposed stochastic consensus measure is given as R π (w 1, w 2 ) [ ( = E (x 1,x 2 ) D p1 (y x 1, w 1 ), p 2 (y x 2, w 2 ) )] (V = 2) ( )) = E (x 1,x 2 ) min π i KL q(y x [2] ) p i (y x i, w i q(y x [2] ) (Y) i {1,2} where KL( ) denotes the K-L divergence, and the weight π. R π 0 = 0 iff p 1 = p 2. The optimal sol. q (y x [2] m ) consensus-view model. 10 / 25
11 Outline Problem Motivations Consensus-constraint via information geometry CMV-MED Experiments Conclusion References Comparison with other consensus measure distance distance classifier classifier 1 classifier 2-2 (1) classifier 1 (2) 7 distance (1) Stochastic consensus: 3 2 (2) Exp-consensus: classifier classifier 1 D(p, q) = exp( sign(p) p sign(q) p); (3) `2 norm -consensus: D(p, q) = kp qk2 (3) 11 / 25
12 Interpretation in information geometry q (y x [V] m ) = arg min q(y) (Y) V i=1 π ikl ( q(y x [V] ) p i (y x i, w i ) ). The centroid of conv. { p i (y x i, w i ), i = 1,..., V } in log( (Y)). 12 / 25
13 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 13 / 25
14 Maximum Entropy Discrimination (MED) MED framework introduced by Jaakkola et al. [1999]. Let F(y n, x n ; w) = log ( p(yn x n, w) p(y y n x n, w) ) is the discriminative functions. MED: learn a convex combination of discriminative functions via Maximum Entropy principle. Assume the prior on w and γ as p 0 (w)p 0 (γ), the goal is to learn q(w, γ D) via solving the following min KL (q(w, γ D) p 0 (w) p 0 (γ)) q(w,γ D) s.t. E q(w,γ D) [ F(y n, x n ; w) γ n ] 0, n MED defines decision rule via Bayesian averaging y = arg max p(y x, w)q(w, γ D)dwdγ w,γ MED is robust compared to single classifier [Jaakkola et al., 1999]. 14 / 25
15 Algorithm Our solution for CMV-MED is based on variational EM [Sindhwani et al., 2006] 1 Given the ŵ i t 1 = E [ ] q t 1 (w i ) w i, i = 1,..., V from single-view MED, find the consensus view on unlabeled data U via information projection, i.e. log q t (y x [V] n ) = 1 V V log p i (y x n, ŵ i t 1) log Z(x n ), n U, i=1 where Z(x n ) is the normalization factor. 2 Given the consensus view q t (y x n ), n U, solve for each view i = 1,..., V a MED problem independently to obtain the following optimal solution q t (w i α i, β i ) = MED-Solver( { (y n, x i n) }, { } x i n L m, {ŷ m U m q t (y x m )} m U ), where (α i, β i ) are dual variables associated with the SVM-type solution. 3 Repeat 1 and 2 until converge. 15 / 25
16 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 16 / 25
17 Multisensor footstep recognition We test on ARL-Footstep [Damarla et al., 2011] data. It is a multi-sensor data set that contains acoustic signals collected by four well-synchronized sensors (labeled as Sensor 1,2,3,4) in a natural environment. The task is to discriminate between human footsteps and human-leading animal footsteps. It involves 840 segments from human subjects and 660 segments from human-animal subjects. We choose 600 segments from each class as the training set with L = 50. In each view, the feature dimension d = 200 measure the classification accuracy vs. size of labeled samples We compare the proposed CMV-MED model with the SVM-2K, MV-MED as well as the single-view MED for each view 17 / 25
18 18 / 25
19 Web-Page Classification The WebKB4 [Craven et al., 2000] data set is widely-used in multi-view learning literature. It consists of 1051 two-view web pages collected from computer science department web sites at four universities. The task is to discriminate between course page and non-course page. There are 230 course pages and 821 non-course pages. The two natural views are words in a web page and words appearing in the links pointing to that page. In each view, we compute the term frequency-inverse document frequency weights (TF-IDF) features from the document word matrix. measure the classification accuracy vs. size of labeled samples 19 / 25
20 20 / 25
21 Outline 1 Problem Motivations 2 Consensus-constraint via information geometry 3 Consensus-based Multi-view Maximum Entropy Learning 4 Experiments 5 Conclusion 21 / 25
22 Conclusion The proposed method maximizes the stochastic agreement btw different models on unlabeled samples. The learned consensus-view distribution is the centroid of all view-specific posterior distributions over the space of probability measures The proposed multi-view learning algorithm has higher accuracy and lower variance compared to its single-view counterparts. 22 / 25
23 Acknowledgment This research was partially supported by US Army Research Office (ARO) grants W911NF and WA11NF A1. Thanks for Army Research Lab to provide data sets. 23 / 25
24 reference I Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (COLT), pages ACM, Michael Collins and Yoram Singer. Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pages Citeseer, Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Seán Slattery. Learning to construct knowledge bases from the world wide web. Artificial intelligence, 118(1):69 113, Thyagaraju Damarla, Asif Mehmood, and James Sabatier. Detection of people and animals using non-imaging sensors. Information Fusion (FUSION), 2011 Proceedings of the 14th International Conference on, pages 1 8, Jason Farquhar, David Hardoon, Hongying Meng, John S Shawe-taylor, and Sandor Szedmak. Two view learning: SVM-2K, theory and practice. In Advances in neural information processing systems, pages , Tommi Jaakkola, Marina Meila, and Tony Jebara. Maximum entropy discrimination. In Advances in neural information processing systems, Lawrence A Klein. Sensor and data fusion: a tool for information assessment and decision making, volume 324. SPIE press Bellinghamˆ ewa WA, Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages , Jan Rupnik and John Shawe-Taylor. Multi-view canonical correlation analysis. In Conference on Data Mining and Data Warehouses (SiKDD 2010), pages 1 4, / 25
25 reference II Vikas Sindhwani, S Sathiya Keerthi, and Olivier Chapelle. Deterministic annealing for semi-supervised kernel machines. In Proceedings of the 23rd international conference on Machine learning, pages ACM, Shiliang Sun and Guoqing Chao. Multi-view maximum entropy discrimination. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pages AAAI Press, Shipeng Yu, Balaji Krishnapuram, Harald Steck, RB Rao, and Rómer Rosales. Bayesian co-training. In Advances in Neural Information Processing Systems, pages , / 25
Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationKernel expansions with unlabeled examples
Kernel expansions with unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA tommi@ai.mit.edu Abstract Modern classification applications
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationPartially labeled classification with Markov random walks
Partially labeled classification with Markov random walks Martin Szummer MIT AI Lab & CBCL Cambridge, MA 0239 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 0239 tommi@ai.mit.edu Abstract To
More informationBayesian Co-Training
Bayesian Co-Training Shipeng Yu, Balai Krishnapuram, Rómer Rosales, Harald Steck, R. Bharat Rao CAD & Knowledge Solutions, Siemens Medical Solutions USA, Inc. firstname.lastname@siemens.com Abstract We
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationClustering and efficient use of unlabeled examples
Clustering and efficient use of unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA 02139 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 02139 tommi@ai.mit.edu Abstract Efficient
More informationPAC Generalization Bounds for Co-training
PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research
More informationMulti-view Laplacian Support Vector Machines
Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationMachine Learning and Related Disciplines
Machine Learning and Related Disciplines The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 8-12 (Mon.-Fri.) Yung-Kyun Noh Machine Learning Interdisciplinary
More informationMulti-View Dimensionality Reduction via Canonical Correlation Analysis
Technical Report TTI-TR-2008-4 Multi-View Dimensionality Reduction via Canonical Correlation Analysis Dean P. Foster University of Pennsylvania Sham M. Kakade Toyota Technological Institute at Chicago
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationKernel expansions with unlabeled examples
Kernel expansions with unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA tommi@ai.mit.edu Abstract Modern classification applications
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationA Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design
A Hybrid Generative/Discriminative Approach to Semi-supervised Classifier Design Akinori Fujino, Naonori Ueda, and Kazumi Saito NTT Communication Science Laboratories, NTT Corporation 2-4, Hikaridai, Seika-cho,
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationMultimodal context analysis and prediction
Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationSupport Vector Machine. Industrial AI Lab. Prof. Seungchul Lee
Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationPAC-Bayesian Generalization Bound for Multi-class Learning
PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationSynthesis of Maximum Margin and Multiview Learning using Unlabeled Data
Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationA Taxonomy for Semi-Supervised Learning Methods. 1 Introduction. 2 Paradigms for Semi-Supervised Learning
A Taxonomy for Semi-Supervised Learning Methods Seeger, Matthias Max Planck Institute for Biological Cybernetics P.O. Box 21 69, 72012 Tuebingen, Germany E-mail: seeger@tuebingen.mpg.de 1 Introduction
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationSemi-Supervised Learning with Very Few Labeled Training Examples
Semi-Supervised Learning with Very Few Labeled Training Examples Zhi-Hua Zhou De-Chuan Zhan Qiang Yang 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 Department of
More informationAdaptive Multi-Modal Sensing of General Concealed Targets
Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham,
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More information... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....
More informationBayesian Co-Training
Journal of Machine Learning Research 1 (11 649-68 Submitted 6/9; Revised 5/11; Published 9/11 Bayesian Co-Training Shipeng Yu Balai Krishnapuram Business Intelligence and Analytics Siemens Medical Solutions
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationOVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION
OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationNotes on Noise Contrastive Estimation (NCE)
Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationPramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF
Pramod K. Varshney EECS Department, Syracuse University varshney@syr.edu This research was sponsored by ARO grant W911NF-09-1-0244 2 Overview of Distributed Inference U i s may be 1. Local decisions 2.
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationComparison of Log-Linear Models and Weighted Dissimilarity Measures
Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationTowards Maximum Geometric Margin Minimum Error Classification
THE SCIENCE AND ENGINEERING REVIEW OF DOSHISHA UNIVERSITY, VOL. 50, NO. 3 October 2009 Towards Maximum Geometric Margin Minimum Error Classification Kouta YAMADA*, Shigeru KATAGIRI*, Erik MCDERMOTT**,
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More information