Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury
|
|
- Gerald Morris
- 5 years ago
- Views:
Transcription
1 Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1
2 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." arxiv 2013 Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." JMLR 2009 Parameswaran, Shibin, and Kilian Q. Weinberger. "Large margin multi-task metric learning. NIPS 2010 Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." AISTATS 2007 Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "Learning good edit similarities with generalization guarantees. ECML
3 Outline Introduction Supervised Mahalanobis Distance Learning Non Linear Methods Metric Learning for structured Data Conclusion Software Packages 3
4 Introduction The goal of metric learning is to adapt some pairwise real-valued metric function say Mahalanobis distance to a problem of interest using training data. The matrix M 0 in Mahalanobis distance, is the metric to be learnt/adapted. While following the below constraints. Must-link / cannot-link constraints (sometimes called positive / negative pairs): Relative constraints (sometimes called training triplets): 4
5 Introduction A metric learning algorithm basically aims at finding the parameters of the metric M such that it best agrees with constraints S, D, R This is typically formulated as an optimization problem that has the following general form: where l(m,s,d,r) is a loss function, R(M) is some regularizer on the parameters M of the learned metric and λ 0 is the regularization parameter 5
6 Introduction Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 6
7 Applications Metric learning can potentially be beneficial whenever the notion of metric between instances plays an important role. Some of the active research areas where metric learning finds its uses are Computer Vision: Image classification, Face Recognition Information Retrieval: Search Engines Bioinformatics: Comparing Sequences of DNA 7
8 Key Properties of a Metric Learning Algorithm Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 8
9 Supervised Mahalanobis Distance Learning 9
10 Introduction Learn Mahalanobis Distance Metric M S + d from the data Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 10
11 Large Margin Nearest Neighbour (LMNN) Aimed at improving knn Minimizes knn leave-one-out classification error Similarity with SVN k nearest neighbours x i, x j together within a margin Instances x l of the same class (target neighbours) to be pulled of other classes (imposters) defined by are to be pushed away from the margin 11
12 LMNN Loss Function μ [0,1] Figure sourced from: Weinberger K. Q., & Saul L. K. "Distance Metric Learning for Large Margin Nearest Neighbour Classification". 12
13 Loss Function Minimization using SDP ξ ijl 0: Large margin inequality violation Semidefinite programming objective 13
14 Extensions to LMNN Multi-pass LMNN Iteratively using the transformation matrix L p from the p th pass to compute new target neighbours in the p + 1 th pass. Multi-metric LMNN Learn multiple locally linear transformations instead of a single global linear transformation. Kernel Version K ij = Φ x i T Φ(x j ) Pre-processing with PCA to get better distance estimates 14
15 LMNN : Application Figure sourced from: Weinberger K. Q., & Saul L. K. "Distance Metric Learning for Large Margin Nearest Neighbour Classification". 15
16 Multi-Task Metric Learning 16
17 Multi-Task Learning We assume that we are given Τ different but related tasks Each input (x i, y i ) belongs to exactly one of the tasks 1,..., T Learn T classifiers {w 1,, w T }, where each classifier w t is specifically dedicated for task t. Learn a global classifier w 0 that captures the commonality among all the tasks. An example x i Τ t is classified by the rule y i = sign(x i T (w 0 + w t )) 17
18 Multi-Task Learning The joint optimization problem is to minimize the following cost: where a + = max(0, a). The constants γ t 0 trade-off the regularization of the various tasks If γ 0 + then w 0 = 0 and all tasks are decoupled If γ 0 is small and γ t>0 + we obtain w t>0 = 0 and all the tasks share same decision function with weights w 0 18
19 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) The goal is to learn a metric d t (, ) for each of the T tasks that minimizes the knn leave-one-out classification error The distance for task t is defined by where M 0 is shared metric and M 1,, M T 0 are task specific metrics. To balance out the learning between different M 0 and the individual parameters M 1,, M T, we use the regularization given below 19
20 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) Regularization in Multi-task Metric Learning Convex optimization problem of LMNN 20
21 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) Convex optimization problem of mt-lmnn 21
22 An application of mt-lmnn mt-lmnn can be used in text-dependent speech analysis application. The different tasks are speaker recognition, gender recognition, dialect recognition, emotion recognition The tasks are all different from each other but related in the sense that they all are reading one common sample of text. 22
23 Non-Linear Metric Learning 23
24 Non Linear Methods Most work in supervised metric learning has focused on linear metrics due convenience in deriving and optimizing convex formulations and less prone to over-fitting. But linear metrics are unable to capture nonlinear structure in the data Two possible solutions are Kernelization of Linear Methods Learning Nonlinear Forms of Metrics Both methods involve non-linear projection data into another space where the data is linearly separated and hence linear metrics work well. 24
25 Non-Linear Neighborhood Component Analysis Similarity between two input vectors x a, x b D[f x a W, f(x b W)] X is given by Where f x W is a function f: X Y mapping the input vectors in X into a feature space Y and parametrized by W If D is Euclidean distance and if f(x W) = Wx. The Euclidean distance in the feature space is then the Mahalanobis distance in input space: D f x a, f x b = x a x b T W T W(x a x b ) 25
26 Non-Linear Neighborhood Component Analysis Given a set of N labelled training cases x a, c a, a = 1,2,, N where x a εr d and c a ε 1,2,, C For a given training vector x a, probability of a selecting b as one of its neighbors is given by p ab = exp( d ab) σ z a exp( d az ) Let, d ab = f x a W f x b W 2 be the Euclidean distance metric f W is a multi-layer neural network, W is the weight vector. 26
27 Non-Linear Neighborhood Component Analysis Probability that point a belongs to class k depends on the relative proximity of all other data points that belong to class k p c a = k = b:c b =k p ab NCA objective is to maximize the expected number of correctly classified points on the training data: max a N a=1 log( p ab ) b:c a =c b 27
28 Non-Linear Neighborhood Component Analysis Figure sourced from : Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." 28
29 Applications of NNCA For any fixed distance metric D, any feature extraction technique could be thought of as learning a similarity metric The simple classification task of MNIST hand-written digits could be solved by NNCA. Even complicated tasks such as, face recognition, object detection etc. could be seen as potential application areas for NNCA. 29
30 Structured Data Metric Learning 30
31 Introduction Distance between structured data Strings/Graphs Edit Distance (Levenshtein) x, x Σ are two strings made of symbols from alphabet Σ Edit script - a set of insertions, deletions and substitutions to transform x to x Can be computed in O x. x time using dynamic programming Cost matrix C of size Σ + 1 ( Σ + 1) where C ij is the cost of substituting Σ i with Σ j Cost of cheapest edit script Example Levenshtein distance between kitten and sitting is 3 1. kitten sitten (substitution of s for k ) 2. sitten sittin (substitution of i for e ) 3. sittin sitting (insertion of g at the end) 31
32 Good Similarity Functions Balcan et al. introduced the concept of ε, γ, τ good similarity functions A similarity function K is ε, γ, τ good if an ε proportion of examples are on average 2γ more similar to reasonable examples of the same class than to reasonable examples of the opposite class, where a τ proportion of examples must be reasonable K can be used to build a linear separator in an explicit projection space that has a margin γ and error close to 1 ε Linear classifier α given by 32
33 Good Edit Similarity Learning (GESL) #(x, x ) is a Σ + 1 ( Σ + 1) size matrix, s.t., # i,j (x, x ) is the number of times edit operation (i, j) is used to turn x into x Edit function Similarity function 33
34 GESL Continued T = z i = x i, l i S L = z j = x j, l j N T i=1 N L j=1 Optimization criterion: is a set of N T training samples is a set of N L landmark examples Alternatively 34
35 GESL : Salient Features Can be optimized using Stochastic Gradient Descent Can be generalized to tree edit distance learning Takes advantage of both the positive samples as well as negative samples Has fast convergence and leads to more accurate and sparser classifiers Applications Natural Language Processing (spelling correction, etc.) DNA sequence matching, etc. 35
36 Metric Learning : Conclusions Metric learning for numerical data has reached a good level of maturity with improvements in terms of scalability, accuracy and generalization Much less work done in the field of metric learning for structured data. However, recent advances such as GESL are a step towards better theoretical understanding, scalability and flexibility. Exploring ways of modelling multimodal similarity that can tell different ways in which two instances are similar or dissimilar (similarity because of different features), degree of similarity as well as the reasons for similarity would bring the learned metrics closer to our own notions of similarity 36
37 References Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." arxiv 2013 Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." JMLR 2009 Parameswaran, Shibin, and Kilian Q. Weinberger. "Large margin multi-task metric learning. NIPS 2010 Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." AISTATS 2007 Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "Learning good edit similarities with generalization guarantees. ECML
38 Software Links Metric Learning Toolkit GESL 38
39 THANK YOU 39
Supervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationTutorial on Metric Learning
Tutorial on Metric Learning Aurélien Bellet Department of Computer Science Viterbi School of Engineering University of Southern California Computational Intelligence and Learning Doctoral School October
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationmetric learning course
metric learning course Cours RI Master DAC UPMC (Construit à partir d un tutorial ECML-PKDD 2015 (A. Bellet, M. Cord)) 1. Introduction 2. Linear metric learning 3. Nonlinear extensions 4. Large-scale metric
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationRuslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group
NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationarxiv: v1 [cs.lg] 9 Apr 2008
On Kernelization of Supervised Mahalanobis Distance Learners Ratthachat Chatpatanasiri, Teesid Korsrilabutr, Pasakorn Tangchanachaianan, and Boonserm Kijsirikul arxiv:0804.1441v1 [cs.lg] 9 Apr 2008 Department
More informationParameter Free Large Margin Nearest Neighbor for Distance Metric Learning
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning Kun Song, Feiping Nie, Junwei Han, Xuelong
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationSparse Compositional Metric Learning
Sparse Compositional Metric Learning Yuan Shi and Aurélien Bellet and Fei Sha Department of Computer Science University of Southern California Los Angeles, CA 90089, USA {yuanshi,bellet,feisha}@usc.edu
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationThe role of dimensionality reduction in classification
The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu
More informationBayesian Multitask Distance Metric Learning
Bayesian Multitask Distance Metric Learning Piyush Rai, Wenzhao Lian, Lawrence Carin ECE Department, Duke University Durham, NC 27708 {piyush.rai,wenzhao.lian,lcarin}@duke.edu Abstract We present a Bayesian
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationarxiv: v1 [stat.ml] 10 Dec 2015
Boosted Sparse Non-linear Distance Metric Learning arxiv:1512.03396v1 [stat.ml] 10 Dec 2015 Yuting Ma Tian Zheng yma@stat.columbia.edu tzheng@stat.columbia.edu Department of Statistics Department of Statistics
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationJoint Semi-Supervised Similarity Learning for Linear Classification
Joint Semi-Supervised Similarity Learning for Linear Classification Maria-Irina Nicolae 1,2, Éric Gaussier2, Amaury Habrard 1, and Marc Sebban 1 1 Université Jean Monnet, Laboratoire Hubert Curien, France
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationmetric learning for large-scale data
metric learning for large-scale data Aurélien Bellet MAGNET Project-Team, Inria Seminar Statistical Machine Learning (SMILE) in Paris April 28, 2016 a bit about me 2009-12: Ph.D., Université de Saint-Etienne
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationTime Series Classification
Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationNonlinear Metric Learning with Kernel Density Estimation
1 Nonlinear Metric Learning with Kernel Density Estimation Yujie He, Yi Mao, Wenlin Chen, and Yixin Chen, Senior Member, IEEE Abstract Metric learning, the task of learning a good distance metric, is a
More informationKernel Density Metric Learning
Kernel Density Metric Learning Yujie He, Wenlin Chen, Yixin Chen Department of Computer Science and Engineering Washington University St. Louis, USA yujie.he@wustl.edu, wenlinchen@wustl.edu, chen@cse.wustl.edu
More informationA Posteriori Corrections to Classification Methods.
A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)
More informationA metric learning perspective of SVM: on the relation of LMNN and SVM
A metric learning perspective of SVM: on the relation of LMNN and SVM Huyen Do Alexandros Kalousis Jun Wang Adam Woznica Computer Science Dept. University of Geneva Switzerland Computer Science Dept. University
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationLearning a Kernel Matrix for Nonlinear Dimensionality Reduction
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationA metric learning perspective of SVM: on the relation of LMNN and SVM
A metric learning perspective of SVM: on the relation of LMNN and SVM Huyen Do Alexandros Kalousis Jun Wang Adam Woznica Business Informatics Computer Science Dept. Computer Science Dept. University of
More informationLocal Metric Learning on Manifolds with Application to Query based Operations
Local Metric Learning on Manifolds with Application to Query based Operations Karim Abou-Moustafa and Frank Ferrie {karimt,ferrie}@cim.mcgill.ca The Artificial Perception Laboratory Centre for Intelligent
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationLearning a kernel matrix for nonlinear dimensionality reduction
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationMultisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues
Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification
More informationLarge-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationNotation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions
Notation S pattern space X feature vector X = [x 1,...,x l ] l = dim{x} number of features X feature space K number of classes ω i class indicator Ω = {ω 1,...,ω K } g(x) discriminant function H decision
More informationECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression
ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationKernel Density Metric Learning
Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2013-28 2013 Kernel Density
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationGeometric Mean Metric Learning
Pourya Habib Zadeh Reshad Hosseini School of ECE, College of Engineering, University of Tehran, Tehran, Iran Suvrit Sra Massachusetts Institute of Technology, Cambridge, MA, USA P.HABIBZADEH@UT.AC.IR RESHAD.HOSSEINI@UT.AC.IR
More informationTemporal and Frequential Metric Learning for Time Series knn Classication
Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Temporal and Frequential Metric Learning for Time Series knn Classication Cao-Tri Do 123, Ahlame Douzal-Chouakria
More informationt-sne and its theoretical guarantee
t-sne and its theoretical guarantee Ziyuan Zhong Columbia University July 4, 2018 Ziyuan Zhong (Columbia University) t-sne July 4, 2018 1 / 72 Overview Timeline: PCA (Karl Pearson, 1901) Manifold Learning(Isomap
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationAn Invariant Large Margin Nearest Neighbour Classifier
An Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar P.H.S. Torr A. Zisserman Oxford Brookes University University of Oxford {pkmudigonda,philiptorr}@brookes.ac.uk http://cms.brookes.ac.uk/research/visiongroup
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More information