Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Size: px
Start display at page:

Download "Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury"

Transcription

1 Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1

2 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." arxiv 2013 Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." JMLR 2009 Parameswaran, Shibin, and Kilian Q. Weinberger. "Large margin multi-task metric learning. NIPS 2010 Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." AISTATS 2007 Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "Learning good edit similarities with generalization guarantees. ECML

3 Outline Introduction Supervised Mahalanobis Distance Learning Non Linear Methods Metric Learning for structured Data Conclusion Software Packages 3

4 Introduction The goal of metric learning is to adapt some pairwise real-valued metric function say Mahalanobis distance to a problem of interest using training data. The matrix M 0 in Mahalanobis distance, is the metric to be learnt/adapted. While following the below constraints. Must-link / cannot-link constraints (sometimes called positive / negative pairs): Relative constraints (sometimes called training triplets): 4

5 Introduction A metric learning algorithm basically aims at finding the parameters of the metric M such that it best agrees with constraints S, D, R This is typically formulated as an optimization problem that has the following general form: where l(m,s,d,r) is a loss function, R(M) is some regularizer on the parameters M of the learned metric and λ 0 is the regularization parameter 5

6 Introduction Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 6

7 Applications Metric learning can potentially be beneficial whenever the notion of metric between instances plays an important role. Some of the active research areas where metric learning finds its uses are Computer Vision: Image classification, Face Recognition Information Retrieval: Search Engines Bioinformatics: Comparing Sequences of DNA 7

8 Key Properties of a Metric Learning Algorithm Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 8

9 Supervised Mahalanobis Distance Learning 9

10 Introduction Learn Mahalanobis Distance Metric M S + d from the data Figure sourced from : Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." 10

11 Large Margin Nearest Neighbour (LMNN) Aimed at improving knn Minimizes knn leave-one-out classification error Similarity with SVN k nearest neighbours x i, x j together within a margin Instances x l of the same class (target neighbours) to be pulled of other classes (imposters) defined by are to be pushed away from the margin 11

12 LMNN Loss Function μ [0,1] Figure sourced from: Weinberger K. Q., & Saul L. K. "Distance Metric Learning for Large Margin Nearest Neighbour Classification". 12

13 Loss Function Minimization using SDP ξ ijl 0: Large margin inequality violation Semidefinite programming objective 13

14 Extensions to LMNN Multi-pass LMNN Iteratively using the transformation matrix L p from the p th pass to compute new target neighbours in the p + 1 th pass. Multi-metric LMNN Learn multiple locally linear transformations instead of a single global linear transformation. Kernel Version K ij = Φ x i T Φ(x j ) Pre-processing with PCA to get better distance estimates 14

15 LMNN : Application Figure sourced from: Weinberger K. Q., & Saul L. K. "Distance Metric Learning for Large Margin Nearest Neighbour Classification". 15

16 Multi-Task Metric Learning 16

17 Multi-Task Learning We assume that we are given Τ different but related tasks Each input (x i, y i ) belongs to exactly one of the tasks 1,..., T Learn T classifiers {w 1,, w T }, where each classifier w t is specifically dedicated for task t. Learn a global classifier w 0 that captures the commonality among all the tasks. An example x i Τ t is classified by the rule y i = sign(x i T (w 0 + w t )) 17

18 Multi-Task Learning The joint optimization problem is to minimize the following cost: where a + = max(0, a). The constants γ t 0 trade-off the regularization of the various tasks If γ 0 + then w 0 = 0 and all tasks are decoupled If γ 0 is small and γ t>0 + we obtain w t>0 = 0 and all the tasks share same decision function with weights w 0 18

19 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) The goal is to learn a metric d t (, ) for each of the T tasks that minimizes the knn leave-one-out classification error The distance for task t is defined by where M 0 is shared metric and M 1,, M T 0 are task specific metrics. To balance out the learning between different M 0 and the individual parameters M 1,, M T, we use the regularization given below 19

20 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) Regularization in Multi-task Metric Learning Convex optimization problem of LMNN 20

21 Multi-Task Large Margin Nearest Neighbor (mt-lmnn) Convex optimization problem of mt-lmnn 21

22 An application of mt-lmnn mt-lmnn can be used in text-dependent speech analysis application. The different tasks are speaker recognition, gender recognition, dialect recognition, emotion recognition The tasks are all different from each other but related in the sense that they all are reading one common sample of text. 22

23 Non-Linear Metric Learning 23

24 Non Linear Methods Most work in supervised metric learning has focused on linear metrics due convenience in deriving and optimizing convex formulations and less prone to over-fitting. But linear metrics are unable to capture nonlinear structure in the data Two possible solutions are Kernelization of Linear Methods Learning Nonlinear Forms of Metrics Both methods involve non-linear projection data into another space where the data is linearly separated and hence linear metrics work well. 24

25 Non-Linear Neighborhood Component Analysis Similarity between two input vectors x a, x b D[f x a W, f(x b W)] X is given by Where f x W is a function f: X Y mapping the input vectors in X into a feature space Y and parametrized by W If D is Euclidean distance and if f(x W) = Wx. The Euclidean distance in the feature space is then the Mahalanobis distance in input space: D f x a, f x b = x a x b T W T W(x a x b ) 25

26 Non-Linear Neighborhood Component Analysis Given a set of N labelled training cases x a, c a, a = 1,2,, N where x a εr d and c a ε 1,2,, C For a given training vector x a, probability of a selecting b as one of its neighbors is given by p ab = exp( d ab) σ z a exp( d az ) Let, d ab = f x a W f x b W 2 be the Euclidean distance metric f W is a multi-layer neural network, W is the weight vector. 26

27 Non-Linear Neighborhood Component Analysis Probability that point a belongs to class k depends on the relative proximity of all other data points that belong to class k p c a = k = b:c b =k p ab NCA objective is to maximize the expected number of correctly classified points on the training data: max a N a=1 log( p ab ) b:c a =c b 27

28 Non-Linear Neighborhood Component Analysis Figure sourced from : Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." 28

29 Applications of NNCA For any fixed distance metric D, any feature extraction technique could be thought of as learning a similarity metric The simple classification task of MNIST hand-written digits could be solved by NNCA. Even complicated tasks such as, face recognition, object detection etc. could be seen as potential application areas for NNCA. 29

30 Structured Data Metric Learning 30

31 Introduction Distance between structured data Strings/Graphs Edit Distance (Levenshtein) x, x Σ are two strings made of symbols from alphabet Σ Edit script - a set of insertions, deletions and substitutions to transform x to x Can be computed in O x. x time using dynamic programming Cost matrix C of size Σ + 1 ( Σ + 1) where C ij is the cost of substituting Σ i with Σ j Cost of cheapest edit script Example Levenshtein distance between kitten and sitting is 3 1. kitten sitten (substitution of s for k ) 2. sitten sittin (substitution of i for e ) 3. sittin sitting (insertion of g at the end) 31

32 Good Similarity Functions Balcan et al. introduced the concept of ε, γ, τ good similarity functions A similarity function K is ε, γ, τ good if an ε proportion of examples are on average 2γ more similar to reasonable examples of the same class than to reasonable examples of the opposite class, where a τ proportion of examples must be reasonable K can be used to build a linear separator in an explicit projection space that has a margin γ and error close to 1 ε Linear classifier α given by 32

33 Good Edit Similarity Learning (GESL) #(x, x ) is a Σ + 1 ( Σ + 1) size matrix, s.t., # i,j (x, x ) is the number of times edit operation (i, j) is used to turn x into x Edit function Similarity function 33

34 GESL Continued T = z i = x i, l i S L = z j = x j, l j N T i=1 N L j=1 Optimization criterion: is a set of N T training samples is a set of N L landmark examples Alternatively 34

35 GESL : Salient Features Can be optimized using Stochastic Gradient Descent Can be generalized to tree edit distance learning Takes advantage of both the positive samples as well as negative samples Has fast convergence and leads to more accurate and sparser classifiers Applications Natural Language Processing (spelling correction, etc.) DNA sequence matching, etc. 35

36 Metric Learning : Conclusions Metric learning for numerical data has reached a good level of maturity with improvements in terms of scalability, accuracy and generalization Much less work done in the field of metric learning for structured data. However, recent advances such as GESL are a step towards better theoretical understanding, scalability and flexibility. Exploring ways of modelling multimodal similarity that can tell different ways in which two instances are similar or dissimilar (similarity because of different features), degree of similarity as well as the reasons for similarity would bring the learned metrics closer to our own notions of similarity 36

37 References Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data." arxiv 2013 Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." JMLR 2009 Parameswaran, Shibin, and Kilian Q. Weinberger. "Large margin multi-task metric learning. NIPS 2010 Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure." AISTATS 2007 Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "Learning good edit similarities with generalization guarantees. ECML

38 Software Links Metric Learning Toolkit GESL 38

39 THANK YOU 39

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Tutorial on Metric Learning

Tutorial on Metric Learning Tutorial on Metric Learning Aurélien Bellet Department of Computer Science Viterbi School of Engineering University of Southern California Computational Intelligence and Learning Doctoral School October

More information

Distance Metric Learning

Distance Metric Learning Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision

More information

metric learning course

metric learning course metric learning course Cours RI Master DAC UPMC (Construit à partir d un tutorial ECML-PKDD 2015 (A. Bellet, M. Cord)) 1. Introduction 2. Linear metric learning 3. Nonlinear extensions 4. Large-scale metric

More information

Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =

More information

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group NON-LINEAR DIMENSIONALITY REDUCTION USING NEURAL NETORKS Ruslan Salakhutdinov Joint work with Geoff Hinton University of Toronto, Machine Learning Group Overview Document Retrieval Present layer-by-layer

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

arxiv: v1 [cs.lg] 9 Apr 2008

arxiv: v1 [cs.lg] 9 Apr 2008 On Kernelization of Supervised Mahalanobis Distance Learners Ratthachat Chatpatanasiri, Teesid Korsrilabutr, Pasakorn Tangchanachaianan, and Boonserm Kijsirikul arxiv:0804.1441v1 [cs.lg] 9 Apr 2008 Department

More information

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning Kun Song, Feiping Nie, Junwei Han, Xuelong

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Sparse Compositional Metric Learning

Sparse Compositional Metric Learning Sparse Compositional Metric Learning Yuan Shi and Aurélien Bellet and Fei Sha Department of Computer Science University of Southern California Los Angeles, CA 90089, USA {yuanshi,bellet,feisha}@usc.edu

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

The role of dimensionality reduction in classification

The role of dimensionality reduction in classification The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Bayesian Multitask Distance Metric Learning

Bayesian Multitask Distance Metric Learning Bayesian Multitask Distance Metric Learning Piyush Rai, Wenzhao Lian, Lawrence Carin ECE Department, Duke University Durham, NC 27708 {piyush.rai,wenzhao.lian,lcarin}@duke.edu Abstract We present a Bayesian

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric

More information

arxiv: v1 [stat.ml] 10 Dec 2015

arxiv: v1 [stat.ml] 10 Dec 2015 Boosted Sparse Non-linear Distance Metric Learning arxiv:1512.03396v1 [stat.ml] 10 Dec 2015 Yuting Ma Tian Zheng yma@stat.columbia.edu tzheng@stat.columbia.edu Department of Statistics Department of Statistics

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Joint Semi-Supervised Similarity Learning for Linear Classification

Joint Semi-Supervised Similarity Learning for Linear Classification Joint Semi-Supervised Similarity Learning for Linear Classification Maria-Irina Nicolae 1,2, Éric Gaussier2, Amaury Habrard 1, and Marc Sebban 1 1 Université Jean Monnet, Laboratoire Hubert Curien, France

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

metric learning for large-scale data

metric learning for large-scale data metric learning for large-scale data Aurélien Bellet MAGNET Project-Team, Inria Seminar Statistical Machine Learning (SMILE) in Paris April 28, 2016 a bit about me 2009-12: Ph.D., Université de Saint-Etienne

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Time Series Classification

Time Series Classification Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Nonlinear Metric Learning with Kernel Density Estimation

Nonlinear Metric Learning with Kernel Density Estimation 1 Nonlinear Metric Learning with Kernel Density Estimation Yujie He, Yi Mao, Wenlin Chen, and Yixin Chen, Senior Member, IEEE Abstract Metric learning, the task of learning a good distance metric, is a

More information

Kernel Density Metric Learning

Kernel Density Metric Learning Kernel Density Metric Learning Yujie He, Wenlin Chen, Yixin Chen Department of Computer Science and Engineering Washington University St. Louis, USA yujie.he@wustl.edu, wenlinchen@wustl.edu, chen@cse.wustl.edu

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

A metric learning perspective of SVM: on the relation of LMNN and SVM

A metric learning perspective of SVM: on the relation of LMNN and SVM A metric learning perspective of SVM: on the relation of LMNN and SVM Huyen Do Alexandros Kalousis Jun Wang Adam Woznica Computer Science Dept. University of Geneva Switzerland Computer Science Dept. University

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Day 3: Classification, logistic regression

Day 3: Classification, logistic regression Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

A metric learning perspective of SVM: on the relation of LMNN and SVM

A metric learning perspective of SVM: on the relation of LMNN and SVM A metric learning perspective of SVM: on the relation of LMNN and SVM Huyen Do Alexandros Kalousis Jun Wang Adam Woznica Business Informatics Computer Science Dept. Computer Science Dept. University of

More information

Local Metric Learning on Manifolds with Application to Query based Operations

Local Metric Learning on Manifolds with Application to Query based Operations Local Metric Learning on Manifolds with Application to Query based Operations Karim Abou-Moustafa and Frank Ferrie {karimt,ferrie}@cim.mcgill.ca The Artificial Perception Laboratory Centre for Intelligent

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues O. L. Mangasarian and E. W. Wild Presented by: Jun Fang Multisurface Proximal Support Vector Machine Classification

More information

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions Notation S pattern space X feature vector X = [x 1,...,x l ] l = dim{x} number of features X feature space K number of classes ω i class indicator Ω = {ω 1,...,ω K } g(x) discriminant function H decision

More information

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Kernel Density Metric Learning

Kernel Density Metric Learning Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2013-28 2013 Kernel Density

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Geometric Mean Metric Learning

Geometric Mean Metric Learning Pourya Habib Zadeh Reshad Hosseini School of ECE, College of Engineering, University of Tehran, Tehran, Iran Suvrit Sra Massachusetts Institute of Technology, Cambridge, MA, USA P.HABIBZADEH@UT.AC.IR RESHAD.HOSSEINI@UT.AC.IR

More information

Temporal and Frequential Metric Learning for Time Series knn Classication

Temporal and Frequential Metric Learning for Time Series knn Classication Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Temporal and Frequential Metric Learning for Time Series knn Classication Cao-Tri Do 123, Ahlame Douzal-Chouakria

More information

t-sne and its theoretical guarantee

t-sne and its theoretical guarantee t-sne and its theoretical guarantee Ziyuan Zhong Columbia University July 4, 2018 Ziyuan Zhong (Columbia University) t-sne July 4, 2018 1 / 72 Overview Timeline: PCA (Karl Pearson, 1901) Manifold Learning(Isomap

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

An Invariant Large Margin Nearest Neighbour Classifier

An Invariant Large Margin Nearest Neighbour Classifier An Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar P.H.S. Torr A. Zisserman Oxford Brookes University University of Oxford {pkmudigonda,philiptorr}@brookes.ac.uk http://cms.brookes.ac.uk/research/visiongroup

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information