Semi Supervised Distance Metric Learning

Similar documents
Relevance Aggregation Projections for Image Retrieval

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Fantope Regularization in Metric Learning

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

CS-E4830 Kernel Methods in Machine Learning

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Distance Metric Learning

Nonlinear Dimensionality Reduction. Jose A. Costa

Manifold Regularization

An Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization

Maximum Margin Matrix Factorization

University of Florida CISE department Gator Engineering. Clustering Part 1

Spectral Feature Selection for Supervised and Unsupervised Learning

Metric Embedding for Kernel Classification Rules

Manifold Coarse Graining for Online Semi-supervised Learning

Semi-Supervised Learning by Multi-Manifold Separation

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

1 Graph Kernels by Spectral Transforms

arxiv: v1 [cs.lg] 9 Apr 2008

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

EECS 275 Matrix Computation

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Semi-Supervised Learning

Iterative Laplacian Score for Feature Selection

LECTURE NOTE #11 PROF. ALAN YUILLE

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Statistical and Computational Analysis of Locality Preserving Projection

Support Vector Machine (SVM) and Kernel Methods

Active and Semi-supervised Kernel Classification

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Graph Partitioning Using Random Walks

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Unsupervised dimensionality reduction

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

A graph based approach to semi-supervised learning

Predicting Graph Labels using Perceptron. Shuang Song

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Joint Semi-Supervised Similarity Learning for Linear Classification

arxiv: v1 [cs.cv] 25 Dec 2012

Robust Metric Learning by Smooth Optimization

Support Vector Machine (SVM) and Kernel Methods

Information-Theoretic Metric Learning

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Local Metric Learning on Manifolds with Application to Query based Operations

L11: Pattern recognition principles

Jeff Howbert Introduction to Machine Learning Winter

Statistical Pattern Recognition

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Dimensionality Reduction:

Statistical Translation, Heat Kernels, and Expected Distances

An Invariant Large Margin Nearest Neighbour Classifier

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Dimensionality Reduc1on

Graph-Laplacian PCA: Closed-form Solution and Robustness

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Semi-supervised Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Metric Learning From Relative Comparisons by Minimizing Squared Residual

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Learning to Learn and Collaborative Filtering

What is semi-supervised learning?

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Semi-Supervised Learning of Speech Sounds

Convex Optimization in Classification Problems

Lecture: Some Practical Considerations (3 of 4)

Data-dependent representations: Laplacian Eigenmaps

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

Justin Solomon MIT, Spring 2017

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

The Graph Realization Problem

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

How to learn from very few examples?

Neural Networks, Convexity, Kernels and Curses

Graph Laplacian Regularization for Large-Scale Semidefinite Programming

CSE446: non-parametric methods Spring 2017

Tutorial on Metric Learning

Nonlinear Dimensionality Reduction

Dimensionality Reduction AShortTutorial

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Spherical Euclidean Distance Embedding of a Graph

Locally-biased analytics

Analysis of Spectral Kernel Design based Semi-supervised Learning

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Kernel Density Metric Learning

Support Vector Machine (SVM) and Kernel Methods

Gaussian Mixture Distance for Information Retrieval

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Supervised Metric Learning with Generalization Guarantees

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Transcription:

Semi Supervised Distance Metric Learning wliu@ee.columbia.edu

Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research

Background Euclidean distance d( x, x ) = ( x x ) ( x x ) = x x 1 2 1 2 1 2 1 2 2 Mahalanobis distance d ( x, x ) = ( x x ) P ( x x ) M 1 1 2 1 2 1 2 We find a distance metric in terms of a square matrix d ( x, x ) = ( x x ) ( x x ) = x x 1 2 1 2 1 2 1 2 2

Background d d is positive semi definite. d m linear subspace U can be learned as a low rank approximation of the metric = UU. Under U, the distance becomes the in subspace distance d ( x, x ) = ( x x ) ( x x ) = U ( x x ) 1 2 1 2 1 2 1 2 Semi supervised settings: 1. labeled and unlabeled data. 2. Partially pairwise similarities&dissimilarities. his paper deals with setting 2 using a geometric intuition: similar points are near each other but dissimilar points are far away under the target metric. 2

Related Work Supervised distance metric learning. Globerson et al, Metric Learning by Collapsing Classes, in NIPS 18, 2006. K. Weinberger et al, Distance Metric Learning for Large Margin Nearest Neighbor Classification, in NIPS 18, 2006. Semi supervised distance metric learning E. P. Xing et al, Distance Metric Learning, with pplication to Clustering with Side Information, in NIPS 15, 2003.. Bar Hillel et al, Learning Mahalanobis Metric from Equivalence Constraints, JMLR, 6:937 965, 2005. S. C. Hoi, W. Liu, M. R. Lyu, and W. Y. Ma, Learning Distance Metrics with Contextual Constraints for Image Retrieval, in Proc. CVPR, 2006. (My Paper)

Xing et al. NIPS 02 min ( x, x ) S i j x i x j 2 st.. x x 1 ( x, x ) D i j i j 0 S: a set of positive pairs, i.e., similar pairs. D: a set of negative pairs, i.e., dissimilar pairs.

Xing et al. NIPS 02 If the constraint is replaced with xi xj 1, is ( x, x ) D always rank 1, which implies that the data are always projected onto a hyperline. Learning with only labeled points might result in overfitting. pply to clustering, low classification performance. i j 2

Graph Laplacian d n Construct a graph G(V,E,W) given X = [ x1, x2,..., x n ]. Set the weight matrix by W 1 if xi is among k NN of xj ij = or xj is among k NN of xi. he Laplacian matrix L=D W. n n g( y) = ( y y ) W = y Ly i= 1 j= 1 2 i j ij Linear : y = X u g( u) = u XLX u y g( y) n is a 1D embedding such as a linear embedding. measure the extent of smoothness.

Graph Laplacian Regularization ssume the subspace related to the desired metric d m is U = [ u, and then formulate a smoothness 1, u2,..., u m ] term which is linear in. m g( ) = u XLX u = tr( U XLX U) i= 1 i = tr( XLX UU ) = tr( XLX ) Notice tr( B) = tr( B) i

Learning Framework min 2 2 t+ c x x c x x s i j D i j ( x, x ) S ( x, x ) D i j i j st.. tr( XLX ) t 0 Introduce a slack variable t that will encourage Laplacian regularization. he above optimization is clearly a standard form of Semidefinite Programs (SDP), which can be solved efficiently with global optimum found by existing convex optimization packages, such as SeDuMi.

Collaborative Image Retrieval Collect the log data of user relevance feedback. For each log session, we can convert it into similar and dissimilar pairwise constraints. Specifically, given a specific query q, for any two images xi and xj, if they are marked as relevant, we will put them into the set of positive pairs Sq; if one of them is marked as relevant, and the other is marked as irrelevant, we will put them into the set of negative pairs Dq. We denote the log data as Ω = {( Sq, Dq) q = 1,..., Q} where Q is the number of log sessions.

Laplacian Regularized Metric Learning min t+ γ tr( ( x x )( x x ) ) s i j i j q= 1( x, x ) S i j q γ tr( ( x x )( x x ) ) D i j i j q= 1( x, x ) D i j q st.. tr( XLX ) t Q Q 0

Large Margin Version min g( ) + c t+ c 1 + t x x s D i j ( x, x ) D i j 2 + x = + 2 st.. x x t, ( x, x ) S i j i j 2 2 x x < x x, ( x, x, x ) R i j i l i j l 0 max{ x,0}is a linear hinge loss. R: a set of triples of points known as relative comparisons.

Discussions Because all of the objection function and constraints are linear in, this learning problem can also be cast into an instance of semi definite programming (SDP). he constraints charactering relative comparisons are optional. If the original dimension of data is very high (~10^3), PC must be done to reduce the dimension to a lower one (~10^2) ahead of metric learning time.

Semi Supervised Clustering Work on the similarity/dissimilarity settings. Spectral clustering SSML + constrained K means Spectral embedding + SSML + constrained K means

hanks! http://www.ee.columbia.edu/~wliu/