Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Similar documents
Non-linear Dimensionality Reduction

Unsupervised dimensionality reduction

Nonlinear Dimensionality Reduction. Jose A. Costa

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Nonlinear Dimensionality Reduction

Statistical and Computational Analysis of Locality Preserving Projection

Discriminant Uncorrelated Neighborhood Preserving Projections

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Dimensionality Reduction:

Semi Supervised Distance Metric Learning

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

Graph-Laplacian PCA: Closed-form Solution and Robustness

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Manifold Learning and it s application

Robust Laplacian Eigenmaps Using Global Information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Lecture 10: Dimension Reduction Techniques

Apprentissage non supervisée

Gaussian Process Latent Random Field

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Intrinsic Structure Study on Whale Vocalizations

Nonlinear Dimensionality Reduction

Dimension Reduction and Low-dimensional Embedding

EECS 275 Matrix Computation

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Adaptive Affinity Matrix for Unsupervised Metric Learning

Dimensionality Reduc1on

Learning a kernel matrix for nonlinear dimensionality reduction

Statistical Pattern Recognition

Information-Theoretic Metric Learning

Data-dependent representations: Laplacian Eigenmaps

Local Learning Projections

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Is Manifold Learning for Toy Data only?

A Duality View of Spectral Methods for Dimensionality Reduction

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Linear and Non-Linear Dimensionality Reduction

Localized Sliced Inverse Regression

Distance Metric Learning

Dimensionality Reduction AShortTutorial

Iterative Laplacian Score for Feature Selection

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

Locally Linear Embedded Eigenspace Analysis

A Duality View of Spectral Methods for Dimensionality Reduction

Locality Preserving Projections

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Informative Laplacian Projection

Statistical Machine Learning

An Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization

Discriminative K-means for Clustering

Manifold Learning: Theory and Applications to HRI

LECTURE NOTE #11 PROF. ALAN YUILLE

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

L26: Advanced dimensionality reduction

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Improved Local Coordinate Coding using Local Tangents

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang

Nonlinear Learning using Local Coordinate Coding

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Geometric Mean Metric Learning

Nonlinear Manifold Learning Summary

MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE. Sam Roweis. University of Toronto Department of Computer Science. [Google: Sam Toronto ]

Maximum likelihood estimation of Riemannian metrics from Euclidean data

L 2,1 Norm and its Applications

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

Data dependent operators for the spatial-spectral fusion problem

Spectral Methods for Semi-supervised Manifold Learning

Multiscale Manifold Learning

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Analysis of Spectral Kernel Design based Semi-supervised Learning

Sparse Compositional Metric Learning

Graphs, Geometry and Semi-supervised Learning

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Online Passive-Aggressive Algorithms

Regularization Predicts While Discovering Taxonomy Youssef Mroueh, Tomaso Poggio, and Lorenzo Rosasco

Manifold Coarse Graining for Online Semi-supervised Learning

Efficient Stochastic Optimization for Low-Rank Distance Metric Learning

Metric Embedding for Kernel Classification Rules

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Spectral Techniques for Clustering

Fantope Regularization in Metric Learning

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Diffusion Wavelets and Applications

arxiv: v1 [cs.lg] 9 Apr 2008

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Regression on Manifolds Using Kernel Dimension Reduction

A graph based approach to semi-supervised learning

Relevance Aggregation Projections for Image Retrieval

metric learning course

Transcription:

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1

Outline Part I - Applications Motivation and Introduction Patient similarity application Part II - Methods Supervised Metric Learning Unsupervised Metric Learning Semi-supervised Metric Learning Challenges and Opportunities for Metric Learning 2

Metric Learning Meta-algorithm Embed the data in some space Usually formulate as an optimization problem Define objective function Separation based Geometry based Information theoretic based Solve optimization problem Eigenvalue decomposition Semi-definite programming Gradient descent Bregman projection Euclidean distance on the embedded space 3

Unsupervised metric learning Learning pairwise distance metric purely based on the data, i.e., without any supervision nonlinear Type Kernelization Laplacian Eigenmap Locally Linear Embedding Isometric Feature Mapping Stochastic Neighborhood Embedding Kernelization linear Principal Component Analysis Locality Preserving Projections Global Local Objective 4

Outline Part I - Applications Motivation and Introduction Patient similarity application Part II - Methods Unsupervised Metric Learning Supervised Metric Learning Semi-supervised Metric Learning Challenges and Opportunities for Metric Learning 5

Principal Component Analysis (PCA) Find successive projection directions which maximize the data variance 2nd PC 1st PC Geometry based objective Eigenvalue decomposition Linear mapping Global method The pairwise Euclidean distances will not change after PCA 6

Kernel Mapping Map the data into some high-dimensional feature space, such that the nonlinear problem in the original space becomes the linear problem in the high-dimensional feature space. We do not need to know the explicit form of the mapping, but we need to define a kernel function. Figure from http://omega.albany.edu:8008/machine-learning-dir/notes-dir/ker1/ker1.pdf 7

Kernel Principal Component Analysis (KPCA) Perform PCA in the feature space Original data With the representer theorem Embedded data after KPCA Geometry based objective Eigenvalue decomposition Nonlinear mapping Global method Figure from http://en.wikipedia.org/wiki/kernel_principal_component_analysis 8

Locality Preserving Projections (LPP) Find linear projections that can preserve the localities of the data set PCA Data Degree Matrix Laplacian Matrix X-axis is the major directio LPP Geometry based objective Generalized eigenvalue decomposition Linear mapping Local method X. He and P. Niyogi. Locality Preserving Projections. NIPS 2003. Data Y-axis is the major projection direction 9

Laplacian Embedding (LE) LE: Find embeddings that can preserve the localities of the data set The embedded Xi The relationship between Xi and Xj Geometry based objective Generalized eigenvalue decomposition Nonlinear mapping Local method M. Belkin, P. Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, June 2003; 15 (6):1373-1396 10

Locally Linear Embedding (LLE) Obtain data relationships Obtain data embeddings Geometry based objective Generalized eigenvalue decomposition Nonlinear mapping Local method Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, v.290 no.5500, Dec. 22, 2000. pp.2323 2326. 11

Isometric Feature Mapping (ISOMAP) 1. Construct the neighborhood graph 2. Compute the shortest path length (geodesic distance) between pairwise data points 3. Recover the low-dimensional embeddings of the data by Multi- Dimensional Scaling (MDS) with preserving those geodesic distances Geometry based objective Eigenvalue decomposition Nonlinear mapping Local method J. B. Tenenbaum, V. de Silva and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000. 12

Stochastic Neighbor Embedding (SNE) The probability that i picks j as its neighbor Neighborhood distribution in the embedded space Neighborhood distribution preservation Information theoretic objective Gradient descent Nonlinear mapping Local method Hinton, G. E. and Roweis, S. Stochastic Neighbor Embedding. NIPS 2003. 13

Summary: Unsupervised Distance Metric Learning PCA LPP LLE Isomap SNE Local Global Linear Nonlinear Separation Geometry Information theoretic Extensibility 14

Outline Unsupervised Metric Learning Supervised Metric Learning Semi-supervised Metric Learning Challenges and Opportunities for Metric Learning 15

Supervised Metric Learning Learning pairwise distance metric with data and their supervision, i.e., data labels and pairwise constraints (must-links and cannot-links) Type nonlinear Kernelization Kernelization linear Linear Discriminant Analysis Eric Xing s method Information Theoretic Metric Learning Global Locally Supervised Metric Learning Local Objective 16

Linear Discriminant Analysis (LDA) Suppose the data are from C different classes Separation based objective Eigenvalue decomposition Linear mapping Global method Figure from http://cmp.felk.cvut.cz/cmp/ software/stprtool/examples/ ldapca_example1.gif Keinosuke Fukunaga. Introduction to statistical pattern recognition. Academic Press Professional, Inc. 1990. Y. Jia, F. Nie, C. Zhang. Trace Ratio Problem Revisited. IEEE Trans. Neural Networks, 20:4, 2009. 17

Distance Metric Learning with Side Information (Eric Xing s method) Must-link constraint set: each element is a pair of data that come from the same class or cluster Cannot-link constraint set: each element is a pair of data that come from different classes or clusters Separation based objective Semi-definite programming No direct mapping Global method E.P. Xing, A.Y. Ng, M.I. Jordan and S. Russell, Distance Metric Learning, with application to Clustering with side-information. NIPS 16. 2002. 18

Information Theoretic Metric Learning (ITML) Suppose we have an initial Mahalanobis distance parameterized by, a set of must-link constraints and a set of cannot-link constraints. ITML solves the following optimization problem Information theoretic objective Bregman projection No direct mapping Global method Scalable and efficient Jason Davis, Brian Kulis, Prateek Jain, Suvrit Sra, & Inderjit Dhillon. Information-Theoretic Metric Learning. ICML 2007 Best paper award 19

Summary: Supervised Metric Learning LDA Xing ITML LSML Label Pairwise constraints Local Global Linear Nonlinear Separation Geometry Information theoretic Extensibility 20

Outline Unsupervised Metric Learning Supervised Metric Learning Semi-supervised Metric Learning Challenges and Opportunities for Metric Learning 21

Semi-Supervised Metric Learning Learning pairwise distance metric using data with and without supervision Type nonlinear Kernelization Semi-Supervised Dimensionality Reduction linear Constraint Margin Maximization Global Laplacian Regularized Metric Learning Local Objective 22

Constraint Margin Maximization (CMM) Scatterness Compactness Projected Constraint Margin Maximum Variance No label information Geometry & Separation based objective Eigenvalue decomposition Linear mapping Global method F. Wang, S. Chen, T. Li, C. Zhang. Semi-Supervised Metric Learning by Maximizing Constraint Margin. CIKM08 23

Laplacian Regularized Metric Learning (LRML) Smoothness term Compactness term Scatterness term Geometry & Separation based objective Semi-definite programming No mapping Local & Global method S. Hoi, W. Liu, S. Chang. Semi-Supervised Distance Metric Learning for Collaborative Image Retrieval. CVPR08 24

Semi-Supervised Dimensionality Reduction (SSDR) Most of the nonlinear dimensionality reduction algorithms can finally be formalized as solving the following optimization problem Low-dimensional embeddings Algorithm specific symmetric matrix Separation based objective Eigenvalue decomposition No mapping Local method X. Yang, H. Fu, H. Zha, J. Barlow. Semi-supervised nonlinear dimensionality reduction. ICML 2006. 25

Summary: Semi-Supervised Distance Metric Learning CMM LRML SSDM Label Pairwise constraints Embedded coordinates Local Global Linear Nonlinear Separation Locality-Preservation Information theoretic Extensibility 26

Outline Unsupervised Metric Learning Supervised Metric Learning Semi-supervised Metric Learning Challenges and Opportunities for Metric Learning 27

Scalability Online (Sequential) Learning (Shalev-Shwartz,Singer and Ng, ICML2004) (Davis et al. ICML2007) Distributed Learning Efficiency Labeling efficiency (Yang, Jin and Sukthankar, UAI2007) Updating efficiency (Wang, Sun, Hu and Ebadollahi, SDM2011) Heterogeneity Data heterogeneity (Wang, Sun and Ebadollahi, SDM2011) Task heterogeneity (Zhang and Yeung, KDD2011) Evaluation 28

Shai Shalev-Shwartz, Yoram Singer and Andrew Y. Ng. Online and Batch Learning of Pseudo-Metrics. ICML 2004 J. Davis, B. Kulis, P. Jain, S. Sra, & I. Dhillon. Information-Theoretic Metric Learning. ICML 2007 Liu Yang, Rong Jin and Rahul Sukthankar. Bayesian Active Distance Metric Learning. UAI 2007. Fei Wang, Jimeng Sun, Shahram Ebadollahi. Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter- Patient Similarity Assessment. SDM2011. F. Wang, J. Sun, J. Hu, S. Ebadollahi. imet: Interactive Metric Learning in Healthcare Applications. SDM 2011. Yu Zhang and Dit-Yan Yeung. Transfer Metric Learning by Learning Task Relationships. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 1199-1208. Washington, DC, USA, 2010. 29

30