Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning wliu@ee.columbia.edu

Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research

Background Euclidean distance d( x, x ) = ( x x ) ( x x ) = x x 1 2 1 2 1 2 1 2 2 Mahalanobis distance d ( x, x ) = ( x x ) P ( x x ) M 1 1 2 1 2 1 2 We find a distance metric in terms of a square matrix d ( x, x ) = ( x x ) ( x x ) = x x 1 2 1 2 1 2 1 2 2

Background d d is positive semi definite. d m linear subspace U can be learned as a low rank approximation of the metric = UU. Under U, the distance becomes the in subspace distance d ( x, x ) = ( x x ) ( x x ) = U ( x x ) 1 2 1 2 1 2 1 2 Semi supervised settings: 1. labeled and unlabeled data. 2. Partially pairwise similarities&dissimilarities. his paper deals with setting 2 using a geometric intuition: similar points are near each other but dissimilar points are far away under the target metric. 2

Related Work Supervised distance metric learning. Globerson et al, Metric Learning by Collapsing Classes, in NIPS 18, 2006. K. Weinberger et al, Distance Metric Learning for Large Margin Nearest Neighbor Classification, in NIPS 18, 2006. Semi supervised distance metric learning E. P. Xing et al, Distance Metric Learning, with pplication to Clustering with Side Information, in NIPS 15, 2003.. Bar Hillel et al, Learning Mahalanobis Metric from Equivalence Constraints, JMLR, 6:937 965, 2005. S. C. Hoi, W. Liu, M. R. Lyu, and W. Y. Ma, Learning Distance Metrics with Contextual Constraints for Image Retrieval, in Proc. CVPR, 2006. (My Paper)

Xing et al. NIPS 02 min ( x, x ) S i j x i x j 2 st.. x x 1 ( x, x ) D i j i j 0 S: a set of positive pairs, i.e., similar pairs. D: a set of negative pairs, i.e., dissimilar pairs.

Xing et al. NIPS 02 If the constraint is replaced with xi xj 1, is ( x, x ) D always rank 1, which implies that the data are always projected onto a hyperline. Learning with only labeled points might result in overfitting. pply to clustering, low classification performance. i j 2

Graph Laplacian d n Construct a graph G(V,E,W) given X = [ x1, x2,..., x n ]. Set the weight matrix by W 1 if xi is among k NN of xj ij = or xj is among k NN of xi. he Laplacian matrix L=D W. n n g( y) = ( y y ) W = y Ly i= 1 j= 1 2 i j ij Linear : y = X u g( u) = u XLX u y g( y) n is a 1D embedding such as a linear embedding. measure the extent of smoothness.

Graph Laplacian Regularization ssume the subspace related to the desired metric d m is U = [ u, and then formulate a smoothness 1, u2,..., u m ] term which is linear in. m g( ) = u XLX u = tr( U XLX U) i= 1 i = tr( XLX UU ) = tr( XLX ) Notice tr( B) = tr( B) i

Learning Framework min 2 2 t+ c x x c x x s i j D i j ( x, x ) S ( x, x ) D i j i j st.. tr( XLX ) t 0 Introduce a slack variable t that will encourage Laplacian regularization. he above optimization is clearly a standard form of Semidefinite Programs (SDP), which can be solved efficiently with global optimum found by existing convex optimization packages, such as SeDuMi.

Collaborative Image Retrieval Collect the log data of user relevance feedback. For each log session, we can convert it into similar and dissimilar pairwise constraints. Specifically, given a specific query q, for any two images xi and xj, if they are marked as relevant, we will put them into the set of positive pairs Sq; if one of them is marked as relevant, and the other is marked as irrelevant, we will put them into the set of negative pairs Dq. We denote the log data as Ω = {( Sq, Dq) q = 1,..., Q} where Q is the number of log sessions.

Laplacian Regularized Metric Learning min t+ γ tr( ( x x )( x x ) ) s i j i j q= 1( x, x ) S i j q γ tr( ( x x )( x x ) ) D i j i j q= 1( x, x ) D i j q st.. tr( XLX ) t Q Q 0

Large Margin Version min g( ) + c t+ c 1 + t x x s D i j ( x, x ) D i j 2 + x = + 2 st.. x x t, ( x, x ) S i j i j 2 2 x x < x x, ( x, x, x ) R i j i l i j l 0 max{ x,0}is a linear hinge loss. R: a set of triples of points known as relative comparisons.

Discussions Because all of the objection function and constraints are linear in, this learning problem can also be cast into an instance of semi definite programming (SDP). he constraints charactering relative comparisons are optional. If the original dimension of data is very high (~10^3), PC must be done to reduce the dimension to a lower one (~10^2) ahead of metric learning time.

Semi Supervised Clustering Work on the similarity/dissimilarity settings. Spectral clustering SSML + constrained K means Spectral embedding + SSML + constrained K means

hanks! http://www.ee.columbia.edu/~wliu/