Relevance Aggregation Projections for Image Retrieval

Similar documents
Semi Supervised Distance Metric Learning

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Nonlinear Dimensionality Reduction. Jose A. Costa

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Lecture 15: Random Projections

Jeff Howbert Introduction to Machine Learning Winter

Dimensionality Reduction:

Covariance and Correlation Matrix

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

What is semi-supervised learning?

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Linear Spectral Hashing

Iterative Laplacian Score for Feature Selection

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Discriminant Uncorrelated Neighborhood Preserving Projections

How to learn from very few examples?

Semi-Supervised Learning of Speech Sounds

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Principal Component Analysis and Linear Discriminant Analysis

Dimensionality Reduction

Convex Optimization in Classification Problems

Unsupervised dimensionality reduction

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

ECE 661: Homework 10 Fall 2014

Manifold Coarse Graining for Online Semi-supervised Learning

LEC 3: Fisher Discriminant Analysis (FDA)

Machine Learning 2nd Edition

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Semi-Supervised Learning by Multi-Manifold Separation

Global vs. Multiscale Approaches

CS-E4830 Kernel Methods in Machine Learning

Fantope Regularization in Metric Learning

A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Semi-Supervised Learning

L 2,1 Norm and its Applications

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

L11: Pattern recognition principles

Department of Computer Science and Engineering

Dimension Reduction and Low-dimensional Embedding

Statistical and Computational Analysis of Locality Preserving Projection

CS281 Section 4: Factor Analysis and PCA

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Multi-Label Informed Latent Semantic Indexing

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Advanced Machine Learning & Perception

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Distance Metric Learning

Support Vector Machine & Its Applications

Spectral Regression for Dimensionality Reduction

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Learning to Hash with Partial Tags: Exploring Correlation Between Tags and Hashing Bits for Large Scale Image Retrieval

Discrete vs. Continuous: Two Sides of Machine Learning

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

On-line Variance Minimization

Support Vector Machines: Maximum Margin Classifiers

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK

Graph-Based Semi-Supervised Learning

Active and Semi-supervised Kernel Classification

Discriminative K-means for Clustering

Machine learning for pervasive systems Classification in high-dimensional spaces

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde

Spectral Clustering. Guokun Lai 2016/10

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

EE 6882 Visual Search Engine

A summary of Deep Learning without Poor Local Minima

Predicting Graph Labels using Perceptron. Shuang Song

Randomized Algorithms

(Kernels +) Support Vector Machines

PCA and admixture models

Statistical Machine Learning

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Joint distribution optimal transportation for domain adaptation

Graph-Laplacian PCA: Closed-form Solution and Robustness

A few applications of the SVD

Machine Learning & SVM

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Principal components analysis COMS 4771

Advanced data analysis

Analysis of Spectral Kernel Design based Semi-supervised Learning

Enhanced graph-based dimensionality reduction with repulsion Laplaceans

Learning Spectral Graph Segmentation

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Statistical Pattern Recognition

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Weighted Maximum Variance Dimensionality Reduction

Machine Learning (BSMC-GA 4439) Wenke Liu

Transcription:

Relevance Aggregation Projections for Image Retrieval CIVR 2008 Wei Liu Wei Jiang Shih-Fu Chang wliu@ee.columbia.edu

Syllabus Motivations and Formulation Our Approach: Relevance Aggregation Projections Experimental Results Conclusions Liu et al. Columbia University 2/28

Syllabus Motivations and Formulation Our Approach: Relevance Aggregation Projections Experimental Results Conclusions Liu et al. Columbia University 3/28

Motivations and Formulation Relevance feedback to close the semantic gap. to explore knowledge about the user s intention. to select features, refine models. Relevance feedback mechanism User selects a query image. The system presents highest ranked images to user, except for labeled ones. During each iteration, the user marks relevant (positive) and irrelevant (negative) images. The system gradually refines retrieval results. Liu et al. Columbia University 4/28

Problems Small sample learning Number of labeled images is extremely small. High dimensionality Feature dim >100, labeled data number < 100. Asymmetry relevant data are coherent and irrelevant data are diverse. Liu et al. Columbia University 5/28

Asymmetry in CBIR query relevant images irrelevant images Liu et al. Columbia University 6/28

Possible Solutions Asymmetry: T query query margin =1 margin =1 Small sample learning semi-supervised learning Curse of dimensionality dimensionality reduction Liu et al. Columbia University 7/28

Previous Work Methods LPP ARE SSP SR NIPS 03 ACM MM 05 ACM MM 06 ACM MM 07 labeled unlabeled asymmetry dimension d l-1 l-1 2 bound image dim: d, total sample #: n, labeled sample #: l In CBIR, n > d > l Liu et al. Columbia University 8/28

Disadvantages LPP: unsupervised. SSP and SR: fail to engage the asymmetry. SSP emphasizes the irrelevant set. SR treats relevant and irrelevant sets equally. ARE, SSP and SR: produce very low-dimensional subspaces (at most l-1 dimensions). Especially for SR (2D subspace). Liu et al. Columbia University 9/28

Syllabus Motivations and Formulation Relevance Aggregation Projections (RAP) Experimental Results Conclusions Liu et al. Columbia University 10/28

Symbols n: total #, l: labeled # d: original dim, r: reduced dim d n X = [ x,..., x, x,..., x ] : samples 1 l l+ 1 n d l X = [ x,..., x ] : labeled samples l 1 l F l : relevant set, F : irrelevant set + : relevant #, l : irrelevant # + A d r d : subspace, a : projecting vector GV (, EW, ):graph, L= D W: graph Laplacian Liu et al. Columbia University 11/28

Graph Construction Build a k-nn graph as W ij 2 xi xj k k = exp( ), x ( ) ( ) 2 i N xj xj N xi σ 0, otherwise Establish an edge if x is among k-nns of x or is x i among k-nns of. i j x j Graph Laplacian regularizers. L= D W n n : used in smoothness Liu et al. Columbia University 12/28

Our Approach T T min tr( A XLX A) (1.1) A d r T T + + st.. A x = A x / l, i F (1.2) i j F j T + A ( x x / l ) r, i F (1.3) i + j F + j Target subspace A reducing raw data from d dims to r dims Obj (1.1) minimize local scatter using labeled and unlabeled data Cons (1.2) aggregate positive data (in F+ ) to the positive center Cons (1.3) push negative data (in F-) far away from the positive center with at least r unit distances. Cons (1.2) (1.3) just address asymmetry in CBIR. Liu et al. Columbia University 13/28 2

Core Idea: Relevance Aggregation An ideal subspace is one in which the relevant examples are aggregated into a single point and the irrelevant examples are simultaneously separated by a large margin. Liu et al. Columbia University 14/28

Relevance Aggregation Projections We transform eq. (1) to eq. (2) in terms of each column vector a in A (a is a projecting vector): T T min a XLX a (2.1) a d T T + + st.. a x = a c, i F (2.2) i + 2 T a ( x c ) 1, i F (2.3) i + + where c = xj / l is the positive center. j F + Liu et al. Columbia University 15/28

Solution Eq. (2.1-2.3) is a quadratically constrained quadratic optimization problem and thus hard to solve directly. We want to remove constraints first and minimize the cost function then. We adopt a heuristic trick to explore the solution. Find ideal 1D projections which satisfy the constraints. Removing constraints, solve a part of the solution. Solve another part of the solution. Liu et al. Columbia University 16/28

Solution: Find Ideal Projections Run PCA to get the r principle eigenvectors and renormalize d r T T them to get V = [ v,..., v r ] such that V XX V = I. 1 On each vector v in V, T T v x v x < 2, i, j = 1,..., n. i j Form the ideal 1D projections on each projecting direction v T + + vc, i F T T T + v xi, i F v xi v c 1 yi = T + T T + vc + 1, i F 0 vxi vc < 1 T + T T + vc 1, i F 1 < vxi vc < 0 y = [ y,..., y ] T l l 1 Liu et al. Columbia University 17/28 (3)

Solution: Find Ideal Projections T v X l 1 l T T v x v c + > 1 i T T v xi v c + 1 y T 1 l T vc + y T i v c + > 1 T y v c + = 1 i The vector y is formed according to each PCA vector v. Liu et al. Columbia University 18/28

Solution: QR Factorization Remove constraints eq. (2.2-2.3) via solving a linear system T Xl a = y (4) Because l < d, eq. (4) is underdetermined and thus strictly satisfied. R Perform QR factorization: Xl [ Q1 Q ] = 2 1 0 = QR The optimal solution is a sum of a particular solution and a complementary solution, i.e. a= Qb 1 1+ Q2b2 (5) T 1 where b1 = ( R ) y Liu et al. Columbia University 19/28

Solution: Regularization We hope that the final solution will not deviate the PCA solution too much, so we develop a regularization framework. Our framework is γ > 0 controls the trade-off between PCA solution and data locality preserving (original loss function). The second term behaves as a regularization term. Plugging a= Qb + Q b into eq. (6), we solve 2 T T f( a) = a v +γ a XLX a (6) 1 1 2 2 b = ( I + γq XLX Q ) ( Q v γq XLX Qb) T T -1 T T T 2 2 2 2 2 1 1 Liu et al. Columbia University 20/28

Algorithm 1 Construct a k-nn graph W, L, S = XLX T 2 PCA initialization V = [ v1,..., v r ] 3 QR factorization Q1, Q2, R 4 5 Transductive Regularization Projecting [ a,..., a ] T x 1 r for j = 1: r form y with v end T 1 b1 ( R ) y T -1 b2 = ( I + γ Q2SQ2) T T Qv 2 j γ QSQb 2 1 1 j = ( ) a = Qb + Q b 1 1 2 2 j Liu et al. Columbia University 21/28

Syllabus Motivations and Formulation Our Approach: Relevance Aggregation Projections Experimental Results Conclusions Liu et al. Columbia University 22/28

Experimental Setup Corel image database: 10,000 image, 100 image per category. Features: two types of color features and two types of texture features, 91 dims. Five feedback iterations, label top-10 ranked images in each iteration. The statistical average top-n precision is used for performance evaluation. Liu et al. Columbia University 23/28

Evaluation Liu et al. Columbia University 24/28

Evaluation Liu et al. Columbia University 25/28

Syllabus Motivations and Formulation Our Approach: Relevance Aggregation Projections Experimental Results Conclusions Liu et al. Columbia University 26/28

Conclusions We develop RAP to simultaneously solve three fundamental issues in relevance feedback: asymmetry between classes small sample size (incorporate unlabeled samples) high dimensionality RAP learns a semantic subspace in which the relevant samples collapse while the irrelevant samples are pushed outward with a large margin. RAP can be used to solve imbalanced semi-supervised learning problems with few labeled data. Experiments on COREL demonstrate RAP can achieve a significantly higher precision than the stat-of-the-arts. Liu et al. Columbia University 27/28

Thanks! http://www.ee.columbia.edu/~wliu/ Liu et al. Columbia University 28/28