Kai Yu NEC Laboratories America, Cupertino, California, USA

Similar documents
Improving Image Similarity With Vectors of Locally Aggregated Tensors (VLAT)

Beyond Spatial Pyramids

Analyzing the Performance of Multilayer Neural Networks for Object Recognition

Compressed Fisher vectors for LSVR

38 1 Vol. 38, No ACTA AUTOMATICA SINICA January, Bag-of-phrases.. Image Representation Using Bag-of-phrases

Improved Local Coordinate Coding using Local Tangents

EE 6882 Visual Search Engine

Kernel Methods in Computer Vision

Lie Algebrized Gaussians for Image Representation

Distinguish between different types of scenes. Matching human perception Understanding the environment

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

TUTORIAL PART 1 Unsupervised Learning

Higher-order Statistical Modeling based Deep CNNs (Part-I)

Fisher Vector image representation

A Discriminatively Trained, Multiscale, Deformable Part Model

Discriminative part-based models. Many slides based on P. Felzenszwalb

The state of the art and beyond

Brief Introduction of Machine Learning Techniques for Content Analysis

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Self-Adaptable Templates for Feature Coding

Feature detectors and descriptors. Fei-Fei Li

Feature detectors and descriptors. Fei-Fei Li

Global Scene Representations. Tilke Judd

Nonlinear Learning using Local Coordinate Coding

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Pedestrian Density Estimation by a Weighted Bag of Visual Words Model

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung

Domain adaptation for deep learning

FAemb: a function approximation-based embedding method for image retrieval

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION

One-Class Slab Support Vector Machine

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

One-Class Slab Support Vector Machine

arxiv: v1 [stat.ml] 26 May 2016

CONVOLUTIONAL DEEP BELIEF NETWORKS

Clustering with k-means and Gaussian mixture distributions

Graphical Object Models for Detection and Tracking

Lecture 13 Visual recognition

Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University

LOCALITY PRESERVING HASHING. Electrical Engineering and Computer Science University of California, Merced Merced, CA 95344, USA

Kernel Density Topic Models: Visual Topics Without Visual Words

Towards Good Practices for Action Video Encoding

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Feature Design. Feature Design. Feature Design. & Deep Learning

CSE 473/573 Computer Vision and Image Processing (CVIP)

ECS289: Scalable Machine Learning

Use Bin-Ratio Information for Category and Scene Classification

ECS289: Scalable Machine Learning

Lec 12 Review of Part I: (Hand-crafted) Features and Classifiers in Image Classification

Randomized Algorithms

Learning Graph Laplacian for Image Segmentation

Statistical Learning Reading Assignments

Blobs & Scale Invariance

Maximally Stable Local Description for Scale Selection

Learning SVM Classifiers with Indefinite Kernels

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Probabilistic and Statistical Learning. Alexandros Iosifidis Academy of Finland Postdoctoral Research Fellow (term )

SCENE CLASSIFICATION USING SPATIAL RELATIONSHIP BETWEEN LOCAL POSTERIOR PROBABILITIES

Clustering with k-means and Gaussian mixture distributions

SCENE CLASSIFICATION USING SPATIAL RELATIONSHIP BETWEEN LOCAL POSTERIOR PROBABILITIES

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

A convex relaxation for weakly supervised classifiers

Probabilistic Latent Semantic Analysis

Improved Bayesian Compression

Wavelet-based Salient Points with Scale Information for Classification

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Fantope Regularization in Metric Learning

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM

Support Vector Machine & Its Applications

Recent advances in Time Series Classification

Blob Detection CSC 767

Multiple Similarities Based Kernel Subspace Learning for Image Classification

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt

Achieving scale covariance

Face detection and recognition. Detection Recognition Sally

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

ELM Classification: Postprocessing NBC

Learning Sparse Covariance Patterns for Natural Scenes

Loss Functions and Optimization. Lecture 3-1

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

Deep Learning of Invariant Spatio-Temporal Features from Video

INTEREST POINTS AT DIFFERENT SCALES

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Kronecker Decomposition for Image Classification

Chapter 2. Semantic Image Representation

TRAFFIC SCENE RECOGNITION BASED ON DEEP CNN AND VLAD SPATIAL PYRAMIDS

Photo Tourism and im2gps: 3D Reconstruction and Geolocation of Internet Photo Collections Part II

Unsupervised Learning: K- Means & PCA

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Transcription:

Kai Yu NEC Laboratories America, Cupertino, California, USA Joint work with Jinjun Wang, Fengjun Lv, Wei Xu, Yihong Gong Xi Zhou, Jianchao Yang, Thomas Huang, Tong Zhang Chen Wu NEC Laboratories America Univ. of Illinois at Urbana-Champaign Rutgers University Stanford University PASCAL VOC Challenge, ICCV, at Kyoto, Japan, October 3 rd, 2009 1

Where We Are in This Competition Our 4 submissions Our Best Other s Best Our Improvement Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair Cow Diningtable Dog Horse Motorbike Person Pottedplant Sheep Sofa Train Tvmonitor Average 88.1 88.0 87.1 87.7 68.0 68.6 67.4 67.8 68.0 67.9 65.8 68.1 72.5 72.9 72.3 71.1 41.0 44.2 40.9 39.1 78.9 79.5 78.3 78.5 70.4 72.5 69.7 70.6 70.4 70.8 69.7 70.7 58.1 59.5 58.5 57.4 53.4 53.6 50.1 51.7 55.7 57.5 55.1 53.3 59.3 59.0 56.3 59.2 73.1 72.6 71.8 71.6 71.3 72.3 70.8 70.6 84.5 85.3 84.1 84.0 32.3 36.6 31.4 30.9 53.3 56.9 51.5 51.7 56.7 57.9 55.1 55.9 86.0 85.9 84.7 85.9 66.8 68.0 65.2 66.7 65.4 66.5 64.3 64.6 88.1 86.6 1.5 68.6 63.9 4.7 68.1 66.7 1.4 72.9 67.3 5.6 44.2 43.7 0.5 79.5 74.1 5.4 72.5 64.7 7.8 70.8 64.2 6.6 59.5 57.4 2.1 53.6 46.2 7.4 57.5 54.7 2.8 59.3 53.5 5.8 73.1 68.1 5.0 72.3 70.6 1.7 85.3 85.2 0.1 36.6 39.1-2.5 56.9 48.2 8.7 57.9 50.0 7.9 86.0 83.4 2.6 68.0 68.6-0.6 66.5 62.8

Comparative Overview Paradigm State of the Art Ours Feature Detection Feature Extraction Coding Scheme Spatial Pooling Classifier multiple detectors multiple descriptors VQ SPM nonlinear classifiers dense sampling SIFT (gray) GMM, LCC SPM linear classifiers

Our Strategy Minimum feature engineering Paradigm State of the Art Ours Feature Detection Feature Extraction Coding Scheme Spatial Pooling Classifier multiple detectors multiple descriptors VQ SPM nonlinear classifiers dense sampling SIFT (gray) GMM, LCC SPM linear classifiers We bet on machine learning techniques.

Input gray image Pipeline Overview - I extract SIFT on a grid of locations Grid Step Size: every 4 pixels Patch Size: 16x16, 24x24, 32x32 PCA on SIFT: 128 dim -> 80 dim GMM coding & SPM LCC coding & SPM Unsupervised codebook learning linear classifiers linear classifiers WCCN Gaussian process SVM Universum Submission Entry: NECUIUC_LL-CDCV Overall AP=64.29% Submission Entry: NECUIUC_LN-CDCV Overall AP=64.63%

Pipeline Overview - II Input gray image GMM coding & SPM linear classifiers extract SIFT on a grid of locations LCC coding & SPM Fast LCC coding & SPM linear classifiers linear classifiers 1 Submission Entry: NECUIUC_CLS-DTCT Overall AP=66.48% Sliding window by object det. Max pooling 2 Note: 1. Overall AP is around 58.0%; 2. Overall AP is around 46% (estimation based on 5-fold cross validation)

Prior Publications Local Coordinate Coding Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang, CVPR 2009 Nonlinear Learning using Local Coordinate Coding Kai Yu, Tong Zhang, and Yihong Gong, NIPS 2009, to appear GMM Hierarchical Gaussianization for Image Classification Xi Zhou, Na Cui, Zhen Li, Feng Liang, and Thomas S. Huang, ICCV 2009 SIFT-Bag Kernel for Video Event Analysis Xi Zhou, Xiaodan Zhuang, Shuicheng Yan, Shih-Fu Chang, Mark Hasegawa-Johnson, Thomas S. Huang, ACM Multimedia 2008 In our work on PASCAL challenge, we made further extensions of the above work in both engineering and theory.

A Unified Framework Dense SIFT Dense SIFT Exchangeable Nonlinear Coding on SIFT Linear Pooling Lin. Classifier cat Nonlinear Coding on SIFT Linear Func. Linear Pooling cat A nonlinear function on each SIFT - What matters is to learn nonlinear function on SIFT vectors. - This boils down to learning a good coding scheme of SIFT.

Coding of SIFT Dense SIFT Nonlinear Coding on SIFT Linear Pooling Lin. Classifier cat

Some Notation a SIFT feature vector encoding function unknown function on local features approximating function Supervised Learning Unsupervised Learning

Example 1: Vector Quantization Coding (VQ) target function approximation

Example 2: Supervector Coding target function approximation

Example 3: Local Coordinate Coding target function approximation B 1 B 2 B 3

LCC: How It Works data points anchor points

Comparison of Coding Methods VQ GMM_SupVect LCC Function Approximation Poor Good Excellent Computation Low High Medium Locality Yes Yes Yes 1 2 3 Caltech-101 ~65% ~73% ~73% Improve its fitting power Reduce its Computation 1. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce, CVPR, 2006 2. Xi Zhou, Na Cui, Zhen Li, Feng Liang, and Thomas S. Huang, ICCV, 2009 3. Jianchao Yang, Kai Yu, Yihong Gong, and Thomas S. Huang, CVPR, 2009

Improve GMM Supervector Coding

Improve LCC s Efficiency Encode this data point data points anchor points Pre-computation: partition data and anchor points Eliminate those anchor points in different partitions

Equivalent to Mixture of Coding Experts

Linear Pooling Dense SIFT Nonlinear Coding on SIFT Linear Pooling Lin. Classifier cat

(Local) Linear Pooling data in an image codes image representation Nonlinear function on local features

SPM representation See also in SurreyUVA_SRKDA method, presentatin at PASCAL VOC workshop 08.

Linear Classifier Dense SIFT Nonlinear Coding on SIFT Linear Pooling Lin. Classifier cat

Support Vector Machines

Universum SVMs Positive class Negative class Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley. Weston, Inference with Universum, ICML 2006

Within-class Covariance Normalization Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley. Weston, Inference with Universum, ICML 2006

Improve SPM using Gaussian Process

Some Details Number of partitions or components GMM: 1024 and 2048 LCC: 1024 and 2048 Dimensionality of feature vector for each image (e.g. in case of 1024 partitions) GMM: 1024x80x8 (1024 components, 80 PCA-SIFT, 8 SPM sub kernels) LCC: 1024x256x8 (1024 partitions, 256 codebook size, 8 SPM sub kernels)

Conclusion Remarks Highly nonlinear, highly local encoding of image local features make difference! Still a long way to go No high-level (semantic) features used so far how to get compact image representations? Supervised training of coding schemes Better methods to use the bounding box information More details will be provided in an upcoming TR.