The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Similar documents
Pattern Recognition 2014 Support Vector Machines

Part 3 Introduction to statistical classification techniques

Elements of Machine Intelligence - I

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

The blessing of dimensionality for kernel methods

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

IAML: Support Vector Machines

Support-Vector Machines

Simple Linear Regression (single variable)

Computational modeling techniques

Eric Klein and Ning Sa

Multiple Source Multiple. using Network Coding

What is Statistical Learning?

Checking the resolved resonance region in EXFOR database

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Chapter 3: Cluster Analysis

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Comparison of hybrid ensemble-4dvar with EnKF and 4DVar for regional-scale data assimilation

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

Tree Structured Classifier

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

Least Squares Optimal Filtering with Multirate Observations

Determining the Accuracy of Modal Parameter Estimation Methods

Pre-Calculus Individual Test 2017 February Regional

Support Vector Machines and Flexible Discriminants

Equilibrium of Stress

Math Foundations 20 Work Plan

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Sequential Allocation with Minimal Switching

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

Churn Prediction using Dynamic RFM-Augmented node2vec

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

Math Foundations 10 Work Plan

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

CMSC 425: Lecture 9 Basics of Skeletal Animation and Kinematics

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

ANALYTICAL SOLUTIONS TO THE PROBLEM OF EDDY CURRENT PROBES

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

A Scalable Recurrent Neural Network Framework for Model-free

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

Resampling Methods. Chapter 5. Chapter 5 1 / 52

SAMPLING DYNAMICAL SYSTEMS

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

You need to be able to define the following terms and answer basic questions about them:

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Math 302 Learning Objectives

SPH3U1 Lesson 06 Kinematics

Lim f (x) e. Find the largest possible domain and its discontinuity points. Why is it discontinuous at those points (if any)?

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

Modeling of ship structural systems by events

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

Submitted to IEEE Trans. Pattern Anal. Machine Intell., March 19, Feature Selection for Classification Based on Sequential Data

Coalition Formation and Data Envelopment Analysis

A Quick Overview of the. Framework for K 12 Science Education

Lecture 8: Multiclass Classification (I)

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Dataflow Analysis and Abstract Interpretation

The standards are taught in the following sequence.

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Science 9 Unit 2: Atoms, Elements and Compounds

Contents. This is page i Printer: Opaque this

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

SINGLE CHANNEL SIGNAL SEPARATION USING MAXIMUM LIKELIHOOD SUBSPACE PROJECTION

Engineering Decision Methods

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Feedforward Neural Networks

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Figure 1a. A planar mechanism.

Linear programming III

7 TH GRADE MATH STANDARDS

Chapter 11: Neural Networks

NGSS High School Physics Domain Model

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

Linear Classification

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Lyapunov Stability Stability of Equilibrium Points

Computational modeling techniques

Lecture 5: Equilibrium and Oscillations

Five Whys How To Do It Better

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Green economic transformation in Europe: territorial performance, potentials and implications

Optimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants

FIELD QUALITY IN ACCELERATOR MAGNETS

Subject description processes

Transcription:

The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge Research Labratry

Classificatin architectures fr visin mdern learning thery favrs discriminant ver generative architectures fr classificatin fr visin, a fundamental difference is the set f cnstraints impsed n the representatin discriminant classifiers favr hlistic representatins (image as a pint in high-dimensinal space) generative classifiers favr lcalized representatins (images as bags f lcal features) lcalized representatins have varius advantages mre invariant mre rbust t cclusin lwer dimensinality 2

Classificatin architectures fr visin als, despite weaker guarantees, generative architectures have great practical appeal better scalability in number f classes encding f prir knwledge in the frm f statistical mdels mdular slutins, using Bayesian inference Q: can all this be cmbined with discriminant guarantees? we cnsider SVMs, and the Kullback-Leibler kernel investigate its ability t seamlessly cmbine discriminant recgnitin with generative mdels based n lcalized representatins 3

Supprt vectr machines 4

5 Kernels 1 2 1 3 2 n Kernel-based feature transfrmatin

Cnstraints n representatin tw pssible image representatins hlistic:the space f all images, each image is a pint in lcalized: image brken int lcal windws, the space f such windws SVM training is O(N 2 ), N = number f eamples hlistic: N = I, dim() large (e.g. 5,000) lcalized: N = k I, I = # images k = windws/image, k large (e.g. 5,000) dim() small (e.g. 88) cmpleity f lcalized ~ k 2 times that f hlistic (e.g. 2.5 10 7 increase) n way t capture gruping f windws int images 6

Cnstraints n representatin lcalized nt suited fr traditinal SVM hlistic has been successful, but has limitatins reslutin: images t high-dimensinal, drastically dwn-sampled (e.g. frm 240360 t 2020) discarded infrmatin imprtant fr fine classificatin (near bundary) invariance: images as pints span quite cnvluted maniflds in X, when subject t transfrmatins cclusin: a few ccluded piels can lead t a very large jump in X 7

Lcalized representatins reslutin is n issue (simply mre pints per image) greater rbustness t invariance, e.g. greater rbustness t cclusin X% ccluded piels, means that % f the prbability mass changes the remaining (1-)% shuld still be enugh t btain a gd match when % f a vectr cmpnents change, matching is hard 8

Prbabilistic kernels since kernel captures similarities between eamples bags f lcalized eamples best described by their prb. density natural t make the kernel functin a measure f distance between prbability density functins varius kernels prpsed in the literature Fischer Kernel (Jaakkla et al, 1999), TOP kernel (Tsuda et al, 2002), diffusin kernels (Lafferty and Lebann, 2002), generalized crrelatin kernel (Kndr, Jebara, 2003), KL-kernel (Mren et al, 2003) 9

Prbabilistic kernels three main advantages ver hlistic kernels enable representatins f variable length enable a cmpact representatin f a large sequence f vectrs (thrugh pdf) can eplit prir knwledge abut the classificatin prblem (selectin f suitable prbability mdels) interpretatin f the standard Gaussian kernel as where d is the Euclidean distance suggests a natural etensin based n pdf distances this leads t the KL kernel 10

The KL kernel relies n the (symmetric) Kullback-Leibler divergence as the measure f pdf distance 11

A kernel tanmy prbabilistic kernels allw a great deal f fleibility ver traditinal cunterparts KL kernel can be tuned t the prblem in terms f: 1. perfrmance: chice f prbability mdels that match the statistics f the data 2. cmputatin: using apprimatins t the KL that have been shwn t wrk well in certain dmains 3. jint design f features and kernel here we fcus n 1 and 2, stay tuned fr 3 it is pssible t develp a tanmy f kernels that implement varius trade-ffs between perfrmance and cmputatin 12

Parametric densities are gd mdels r apprimatins fr varius prblems the kernel can be tailred t the particular pdf family 13

The Gaussian is a particularly ppular case in general, it is pssible t derive the kernel functin fr the parametric cases 14

Nn-parametric densities nn-parametric density mdels can be a lt trickier sme have clsed-frm KL kernels, e.g. the histgram etensins available fr histgrams defined n different partitins (Vascncels, Trans. Inf. Thery, 2004) 15

Apprimatins varius are pssible fr kernels withut clsed frm in sme cases, even this has n clsed-frm, e.g. 16

Apprimatins and sampling varius specific apprimatins have been recently prpsed fr the Gauss miture case lg-sum bund (Singer and Warmuth, NIPS 98) asympttic likelihd apprimatin (Vascncels, ICCV 2001, trans. IT, 2004) unscented transfrmatin (Gldberger et al, ICCV 2004) finally, ne can always use Mnte Carl sampling 17

Eperiments all n COIL-100, three reslutins: 3232, 6464, 128128 4 different cmbinatins f train/test: I images f each bject used as training set, I in {4, 8,18,36} remaining used fr test dataset with I = n referred t as n hlistic representatin: each image ne vectr lcalized representatin: image as feature bag: etract 88 windws, cmpute DCT, keep 32 first features miture f 16 Gaussians fit t each image 18

COIL-100 100 bjects subject t 3D rtatin ne view every 5 19

Results hlistic: SVM with three different kernels linear (L-SVM), plynmial rder 2 (P2-SVM), Gaussian (G-SVM) lcalized: standard maimum-likelihd Gauss miture classifier KL kernel with Gauss miture mdels (KL-SVM) recgnitin rates (%) 20

Results 4 8 21

Results 18 36 22

Observatins hlistic kernels: G-SVM clearly better ecellent when n is large, but drps quickly fr small n weaker than GMM! verall: lcalized + discriminant (KL-SVM) is best differences between KL-SVM and G-SVM as high as 10% lcalized + weak (GMM) learner better than hlistic + strng (G-SVM) cnclusins: lcalized is mre invariant, leads t easier classificatin prblem: weaker classifier (GMM) has better generalizatin! reslutin (higher dimensinality vs mre image inf): lsses f abut 5% at lwer reslutin KL-SVM much mre rbust than GMM 23

Fleibility discriminant attributes fr recgnitin depend n task (e.g. shape better fr digits, teture better fr landscapes) KL kernel supprts multiple representatins cmparisn f representatins based n supprt: pint-wise vs lcal appearance vs glbal appearance clr: grayscale vs clr all eperiments n 4, 128128, cmpared pint-wise:kl-kernel ( 2 )+ histgram (16 bins/channel) histgram intersectin (Laplacian kernel, Chapelle et al, trans. Neural Nets, 1999) lcal: glbal: KL-kernel with GMM (88 windws) G-SVM 24

Results clr imprtant cue fr recgnitin n COIL the less lcalized the better: pint-wise > lcal >> glbal lcalizatin/invariance trade-ff: clr s discriminant that even invariance lss f 88 is t much lss f hlistic is s large that it perfrms quite prly n grayscale (less discriminant) lcalized des best cnclusin: different representatins perfrm best n different tasks, fleibility f KL-kernel is a great asset 25