SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION

Similar documents
Kernel Density Topic Models: Visual Topics Without Visual Words

Topic Modelling and Latent Dirichlet Allocation

Probabilistic Latent Semantic Analysis

CS Lecture 18. Topic Models and LDA

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets

Latent Dirichlet Allocation Introduction/Overview

Document and Topic Models: plsa and LDA

Dirichlet Enhanced Latent Semantic Analysis

Generative Clustering, Topic Modeling, & Bayesian Inference

Distinguish between different types of scenes. Matching human perception Understanding the environment

Mixture Models and Expectation-Maximization

13: Variational inference II

Note 1: Varitional Methods for Latent Dirichlet Allocation

Lecture 13 : Variational Inference: Mean Field Approximation

Global Scene Representations. Tilke Judd

Using Both Latent and Supervised Shared Topics for Multitask Learning

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

ECE 5984: Introduction to Machine Learning

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Introduction to Machine Learning

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Latent Dirichlet Allocation (LDA)

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Study Notes on the Latent Dirichlet Allocation

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression

Latent Dirichlet Allocation

CS145: INTRODUCTION TO DATA MINING

Evaluation Methods for Topic Models

Unsupervised Learning

Distributed ML for DOSNs: giving power back to users

Latent Variable Models and EM algorithm

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Gaussian Mixture Model

LDA with Amortized Inference

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Unsupervised Learning

Latent variable models for discrete data

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Topic Models and Applications to Short Documents

PATTERN RECOGNITION AND MACHINE LEARNING

Latent Dirichlet Allocation Based Multi-Document Summarization

Probabilistic Latent Semantic Analysis

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Latent Dirichlet Alloca/on

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Unsupervised Learning with Permuted Data

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Collaborative topic models: motivations cont

Image segmentation combining Markov Random Fields and Dirichlet Processes

Unsupervised Activity Perception by Hierarchical Bayesian Models

Density Estimation. Seungjin Choi

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Lecture 8: Graphical models for Text

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Two Useful Bounds for Variational Inference

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Content-based Recommendation

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Introduction to Machine Learning

Probabilistic Time Series Classification

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

Bayesian Inference Course, WTCN, UCL, March 2013

Text mining and natural language analysis. Jefrey Lijffijt

Recent Advances in Bayesian Inference Techniques

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Modeling User Rating Profiles For Collaborative Filtering

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

STA 4273H: Statistical Machine Learning

Topic Models. Charles Elkan November 20, 2008

Measuring Topic Quality in Latent Dirichlet Allocation

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

Bayesian Hidden Markov Models and Extensions

Clustering with k-means and Gaussian mixture distributions

INTERPRETING THE PREDICTION PROCESS OF A DEEP NETWORK CONSTRUCTED FROM SUPERVISED TOPIC MODELS

Introduction to Bayesian Statistics

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

TOPIC models, such as latent Dirichlet allocation (LDA),

Chris Bishop s PRML Ch. 8: Graphical Models

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Latent Dirichlet Bayesian Co-Clustering

Latent Variable Models and Expectation Maximization

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Conditional Random Field

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

AN INTRODUCTION TO TOPIC MODELS

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Multi-Layer Boosting for Pattern Recognition

Lecture 13 Visual recognition

A Continuous-Time Model of Topic Co-occurrence Trends

Latent Variable Models and Expectation Maximization

Latent Dirichlet Allocation

Expectation Maximization

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Web Search and Text Mining. Lecture 16: Topics and Communities

Transcription:

SUPERVISE MULTI-MOAL TOPIC MOEL FOR IMAGE AOTATIO Thu Hoai Tran 2 and Seungjin Choi 12 1 epartment of Computer Science and Engineering POSTECH Korea 2 ivision of IT Convergence Engineering POSTECH Korea thtlamson@gmailcom seungjin@postechacr ABSTRACT Multi-modal topic models are probabilistic generative models here hidden topics are learned from data of different types In this paper e present supervised multi-modal latent irichlet allocation smmla) here e incorporate class label global description) into the joint modeling of visual ords and caption ords local description) for image annotation tas We derive variational inference algorithm to approximately compute posterior distribution over latent variables Experiments on a subset of LabelMe dataset demonstrate the useful behavior of our model compared to existing topic models Index Terms Image annotation latent irichlet allocation topic models 1 ITROUCTIO Latent irichlet allocation LA) is a idely-used topic model hich as originally developed to model text corpora 5 It is a hierarchical Bayesian model in hich each observed item is modeled as a finite mixture over an underlying set of topics and each topic is characterized by a distribution over ords The basic idea of LA hen it is applied to model a set of images treating an image as a collection of visual ords is shon in Fig 1 The same intuition in the case of documents can be found in 2 Multi-modal extensions of LA referred to as multi-modal topic models have been proposed to jointly model data of different types These models ere mainly applied to image annotation here the goal is to assign a set of eyords to an image learning underlying topics from a set of image-annotations pairs Earlier or on this direction is correspondence LA cla) 3 hich finds conditional relationships beteen latent variable representations of visual ords and caption ords The conditional distribution of the annotation given visual descriptors is modeled for automatic image annotation Topic regression multi-modal LA trmmla) 10 is an alternative method for capturing statistical association beteen image and text Unlie cla trmmla learns to separate sets of hidden topics and counts on a regression module to allo a set of caption topics to be linearly predicted from the set of image topics It as motivated by the regression-based latent factor model 1 hich as further elaborated in the hierarchical Bayesian frameor 9 It as shon in 10 that trmmla is more flexible than cla in the sense that the former allos the number of image topics to be different from the number of caption topics Class label is a global description of an image hile annotated eyords are local descriptions of image patches Class label and annotations are related to each other For instance an image labeled as highay scene is more liely to be annotated ith cars and road Fig 1 A codeord is assigned to each image patch to represent an image as a collection of visual ords We assume that some number of topics hich are distributions over ords exist for the set of images An illustration of ho an image is generated by an LA model is shon here We first choose a distribution over topics the histogram at right) Then for each visual ord choose a topic assignment the circles ith patterns filled in) and choose the visual ord from the corresponding topic rather than apple and des In this paper e present supervised multimodal latent irichlet allocation smmla) here e incorporate class label into trmmla so that to sets of hidden topics hich are related via linear regression are learned from data of to types as ell as from class label Several extensions of LA to incorporate supervision have been developed in the literature 4 6 7 11 13 Most of these existing methods are limited to learning from data of single type The model trmmla outperforms most of previous methods in the tas of image annotation but is an unsupervised method Our model smmla extends the previous state of the arts in this domain trmmla by incorporating supervision of class label 2 LATET IRICHLET ALLOCATIO We briefly give an overvie of LA 5 LA 5 is a generative probabilistic model of a corpus in hich documents are represented

as random mixtures over latent topics here each topic is described by a distribution over ords Each document d1: is a sequence of ords for d 1 is the size of a corpus) and each ord dn R V V is the size of vocabulary) is a unit vector that has a single entry equal to one and all other entries equal to zero For instance if dn is the vth ord in the vocabulary then dnv 1 and dnj 0 for j v The graphical model for LA is shon in Fig 2 here each document d1: is assumed to be generated as follos: α Variational inference allos us to calculate approximate posterior distributions over hidden variables {θ d z dn } by maximizing the variational loer-bound on the log marginal lielihood 3 SUPERVISE MULTI-MOAL LA In this section e present the main result discriminative multimodal LA smmla) here e incorporate class label into the joint modeling of visual ords {r dn } and caption ords { dm } hose latent variable representations are related via linear regression The graphical model for smmla is shon in Fig 3 θ d αc C η z dn θ d c d A Λ µ φ dn Fig 2 Graphical model for LA K r c C K z dn r dn x d y dm c ra a vector of topic proportions θ d R K dm M C L θ d irα 1 α K) For each ord n ra a topic assignment z dn R K from multinomial distribution: z dn θ d Multθ d ) ra a ord dn R V : dn z dn φ 1:K p dn z dn φ 1:K ) Given parameters α and φ 1:K the joint distribution of hidden and observed variables is given by pθ d z d1: d1: α φ 1:K ) pθ d α) pz dn θ d )p dn z dn φ 1:K ) Integrating over θ d and φ 1:K and summing over z d1: the marginal distribution of a document is given by p d1: α φ 1:K ) pz dn θ d )p dn z dn φ 1:K ) z d1: pθ d α) dθ d Taing the product of marginal probabilities of single documents the probability of a crops the marginal lielihood is given by p 1:1: α φ 1:K ) p d1: α φ 1:K ) 1) Fig 3 Graphical model for discriminative multi-modal LA smmla) 31 Model The generation process for each visual ord {r dn } and caption ord { dm } is as follos Choose a category label: c d R C Multη) j1 η c dj j here c d is the C-dimensional unit vector If c d is the class label j then c dj 1 and c di 0 for i j ra a vector of image topic proportions: θ d R K irθ d α j) c dj j1 For each visual ord r dn ra an image topic assignment: z dn R K Multθ d ) K 1 θ z dn d

ra a visual ord: r dn z dn c d Mult r ) K V r r cdi z dn r dnj ij i1 1 j1 here V r is the size of visual ord vocabulary Given the empirical image topic frequency z d 1 z dn sample a real-valued topic proportion variable for caption text: x d z d A µ Λ x d Az d + µ Λ 1 ) Compute caption topic proportions: v dl For each caption ord dm e x dl L ex dl ra a caption topic assignment: y dm Multv d ) ra a caption ord: L v y dml dl dm y dm c d Mult ) L V cdi y dml dmj ilj i1 j1 here V is the size of caption ord vocabulary We define sets of variables as R {r dn } W { dm } Z {z dn } Y {y dm } C {c d } Θ {θ d } X {x d } Then the joint distribution over these variables obeys the folloing factorization: pr W C Θ X Z Y) pc η)pθ C α)pz Θ)pR Z r C) px Z A µ Λ)pY X)pW Y C) 2) here pc η) 1 C pθ C α) pz Θ) pr Z r C) px Z A µ Λ) j1 irθ d α j) c dj K 1 θ z dn d C K V r r ) cdi z dn r dnj ij i1 1 j1 x d Az d + µ Λ 1 ) M M py X) py dm x d ) pw Y C) m1 M C L m1 32 Variational Inference V i1 j1 The log marginal lielihood is given by L m1 v y dml dl ilj ) cdi y dml dmj log pr W C) log pr W C Θ X Z Y)dΘdX Θ X Z Y qθ X Z Y) Θ X Z Y ) pr W C Θ X Z Y) log dθdx qθ X Z Y) Fq) 3) here qθ X Z Y) denotes the variational distribution and Jensen s inequality is used to reach the variational loer-bound Fq) We assume that the variational distribution factorizes as qθ X Z Y) qθ)qx)qz)qy) 4) here each distribution { is assumed to be of the form in Table } 1 Variational parameters {α d } {x d Γ 1 d } {τ dn} {ρ dml } are determined by maximizing the variational loer-bound Fq) E q log pc η) + log log pθ C α) + log pz Θ) + log pr Z r C) + log px Z A µ Λ) + log py X) + log pw Y C) E q log qθ) + log qx) + log qz) + log qy) here E q denotes the statistical expectation ith respect to the variational distribution q ) etailed derivations for variational inference are omitted here due to the space limitation In fact derivations can be done in a similar manner to 10 Especially

Variational posterior distributions Table 1 Updating equations for variational parameters Updating equations for variational parameters qθ) irθ d α d ) α d C i1 c diα i + τ dn qz) K 1 τ z dn dn qy) M L m1 ρy dml dml qx) x d x d Γ 1 log τ dn ψα d ) ψα d1 + + α dk ) + C A Λx d µ) 1 + 1 d ) ξ d L log ρ dml C i1 ex dl+ 05 γ dl Vr i1 j1 c dir dnj log r ij 2 diaga ΛA) A 1 ΛA 2 2 m n φ dm V j1 c di dmj log ilj + x dl etermine x d and γ dl by eton-raphson method E q log v dl E q e x dl L e x dl as in 10 its convex loer-bound is maximized: E q log v dl E q x dl log ξ d 1 ξ d 33 Parameter Estimation is not directly maximized Instead x dl log ξ d 1 ξ d L L e x dl + 1 e x dl+ 5 γ dl + 1 Coordinate ascent algorithms for updating variational parameters are summarized in Table 1 Regression parameters {A µ Λ 1 } are updated: A 1 2 1 x d µ) φ dn tr diagφ dn ) + φ dn ) µ 1 x d 1 ) A φ dn Λ 1 1 x d µ)x d µ) + Γ 1 d m n φ dm 1 1 ) ) A φ dn x d Multinomial parameters { r } are updated: r ij c diτ dn r dnj c diτ dn r dnj Vr j1 highay inside city mountain open country street and tall building This subset has 2686 images of size 256 256 ith complete annotations We use 128-dimensional SIFT descriptors 8 computed on 20 20 image patches here each image patch is obtained by sliding a indo ith a 20-pixel interval Then e run -means clustering on 128-dimensional descriptors to learn a 256-ord visual codeboo For the annotation ords e remove the ords appearing less than 3 times in the hole data Finally e have a complete set of triples visual ords caption ords class label) The hole data is separated into the training set of size 2000 and the test set of size 686 We evaluate the performance in terms of caption perplexity defined as Perplexity exp { Md m1 log p dm r d1: ) M d here p dm r d1: ) is the conditional probability of caption ords given an image r d1: and M d is the number of caption ords in document d The higher conditional lielihood leads to the loer perplexity The performance comparison to the previous state of the arts trmmla is summarized in Table 2 here our smmla outperforms trmmla Table 2 Perplexity comparison Method K 25 K 30 trmmla 10 35 36 smmla our method) 285 304 } ilj M m1 c diρ dml dmj V M j1 m1 c diρ dml dmj irichlet parameters {α c} are updated using eton-raphson method as in LA 5 Given a test image r 1: class label and annotations are determined by choosing the most probable ones among conditional probabilities pc d r 1: ) and p dm r 1: ) 4 EXPERIMETS We use the 8-category subset of LabelMe dataset 12 to perform image annotation experiments Categories include coast forest 5 COCLUSIOS In this paper e have presented a multi-modal extension of LA ith supervision leading to smmla We have developed variational inference algorithms to approximately compute posterior distributions over variables of interest in smmla Applications to image annotation demonstrated the high performance of smmla compared to the previous state of the arts Acnoledgments: This or as supported by ational Research Foundation RF) of Korea RF-2013R1A2A2A01067464) and POSTECH Rising Star Program

6 REFERECES 1 Agaral and B-C Chen Regression-based latent factor models in Proceedings of the ACM SIGK Conference on Knoledge iscovery and ata Mining K) Paris France 2009 2 M Blei Probabilistic topic models Communications of the ACM vol 55 no 4 pp 77 84 2012 3 M Blei and M I Jordan Modeling annotated data in Proceedings of the ACM SIGIR Conference on Research and evelopment in Information Retrieval SIGIR) Toronto Canada 2003 4 M Blei and J McAuliffe Supervised topic models in Advances in eural Information Processing Systems IPS) vol 20 MIT Press 2008 5 M Blei A g and M I Jordan Latent irichlet allocation Journal of Machine Learning Research vol 3 pp 993 1022 2003 6 L Fei-Fei and P Perona A Bayesian hierarchical model for learning natural scene categories in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) San iego CA 2005 7 S Lacoste-Julien F Sha and M I Jordan iscla: iscriminative learning for dimensionality reduction and classification in Advances in eural Information Processing Systems IPS) vol 21 2009 8 G Loe istinctive image features from scale-invariant eypoints International Journal of Computer Vision vol 60 no 2 pp 91 110 2004 9 S Par Y- Kim and S Choi Hierarchical Bayesian matrix factorization ith side information in Proceedings of the International Joint Conference on Artificial Intelligence IJCAI) Beijing China 2013 10 Putthividhya H T Attias and S S agarajan Topic regression multi-modal latent irichlet allocation for image annotation in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) San Francisco CA USA 2010 11 Ramage Hall H allapati and C Manning Labeled LA: A supervised topic model for credit attribution in multi-labeled corpora in Proceedings of the Conference on Empirical Methods in atural Language Processing EMLP) Singapore 2009 12 B C Russel A Torralba K P Murphy and W T Freeman LabelMe: A database and eb-based tool for image annotation International Journal of Computer Vision vol 77 pp 157 173 2008 13 C Wang M Blei and L Fei-Fei Simultaneous image classification and annotation in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) Miami FL USA 2009