SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION

Size: px
Start display at page:

Download "SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION"

Transcription

1 SUPERVISE MULTI-MOAL TOPIC MOEL FOR IMAGE AOTATIO Thu Hoai Tran 2 and Seungjin Choi 12 1 epartment of Computer Science and Engineering POSTECH Korea 2 ivision of IT Convergence Engineering POSTECH Korea thtlamson@gmailcom seungjin@postechacr ABSTRACT Multi-modal topic models are probabilistic generative models here hidden topics are learned from data of different types In this paper e present supervised multi-modal latent irichlet allocation smmla) here e incorporate class label global description) into the joint modeling of visual ords and caption ords local description) for image annotation tas We derive variational inference algorithm to approximately compute posterior distribution over latent variables Experiments on a subset of LabelMe dataset demonstrate the useful behavior of our model compared to existing topic models Index Terms Image annotation latent irichlet allocation topic models 1 ITROUCTIO Latent irichlet allocation LA) is a idely-used topic model hich as originally developed to model text corpora 5 It is a hierarchical Bayesian model in hich each observed item is modeled as a finite mixture over an underlying set of topics and each topic is characterized by a distribution over ords The basic idea of LA hen it is applied to model a set of images treating an image as a collection of visual ords is shon in Fig 1 The same intuition in the case of documents can be found in 2 Multi-modal extensions of LA referred to as multi-modal topic models have been proposed to jointly model data of different types These models ere mainly applied to image annotation here the goal is to assign a set of eyords to an image learning underlying topics from a set of image-annotations pairs Earlier or on this direction is correspondence LA cla) 3 hich finds conditional relationships beteen latent variable representations of visual ords and caption ords The conditional distribution of the annotation given visual descriptors is modeled for automatic image annotation Topic regression multi-modal LA trmmla) 10 is an alternative method for capturing statistical association beteen image and text Unlie cla trmmla learns to separate sets of hidden topics and counts on a regression module to allo a set of caption topics to be linearly predicted from the set of image topics It as motivated by the regression-based latent factor model 1 hich as further elaborated in the hierarchical Bayesian frameor 9 It as shon in 10 that trmmla is more flexible than cla in the sense that the former allos the number of image topics to be different from the number of caption topics Class label is a global description of an image hile annotated eyords are local descriptions of image patches Class label and annotations are related to each other For instance an image labeled as highay scene is more liely to be annotated ith cars and road Fig 1 A codeord is assigned to each image patch to represent an image as a collection of visual ords We assume that some number of topics hich are distributions over ords exist for the set of images An illustration of ho an image is generated by an LA model is shon here We first choose a distribution over topics the histogram at right) Then for each visual ord choose a topic assignment the circles ith patterns filled in) and choose the visual ord from the corresponding topic rather than apple and des In this paper e present supervised multimodal latent irichlet allocation smmla) here e incorporate class label into trmmla so that to sets of hidden topics hich are related via linear regression are learned from data of to types as ell as from class label Several extensions of LA to incorporate supervision have been developed in the literature Most of these existing methods are limited to learning from data of single type The model trmmla outperforms most of previous methods in the tas of image annotation but is an unsupervised method Our model smmla extends the previous state of the arts in this domain trmmla by incorporating supervision of class label 2 LATET IRICHLET ALLOCATIO We briefly give an overvie of LA 5 LA 5 is a generative probabilistic model of a corpus in hich documents are represented

2 as random mixtures over latent topics here each topic is described by a distribution over ords Each document d1: is a sequence of ords for d 1 is the size of a corpus) and each ord dn R V V is the size of vocabulary) is a unit vector that has a single entry equal to one and all other entries equal to zero For instance if dn is the vth ord in the vocabulary then dnv 1 and dnj 0 for j v The graphical model for LA is shon in Fig 2 here each document d1: is assumed to be generated as follos: α Variational inference allos us to calculate approximate posterior distributions over hidden variables {θ d z dn } by maximizing the variational loer-bound on the log marginal lielihood 3 SUPERVISE MULTI-MOAL LA In this section e present the main result discriminative multimodal LA smmla) here e incorporate class label into the joint modeling of visual ords {r dn } and caption ords { dm } hose latent variable representations are related via linear regression The graphical model for smmla is shon in Fig 3 θ d αc C η z dn θ d c d A Λ µ φ dn Fig 2 Graphical model for LA K r c C K z dn r dn x d y dm c ra a vector of topic proportions θ d R K dm M C L θ d irα 1 α K) For each ord n ra a topic assignment z dn R K from multinomial distribution: z dn θ d Multθ d ) ra a ord dn R V : dn z dn φ 1:K p dn z dn φ 1:K ) Given parameters α and φ 1:K the joint distribution of hidden and observed variables is given by pθ d z d1: d1: α φ 1:K ) pθ d α) pz dn θ d )p dn z dn φ 1:K ) Integrating over θ d and φ 1:K and summing over z d1: the marginal distribution of a document is given by p d1: α φ 1:K ) pz dn θ d )p dn z dn φ 1:K ) z d1: pθ d α) dθ d Taing the product of marginal probabilities of single documents the probability of a crops the marginal lielihood is given by p 1:1: α φ 1:K ) p d1: α φ 1:K ) 1) Fig 3 Graphical model for discriminative multi-modal LA smmla) 31 Model The generation process for each visual ord {r dn } and caption ord { dm } is as follos Choose a category label: c d R C Multη) j1 η c dj j here c d is the C-dimensional unit vector If c d is the class label j then c dj 1 and c di 0 for i j ra a vector of image topic proportions: θ d R K irθ d α j) c dj j1 For each visual ord r dn ra an image topic assignment: z dn R K Multθ d ) K 1 θ z dn d

3 ra a visual ord: r dn z dn c d Mult r ) K V r r cdi z dn r dnj ij i1 1 j1 here V r is the size of visual ord vocabulary Given the empirical image topic frequency z d 1 z dn sample a real-valued topic proportion variable for caption text: x d z d A µ Λ x d Az d + µ Λ 1 ) Compute caption topic proportions: v dl For each caption ord dm e x dl L ex dl ra a caption topic assignment: y dm Multv d ) ra a caption ord: L v y dml dl dm y dm c d Mult ) L V cdi y dml dmj ilj i1 j1 here V is the size of caption ord vocabulary We define sets of variables as R {r dn } W { dm } Z {z dn } Y {y dm } C {c d } Θ {θ d } X {x d } Then the joint distribution over these variables obeys the folloing factorization: pr W C Θ X Z Y) pc η)pθ C α)pz Θ)pR Z r C) px Z A µ Λ)pY X)pW Y C) 2) here pc η) 1 C pθ C α) pz Θ) pr Z r C) px Z A µ Λ) j1 irθ d α j) c dj K 1 θ z dn d C K V r r ) cdi z dn r dnj ij i1 1 j1 x d Az d + µ Λ 1 ) M M py X) py dm x d ) pw Y C) m1 M C L m1 32 Variational Inference V i1 j1 The log marginal lielihood is given by L m1 v y dml dl ilj ) cdi y dml dmj log pr W C) log pr W C Θ X Z Y)dΘdX Θ X Z Y qθ X Z Y) Θ X Z Y ) pr W C Θ X Z Y) log dθdx qθ X Z Y) Fq) 3) here qθ X Z Y) denotes the variational distribution and Jensen s inequality is used to reach the variational loer-bound Fq) We assume that the variational distribution factorizes as qθ X Z Y) qθ)qx)qz)qy) 4) here each distribution { is assumed to be of the form in Table } 1 Variational parameters {α d } {x d Γ 1 d } {τ dn} {ρ dml } are determined by maximizing the variational loer-bound Fq) E q log pc η) + log log pθ C α) + log pz Θ) + log pr Z r C) + log px Z A µ Λ) + log py X) + log pw Y C) E q log qθ) + log qx) + log qz) + log qy) here E q denotes the statistical expectation ith respect to the variational distribution q ) etailed derivations for variational inference are omitted here due to the space limitation In fact derivations can be done in a similar manner to 10 Especially

4 Variational posterior distributions Table 1 Updating equations for variational parameters Updating equations for variational parameters qθ) irθ d α d ) α d C i1 c diα i + τ dn qz) K 1 τ z dn dn qy) M L m1 ρy dml dml qx) x d x d Γ 1 log τ dn ψα d ) ψα d1 + + α dk ) + C A Λx d µ) d ) ξ d L log ρ dml C i1 ex dl+ 05 γ dl Vr i1 j1 c dir dnj log r ij 2 diaga ΛA) A 1 ΛA 2 2 m n φ dm V j1 c di dmj log ilj + x dl etermine x d and γ dl by eton-raphson method E q log v dl E q e x dl L e x dl as in 10 its convex loer-bound is maximized: E q log v dl E q x dl log ξ d 1 ξ d 33 Parameter Estimation is not directly maximized Instead x dl log ξ d 1 ξ d L L e x dl + 1 e x dl+ 5 γ dl + 1 Coordinate ascent algorithms for updating variational parameters are summarized in Table 1 Regression parameters {A µ Λ 1 } are updated: A x d µ) φ dn tr diagφ dn ) + φ dn ) µ 1 x d 1 ) A φ dn Λ 1 1 x d µ)x d µ) + Γ 1 d m n φ dm 1 1 ) ) A φ dn x d Multinomial parameters { r } are updated: r ij c diτ dn r dnj c diτ dn r dnj Vr j1 highay inside city mountain open country street and tall building This subset has 2686 images of size ith complete annotations We use 128-dimensional SIFT descriptors 8 computed on image patches here each image patch is obtained by sliding a indo ith a 20-pixel interval Then e run -means clustering on 128-dimensional descriptors to learn a 256-ord visual codeboo For the annotation ords e remove the ords appearing less than 3 times in the hole data Finally e have a complete set of triples visual ords caption ords class label) The hole data is separated into the training set of size 2000 and the test set of size 686 We evaluate the performance in terms of caption perplexity defined as Perplexity exp { Md m1 log p dm r d1: ) M d here p dm r d1: ) is the conditional probability of caption ords given an image r d1: and M d is the number of caption ords in document d The higher conditional lielihood leads to the loer perplexity The performance comparison to the previous state of the arts trmmla is summarized in Table 2 here our smmla outperforms trmmla Table 2 Perplexity comparison Method K 25 K 30 trmmla smmla our method) } ilj M m1 c diρ dml dmj V M j1 m1 c diρ dml dmj irichlet parameters {α c} are updated using eton-raphson method as in LA 5 Given a test image r 1: class label and annotations are determined by choosing the most probable ones among conditional probabilities pc d r 1: ) and p dm r 1: ) 4 EXPERIMETS We use the 8-category subset of LabelMe dataset 12 to perform image annotation experiments Categories include coast forest 5 COCLUSIOS In this paper e have presented a multi-modal extension of LA ith supervision leading to smmla We have developed variational inference algorithms to approximately compute posterior distributions over variables of interest in smmla Applications to image annotation demonstrated the high performance of smmla compared to the previous state of the arts Acnoledgments: This or as supported by ational Research Foundation RF) of Korea RF-2013R1A2A2A ) and POSTECH Rising Star Program

5 6 REFERECES 1 Agaral and B-C Chen Regression-based latent factor models in Proceedings of the ACM SIGK Conference on Knoledge iscovery and ata Mining K) Paris France M Blei Probabilistic topic models Communications of the ACM vol 55 no 4 pp M Blei and M I Jordan Modeling annotated data in Proceedings of the ACM SIGIR Conference on Research and evelopment in Information Retrieval SIGIR) Toronto Canada M Blei and J McAuliffe Supervised topic models in Advances in eural Information Processing Systems IPS) vol 20 MIT Press M Blei A g and M I Jordan Latent irichlet allocation Journal of Machine Learning Research vol 3 pp L Fei-Fei and P Perona A Bayesian hierarchical model for learning natural scene categories in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) San iego CA S Lacoste-Julien F Sha and M I Jordan iscla: iscriminative learning for dimensionality reduction and classification in Advances in eural Information Processing Systems IPS) vol G Loe istinctive image features from scale-invariant eypoints International Journal of Computer Vision vol 60 no 2 pp S Par Y- Kim and S Choi Hierarchical Bayesian matrix factorization ith side information in Proceedings of the International Joint Conference on Artificial Intelligence IJCAI) Beijing China Putthividhya H T Attias and S S agarajan Topic regression multi-modal latent irichlet allocation for image annotation in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) San Francisco CA USA Ramage Hall H allapati and C Manning Labeled LA: A supervised topic model for credit attribution in multi-labeled corpora in Proceedings of the Conference on Empirical Methods in atural Language Processing EMLP) Singapore B C Russel A Torralba K P Murphy and W T Freeman LabelMe: A database and eb-based tool for image annotation International Journal of Computer Vision vol 77 pp C Wang M Blei and L Fei-Fei Simultaneous image classification and annotation in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR) Miami FL USA 2009

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052

Markov Topic Models. Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 98052 Chong Wang Computer Science Dept. Princeton University Princeton, NJ 08540 Bo Thiesson, Christopher Meek Microsoft Research One Microsoft Way Redmond, WA 9805 David Blei Computer Science Dept. Princeton

More information

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets

Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets Fast Supervised LDA for Discovering Micro-Events in Large-Scale Video Datasets Angelos Katharopoulos, Despoina Paschalidou, Christos Diou, Anastasios Delopoulos Multimedia Understanding Group ECE Department,

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Distinguish between different types of scenes. Matching human perception Understanding the environment

Distinguish between different types of scenes. Matching human perception Understanding the environment Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Using Both Latent and Supervised Shared Topics for Multitask Learning

Using Both Latent and Supervised Shared Topics for Multitask Learning Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka UT Austin, Dept. of ECE September 21, 2013 Problem Definition An

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos Department of Electrical and Computer Engineering University of California,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification 3 IEEE International Conference on Computer Vision Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos Department of Electrical

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression

LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression LogisticLDA: Regularizing Latent Dirichlet Allocation by Logistic Regression b Jia-Cheng Guo a, Bao-Liang Lu a,b, Zhiwei Li c, and Lei Zhang c a Center for Brain-Like Computing and Machine Intelligence

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models SUBMISSION TO IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models Xiaogang Wang, Xiaoxu Ma,

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models SUBMISSION TO IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models Xiaogang Wang, Xiaoxu Ma,

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Unsupervised Activity Perception by Hierarchical Bayesian Models

Unsupervised Activity Perception by Hierarchical Bayesian Models Unsupervised Activity Perception by Hierarchical Bayesian Models Xiaogang Wang Xiaoxu Ma Eric Grimson Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment

Enhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China

More information

Content-based Recommendation

Content-based Recommendation Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation Indraneel Muherjee David M. Blei Department of Computer Science Princeton University 3 Olden Street Princeton, NJ

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Text mining and natural language analysis. Jefrey Lijffijt

Text mining and natural language analysis. Jefrey Lijffijt Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Measuring Topic Quality in Latent Dirichlet Allocation

Measuring Topic Quality in Latent Dirichlet Allocation Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13

More information

INTERPRETING THE PREDICTION PROCESS OF A DEEP NETWORK CONSTRUCTED FROM SUPERVISED TOPIC MODELS

INTERPRETING THE PREDICTION PROCESS OF A DEEP NETWORK CONSTRUCTED FROM SUPERVISED TOPIC MODELS INTERPRETING THE PREDICTION PROCESS OF A DEEP NETWORK CONSTRUCTED FROM SUPERVISED TOPIC MODELS Jianshu Chen, Ji He, Xiaodong He, Lin Xiao, Jianfeng Gao, and Li Deng Microsoft Research, Redmond, WA 9852,

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

TOPIC models, such as latent Dirichlet allocation (LDA),

TOPIC models, such as latent Dirichlet allocation (LDA), IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, XXXX Learning Supervised Topic Models for Classification and Regression from Crowds Filipe Rodrigues, Mariana Lourenço, Bernardete

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Latent Dirichlet Bayesian Co-Clustering

Latent Dirichlet Bayesian Co-Clustering Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07

Probablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

A Continuous-Time Model of Topic Co-occurrence Trends

A Continuous-Time Model of Topic Co-occurrence Trends A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Web Search and Text Mining. Lecture 16: Topics and Communities

Web Search and Text Mining. Lecture 16: Topics and Communities Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the

More information