Graphical Models for Query-driven Analysis of Multimodal Data

Similar documents
Spatial Normalized Gamma Process

arxiv: v1 [stat.ml] 8 Jan 2012

28 : Approximate Inference - Distributed MCMC

Parallel Markov Chain Monte Carlo for Pitman-Yor Mixture Models

Non-Parametric Bayes

Bayesian Nonparametrics for Speech and Signal Processing

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

arxiv: v1 [stat.ml] 5 Dec 2016

Dirichlet Enhanced Latent Semantic Analysis

Bayesian Nonparametrics: Dirichlet Process

Approximate Inference using MCMC

Bayesian Methods for Machine Learning

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Introduction to Machine Learning

CPSC 540: Machine Learning

Bayesian Nonparametric Regression for Diabetes Deaths

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Pitfalls in the use of Parallel Inference for the Dirichlet Process

Collapsed Variational Dirichlet Process Mixture Models

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Bayesian Inference and MCMC

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

STA 4273H: Statistical Machine Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Stochastic Variational Inference for the HDP-HMM

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

17 : Optimization and Monte Carlo Methods

Bayesian Nonparametric Modeling of Driver Behavior using HDP Split-Merge Sampling Algorithm

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

STA 4273H: Statistical Machine Learning

A permutation-augmented sampler for DP mixture models

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Inference in Explicit Duration Hidden Markov Models

Advanced Machine Learning

16 : Approximate Inference: Markov Chain Monte Carlo

Hierarchical Dirichlet Processes with Random Effects

Bayesian Hidden Markov Models and Extensions

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Hierarchical Dirichlet Processes

A Brief Overview of Nonparametric Bayesian Models

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Distance dependent Chinese restaurant processes

Markov chain Monte Carlo Lecture 9

Machine Learning for Data Science (CS4786) Lecture 24

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

Gaussian Mixture Model

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Probabilistic Graphical Models

Part IV: Monte Carlo and nonparametric Bayes

Slice Sampling Mixture Models

STA 4273H: Sta-s-cal Machine Learning

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Image segmentation combining Markov Random Fields and Dirichlet Processes

Hidden Markov models: from the beginning to the state of the art

an introduction to bayesian inference

Kernel adaptive Sequential Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Sparse Stochastic Inference for Latent Dirichlet Allocation

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Markov Chain Monte Carlo methods

Machine Learning Summer School

Lecture 13 : Variational Inference: Mean Field Approximation

Latent Dirichlet Allocation Based Multi-Document Summarization

Lecture 6: Graphical Models: Learning

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Afternoon Meeting on Bayesian Computation 2018 University of Reading

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

17 : Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Lecture 3a: Dirichlet processes

A marginal sampler for σ-stable Poisson-Kingman mixture models

Probabilistic Graphical Models

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Tree-Based Inference for Dirichlet Process Mixtures

A Nonparametric Model for Stationary Time Series

LECTURE 15 Markov chain Monte Carlo

Bayesian Nonparametric Hidden Semi-Markov Models

Introduction to Machine Learning CMU-10701

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Topic Modelling and Latent Dirichlet Allocation

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Contents. Part I: Fundamentals of Bayesian Inference 1

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Construction of Dependent Dirichlet Processes based on Poisson Processes

Nonparametric Drift Estimation for Stochastic Differential Equations

Hmms with variable dimension structures and extensions

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Bayesian Nonparametric Models

Nonparametric Bayesian Methods - Lecture I

Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models

Non-parametric Clustering with Dirichlet Processes

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Transcription:

Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology http://groups.csail.mit.edu/vision/sli/ October 15, 2015

Actionable Information...information is actionable if it is prescriptive of actions that can be taken to either improve upon the state of uncertainty for a particular task or allow one to accurately evaluate the cost of ancillary decisions related to the task. -original source in dispute The perfect is the enemy of the good. -Voltaire, 1764 (though, he probably said it in French)

Actionable Information Some Comments on Value, Information, and Action Choices The Economic notion of value is subjective. The value of a tangible good is inversely proportional to its availability. The value of information depends on the actions it induces and the subsequent rewards. Depending on context informational utility can vary. In the economic setting, informational value has focused primarily on decision making. The relation to uncertainty has been one of the key drivers. Perfect information removes all uncertainty about outcomes and hence uncertainty about consequences of actions. In the Bayesian setting we are interested in reasoning about the relation between the uncertain state of nature (e.g.,truth or falseness of a set of assertions) and the inherent distribution of risk associated with subsequent actions.

Analysis by Query EES SUT predicate processing logic processing scene representation attribution tracker sensors comes first, the sensor or the query? Goal is to efficiently bridge gap between sensors (broadly construed), queries and answers (including uncertainty).

Airborne Detection of Material Analysis structure of the graphical model guides allocation of computational resources and sensing resources. η x x j η w w j y i b η b η s s j z i σ ε η σ m n p(θ) = p(b)p(σ ε ) n i=1 m j=1 p(x j )p(w j )p(s j ) p(z i x,w,s,b i,σ ε ; y i ).

Parametric vs. Nonparametric Methods Component-wise MH and Gibbs sampling are two common parametric methods of sampling from high-dimensional distributions. Others exist, e.g.,slice sampling, rejection sampling, Hamiltonian Monte Carlo, etc. Parametric methods require that the dimension of X is constant! Suppose X contains the cluster parameters for m different clusters, but we do not know the number of clusters m a priori. Represent each number of clusters m as corresponding to a model M m. Now X m denotes an r.v. of model M m, not the m-th component of X! Nonparametric MCMC methods extend to these cases while retaining guarantees about convergence to the stationary distribution π(m,x m ).

Reversible Jump MCMC (RJMCMC) RJMCMC [1] adds a trans-dimensional jump proposal to each iteration of any parametric method, hence allowing sampling of the model order! Algorithm 1 Reversible-Jump MCMC Initial state x (0) m 0 t = 1,2,... Generate x (t) m from x (t 1) m using componentwise MH, Gibbs, etc. Propose new model M n with probability π mn Sample auxiliary variables u mn ϕ mn (u) Set (x n,v nm ) = T mn ( x m (t),u mn ) via transform T mn : M m M n Set x (t) ( min 1, π(n,x n) π(m, x (t) set x m (t) = x m (t) otherwise. m ) n = x n with acceptance probability ) π nm ϕ nm (v nm ) T mn (x m,u mn ) π mn ϕ mn (u mn ) (x m,u mn ) ; [1] Green (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.

Grid-based events Divide survey area into cells Sources indicated as x s with 1σ contour shown as dashed lines Average over the set of samples... Compute the probability that each cell contains at least one source by integrating source pdf over the cell. y 8 y 7 y 6 y 5 y 4 y 3 y 2 (3,5) y 1 y A 0 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8

MCMC Inference What an MCMC practitioner might look like. Generative models can be used to compute various expectations and marginal event probabilities. Markov Chain Monte Carlo (MCMC) methods are a way to do this exactly. Akin to flipping a coin and counting. Suitable for large and complex models, i.e., the quality of the estimates do not depend on dimensionality or dependency structure. The challenge is to obtain sufficient independent flips (from the correct distribution).

MCMC Inference Definition (Detailed Balance) Let π(z) denote the target distribution. If an ergodic Markov chain is constructed with a transition distribution q(ẑ z) that satisfies π(z)q(ẑ z) = π(ẑ)q(z ẑ), then the chain is said to satisfy the detailed balance condition and π(z) is guaranteed to be the unique stationary distribution of the chain. Generative models can be used to compute various expectations and marginal event probabilities. Markov Chain Monte Carlo (MCMC) methods are a way to do this exactly. Akin to flipping a coin and counting. Suitable for large and complex models, i.e., the quality of the estimates do not depend on dimensionality or dependency structure. The challenge is to obtain sufficient independent flips (from the correct distribution).

Parallel Sampling in DP Mixture Models Chang and Fisher III [2013] Key Ideas Composition of non-ergodic restricted Gibbs iterations. Points in different super-clusters (groups of clusters) can be sampled in parallel Splits proposed via sub-cluster assignment in constant time and in parallel. Yields an ergodic Markov Chain that satisfied detailed balance. Significantly faster convergence in experiments with large datasets. α π z i x i θ k λ N DPMM α g λ π z i x i θ k π z i θ k N Augmented with sub & super-clusters

DP Sampling Properties Chang and Fisher III [2013] CW [7, 8] [4, 14] [3, 6, 9] [10] [11, 18] [1] Exact Model Splits & Merges Intra-cluster Parallelizable 2 Inter-cluster Parallelizable Non-conjugate Priors Log likelihood vs. computation time for real data. All parallel algorithms use 16 cores. 2 Intra-cluster parallelization has not been a significant factor. Decentralized inference may show different behavior

Parallel Sampling in HDP Mixture Models Chang and Fisher III [2014] Extension of Chang and Fisher III [2013] to HDPs is not straightforward. The notion of sub-clusters remains Complexity due to additional latent variables and overlapping distributions Necessitates some bookkeeping Split/merge moves are modified from the DP case Empirical results indicate that hold-out log-likelihood (aka perplexity) can be a poor indicator of convergence. α π z i x i θ k λ γ N HDP model α β π d z di x di θ k β π d z di θ k Nd Augmented HDP model D λ

HDP Sampling Properties Chang and Fisher III [2014] CRF [15] DA [15] SAMS [16] FSD [5] HW[13] SC [17] [2] Infinite Model MCMC Guarantees Non-Conjugate Priors Parallelizable Local Splits/Merges Global Splits/Merges potentially with adapatation of the DP Metropolis-Hastings framework of Neal [2000]. (a) Results on (a) Enron emails and (b) NYTimes articles for 1 and 50 initial topics. (b)

Source visibility Single source of specified emission rate s. Marginalizing over wind fields, aircraft path, and half-width. This plots our detection probability conditioned on some source rate versus the false alarm probability. We can reliably detect sources with emission rate of at least 0.02 m 3 /s.

Video example of contextual modeling for scene understanding

References I J. Chang and J. W. Fisher III. Parallel sampling of dp mixture models using sub-cluster splits. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 620 628. 2013. URL http://papers.nips.cc/paper/ 5162-parallel-sampling-of-dp-mixture-models-using-sub-cluster-splits. pdf. J. Chang and J. W. Fisher III. Mcmc sampling in hdps using sub-cluster. In Advances in Neural Information Processing Systems 27. 2014. D. B. Dahl. An improved merge-split sampler for conjugate Dirichlet process mixture models. Technical report, University of Wisconsin - Madison Dept. of Statistics, 2003. S. Favaro and Y. W. Teh. MCMC for normalized random measure mixture models. Statistical Science, 2013. E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An HDP-HMM for systems with state persistence. In International Conference on Machine Learning, July 2008. P. J. Green and S. Richardson. Modelling heterogeneity with and without the Dirichlet process. Scandinavian Journal of Statistics, pages 355 375, 2001.

References II H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96:161 173, 2001. H. Ishwaran and M. Zarepour. Exact and approximate sum-representations for the Dirichlet process. Canadian Journal of Statistics, 30:269 283, 2002. S. Jain and R. Neal. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13:158 182, 2000. S. Jain and R. Neal. Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Analysis, 2(3):445 472, 2007. D. Lovell, R. P. Adams, and V. K. Mansingka. Parallel Markov chain Monte Carlo for Dirichlet process mixtures. In Workshop on Big Learning, NIPS, 2012. R. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249 265, June 2000. D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801 1828, Dec. 2009. ISSN 1532-4435.

References III O. Papaspiliopoulos and G. O. Roberts. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 95(1):169 186, 2008. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566 1581, 2006. C. Wang and D. Blei. A split-merge MCMC algorithm for the Hierarchical Dirichlet process. arxiv:1207.1657 [stat.ml], 2012. S. Williamson, A. Dubey, and E. P. Xing. Parallel Markov chain Monte Carlo for nonparametric mixture models. In ICML, 2013a. S. A. Williamson, A. Dubey, and E. P. Xing. Parallel Markov chain Monte Carlo for nonparametric mixture models. In ICML, 2013b.

References IV