Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology http://groups.csail.mit.edu/vision/sli/ October 15, 2015
Actionable Information...information is actionable if it is prescriptive of actions that can be taken to either improve upon the state of uncertainty for a particular task or allow one to accurately evaluate the cost of ancillary decisions related to the task. -original source in dispute The perfect is the enemy of the good. -Voltaire, 1764 (though, he probably said it in French)
Actionable Information Some Comments on Value, Information, and Action Choices The Economic notion of value is subjective. The value of a tangible good is inversely proportional to its availability. The value of information depends on the actions it induces and the subsequent rewards. Depending on context informational utility can vary. In the economic setting, informational value has focused primarily on decision making. The relation to uncertainty has been one of the key drivers. Perfect information removes all uncertainty about outcomes and hence uncertainty about consequences of actions. In the Bayesian setting we are interested in reasoning about the relation between the uncertain state of nature (e.g.,truth or falseness of a set of assertions) and the inherent distribution of risk associated with subsequent actions.
Analysis by Query EES SUT predicate processing logic processing scene representation attribution tracker sensors comes first, the sensor or the query? Goal is to efficiently bridge gap between sensors (broadly construed), queries and answers (including uncertainty).
Airborne Detection of Material Analysis structure of the graphical model guides allocation of computational resources and sensing resources. η x x j η w w j y i b η b η s s j z i σ ε η σ m n p(θ) = p(b)p(σ ε ) n i=1 m j=1 p(x j )p(w j )p(s j ) p(z i x,w,s,b i,σ ε ; y i ).
Parametric vs. Nonparametric Methods Component-wise MH and Gibbs sampling are two common parametric methods of sampling from high-dimensional distributions. Others exist, e.g.,slice sampling, rejection sampling, Hamiltonian Monte Carlo, etc. Parametric methods require that the dimension of X is constant! Suppose X contains the cluster parameters for m different clusters, but we do not know the number of clusters m a priori. Represent each number of clusters m as corresponding to a model M m. Now X m denotes an r.v. of model M m, not the m-th component of X! Nonparametric MCMC methods extend to these cases while retaining guarantees about convergence to the stationary distribution π(m,x m ).
Reversible Jump MCMC (RJMCMC) RJMCMC [1] adds a trans-dimensional jump proposal to each iteration of any parametric method, hence allowing sampling of the model order! Algorithm 1 Reversible-Jump MCMC Initial state x (0) m 0 t = 1,2,... Generate x (t) m from x (t 1) m using componentwise MH, Gibbs, etc. Propose new model M n with probability π mn Sample auxiliary variables u mn ϕ mn (u) Set (x n,v nm ) = T mn ( x m (t),u mn ) via transform T mn : M m M n Set x (t) ( min 1, π(n,x n) π(m, x (t) set x m (t) = x m (t) otherwise. m ) n = x n with acceptance probability ) π nm ϕ nm (v nm ) T mn (x m,u mn ) π mn ϕ mn (u mn ) (x m,u mn ) ; [1] Green (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.
Grid-based events Divide survey area into cells Sources indicated as x s with 1σ contour shown as dashed lines Average over the set of samples... Compute the probability that each cell contains at least one source by integrating source pdf over the cell. y 8 y 7 y 6 y 5 y 4 y 3 y 2 (3,5) y 1 y A 0 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8
MCMC Inference What an MCMC practitioner might look like. Generative models can be used to compute various expectations and marginal event probabilities. Markov Chain Monte Carlo (MCMC) methods are a way to do this exactly. Akin to flipping a coin and counting. Suitable for large and complex models, i.e., the quality of the estimates do not depend on dimensionality or dependency structure. The challenge is to obtain sufficient independent flips (from the correct distribution).
MCMC Inference Definition (Detailed Balance) Let π(z) denote the target distribution. If an ergodic Markov chain is constructed with a transition distribution q(ẑ z) that satisfies π(z)q(ẑ z) = π(ẑ)q(z ẑ), then the chain is said to satisfy the detailed balance condition and π(z) is guaranteed to be the unique stationary distribution of the chain. Generative models can be used to compute various expectations and marginal event probabilities. Markov Chain Monte Carlo (MCMC) methods are a way to do this exactly. Akin to flipping a coin and counting. Suitable for large and complex models, i.e., the quality of the estimates do not depend on dimensionality or dependency structure. The challenge is to obtain sufficient independent flips (from the correct distribution).
Parallel Sampling in DP Mixture Models Chang and Fisher III [2013] Key Ideas Composition of non-ergodic restricted Gibbs iterations. Points in different super-clusters (groups of clusters) can be sampled in parallel Splits proposed via sub-cluster assignment in constant time and in parallel. Yields an ergodic Markov Chain that satisfied detailed balance. Significantly faster convergence in experiments with large datasets. α π z i x i θ k λ N DPMM α g λ π z i x i θ k π z i θ k N Augmented with sub & super-clusters
DP Sampling Properties Chang and Fisher III [2013] CW [7, 8] [4, 14] [3, 6, 9] [10] [11, 18] [1] Exact Model Splits & Merges Intra-cluster Parallelizable 2 Inter-cluster Parallelizable Non-conjugate Priors Log likelihood vs. computation time for real data. All parallel algorithms use 16 cores. 2 Intra-cluster parallelization has not been a significant factor. Decentralized inference may show different behavior
Parallel Sampling in HDP Mixture Models Chang and Fisher III [2014] Extension of Chang and Fisher III [2013] to HDPs is not straightforward. The notion of sub-clusters remains Complexity due to additional latent variables and overlapping distributions Necessitates some bookkeeping Split/merge moves are modified from the DP case Empirical results indicate that hold-out log-likelihood (aka perplexity) can be a poor indicator of convergence. α π z i x i θ k λ γ N HDP model α β π d z di x di θ k β π d z di θ k Nd Augmented HDP model D λ
HDP Sampling Properties Chang and Fisher III [2014] CRF [15] DA [15] SAMS [16] FSD [5] HW[13] SC [17] [2] Infinite Model MCMC Guarantees Non-Conjugate Priors Parallelizable Local Splits/Merges Global Splits/Merges potentially with adapatation of the DP Metropolis-Hastings framework of Neal [2000]. (a) Results on (a) Enron emails and (b) NYTimes articles for 1 and 50 initial topics. (b)
Source visibility Single source of specified emission rate s. Marginalizing over wind fields, aircraft path, and half-width. This plots our detection probability conditioned on some source rate versus the false alarm probability. We can reliably detect sources with emission rate of at least 0.02 m 3 /s.
Video example of contextual modeling for scene understanding
References I J. Chang and J. W. Fisher III. Parallel sampling of dp mixture models using sub-cluster splits. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 620 628. 2013. URL http://papers.nips.cc/paper/ 5162-parallel-sampling-of-dp-mixture-models-using-sub-cluster-splits. pdf. J. Chang and J. W. Fisher III. Mcmc sampling in hdps using sub-cluster. In Advances in Neural Information Processing Systems 27. 2014. D. B. Dahl. An improved merge-split sampler for conjugate Dirichlet process mixture models. Technical report, University of Wisconsin - Madison Dept. of Statistics, 2003. S. Favaro and Y. W. Teh. MCMC for normalized random measure mixture models. Statistical Science, 2013. E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An HDP-HMM for systems with state persistence. In International Conference on Machine Learning, July 2008. P. J. Green and S. Richardson. Modelling heterogeneity with and without the Dirichlet process. Scandinavian Journal of Statistics, pages 355 375, 2001.
References II H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96:161 173, 2001. H. Ishwaran and M. Zarepour. Exact and approximate sum-representations for the Dirichlet process. Canadian Journal of Statistics, 30:269 283, 2002. S. Jain and R. Neal. A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13:158 182, 2000. S. Jain and R. Neal. Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Analysis, 2(3):445 472, 2007. D. Lovell, R. P. Adams, and V. K. Mansingka. Parallel Markov chain Monte Carlo for Dirichlet process mixtures. In Workshop on Big Learning, NIPS, 2012. R. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249 265, June 2000. D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. The Journal of Machine Learning Research, 10:1801 1828, Dec. 2009. ISSN 1532-4435.
References III O. Papaspiliopoulos and G. O. Roberts. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 95(1):169 186, 2008. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566 1581, 2006. C. Wang and D. Blei. A split-merge MCMC algorithm for the Hierarchical Dirichlet process. arxiv:1207.1657 [stat.ml], 2012. S. Williamson, A. Dubey, and E. P. Xing. Parallel Markov chain Monte Carlo for nonparametric mixture models. In ICML, 2013a. S. A. Williamson, A. Dubey, and E. P. Xing. Parallel Markov chain Monte Carlo for nonparametric mixture models. In ICML, 2013b.
References IV