Efficient Information Planning in Graphical Models

Similar documents
A Note on the Budgeted Maximization of Submodular Functions

CS Lecture 3. More Bayesian Networks

Optimal Sensor Placement and Scheduling with Value of Information for Spatio-Temporal Infrastructure System Management

Performance Guarantees for Information Theoretic Active Inference

Linear Dynamical Systems

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

13: Variational inference II

Probabilistic Graphical Models

Partially Observable Markov Decision Processes (POMDPs)

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Submodularity in Machine Learning

Bayesian Machine Learning - Lecture 7

Probabilistic Graphical Models

Machine Learning for Data Science (CS4786) Lecture 24

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

13 : Variational Inference: Loopy Belief Propagation and Mean Field

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Expectation Propagation in Factor Graphs: A Tutorial

Graphical Models Seminar

Variational Inference (11/04/13)

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

7. Shortest Path Problems and Deterministic Finite State Systems

Expectation Propagation Algorithm

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

Probabilistic Graphical Models for Image Analysis - Lecture 1

MACHINE LEARNING 2 UGM,HMMS Lecture 7

9 Forward-backward algorithm, sum-product on factor graphs

Inference in Bayesian Networks

A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models

Submodular Functions Properties Algorithms Machine Learning

14 : Theory of Variational Inference: Inner and Outer Approximation

Variational algorithms for marginal MAP

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Stochastic Variational Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Variational Inference. Sargur Srihari

Introduction to Probabilistic Graphical Models

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference

Probabilistic and Bayesian Machine Learning

Does Better Inference mean Better Learning?

Convex sets, conic matrix factorizations and conic rank lower bounds

Recent Advances in Bayesian Inference Techniques

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

9. Submodular function optimization

Hidden Markov Models. Terminology, Representation and Basic Problems

Online Forest Density Estimation

Introduction to Artificial Intelligence (AI)

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

High-dimensional graphical model selection: Practical and information-theoretic limits

Stochastic Complexity of Variational Bayesian Hidden Markov Models

A Combined LP and QP Relaxation for MAP

Learning discrete graphical models via generalized inverse covariance matrices

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Pattern Recognition and Machine Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

STA 4273H: Statistical Machine Learning

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Probabilistic Graphical Models

Convex relaxation for Combinatorial Penalties

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

Hidden Markov models 1

Bayesian networks: approximate inference

Convergence Rate of Expectation-Maximization

Self-Organization by Optimizing Free-Energy

Estimation of signal information content for classification

Probabilistic Graphical Models

High-dimensional graphical model selection: Practical and information-theoretic limits

Expectation Propagation in Dynamical Systems

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Active Learning and Optimized Information Gathering

CS532, Winter 2010 Hidden Markov Models

Variational Algorithms for Marginal MAP

Graphical models and message-passing Part II: Marginals and likelihoods

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Minimum Weight Perfect Matching via Blossom Belief Propagation

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Methods in Artificial Intelligence

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Graphical Models for Query-driven Analysis of Multimodal Data

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Chapter 05: Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Lecture 13 : Variational Inference: Mean Field Approximation

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

CS711008Z Algorithm Design and Analysis

Learning symmetric non-monotone submodular functions

Machine Learning Techniques for Computer Vision

Reinforcement Learning

Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits

Transcription:

Efficient Information Planning in Graphical Models computational complexity considerations John Fisher & Giorgos Papachristoudis, MIT VITALITE Annual Review 2013 September 9, 2013 J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 1 / 14

Information Fusion Distributed Information Fusion J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 2 / 14

Established Results Key Ideas 1 A broad class of information measures - f -divergences are fundamentally linked to bounds on risk. Bartlett et al. [2003], Nguyen et al. [2009] f -divergence φ-risk bound on excess risk 2 as applied to information measures is a key enabler. Krause and Guestrin [2005], Williams et al. [2007], Papachristoudis and Fisher III [2012] off-line and on-line performance bounds guarantees on tractable planning methods incorporations of inhomogenous resource constraints 3 Submodular properties are intimately related to the structure of graphical models. Williams et al. [2007] local properties (and computations) yield global properties J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 3 / 14

Key Ideas Key Ideas Information planning posed as combinatorial selection problem over sequential consideration of groups of measurements 1 bounds apply to all sequences (visit paths) 2 information rewards vary across walks 3 evaluation of multiple walks leads to increased information rewards with diminishing probability 4 evaluation of multiple walks leads to tighter upper bound also with reduced probability J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 4 / 14

Some Context Distributed Sensing Key Ideas z 1 z 2 xn xk x0 z Ns Computational Hurdles Evaluating information measures for complex sensors induces a computational bottleneck. Evaluating information measures for simple sensors and complex graphs (or even simple graphs) induces a computational bottleneck. Due to the branching structure (i.e.,, dependence on prior sensor actions), optimal plans are intractable due to exponential complexity. J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 5 / 14

Inference versus Information VoI Inf Gain x z 1... z k... z N Bayesian Inference p(x z 1,,z k ) = p(x) p(z 1 x) p(z 1 ) Information Gain p(z 2 x) p(z 2 z 1 ) p(z k x) p(z k z 1,,z k 1 ) complementary information {}}{ I (x;z 1,,z k ) = I (x;z 1 )+I (x;z 2 ) I (z 1 ;z 2 ) + + I (x;z k ) I (z k ;z 1,,z k 1 ) }{{} common information J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 6 / 14

VoI Submodularity Given a set V, a real-valued function f on 2 V is submodular if f (A) + f (B) f (A B) + f (A B) A,B V. Define the set increment function as ρ S (j) f (S j) f (S). Equivalently, a real-valued function is submodular if ρ A (j) ρ B (j) A B V and j / B that is, the incremental value of j is greater relative to A than to any B which contains A. Submodularity captures the notion of diminishing returns J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 7 / 14

Submodularity VoI Monotonicity: A real-valued f is monotone if Greedy Selection Batch setting g j = argmax ρ G j 1(u) u V\G j 1 f (A) f (B) ; A B, or ρ S (j) 0 ; j V,S V Sequential setting g j = argmax ρ G j 1(u) u V wj \G j 1 The batch setting chooses from among all measurements conditioned on previous selections. The sequential setting is restricted to only those available at the current node in the visit walk. J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 8 / 14

Preliminaries VoI Notation X = {X 1,,X n } denotes n latent inference variables. Z = {Z 1,,Z n } denotes n measurement vectors. Each Z t is comprised of N t measurements corresponding to variable X t. V t = {1,,N t } indicate measurement indices, i.e., observation sets. Z i Z j X : Measurements are independent given X. Reward function: f : 2 V R : a set function that captures the value of sensing actions. Cost function: c : 2 V R + : a nonnegative set function that quantifies the cost of a subset, and where costs are assumed to be additive over the elements of the subset.. c(s) = c j j S J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 9 / 14

Sequential Setting VoI Goal: Choose k 1 from V 1,, k n from V n : n O = argmax f (S) where S = S t and S i S j = /0, i j. S 1 k 1, S n k n t=1 Z 1 1 Z2 1 Z2 2 Z N2 2 Z 1 T Z 2 T Z 1 1 Z2 1 Z2 2 Z N2 2 Z 1 T Z 2 T Z 2 1 Z N1 1 X1 X3 X2 X4 XT Z NT T Z 2 1 Z N1 1 X1 X3 X2 X4 XT Z NT T Z3 1 Z3 2 Z N3 3 Z 1 4 Z 2 4 Z N4 4 Z3 1 Z3 2 Z N3 3 Z 1 4 Z 2 4 Z N4 4 N t measurements for each hidden variable X t. Visit walk: define the M-length visit walk as the order {w 1,,w M } in which we visit observation sets V t during a selection process. J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 10 / 14

Sequential Setting VoI Goal: Choose k 1 from V 1,, k n from V n : n O = argmax f (S) where S = S t and S i S j = /0, i j. S 1 k 1, S n k n t=1 Z2 1 Z2 2 Z N2 2 Z 1 T t =1 t =2 t = T Z 1 1 Z 2 T Z 2 1 X1 X2 XT Z NT T Z N1 1 X3 X4 Z3 1 Z3 2 Z N3 3 Z 1 4 Z 2 4 Z N4 4 h 1 h 1 h 1 h 1 h 1 h 2 h 2 h 3 h 1 h 1 h 4 h 4 h 4 h 4 h 4 wj wj+1wj+2 wm Analysis specialized to Markov Chains and LQG models Extends to trees and polytrees J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 10 / 14

Gaussian Markov Chains VoI We consider Gaussian Markov chains for convenience in derivations. The underlying dynamical system is: X k = A k 1 X k 1 + V k 1 Y k = C k X k + W k, where X 0 N ( x 0, Σ 0 ),V k 1 N (0,Q k 1 ),W k N (0,R k ). (A Markov chain is shown in the upper right figure.) Results can be generalized to trees and polytrees. J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 11 / 14

VoI Sparsity Usually, measurements are obtained from a small subset of the underlying process. A hidden variable depends only on a restricted set of hidden variables of the previous time point. t = k t = k +1 J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 12 / 14

Emprical Results VoI 14.8 IG as a function of complexity 14.7 14.6 14.5 14.4 IG 14.3 14.2 14.1 14 13.9 random walks segm walks (len: 2) segm walks (len: 3) segm walks (len: 4) segm walks (len: 5) forward walk worst complexity walk maximum IG walk 13.8 0 100 200 300 400 500 600 700 800 900 1000 number of messages J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 13 / 14

References References I P. L. Bartlett, M. I. Jordan, and J. D. Mcauliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 2003. A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. In Uncertainty in Artificial Intelligence, July 2005. X. Nguyen, M. J. Wainwright, and M. I. Jordan. On surrogate loss functions and f-divergences. Annals of Statistics, 2009. G. Papachristoudis and J. W. Fisher III. Theoretical guarantees on penalized information gathering. In Proc. IEEE Workshop on Statistical Signal Processing, August 2012. URL publications/papers/papachristoudis12sspworkshop.pdf. **. J. L. Williams, J. W. Fisher III, and A. S. Willsky. Performance guarantees for information theoretic active inference. In M. Meila and X. Shen, editors, Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pages 616 623, March 2007. URL publications/papers/wilfis07aistats.pdf. **Outgrowth of Supervised Student Research J. Fisher (VITALITE Annual Review 2013) 9 Sep 13 14 / 14