Introduc)on to Ar)ficial Intelligence

Similar documents
Probability and Structure in Natural Language Processing

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Introduc)on to Ar)ficial Intelligence

Bayesian networks Lecture 18. David Sontag New York University

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring 2016

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Probabilistic Graphical Models

Sampling Algorithms for Probabilistic Graphical models

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

Machine Learning for Data Science (CS4786) Lecture 24

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Lecture 8: PGM Inference

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Bayesian networks: approximate inference

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

DART Tutorial Part IV: Other Updates for an Observed Variable

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Latent Dirichlet Alloca/on

Inference in Bayesian Networks

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Lecture 6: Graphical Models

Need for Sampling in Machine Learning. Sargur Srihari

More belief propaga+on (sum- product)

Sta$s$cal sequence recogni$on

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Bayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University

Parameter Es*ma*on: Cracking Incomplete Data

Directed Graphical Models or Bayesian Networks

Exact Inference Algorithms Bucket-elimination

COMPSCI 276 Fall 2007

Stat 521A Lecture 18 1

Probability and Structure in Natural Language Processing

Basic Sampling Methods

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

DART Tutorial Sec'on 1: Filtering For a One Variable System

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Graphical Models and Kernel Methods

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Machine Learning Lecture 14

Part 2. Representation Learning Algorithms

CS 188: Artificial Intelligence. Bayes Nets

Machine Learning 4771

Mixture Models. Michael Kuhn

Artificial Intelligence

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Machine learning for Dynamic Social Network Analysis

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Recitation 9: Loopy BP

Variable Elimination: Algorithm

Stochastic Simulation

Introduc)on to Ar)ficial Intelligence

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:

Bellman s Curse of Dimensionality

Intelligent Systems (AI-2)

Probabilistic Graphical Models

Introduction to Bayesian Learning

Probabilistic Graphical Models

CSE 473: Ar+ficial Intelligence

Bayes Networks 6.872/HST.950

Introduction to Particle Filters for Data Assimilation

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Inference as Optimization

Lecture 8: Bayesian Networks

Bayesian Learning in Undirected Graphical Models

Variable Elimination: Algorithm

Variable Elimination (VE) Barak Sternberg

Exact Inference Algorithms Bucket-elimination

Bayes Nets III: Inference

Structure Learning: the good, the bad, the ugly

Be#er Generaliza,on with Forecasts. Tom Schaul Mark Ring

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Metropolis Sampler and Markov Chains

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Model-Based Dynamic Sampling (MBDS) in 2D and 3D. Dilshan Godaliyadda Gregery Buzzard Charles Bouman Purdue University

Modeling radiocarbon in the Earth System. Radiocarbon summer school 2012

Probabilistic Graphical Networks: Definitions and Basic Results

4 : Exact Inference: Variable Elimination

Bayesian Network Inference Using Marginal Trees

Probabilistic Graphical Models

Learning Deep Genera,ve Models

Bayesian Networks: Belief Propagation in Singly Connected Networks

Variational Inference (11/04/13)

Bayesian Learning in Undirected Graphical Models

7. Quantum Monte Carlo (QMC)

13 : Variational Inference: Loopy Belief Propagation

STA 4273H: Sta-s-cal Machine Learning

Part 1: Expectation Propagation

6.867 Machine learning, lecture 23 (Jaakkola)

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Class Notes. Examining Repeated Measures Data on Individuals

Lecture 9: PGM Learning

Transcription:

Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause

Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows efficient exact inference (compu)ng marginals, etc.) HailFinder 56 vars ~ 3 states each JavaBayes applet ~10 26 terms > 10.000 years on Top supercomputers 2

Typical queries: Condi)onal distribu)on E B! Compute distribu)on of some variables given values for others A J M 3

Typical queries: Maximiza)on! MPE (Most probable explana)on): E A B Given values for some vars, compute most likely assignment to all remaining vars J M! MAP (Maximum a posteriori): Compute most likely assignment to some variables 4

Hardness of inference for general BNs! Compu)ng condi)onal distribu)ons:! Exact solu)on: #P- complete! NP- hard to obtain any nontrivial approxima)on! Maximiza)on:! MPE: NP- complete! MAP: NP PP - complete! Inference in general BNs is really hard 5

Inference! Can exploit structure (condi)onal independence) to efficiently perform exact inference in many prac)cal situa)ons! For BNs where exact inference is not possible, can use algorithms for approximate inference (later) 6

Variable elimina)on algorithm! Given BN and Query P(X E=e)! Choose an ordering of X 1,,X n! Set up ini)al factors: f i = P(X i Pa i )! For i =1:n, X i {X,E}! Collect all factors f that include X i! Generate new factor by marginalizing out X i! Add g to set of factors! Renormalize P(x,e) to get P(x e) 7

Reusing computa)on! OQen, want to compute condi)onal distribu)ons of many variables, for fixed observa)ons! E.g., probability of Pits at different loca)ons given observed Breezes! Repeatedly performing variable elimina)on is wasteful (many factors are recomputed)! Need right data- structure to avoid recomputa)on Message passing on factor graphs 8

Factor graphs! P(C,D,G,I,S,L) = P(C) P(I) P(D C) P(G D,I) P(S I,G) P(L S) C D G I S f 1 f 2 f 3 f 4 CD DIG IGS SL L C D G I S L 9

Factor graph! A factor graph for a Bayesian network is a bipar)te graph consis)ng of! Variables and! Factors! Each factor is associated with a subset of variables, and all CPDs of the Bayesian network have to be assigned to one of the factor nodes C D I f 1 f 2 f 3 f 4 CD DIG IGS SL G S L C D G I S L 10

Sum- product message passing on factor graphs! Messages from node v to factor u! Messages from factor u to node v f 1 f 2 f 3 f 4 CD DIG IGS SL C D G I S L 11

Example messages P(A)P(B A) P(C B) f 1 f 2 AB BC A B C 12

Belief propaga)on on polytrees! Belief propaga)on (aka sum- product) is exact for polytree Bayesian networks! Factor graph of polytree is a tree! Choose one node as root! Send messages from leaves to root, and from root to leaves! AQer convergence:! Thus: immediately have correct values for all marginals! 13

What if we have loops?! Can s)ll apply belief propaga)on even if we have loops! Just run it, close your eyes and hope for the best!! Use approxima)on:! In general, will not converge! Even if it converges, may converge to incorrect marginals! However, in prac)ce oqen s)ll useful!! E.g., turbo- codes, etc. C D I! Loopy belief propaga)on G S L 14

Behavior of Loopy BP X 1 1 True posterior.5 P(X 1 = 1) BP es)mate X 2 X 3 X 4 0 Itera)on #! Loopy BP mul)plies same factors mul)ple )mes BP oqen overconfident 15

Does Loopy BP always converge?! No! Can oscillate!! Typically, oscilla)on the more severe the more determinis)c the poten)als Graphs from K. Murphy UAI 99 16

What about MPE queries?! E.g.,: What s the most likely assignment to the unobserved variables, given the observed ones? E B A J M! Use max- product (same as sum- product/bp, but with max instead of sums!) 17

Max- product message passing on factor graphs! Messages from nodes to factors! Messages from factors to nodes f 1 f 2 f 3 f 4 CD DIG IGS SL C D G I S L 18

Sampling based inference! So far: determinis)c inference techniques! Variable elimina)on! (Loopy) belief propaga)on! Will now introduce stochas)c approxima)ons! Algorithms that randomize to compute expecta)ons! In contrast to the determinis)c methods, guaranteed to converge to right answer (if wait looong enough..)! More exact, but slower than determinis)c variants 19

Compu)ng expecta)ons! OQen, we re not necessarily interested in compu)ng marginal distribu)ons, but certain expecta)ons:! Moments (mean, variance, )! Event probabili)es 20

Sample approxima)ons of expecta)ons! x 1,,x N samples from RV X! Law of large numbers:! Hereby, the convergence is with probability 1 (almost sure convergence)! Finite samples: 21

How many samples do we need?! Hoeffding inequality Suppose f is bounded in [0,C]. Then! Thus, probability of error decreases exponen)ally in N!! Need to be able to draw samples from P 22

Sampling from a Bernoulli distribu)on! Most random number generators produce (approximately) uniformly distributed random numbers! How can we draw samples from X ~ Bernoulli(p)? 23

Sampling from a Mul)nomial! X ~ Mult([µ 1,,µ k ]) where µ i = P(X=i); i µ i = 1 µ 1 µ 2 µ 3 µ k 0 1! Func)on g: [0,1] {1,,k} assigns state g(x) to each x! Draw sample from uniform distribu)on on [0,1]! Return g - 1 (x) 24

Forward sampling from a BN 25

Monte Carlo sampling from a BN! Sort variables in topological ordering X 1,,X n! For i = 1 to n do! Sample x i ~ P(X i X 1 =x 1,, X i- 1 =x i- 1 )! Works even with loopy models! C D I G S L H J 26

Compu)ng probabili)es through sampling! Want to es)mate probabili)es! Draw N samples from BN C D I! Marginals G L S! Condi)onals H J 27

Rejec)on sampling! Collect samples over all variables! Throw away samples that disagree with x B! Can be problema)c if P(X B = x B ) is rare event 28

Sample complexity for probability es)mates! Absolute error:! Rela)ve error: 29

Sampling from rare events! Es)ma)ng condi)onal probabili)es P(X A X B =x B ) using rejec)on sampling is hard!! The more observa)ons, the unlikelier P(X B = x B ) becomes! Want to directly sample from posterior distribu)on! 30

Gibbs sampling! Start with ini)al assignment x (0) to all variables! For t = 1 to do! Set x (t) = x (t- 1)! For each variable X i! Set v i = values of all x (t) except x i! Sample x (t) i from P(X i v i )! For large enough t, sampling distribu)on will be close to true posterior distribu)on!! Key challenge: Compu)ng condi)onal distribu)ons P(X i v i ) 31