CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Similar documents
CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Fall 2009

Probabilistic Models

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Models. Models describe how (a portion of) the world works

CS 188: Artificial Intelligence Fall 2008

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Ar+ficial Intelligence

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Our Status. We re done with Part I Search and Planning!

CSE 473: Artificial Intelligence

Bayesian networks (1) Lirong Xia

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence. Our Status in CS188

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayes Nets: Independence

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Fall 2009

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Artificial Intelligence Bayes Nets: Independence

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CS 343: Artificial Intelligence

Markov Models and Hidden Markov Models

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Product rule. Chain rule

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

CS 343: Artificial Intelligence

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Bayes Nets III: Inference

CS 188: Artificial Intelligence

CS 343: Artificial Intelligence

CSE 473: Artificial Intelligence

CSEP 573: Artificial Intelligence

Approximate Inference

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Review: Bayesian learning and inference

CS 188: Artificial Intelligence

CS 5522: Artificial Intelligence II

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Fall 2011

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2009

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Quantifying uncertainty & Bayesian networks

Bayes Nets: Sampling

CS 188: Artificial Intelligence. Bayes Nets

Our Status in CSE 5522

Bayesian Networks BY: MOHAMAD ALSABBAGH

Reinforcement Learning Wrap-up

Introduc)on to Ar)ficial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Markov Chains and Hidden Markov Models

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Announcements. CS 188: Artificial Intelligence Fall Example Bayes Net. Bayes Nets. Example: Traffic. Bayes Net Semantics

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

CSE 473: Artificial Intelligence Spring 2014

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 188: Artificial Intelligence Fall 2011

Bayesian networks. Chapter 14, Sections 1 4

Graphical Models - Part I

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CSEP 573: Artificial Intelligence

CS 188: Artificial Intelligence Fall Recap: Inference Example

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Directed Graphical Models

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

PROBABILISTIC REASONING SYSTEMS

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Informatics 2D Reasoning and Agents Semester 2,

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks. Motivation

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

Introduction to Artificial Intelligence (AI)

Bayesian networks. Chapter Chapter

Reasoning under Uncertainty: Intro to Probability

Introduction to Artificial Intelligence. Unit # 11

Reasoning Under Uncertainty: Bayesian networks intro

Transcription:

CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.] Hidden Markov Models Two random variable at each +me step Hidden state, X i Observa+on, E i X 1 X 2 X 3 X 4 X N 5 Condi+onal Independences Dynamics don t change E.g., P(X 2 X 1 ) = P(X 18 X 17 ) E 1 E 2 E 3 E 4 E 5 E N 1

Example An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions: HMM Computa+ons Given parameters evidence E 1:n =e 1:n Inference problems include: Filtering, find P(X t e 1:t ) for all t Smoothing, find P(X t e 1:n ) for all t Most probable explanation, find x* 1:n = argmaxx 1:n P(x 1:n e 1:n ) 2

Base Case Inference (In Forward Algorithm) Observation Passage of Time X 1 X 1 X2 E 1 Par+cles Filtering: Representa+on Represent P(X) with a list of N par+cles (samples), Generally, N << X E.g. P(ghost@(3.3)) = 5/10 = 0.5 P(x) Distribution Par+cles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) 3

Par+cle Filtering: Summary Par+cles: track samples of states rather than an explicit distribu+on Elapse Weight Resample Par+cles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Par+cles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Par+cles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 (New) Par+cles: (3,2) (2,2) (3,2) (2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2) [Demos: ghostbusters par+cle filtering (L15D3,4,5)] Which Algorithm? Particle filter, uniform initial beliefs, 25 particles 4

Which Algorithm? Particle filter, uniform initial beliefs, 300 particles Exact filter, uniform initial beliefs Which Algorithm? 5

Complexity of the Forward Algorithm? We are given evidence at each +me and want to know We use the single (+me- passage+observa+on) updates: If only need P(x e) at the end, only normalize there Complexity? O( X 2 ) +me & O(X) space But X is exponen&al in the number of state variables L Why Does X Grow? 1 Ghost: k (eg 9) possible posi+ons in maze 2 Ghosts: k 2 combina+ons N Ghosts: k N combina+ons 13 6

Joint Distribu+on for Snapshot of World It gets big P T F Q T F T F R T F T F T F T F 0.1 0.05 0.2 0.07 0.03 0.05 0.1 0.3 http://img4.wikia.nocookie.net/ cb20090430175407/monster/images/9/92/basilisk.jpg 15 The Sword of Condi+onal Independence! Slay the Basilisk! I am a BIG joint distribution! harrypotter.wikia.com/ Means: Or, equivalently: 16 7

HMM Condi+onal Independence HMMs have two important independence proper+es: Markov hidden process, future depends on past via the present?? X 1 X 2 X 3 X 4 E 1 E 1 E 3 E 4 HMM Condi+onal Independence HMMs have two important independence proper+es: Markov hidden process, future depends on past via the present Current observa+on independent of all else given current state? X 1 X 2 X 3 X 4 E 1 E 1 E 3 E 4? 8

Condi+onal Independence in Snapshot Can we do something here? Factor X into product of (condi+onally) independent random vars? X 3 Maybe also factor E E 3 Yes! with Bayes Nets X 3 9

Dynamic Bayes Nets Dynamic Bayes Nets (DBNs) We want to track mul+ple variables over +me, using mul+ple sources of evidence Idea: Repeat a fixed Bayes net structure at each +me Variables from +me t can condi+on on those from t- 1 t =1 t =2 t =3 G 1 a G 2 a G 3 a G 1 b G 2 b G 3 b E 1 a E 1 b E 2 a E 2 b E 3 a E 3 b Dynamic Bayes nets are a generaliza+on of HMMs [Demo: pacman sonar ghost DBN model (L15D6)] 10

DBN Par+cle Filters A par+cle is a complete sample for a +me step Ini/alize: Generate prior samples for the t=1 Bayes net Example par+cle: G 1 a = (3,3) G 1 b = (5,3) Elapse /me: Sample a successor for each par+cle Example successor: G 2 a = (2,3) G 2 b = (6,3) Observe: Weight each en)re sample by the likelihood of the evidence condi+oned on the sample Likelihood: P(E 1 a G 1 a ) * P(E 1 b G 1 b ) Resample: Select prior samples (tuples of values) in propor+on to their likelihood Probabilis+c Models Models describe how (a por+on of) the world works Models are always simplifica+ons May not account for every variable May not account for all interac+ons between variables All models are wrong; but some are useful. George E. P. Box What do we do with probabilis+c models? We (or our agents) need to reason about unknown variables, given evidence Example: explana+on (diagnos+c reasoning) Example: predic+on (causal reasoning) Example: value of informa+on 11

Independence Two variables are independent if: Independence This says that their joint distribu+on factors into a product two simpler distribu+ons Another form: We write: Independence is a simplifying modeling assump)on Empirical joint distribu+ons: at best close to independent What could we assume for {Weather, Traffic, Cavity, Toothache}? 12

Example: Independence? T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T P hot 0.5 cold 0.5 W P sun 0.6 rain 0.4 T W P hot sun 0.3 hot rain 0.2 cold sun 0.3 cold rain 0.2 N fair, independent coin flips: Example: Independence H 0.5 T 0.5 H 0.5 T 0.5 H 0.5 T 0.5 13

Condi+onal Independence P(Toothache, Cavity, Catch) If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: P(+catch +toothache, +cavity) = P(+catch +cavity) The same independence holds if I don t have a cavity: P(+catch +toothache, - cavity) = P(+catch - cavity) Catch is condi)onally independent of Toothache given Cavity: P(Catch Toothache, Cavity) = P(Catch Cavity) Equivalent statements: P(Toothache Catch, Cavity) = P(Toothache Cavity) P(Toothache, Catch Cavity) = P(Toothache Cavity) P(Catch Cavity) One can be derived from the other easily Condi+onal Independence Uncondi+onal (absolute) independence very rare (why?) Condi)onal independence is our most basic and robust form of knowledge about uncertain environments. X is condi+onally independent of Y given Z if and only if: or, equivalently, if and only if 14

Condi+onal Independence What about this domain: Traffic Umbrella Raining Condi+onal Independence and the Chain Rule Chain rule: Trivial decomposi+on: With assump+on of condi+onal independence: Bayes nets / graphical models help us express condi+onal independence assump+ons 15

Ghostbusters Chain Rule Each sensor depends only on where the ghost is That means, the two sensors are condi+onally independent, given the ghost posi+on T: Top square is red B: BoNom square is red G: Ghost is in the top Givens: P( +g ) = 0.5 P( +t +g ) = 0.8 P( +t - g ) = 0.4 P( +b +g ) = 0.4 P( +b - g ) = 0.8 P(T,B,G) = P(G) P(T G) P(B G) T B G P(T,B,G) +t +b +g 0.16 +t +b - g 0.16 +t - b +g 0.24 +t - b - g 0.04 - t +b +g 0.04 - t +b - g 0.24 - t - b +g 0.06 - t - b - g 0.06 Number of Parameters? Bayes Nets: Big Picture 16

Bayes Nets: Big Picture Two problems with using full joint distribu+on tables as our probabilis+c models: Unless there are only a few variables, the joint is WAY too big to represent explicitly Hard to learn (es+mate) anything empirically about more than a few variables at a +me Bayes nets: a technique for describing complex joint distribu+ons (models) using simple, local distribu+ons (condi+onal probabili+es) More properly called graphical models We describe how variables locally interact Local interac+ons chain together to give global, indirect interac+ons For about 10 min, we ll be vague about how these interac+ons are specified Example Bayes Net: Insurance 17

Example Bayes Net: Car Graphical Model Nota+on Nodes: variables (with domains) Can be assigned (observed) or unassigned (unobserved) Arcs: interac+ons Similar to CSP constraints Indicate direct influence between variables Formally: encode condi+onal independence (more later) For now: imagine that arrows mean direct causa+on (in general, they don t!) 18

N independent coin flips Example: Coin Flips X 1 X 2 X n No interac+ons between variables: absolute independence Example: Traffic Variables: R: It rains T: There is traffic Model 1: independence R Model 2: rain causes traffic R T T Why is an agent using model 2 bener? 19

Let s build a causal graphical model! Variables T: Traffic R: It rains L: Low pressure D: Roof drips B: Ballgame C: Cavity L Example: Traffic II C R B D T Example: Alarm Network Variables B: Burglary A: Alarm goes off M: Mary calls J: John calls E: Earthquake! B A E M J 20

Bayes Net Seman+cs Bayes Net Seman+cs A set of nodes, one per variable X P(A 1 ). P(A n ) A directed, acyclic graph A condi+onal distribu+on for each node A 1 A n A collec+on of distribu+ons over X, one for each combina+on of parents values X CPT: condi+onal probability table Descrip+on of a noisy causal process A Bayes net = Topology (graph) + Local Condi)onal Probabili)es 21

Probabili+es in BNs Bayes nets implicitly encode joint distribu+ons As a product of local condi+onal distribu+ons To see what probability a BN gives to a full assignment, mul+ply all the relevant condi+onals together: Example: Why are we guaranteed that se ng Probabili+es in BNs results in a proper joint distribu+on? Chain rule (valid for all distribu+ons): Assume condi+onal independences: à Consequence: Not every BN can represent every joint distribu+on The topology enforces certain condi+onal independencies 22

Example: Coin Flips X 1 X 2 X n h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5 Only distribu)ons whose variables are absolutely independent can be represented by a Bayes net with no arcs. Example: Traffic R +r 1/4 - r 3/4 T +r +t 3/4 - t 1/4 - r +t 1/2 - t 1/2 23

Example: Alarm Network B P(B) +b 0.001 Burglary Earthqk E P(E) +e 0.002 - b 0.999 - e 0.998 John calls A J P(J A) +a +j 0.9 +a - j 0.1 - a +j 0.05 - a - j 0.95 Alarm Mary calls A M P(M A) +a +m 0.7 +a - m 0.3 - a +m 0.01 - a - m 0.99 B E A P(A B,E) +b +e +a 0.95 +b +e - a 0.05 +b - e +a 0.94 +b - e - a 0.06 - b +e +a 0.29 - b +e - a 0.71 - b - e +a 0.001 - b - e - a 0.999 Example: Traffic Causal direc+on R T +r 1/4 - r 3/4 +r +t 3/4 - t 1/4 - r +t 1/2 - t 1/2 +r +t 3/16 +r - t 1/16 - r +t 6/16 - r - t 6/16 24

Example: Reverse Traffic Reverse causality? T R +t 9/16 - t 7/16 +t +r 1/3 - r 2/3 - t +r 1/7 - r 6/7 +r +t 3/16 +r - t 1/16 - r +t 6/16 - r - t 6/16 Causality? When Bayes nets reflect the true causal panerns: O en simpler (nodes have fewer parents) O en easier to think about O en easier to elicit from experts BNs need not actually be causal Some+mes no causal net exists over the domain (especially if variables are missing) E.g. consider the variables Traffic and Drips End up with arrows that reflect correla+on, not causa+on What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes condi+onal independence 25

So far: how a Bayes net encodes a joint distribu+on Bayes Nets Next: how to answer queries about that distribu+on Today: First assembled BNs using an intui+ve no+on of condi+onal independence as causality Then saw that key property is condi+onal independence Main goal: answer queries about condi+onal independence and influence A er that: how to answer numerical queries (inference) 26