Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Similar documents
CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Bayes Nets III: Inference

Announcements. CS 188: Artificial Intelligence Fall Example Bayes Net. Bayes Nets. Example: Traffic. Bayes Net Semantics

CS 188: Artificial Intelligence. Bayes Nets

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Product rule. Chain rule

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2009

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

CSEP 573: Artificial Intelligence

Bayes Nets: Independence

Probabilistic Models

CSE 473: Artificial Intelligence Autumn 2011

CS 5522: Artificial Intelligence II

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Artificial Intelligence Bayes Nets: Independence

CS 343: Artificial Intelligence

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

last two digits of your SID

Bayesian networks (1) Lirong Xia

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CS 188: Artificial Intelligence. Our Status in CS188

Our Status. We re done with Part I Search and Planning!

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

CSE 473: Artificial Intelligence

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Bayes Nets: Sampling

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2009

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter Chapter

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Reinforcement Learning Wrap-up

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Quantifying uncertainty & Bayesian networks

PROBABILISTIC REASONING SYSTEMS

Our Status in CSE 5522

Approximate Inference

Probabilistic Reasoning. (Mostly using Bayesian Networks)

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

CS 188: Artificial Intelligence

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 343: Artificial Intelligence

Bayesian belief networks. Inference.

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks. Motivation

CSEP 573: Artificial Intelligence

Reasoning Under Uncertainty: Bayesian networks intro

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed Graphical Models

Probabilistic Reasoning Systems

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

PROBABILITY. Inference: constraint propagation

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Introduction to Artificial Intelligence Belief networks

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2009

Uncertainty and Bayesian Networks

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

CSE 473: Artificial Intelligence Autumn Topics

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

CS 188: Artificial Intelligence Spring Today

Artificial Intelligence Bayes Nets

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

CSE 473: Artificial Intelligence

Introduction to Artificial Intelligence. Unit # 11

14 PROBABILISTIC REASONING

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Learning Bayesian Networks (part 1) Goals for the lecture

PROBABILITY AND INFERENCE

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

Informatics 2D Reasoning and Agents Semester 2,

Inference in Bayesian networks

Transcription:

CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission drop-off] ecture 16: ayes Nets IV Inference 3/28/2011 Pieter Abbeel UC erkeley Many slides over this course adapted from Dan Klein, Stuart ussell, Andrew Moore 2 ayes Net Semantics Probabilities in Ns A set of nodes, one per variable A directed, acyclic graph A conditional distribution for each node A collection of distributions over, one for each combination of parents values A 1 A n For all joint distributions, we have (chain rule): ayes nets implicitly encode joint distributions As a product of local conditional distributions o see what probability a N gives to a full assignment, multiply all the relevant conditionals together: CP: conditional probability table Description of a noisy causal process A ayes net = opology (graph) + ocal Conditional Probabilities 3 his lets us reconstruct any entry of the full joint Not every N can represent every joint distribution he topology enforces certain conditional independencies 4 All Conditional Independences Possible to have same full list of conditional independence assumptions for different N graphs? Given a ayes net structure, can run d- separation to build a complete list of conditional independences that are necessarily true of the form i j { k1,..., kn } es! Examples: his list determines the set of probability distributions that can be represented 5 6 1

opology imits Distributions Given some graph topology G, only certain joint distributions can be encoded he graph structure guarantees certain (conditional) independences (here might be more independence) Adding arcs increases the set of distributions, but has several costs Full conditioning can encode any distribution {,,,,, } { } {} 8 Causality? When ayes nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts Ns need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables raffic and Drips End up with arrows that reflect correlation, not causation What do the arrows really mean? opology may happen to encode causal structure opology only guaranteed to encode conditional independence [[If you wanted to learn more about causality, beyond the 9 scope of 188: Causility Judea Pearl]]** Example: raffic asic traffic net et s multiply out the joint Example: everse raffic everse causality? r 1/4 r 3/4 r t 3/4 t 1/4 r t 3/16 r t 1/16 r t 6/16 r t 6/16 t 9/16 t 7/16 t r 1/3 r 2/3 r t 3/16 r t 1/16 r t 6/16 r t 6/16 r t 1/2 t 1/2 10 t r 1/7 r 6/7 11 Example: Coins Extra arcs don t prevent representing independence, just allow non-independence 1 2 1 2 Changing ayes Net Structure he same joint distribution can be encoded in many different ayes nets Causal structure tends to be the simplest h 0.5 t 0.5 h 0.5 t 0.5 Adding unneeded arcs isn t wrong, it s just inefficient h 0.5 t 0.5 h h 0.5 t h 0.5 h t 0.5 t t 0.5 12 Analysis question: given some edges, what other edges do you need to add? One answer: fully connect the graph etter answer: don t make any false conditional independence assumptions 13 2

An Algorithm for Adding Necessary Edges Example: Alternate Alarm Choose an ordering consistent with the partial ordering induced by existing edges, let s refer to the ordered variables as 1, 2,, n For i=1, 2,, n Find the minimal set parents( i ) such that P (x i x 1 x i 1 )=P (x i parents( i )) urglary John calls Alarm Earthquake Mary calls If we reverse the edges, we make different conditional independence assumptions John calls Alarm Mary calls Why does this ensure no spurious conditional independencies remain? 14 o capture the same joint distribution, we have to add more edges to the graph urglary Earthquake 15 ayes Nets epresentation Summary ayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from N graph structure epresentation Inference ayes Nets Status D-separation gives precise conditional independence guarantees from graph alone earning ayes Nets from Data A ayes net s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution 16 17 Inference Inference by Enumeration Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: A E Given unlimited time, inference in Ns is easy ecipe: State the marginal probabilities you need Figure out A the atomic probabilities you need Calculate and combine them Example: E Most likely explanation: J M A 18 J M 19 3

Example: Enumeration Inference by Enumeration? In this simple method, we only need the N to synthesize the joint entries 20 21 Variable Elimination Factor oo I Why is inference by enumeration so slow? ou join up the whole joint distribution before you sum out the hidden variables ou end up repeating a lot of work! Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration Joint distribution: P(,) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 W P cold sun 0.2 cold rain 0.3 We ll need some new notation to define VE 22 Number of capitals = dimensionality of the table 23 Factor oo II Factor oo III Family of conditionals: P( ) Multiple conditionals Entries P(x y) for all x, y Sums to Single conditional: P( x) Entries P(y x) for fixed x, all y Sums to 1 W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 W P cold sun 0.4 cold rain 0.6 Specified family: P(y ) Entries P(y x) for fixed y, but for all x Sums to who knows! W P hot rain 0.2 cold rain 0.6 In general, when we write P( 1 N 1 M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any assigned or is a dimension missing (selected) from the array 24 25 4

Example: raffic Domain Variable Elimination Outline andom Variables : aining : raffic : ate for class! rack objects called factors Initial factors are local CPs (one per node) Any known values are selected E.g. if we know, the initial factors are 26 VE: Alternately join factors and eliminate variables 27 Operation 1: Join Factors Example: Multiple Joins First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable uild a new factor over the union of the variables involved Example: Join on Computation for each entry: pointwise products, 28 Join, 30 Example: Multiple Joins Operation 2: Eliminate, Join,, +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 31 Second basic operation: marginalization ake a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: +t 0.17 - t 0.83 32 5

Multiple Elimination P() : Marginalizing Early!,,, Join Sum out +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 Sum out +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 Sum out +l 0.134 - l 0.886 33, +t 0.17 - t 0.83 34 Marginalizing Early (aka VE*) +t 0.17 - t 0.83 Join, Sum out +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 +l 0.134 - l 0.886 * VE is variable elimination If evidence, start with factors that select that evidence No evidence uses these initial factors: Computing Evidence, the initial factors become: We eliminate all vars other than query + evidence 36 esult will be a selected joint of query and evidence E.g. for P( +r), we d end up with: o get our answer, just normalize this! hat s it! +r +l 0.026 +r - l 0.074 Evidence II Normalize +l 0.26 - l 0.74 General Variable Elimination Query: Start with initial factors: ocal CPs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize 37 38 6

Example Example Choose E Choose A Finish with Normalize 39 40 Example 2: P( a) Variable Elimination Start / Select Join on Normalize P A P +b 0.1 b 0.9 a +b +a 0.8 b a 0.2 b +a 0.1 b a 0.9 a, A P +a +b 0.08 +a b 0.09 A P +a +b 8/17 +a b 9/17 41 What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow etter than enumeration: saves time by marginalizing variables as soon as possible rather than at the end We will see special cases of VE later On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs ou ll have to implement a tree-structured special case to track invisible ghosts (Project 4) 7