Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission drop-off] ecture 16: ayes Nets IV Inference 3/28/2011 Pieter Abbeel UC erkeley Many slides over this course adapted from Dan Klein, Stuart ussell, Andrew Moore 2 ayes Net Semantics Probabilities in Ns A set of nodes, one per variable A directed, acyclic graph A conditional distribution for each node A collection of distributions over, one for each combination of parents values A 1 A n For all joint distributions, we have (chain rule): ayes nets implicitly encode joint distributions As a product of local conditional distributions o see what probability a N gives to a full assignment, multiply all the relevant conditionals together: CP: conditional probability table Description of a noisy causal process A ayes net = opology (graph) + ocal Conditional Probabilities 3 his lets us reconstruct any entry of the full joint Not every N can represent every joint distribution he topology enforces certain conditional independencies 4 All Conditional Independences Possible to have same full list of conditional independence assumptions for different N graphs? Given a ayes net structure, can run d- separation to build a complete list of conditional independences that are necessarily true of the form i j { k1,..., kn } es! Examples: his list determines the set of probability distributions that can be represented 5 6 1

opology imits Distributions Given some graph topology G, only certain joint distributions can be encoded he graph structure guarantees certain (conditional) independences (here might be more independence) Adding arcs increases the set of distributions, but has several costs Full conditioning can encode any distribution {,,,,, } { } {} 8 Causality? When ayes nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts Ns need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables raffic and Drips End up with arrows that reflect correlation, not causation What do the arrows really mean? opology may happen to encode causal structure opology only guaranteed to encode conditional independence [[If you wanted to learn more about causality, beyond the 9 scope of 188: Causility Judea Pearl]]** Example: raffic asic traffic net et s multiply out the joint Example: everse raffic everse causality? r 1/4 r 3/4 r t 3/4 t 1/4 r t 3/16 r t 1/16 r t 6/16 r t 6/16 t 9/16 t 7/16 t r 1/3 r 2/3 r t 3/16 r t 1/16 r t 6/16 r t 6/16 r t 1/2 t 1/2 10 t r 1/7 r 6/7 11 Example: Coins Extra arcs don t prevent representing independence, just allow non-independence 1 2 1 2 Changing ayes Net Structure he same joint distribution can be encoded in many different ayes nets Causal structure tends to be the simplest h 0.5 t 0.5 h 0.5 t 0.5 Adding unneeded arcs isn t wrong, it s just inefficient h 0.5 t 0.5 h h 0.5 t h 0.5 h t 0.5 t t 0.5 12 Analysis question: given some edges, what other edges do you need to add? One answer: fully connect the graph etter answer: don t make any false conditional independence assumptions 13 2

An Algorithm for Adding Necessary Edges Example: Alternate Alarm Choose an ordering consistent with the partial ordering induced by existing edges, let s refer to the ordered variables as 1, 2,, n For i=1, 2,, n Find the minimal set parents( i ) such that P (x i x 1 x i 1 )=P (x i parents( i )) urglary John calls Alarm Earthquake Mary calls If we reverse the edges, we make different conditional independence assumptions John calls Alarm Mary calls Why does this ensure no spurious conditional independencies remain? 14 o capture the same joint distribution, we have to add more edges to the graph urglary Earthquake 15 ayes Nets epresentation Summary ayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from N graph structure epresentation Inference ayes Nets Status D-separation gives precise conditional independence guarantees from graph alone earning ayes Nets from Data A ayes net s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution 16 17 Inference Inference by Enumeration Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: A E Given unlimited time, inference in Ns is easy ecipe: State the marginal probabilities you need Figure out A the atomic probabilities you need Calculate and combine them Example: E Most likely explanation: J M A 18 J M 19 3

Example: Enumeration Inference by Enumeration? In this simple method, we only need the N to synthesize the joint entries 20 21 Variable Elimination Factor oo I Why is inference by enumeration so slow? ou join up the whole joint distribution before you sum out the hidden variables ou end up repeating a lot of work! Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration Joint distribution: P(,) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 W P cold sun 0.2 cold rain 0.3 We ll need some new notation to define VE 22 Number of capitals = dimensionality of the table 23 Factor oo II Factor oo III Family of conditionals: P( ) Multiple conditionals Entries P(x y) for all x, y Sums to Single conditional: P( x) Entries P(y x) for fixed x, all y Sums to 1 W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 W P cold sun 0.4 cold rain 0.6 Specified family: P(y ) Entries P(y x) for fixed y, but for all x Sums to who knows! W P hot rain 0.2 cold rain 0.6 In general, when we write P( 1 N 1 M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any assigned or is a dimension missing (selected) from the array 24 25 4

Example: raffic Domain Variable Elimination Outline andom Variables : aining : raffic : ate for class! rack objects called factors Initial factors are local CPs (one per node) Any known values are selected E.g. if we know, the initial factors are 26 VE: Alternately join factors and eliminate variables 27 Operation 1: Join Factors Example: Multiple Joins First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable uild a new factor over the union of the variables involved Example: Join on Computation for each entry: pointwise products, 28 Join, 30 Example: Multiple Joins Operation 2: Eliminate, Join,, +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 31 Second basic operation: marginalization ake a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: +t 0.17 - t 0.83 32 5

Multiple Elimination P() : Marginalizing Early!,,, Join Sum out +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 Sum out +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 Sum out +l 0.134 - l 0.886 33, +t 0.17 - t 0.83 34 Marginalizing Early (aka VE*) +t 0.17 - t 0.83 Join, Sum out +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 +l 0.134 - l 0.886 * VE is variable elimination If evidence, start with factors that select that evidence No evidence uses these initial factors: Computing Evidence, the initial factors become: We eliminate all vars other than query + evidence 36 esult will be a selected joint of query and evidence E.g. for P( +r), we d end up with: o get our answer, just normalize this! hat s it! +r +l 0.026 +r - l 0.074 Evidence II Normalize +l 0.26 - l 0.74 General Variable Elimination Query: Start with initial factors: ocal CPs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize 37 38 6

Example Example Choose E Choose A Finish with Normalize 39 40 Example 2: P( a) Variable Elimination Start / Select Join on Normalize P A P +b 0.1 b 0.9 a +b +a 0.8 b a 0.2 b +a 0.1 b a 0.9 a, A P +a +b 0.08 +a b 0.09 A P +a +b 8/17 +a b 9/17 41 What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow etter than enumeration: saves time by marginalizing variables as soon as possible rather than at the end We will see special cases of VE later On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs ou ll have to implement a tree-structured special case to track invisible ghosts (Project 4) 7