PROBABILITY. Inference: constraint propagation

Similar documents
PROBABILITY AND INFERENCE

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Our Status in CSE 5522

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Our Status. We re done with Part I Search and Planning!

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence Fall 2011

CSE 473: Artificial Intelligence

CS 343: Artificial Intelligence

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

Reinforcement Learning Wrap-up

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Spring Announcements

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Uncertainty. Chapter 13

Artificial Intelligence Uncertainty

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Uncertainty. Chapter 13

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Artificial Intelligence

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Uncertainty and Bayesian Networks

Probabilistic Reasoning

Basic Probability and Decisions

Uncertainty. Outline

Quantifying uncertainty & Bayesian networks

Review of Probability Mark Craven and David Page Computer Sciences 760.

Uncertainty. Chapter 13, Sections 1 6

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Uncertain Knowledge and Reasoning

Probabilistic Robotics

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Pengju XJTU 2016

CS 188: Artificial Intelligence Spring Announcements

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence. Bayes Nets

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probability and Decision Theory

Bayes Nets III: Inference

Pengju

Web-Mining Agents Data Mining

Chapter 13 Quantifying Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Chapter 13. Chapter 13 1

CSEP 573: Artificial Intelligence

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Introduction to Machine Learning

Introduction to Machine Learning

Probabilistic representation and reasoning

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

Artificial Intelligence CS 6364

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Basic Probability and Statistics

Reasoning Under Uncertainty

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Reasoning Under Uncertainty

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Probabilistic Models

Discrete Probability and State Estimation

Reasoning Under Uncertainty: Conditional Probability

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Probabilistic representation and reasoning

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Bayesian Networks BY: MOHAMAD ALSABBAGH

Review of probability

Approximate Inference

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Reasoning under Uncertainty: Intro to Probability

Artificial Intelligence Programming Probability

Uncertain Knowledge and Bayes Rule. George Konidaris

CS 188: Artificial Intelligence Fall 2009

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

CS 561: Artificial Intelligence

Modeling and reasoning with uncertainty

Artificial Intelligence

Reasoning under Uncertainty: Intro to Probability

HMM part 1. Dr Philip Jackson

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

Bayesian networks (1) Lirong Xia

COMP5211 Lecture Note on Reasoning under Uncertainty

Brief Review of Probability

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

CS 343: Artificial Intelligence

Uncertainty. Russell & Norvig Chapter 13.

Transcription:

PROBABILITY Inference: constraint propagation! Use the constraints to reduce the number of legal values for a variable! Possible to find a solution without searching! Node consistency " A node is node-consistent if all values in its domain satisfy the unary constraints! Arc consistency " A node i is arc-consistent w.r.t. node j if for every value in D i there exists a value in D j that satisfies the binary constraint " Algorithm AC-! Other types of consistency (path consistency, k-consistency, global constraints) '

AC- algorithm for Arc consistency function AC-(csp) returns false if inconsistency found, true otherwise queue all arcs in csp while queue not empty ( i, j ) REMOVE-FIRST(queue) if REMOVE-INCONSISTENT-VALUES( i, j ) if size Di == 0 return false O(cd) for each arc ( k, i ) add ( k, i ) to queue return true function REMOVE-INCONSISTENT-VALUES( i, j ) revised false c constraints (arcs) d domain size Total O(cd ) for each x in D i if @ y in D j s.t. (x,y) satisfies constraints delete x from D i revised true return revised O(d ) Backtracking Search function CSP-BACKTRACKING(assignment) returns a solution or failure if assignment complete return assignment select unassigned variable D select an ordering for the domain of for each value in D if value is consistent with assignment add ( = value) to assignment (ADD INFERENCE HERE) result CSP-BACKTRACKING(assignment) if result failure return result remove ( = value) from assignment return failure '

-Queens Problem -Queens Problem {,,,} {,,,} {,,,} {,,,} '

-Queens Problem {,,,} {,,,} {,,,} {,,,} -Queens Problem {,,,} {,,,} {,,,} {,,,} '

-Queens Problem {,,,} {,,,} {,,,} {,,,} -Queens Problem 5'

-Queens Problem {,,,} {,,,} {,,,} {,,,} -Queens Problem {,,,} {,,,} {,,,} {,,,} 6'

-Queens Problem {,,,} {,,,} {,,,} {,,,} -Queens Problem {,,,} {,,,} {,,,} {,,,} 7'

-Queens Problem {,,,} {,,,} {,,,} {,,,} Progress Report! We ve finished Part I: Problem Solving!! Part II: Reasoning with uncertainty! Probability! Bayesian networks! Reasoning over time (hidden Markov models)! Applications: tracking objects, speech recognition, medical diagnosis! Part III: Learning 8'

Today! Reading! We re skipping to AIMA Chapter!! Goals! Random variables! Joint, marginal, conditional distributions! Product rule, chain rule, Bayes rule! Inference! Independence We re going to be using these concepts a lot so it s worth learning it well the first time! Handling Uncertainty! The world is an uncertain place! Partially observable, non-deterministic! On the way to the bank, you get in a car crash!! Medical diagnosis! Driving to LA (if you have to)! Sensors! Probability theory gives us a language to reason about an uncertain world.! Probability theory is beautiful! 9'

Random variables! A random variable (rv) is a variable (that captures some quantity of interest) whose value is random! = the next word uttered by my professor (this is of great interest and importance)! Y = the number of people that enter this building on a given day! D = the time it will take to drive to LA! W = today s weather! Like variables in a CSP, random variables have domains! in {the, a, of, is, in, if, when, up, on,, sky, doughnut, }! Y in [0,,,,, 5, 6,, )! D in [0, )! W in {sun, rain, cloudy, snowy}! A discrete rv has a countable domain! A continuous rv has an uncountable domain Discrete Probability distribution Each value (outcome) in the domain is associated with a realvalued number called a probability that reflects the chances of the random variable taking on that value w P(W = w) sunny 0.6 rain 0. cloudy 0.9 snow 0.0 probability distributions Constraints for a valid probability distribution: 0 apple p(!) apple such that! x P( = x) the.005 a.00 of.000 snickerdoodle 0-9 p(!) = 0'

Discrete Probability distribution Constraints for a valid probability distribution: 0 apple p(!) apple such that! p(!) = rainy 0. The total probability mass, which is, is divided among the possible outcomes sunny 0.6 cloudy 0.9 snowy 0.0 Joint probability distribution! A joint distribution over a set of r.v.s {,,, n } assigns probabilities to each possible assignment: p( = x, = x,..., n = x n ) P(W = w, T = t) w t P sunny hot 0. p(x,x,...,x n ) rain hot 0. sunny cold 0. rain cold 0. '

Joint probability distribution! A joint distribution over a set of r.v.s {,,, n } assigns probabilities to each possible assignment: p( = x, = x,..., n = x n ) P(W = w, T = t) w t P sunny hot 0. p(x,x,...,x n ) rain hot 0. sunny cold 0.! Still subject to constraints: 0 apple p(x,x,...,x n ) apple and p(x,...,x n )= (x,...,x n ) rain cold 0. Joint probability distribution! A joint distribution over a set of r.v.s {,,, n } assigns probabilities to each possible assignment: p( = x, = x,..., n = x n ) p(x,x,...,x n )! Still subject to constraints: 0 apple p(x,x,...,x n ) apple and P(W = w, T = t) w t P sunny hot 0. rain hot 0. sunny cold 0. rain cold 0. If we have n variables with domain size d, what is the size of the probability distribution (the number of rows in the table)? (x,...,x n ) p(x,...,x n )= '

Events! An event is a set E of outcomes! sunny AND hot = {(sunny, hot)}! sunny = {(sunny, hot), (sunny, cold)}! sunny OR hot = {(sunny, hot), (rainy, hot), (sunny, cold)} Events! An event is a set E of outcomes! sunny AND hot = {(sunny, hot)}! sunny = {(sunny, hot), (sunny, cold)}! sunny OR hot = {(sunny, hot), (rainy, hot), (sunny, cold)}! The joint distribution can be used to calculate the probability of an event p(e) = p(x,...,x n ) (x,...,x n )E w t P sunny hot 0. rain hot 0. The probability of an event is the sum of the probability of the outcomes in the set sunny cold 0. rain cold 0. '

Marginal Distributions! Sometimes we have the joint distribution but we re only interested in the distribution of a subset of the variables! Called the marginal distribution! We marginalize out the other variables by summing over them! Corresponds to a sub-table created by summing over rows P(W = w, T = t) w t P sunny hot 0. rain hot 0. sunny cold 0. rain cold 0. p( = x) = y P ( = x, Y = y) Oftentimes, the events we re interested in are marginal distributions P(W = w) w P sunny rain P(T = t) t P hot cold Marginal Distributions! Sometimes we have the joint distribution but we re only interested in the distribution of a subset of the variables! Called the marginal distribution! We marginalize out the other variables by summing over them! Corresponds to a sub-table created by summing over rows P(W = w, T = t) w t P sunny hot 0. rain hot 0. sunny cold 0. rain cold 0. p( = x) = y P ( = x, Y = y) Oftentimes, the events we re interested in are marginal distributions P(W = w) w P sunny 0.6 rain 0. P(T = t) t P hot 0.5 cold 0.5 '

Conditional (posterior) distribution! Often, we observe some information (evidence) and we want to know the probability of an event conditioned on this evidence p(w T = cold) evidence In all the worlds where T=cold, what is the probability that W = sunny? That W = rainy?! This is called the conditional distribution (or the posterior distribution), e.g. the distribution of W conditioned on the evidence T = cold Conditional (posterior) distribution! The conditional distribution is given by the equation p( = x Y = y) = p( = x, Y = y) p(y = y) hot cold rainy 0. p(w T=cold)? sunny 0. 0. 0. rainy sunny Whereas before the total probability mass was, the total probability mass is now p(t=cold). We compute everything in relation to this value. 5'

Conditional (posterior) distribution! The conditional distribution is given by the equation p( = x Y = y) = p( = x, Y = y) p(y = y) rainy sunny hot 0. 0. cold 0. 0. rainy sunny p(w T=cold)? Conditional (posterior) distribution! The conditional distribution is given by the equation p( = x Y = y) = p( = x, Y = y) p(y = y) rainy sunny hot 0. 0. cold 0. 0. rainy sunny p(w T=hot)? 6'

Conditional (posterior) distribution! The conditional distribution is given by the equation p( = x Y = y) = p( = x, Y = y) p(y = y) rainy hot 0. cold p(t W=rainy)? 0. rainy sunny 0. 0. sunny Conditional (posterior) distribution! The conditional distribution is given by the equation p( = x Y = y) = p( = x, Y = y) p(y = y) hot cold rainy 0. p(t W=sunny)? 0. rainy sunny 0. 0. sunny 7'

Normalization constants p(w = s T = c) = p(w = s, T = c) p(t = c) = 0. 0.5 =0. p(w = r T = c) = p(w = r, T = c) p(t = c) = 0. 0.5 =0.6! Note that p(t=c) is constant no matter the value of W! We call p(t=c) a normalization constant because:. It is constant with respect to the distribution of interest p(w T=c). It ensures that the distribution sums to (i.e. it restores the distribution p(w T=c) back to the normal condition of summing to ) Normalization constants! Even without the prob. of evidence, i.e. p(t=c), we could calculate the conditional distribution using the joint distrib. alone! Step : Compute Z = sum of p(w, T=c) for all values of W! Step : Divide each joint probability by Z! (All we ve done is computed the prob. of evidence, i.e. p(t=c), from the joint distribution by marginalizing over W) p(, Y ) / p( Y ) < 0. 0.+0., 0. 0.+0. >=< 0., 0.6 > 8'

Normalization constant! Normalize the following distribution: P(W = w, T = t) w t P sunny hot 0 rain hot 5 sunny cold 0 rain cold 5 Step One: Z = 0 + 5 + 0 + 5 = 50 Step Two: p(w, T )=< 0 50, 5 50, 0 50, 5 >=<.0,.0,.0,.0 > 50 Summary of distributions so far p(x) = y p(x, y) Marginal Conditional p(x y) = p(x, y) p(y) Joint p(x, y) 9'

Probabilistic Inference! Probabilistic inference refers to the task of computing some desired probability given other known probabilities (evidence)! Typically compute the conditional (posterior) probability of an event! p(on time no accidents ) = 0.80! Probabilities change with new evidence! p(on time no accidents, 5 a.m.) = 0.95! p(on time no accidents, 5 a.m., raining) = 0.8 Inference by Enumeration! Have a set of random variables {,,, n }! Partition the set of random variables into:! Evidence variables: E =e, E =e, E k =e k! Query variables: Q! Hidden (misc.) variables: H, H,,H r We re interested in computing: p(q E =e, E =e,, E k =e k ) Step One: select the entries in the table consistent with the evidence (this becomes our world) Step Two: sum over the H variables to get the joint distribution of the query and evidence variables Step Three: Normalize p(q, e,...,e k )= p(q, e,...,e k,h,...,h r ) (h,...,h r) Z = q p(q = q, e,...,e k ) p(q e,...,e k )= Z p(q, e,...,e k ) 0'

Inference by Enumeration Step One: select the entries in the table consistent with the evidence (this becomes our world) Step Two: sum over the H variables to get the joint distribution of the query and evidence variables Step Three: Normalize p(w S = winter)? S W T P summer sunny hot 0.0 summer rain hot 0.05 summer sunny cold 0.0 summer rain cold 0.05 winter sunny hot 0.0 winter rain hot 0.05 winter sunny cold 0.5 winter rain cold 0.0 Step One S W T P summer sunny hot 0.0 summer rain hot 0.05 summer sunny cold 0.0 summer rain cold 0.05 winter sunny hot 0.0 winter rain hot 0.05 winter sunny cold 0.5 winter rain cold 0.0 Inference by Enumeration Step One: select the entries in the table consistent with the evidence (this becomes our world) Step Two: sum over the H variables to get the joint distribution of the query and evidence variables Step Three: Normalize p(w S = winter)? S W T P winter sunny hot 0.0 winter rain hot 0.05 winter sunny cold 0.5 winter rain cold 0.0 Step Two p(q, e,...,e k )= p(q, e,...,e k,h,...,h r ) (h,...,h r) '

Inference by Enumeration Step One: select the entries in the table consistent with the evidence (this becomes our world) Step Two: sum over the H variables to get the joint distribution of the query and evidence variables Step Three: Normalize p(w S = winter)? S W T P winter sunny hot 0.0 winter rain hot 0.05 winter sunny cold 0.5 winter rain cold 0.0 Step Three Z = p(q = q, e,...,e k ) q p(q e,...,e k )= Z p(q, e,...,e k ) Inference by Enumeration Step One: select the entries in the table consistent with the evidence (this becomes our world) Step Two: sum over the H variables to get the joint distribution of the query and evidence variables Step Three: Normalize S W T P summer sunny hot 0.0 summer rain hot 0.05 summer sunny cold 0.0 summer rain cold 0.05 winter sunny hot 0.0 winter rain hot 0.05 winter sunny cold 0.5 winter rain cold 0.0 Queries: p(w S=winter, T=hot)? p(s, W)? p(s, W T=hot)? '

Inference by Enumeration! n random variables! d domain size! Worst-case time is O(d n )! Space is O(d n ) to save entire table in memory The Product Rule! Given the conditional and marginal distributions, we can compute the joint distribution using the Product Rule: p(x y) = p(x, y) p(y) ) p(x, y) =p(x y) p(y) '

The Chain Rule! In general, the joint distribution of a set of random variables can be expressed as a product of conditional and marginal distributions p(x,...,x n )=p(x ) p(x x )...p(x n x,...,x n ) = Y i p(x i x,...,x i ) Bayes Rule (we ve arrived!)! We can combine the equation for the conditional distribution with the YProduct rule: p(x y) = p(x y) = p(x, y) p(y) p(y x) p(x) p(y) conditional distribution rewrite joint using the Product rule! This is known as Bayes Rule! Allows us to compute one conditional p(x y) in terms of the other conditional p(y x) and the marginals p(y) and p(x) '

Y Bayes Rule! Often one conditional is hard to specify whereas the other is simple: p(disease symptom) = p(symptom disease) p(disease) p(symptom) s = stiff neck m = meningitis p(+s +m) = 0.80 p(+s) = 0. p(+m) = 0.00 p(+m +s) = p(+s +m) p(+m) p(+s) = (0.80)(0.00) (0.) = 0.0008 Bayes Rule! For more than variables: p(x x,...,x n )= p(x,...,x n ) p(x,...,x n ) = p(x n x,...,x n )...p(x x )p(x ) p(x,...,x n )! Need to specify n- conditional distributions!! Leads to the notion of independence: some variables do not depend on the value of other variables 5'

Independence! Two variables are independent if p( = x, Y = y) =p( = x Y = y) p(y = y) = p( = x) p(y = y) 8x, y The value of is independent of the value of Y p(x,x,...,x n )=p(x n x n )...p(x x )p(x )! The joint distribution now factors into the product of simpler distributions! Independence is a modeling assumption that we make for simplification 6'