Y. Xiang, Inference with Uncertain Knowledge 1

Similar documents
Quantifying uncertainty & Bayesian networks

PROBABILISTIC REASONING SYSTEMS

Uncertain Reasoning. Environment Description. Configurations. Models. Bayesian Networks

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Introduction to Artificial Intelligence. Unit # 11

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Uncertainty and Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Bayesian networks. Chapter 14, Sections 1 4

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

CS 5522: Artificial Intelligence II

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Probabilistic representation and reasoning

CS 343: Artificial Intelligence

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayesian networks. Chapter Chapter

CS 188: Artificial Intelligence Fall 2009

Probabilistic Models

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Graphical Models - Part I

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probabilistic Models. Models describe how (a portion of) the world works

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Pengju XJTU 2016

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

CS 188: Artificial Intelligence Spring Announcements

Bayesian Networks. Material used

Review: Bayesian learning and inference

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Pengju

CSE 473: Artificial Intelligence Autumn 2011

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

CS 188: Artificial Intelligence Fall 2008

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayes Nets: Independence

Artificial Intelligence Bayes Nets: Independence

Artificial Intelligence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Artificial Intelligence Uncertainty

Probabilistic Representation and Reasoning

14 PROBABILISTIC REASONING

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Probabilistic representation and reasoning

Bayesian networks (1) Lirong Xia

Introduction to Artificial Intelligence Belief networks

Probabilistic Reasoning

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

CS 5522: Artificial Intelligence II

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Directed Graphical Models

Artificial Intelligence Bayesian Networks

Probabilistic Reasoning

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Uncertainty. Chapter 13

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Introduction to Intelligent Systems

Foundations of Artificial Intelligence

Uncertainty. Chapter 13, Sections 1 6

Introduction to Intelligent Systems

Uncertain Knowledge and Reasoning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian belief networks

Artificial Intelligence CS 6364

Uncertainty. Chapter 13. Chapter 13 1

Reasoning Under Uncertainty

CS 188: Artificial Intelligence. Our Status in CS188

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Chapter 13

CS 188: Artificial Intelligence Spring Announcements

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Uncertainty. Outline

Directed Graphical Models or Bayesian Networks

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Uncertainty. Russell & Norvig Chapter 13.

Bayesian Belief Network

Our Status. We re done with Part I Search and Planning!

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Lecture 4 October 18th

Web-Mining Agents Data Mining

Transcription:

Inference with Uncertain Knowledge Objectives Why must agent use uncertain knowledge? Fundamentals of Bayesian probability Inference with full joint distributions Inference with Bayes rule Bayesian networks (BNs) Exact inference in BNs Reference Russell & Norvig: Chapter 13 & 14 Limitation of Logic Inference When a logic agent knows enough about its env, it can derive plans that guarantee to work. What if it does not have all necessary facts? Ex Wumpus World has at most 2 pits. Starting from [1,1], agent senses breeze at [1,2] and [2,1]. What should agent do? A logic agent may not be able to act under uncertainty. Y. Xiang, Inference with Uncertain Knowledge 1 Y. Xiang, Inference with Uncertain Knowledge 2 Review of Wumpus World Gold is located in a room. Gold glitters. Wumpus eats anyone in its room. Stench in room next to wumpus Pits trap explorers. Breeze in room next to a pit Goal of agent Find gold without being eaten or falling into a pit. 1,4 2,4 3,4 4,4 1,3 2,3 3,3 4,3 1,2 2,2 3,2 4,2 1,1 2,1 3,1 4,1 Uncertainty and Decision Making The presence of uncertainty in env radically changes how agent should make decisions. A logic agent can select or reject an action on whether it achieves the goal, regardless of what other actions might achieve. Ex Flight will departure 9am at Pearson airport. Action a 1 : Drive from Guelph at 6:30am. Action a 2 : Drive from Guelph at 5:30am. Action a 3 : Drive from Guelph at 1:30am. Which action should be selected? Y. Xiang, CIS 3700, Introduction to Intelligent Systems 3 Y. Xiang, Inference with Uncertain Knowledge 4 1

Preference and Utility Making decision under uncertainty requires agent to have preferences over env states. The degree of desirability of a state is specified by a numerical utility. The degrees of desirability over all relevant states is specified by a utility function. How to specify preference using utility functions is investigated in utility theory. Ex Catching 9am flight Y. Xiang, Inference with Uncertain Knowledge 5 Decision Theory Making decision under uncertainty also requires agent to estimate likelihood of env states resultant from alternative actions. Likelihood of the unobservable given observations is investigated by probability theory, among others. Ex Catching 9am flight Decision theory = probability theory + utility theory Let utility of state s be U(s) and probability of s from action a be P(s a), then expected utility of action a is EU(a) = s U(s)P(s a). What is EU(a 1 ), where a 1 is leaving Guelph at 6:30am? Y. Xiang, Inference with Uncertain Knowledge 6 Maximum Expected Utility Principle Maximum expected utility (MEU) principle Rational agent should select action a* with max EU(a). Ex Catching 9am flight: a 1, a 2 or a 3? Can a probabilistic WW agent do better? Which room should be entered, [1,3], [2,2] or [3,1]? Focus Likelihood estimation by probability theory Approach Extend propositional logic to Bayesian probability Variables and Their Domains Describe env state by a set V = {x 1,,x n } of discrete variables. Ex Health state of a cough patient Associate each variable x i in V with a finite domain D i = {x,,x i } of possible values. 1 ik Values in each domain must be mutually exclusive and exhaustive. Ex An athlete's highest achievement in an Olympic game Y. Xiang, Inference with Uncertain Knowledge 7 Y. Xiang, Inference with Uncertain Knowledge 8 2

Propositions A proposition is an assertion about an env state. A simple proposition asserts an assignment over x V. A complex proposition is formed by combining simple propositions with logical connectives. A complex proposition asserts an assignment over X V. Intuitively, a proposition asserts an event in env. Atomic Event An atomic event is a conjunction of simple propositions one over each x V. It corresponds to a complete assignment. Properties of atomic events They are mutually exclusive. They are collectively exhaustive. An atomic event entails the truth value of every proposition. Any proposition q is logically equivalent to disjunction of all atomic events that entail the truth of q. An atomic event is also referred to as a model. Y. Xiang, Inference with Uncertain Knowledge 9 10 Express Uncertain Knowledge Probabilistically Represent an env state s by a proposition q. Express an agent s uncertain knowledge about s by assigning q a probability p. p is degree of belief of the agent over q or s. Agent s belief state is the collection of probabilities it assigns to all env states. Frequentist versus Bayesian probability Prior Probability Prior probability P(q) for proposition q is an agent s degree of belief over q at initial belief state. Prior probability distribution P(x) for x V is a set of probabilities one for each value of x. Joint probability distribution P(X) for X V is a set of probabilities one for each assignment over X. Full joint probability distribution P(V) is a set of probabilities one for each atom event. Y. Xiang, Inference with Uncertain Knowledge 11 Y. Xiang, Inference with Uncertain Knowledge 12 3

Conditional Probability Conditional or posterior probability P(q e), where q and e are propositions, is agent s degree of belief over q, after knowing e beyond initial belief state. Conditional probability is related to prior probability according to product rule: P(q e) = P(q e) / P(e), whenever P(e)>0 Alternatively: P(q e) = P(q e) P(e) Conditional probability distribution P(X y) for assignment y over Y is the set of all conditional probabilities P(x y), one for each assignment x of X. P(q) is a special case of conditional probability. Y. Xiang, Inference with Uncertain Knowledge 13 Conditional Probability Table Conditional probability table (CPT) P(X Y) is a set of conditional probability distributions (CPDs), one for each assignment y over Y. Ex CPD P(lung_cancer pos_x_ray) = { P(early_lung_cancer pos_x_ray), P(late_lung_cancer pos_x_ray), P(absent_lung_cancer pos_x_ray) } Ex CPT P(lung_cancer x_ray) = { P(lung_cancer pos_x_ray), P(lung_cancer neg_x_ray) } Y. Xiang, Inference with Uncertain Knowledge 14 Axioms of Probability Since degree of belief is subjective, can agent assign numerical values to propositions arbitrarily? Probability assignment must satisfy a set of axioms, where q, r and e are propositions: 1. [range] For any proposition q and r, 0 P(q r) 1. 2. [certainty] P(q q) = 1. 3. [sum] If q and r are mutually exclusive, P(q r e) = P(q e) + P(r e). 4. [product] P(q r e) = P(q r e) P(r e). Properties Derived from Axioms Rest of probability theory is derivable from axioms. In the following, h, e and q are propositions. [Bayes rule] P(h e q) = P(e h q)p(h q)/p(e q) [Negation] P( q e) = 1 - P(q e) [Marginalization] Given P(X,Z), P(X) = z P(X, z), where z is an assignment over Z Marginal distribution Y. Xiang, Inference with Uncertain Knowledge 15 Y. Xiang, Inference with Uncertain Knowledge 16 4

Properties Derived from Axioms (Cont) [Sum to 1] For any Z V, the sum z P(z) = 1, where z is an assignment over Z. Marginalization and Sum to 1 hold for conditional probability as well. [Sum from atomic events] For any proposition q, it holds P(q) = z P(z), where z is an atomic event that entails q. Why Should Agent Belief be Probability? Degree of belief of agent Ag in proposition e being p implies the following: When offered alternative lotteries L 1 = {$1 e, $0 e} and L 2 = {$p e e}, Ag is indifferent btw them. Ag is indifferent among lotteries L 3 = {$(p-1) e, $p e}, L 4 = {$(1-p) e, -$p e}, and L 5 = {$0 e e}. Ag s degree of belief either follows axioms of probability (hence it is probability), or it does not. There exists a combination lottery that guarantees Ag s loss iff its degree of belief is not probability. Y. Xiang, Inference with Uncertain Knowledge 17 Y. Xiang, Inference with Uncertain Knowledge 18 When Agent Belief is Not Probability Proposition Belief Lottery a b a b a b a b a 0.1 {-$0.9 a, $0.1 a} b 0.6 {-$0.4 b, $0.6 b} a b 0.8 {$0.2 a b, -$0.8 (a b)} Combo Lottery: -$0.9 -$0.9 $0.1 $0.1 -$0.4 $0.6 -$0.4 $0.6 $0.2 $0.2 $0.2 -$0.8 -$1.1 -$0.1 -$0.1 -$0.1 Reflection In stochastic and partially observable envs, agent frequently faces decision situations like lotteries. If agent does not base its belief on probability, it will encounter situations (by nature or by design) where its performance is guaranteed sub-optimal. The only way that agent can always avoid such undesirable position is to adopt probabilistic belief. Y. Xiang, Constraint Satisfaction Problems 19 Y. Xiang, Inference with Uncertain Knowledge 20 5

Probabilistic Inference To act effectively, agent needs to determine what is the state of env given perception (observation). In stochastic and partially observable env, this task becomes computation of posterior distribution P(X e) from prior distribution P(X) and observation e, where X V. This task is referred to as probabilistic inference. Ex Given P(hygiene, cavity, toothache) and observation toothache=yes, what is the posterior P(hygiene toothache=yes)? Y. Xiang, Inference with Uncertain Knowledge 21 Ex Dental Hygiene V = {hygiene, cavity, toothache}, where hygiene {good, bad}, cavity {yes, no}, and toothache {yes, no} h c t P(h,c,t) good yes yes 0.0595 good yes no 0.0105 good no yes 0.0315 good no no 0.5985 bad yes yes 0.2040 bad yes no 0.0360 bad no yes 0.0030 bad no no 0.0570 Y. Xiang, Inference with Uncertain Knowledge 22 Inference Using Full Joint Distribution Query: What is P(hygiene toothache=yes)? Algorithm 1. Constrain P(h,c,t) to P(h,c,t=y) 2. Compute P(t=y) 3. Condition P(h,c,t=y) into P(h,c,t t=y) 4. Marginalize out c and t to get P(h t=y) Normalization: Steps 2 and 3 above Normalization constant: = 1/P(t=y) Hence, P(h t=y) = c,t P(h,c,t=y) Inference by Bayes Rule [Bayes rule] P(h e) = P(e h)p(h)/p(e) Compute probability of hypothesis h given observation e. Ex Meningitis patients often have stiff neck. P(s m): knowledge on causal mechanism P(m): disease incidence rate in the population P(s): symptom statistics from the population Y. Xiang, Inference with Uncertain Knowledge 23 Y. Xiang, Inference with Uncertain Knowledge 24 6

Complexity of Inference Using Full Joint Suppose V =n, D x k for x V, and V =V\{x}. How many independent probability parameters are needed to specify P(V)? How may parameters in P(V) make up P(V x=x 0 )? These parameters must be processed to compute P(y x=x 0 ) for y V. Inference using full joint distributions is intractable! Idea for efficiency improvement: Explore independence using graphical models Y. Xiang, Inference with Uncertain Knowledge 25 Explore Independence Inference by full joint assumes that every variable is directly dependent on every other. This is often not the case in an application env. Ex Knowing child s dental hygiene affects belief on suffering of toothache. But P(t c,h) = P(t c) This allows full joint over V={h,c,t} to be specified with a less number of (independent) parameters. Variables x and y are conditionally independent given variable z, iff P(x y,z ) = P(x z ) holds for every value x, y, z of x, y, z, respectively. Y. Xiang, Inference with Uncertain Knowledge 26 Conditional Independence When x and y are conditionally independent given z, we write P(x y,z) = P(x z) and I(x,z,y). Subsets of variables X and Y are conditionally independent given Z, where X, Y and Z are disjoint, iff P(x y,z) = P(x z) holds for every assignment x, y, z of X, Y, Z, respectively. We write P(X Y,Z) = P(X Z) and I(X,Z,Y). Ex V = {x 0,,x 9 } and D i =2, where x k+1 (k=1,,8) is conditionally independent of x 0,, x k-1 given x k. How many parameters are needed to specify P(V)? Symmetry of I(X,Z,Y) & unconditional independence Y. Xiang, Inference with Uncertain Knowledge 27 Encode Conditional Independence Concisely When V is large, how can agent encode knowledge on conditional independence concisely? Many dependence and independence relations can be encoded in a graph. 1. Direct dependence 2. Indirect dependence 3. Causal dependence 4. Unconditional independence 5. Conditional independence Y. Xiang, Inference with Uncertain Knowledge 28 7

Bayesian Network (BN) A BN is an acyclic, directed graph G, where each node is associated with a CPT. V is a set of discrete env variables. Each node in G is labeled by a variable v V. Each node v with parent set (v) is associated with conditional probability table P(v (v)). Ex Burglar-quake A home burglar alarm is reliable, but responds to quakes on occasion. Neighbor John calls if alarm is heard, but may be confused with phone ringing. Mary also calls, but may miss the alarm due to loud music. Number of independent probability parameters Y. Xiang, Inference with Uncertain Knowledge 29 Semantics of BNs 1. A variable v in BN is conditionally independent of its non-descendants given its parents (v). [Chain rule] Given a BN over V, the full joint distribution is P(V) = v V P(v (v)). Ex Reduced BN for burglar alarm. Markov blanket of a variable in BN includes its parents, children, and children s parents. 2. A variable in BN is conditionally independent of all other variables given its Markov blanket. Y. Xiang, Inference with Uncertain Knowledge 30 Inference in BNs BNs avoid specification of intractable number of probability parameters through chain rule. Ex Burglar-quake Compute P(b j=t, m=t) through chain rule. Key operations Product: multiply(b 1 (X 1 ),, B m (X m )) Multiply a set of potentials. Marginalization: marginout(b(x), Z) Marginalize out variables in Z from potential B(X). A potential is a non-negative real function with at least one positive value. Compute P(b j=t, m=t) b P(b) t 0.001 f 0.999 a j P(j a) t t 0.90 t f 0.10 f t 0.05 f f 0.95 Ex Burglary-Quake b q a j m q P(q) t 0.002 f 0.998 a m P(m a) t t 0.70 t f 0.30 f t 0.01 f f 0.99 b q a P(a b,q) t t t 0.95 t t f 0.05 t f t 0.94 t f f 0.06 f t t 0.29 f t f 0.71 f f t 0.001 f f f 0.999 Y. Xiang, Inference with Uncertain Knowledge 31 Y. Xiang, Inference with Uncertain Knowledge 32 8

Multiply Potentials multiply(b 1 (X 1 ),, B m (X m )) { Z = i=1,,m X i ; init F(Z) s.t. F(z) = 1 for each assignment z of Z; for each assignment z of Z, for i=1 to m, x i = assignment of X i consistent with z ; F(z) *= B i (x i ); return F(Z); } Marginalize Out Variables marginout(b(x), Z) { Y = X\Z; init F(Y) s.t. F(y) = 0 for each assignment y of Y; for each assignment y of Y, for each assignment z of Z, F(y) += B(y z); return F(Y); } Y. Xiang, Inference with Uncertain Knowledge 33 Y. Xiang, Inference with Uncertain Knowledge 34 BN Inference By Variable Elimination When BN inference is based on full joint, it still processes an intractable number of parameters. Ideas to improve inference efficiency Avoid processing full joint directly. Reduce size of intermediate result (potential or factor). Marginalize as soon as possible. Multiply as late as possible. Algorithm varelim(s, x, y) Given a BN S, a query variable x, and observation y over Y, compute posterior P(x y). Y. Xiang, Inference with Uncertain Knowledge 35 varelim(s, x, y) { // S: over V; y over Y V; x V\Y; T = set of all CPTs in S; for each v Y with v=u in y, pick potential B over v in T; constrain B by v=u; for i=1 to V -1, select variable z x covered by a potential in T; Q = set of potentials over z in T; B 1 = multiply( potentials in Q ); B 2 = marginout(b 1, z); T = (T\Q) {B 2 } ; F = multiply( potentials in T ); normalize F and return result; } 36 9

Select Variable to Eliminate Heuristic: Select the variable whose elimination produces the smallest potential. Algorithm for each variable v to be eliminated, U v = {}; for each potential B(X) s.t. v X, U v = U v X; return variable y s.t. U y is minimal; Complexity of VE What is the complexity if BN has a chain topology? If each variable in BN is directly connected to each other variable, what is the complexity? The sparser the BN topology, the more efficient the inference computation. VE can be extended to query about a set X V of variables directly. Y. Xiang, Inference with Uncertain Knowledge 37 Y. Xiang, Inference with Uncertain Knowledge 38 Remarks Logic reasoning was once considered as sufficient for AI, but that view was soon found to be flawed. BNs and related graphical models are the dominant paradigm for uncertain reasoning in AI today. More advanced topics Acquisition of BNs by learning or elicitation Alternative BN inference paradigms BNs with mixed variables or under dynamic envs Multi-agent BNs Decision making with graphical models Uncertain reasoning in practical applications Y. Xiang, Inference with Uncertain Knowledge 39 10