Introduction to Artificial Intelligence. Unit # 11

Similar documents
Directed Graphical Models

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

PROBABILISTIC REASONING SYSTEMS

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Informatics 2D Reasoning and Agents Semester 2,

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

CS 5522: Artificial Intelligence II

Quantifying uncertainty & Bayesian networks

Bayesian Networks. Motivation

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian networks. Chapter 14, Sections 1 4

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Probabilistic Models

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2009

Y. Xiang, Inference with Uncertain Knowledge 1

Review: Bayesian learning and inference

Artificial Intelligence Bayesian Networks

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Probabilistic Models. Models describe how (a portion of) the world works

CS 5522: Artificial Intelligence II

Bayes Nets: Independence

Artificial Intelligence Bayes Nets: Independence

Learning Bayesian Networks (part 1) Goals for the lecture

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Introduction to Artificial Intelligence Belief networks

CS 188: Artificial Intelligence Fall 2008

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian belief networks

Bayesian networks. Chapter Chapter

Probabilistic Reasoning Systems

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic representation and reasoning

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian Networks. Material used

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Graphical Models - Part I

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Our learning goals for Bayesian Nets

14 PROBABILISTIC REASONING

Introduction to Bayesian Networks

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic representation and reasoning

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Rapid Introduction to Machine Learning/ Deep Learning

CS 188: Artificial Intelligence Spring Announcements

Another look at Bayesian. inference

CS Lecture 3. More Bayesian Networks

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Bayesian Networks and Decision Graphs

Reasoning Under Uncertainty: Belief Network Inference

last two digits of your SID

Bayesian networks (1) Lirong Xia

Uncertainty and Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian networks: Modeling

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian networks. Chapter Chapter

Machine Learning Lecture 14

COMP538: Introduction to Bayesian Networks

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Product rule. Chain rule

CS 188: Artificial Intelligence. Bayes Nets

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian belief networks. Inference.

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Directed Graphical Models or Bayesian Networks

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Bayesian Networks: Independencies and Inference

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Intelligent Systems (AI-2)

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Bayesian Networks (Part II)

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Statistical Approaches to Learning and Discovery

Foundations of Artificial Intelligence

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Transcription:

Introduction to Artificial Intelligence Unit # 11 1

Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian Networks Knowledge Elicitation Inference in BNs Evolutionary Computation Miscellaneous Topics (depending upon the availability of time) Computer Vision Introduction to Robotics Reinforcement Learning 2

Acknowledgement The slides of this lecture have been taken from the lecture slides of CS307 Introduction to Artificial Intelligence by Dr. Sajjad Haider. Artificial Intelligence Lab - IBA Spring 2011 3

Recap Joint Distribution: A joint probability gives us a full picture about how random variables are related. Marginal Distribution: Marginalization lets us to focus one aspect of the picture. Prior probability: Belief about a hypothesis h before obtaining observations. Example: Suppose 10% of people suffer from Hepatitis B. A doctor s prior probability about a new patient suffering from Hepatitis B is 0.1. Posterior probability: Belief about a hypothesis after obtaining observations. 4

Bayes Theorem P (A B) = P(B A) P(A) P(B) = P(B A) P(A) P(B A)P(A)+P(B A)P( A) P(A) is the prior probability and P(A B) is the posterior probability. Suppose events A 1, A 2,., A k are mutually exclusive and exhaustive; i.e., exactly one of the events must occur. Then for any event B: P(A i B) = P(B A i ) P(A i ) P(B A i ) P(A i ) 5

Probabilistic Reasoning Logic is used to represent and reason knowledge about the world with facts and rules. bird(tweety). fly(x) :- bird(x). In many cases, certain rules are hard to find. Probability theory is the most popular way of representing uncertainty. lung_cancer(x) :- smoking(x). is replaced by P(Lung Cancer Smoking) = 0.6 6

Bayes Theorem Example Suppose that Bob can decide to go to work by one of three modes of transportation, car, bus, or commuter train. Because of high traffic, if he decides to go by car, there is a 50% chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes overcrowded, the probability of being late is only 20%. The commuter train is almost never late, with a probability of only 1%, but is more expensive than the bus. Suppose that Bob is late one day, and his boss wishes to estimate the probability that he drove to work that day by car. Since he does not know which mode of transportation Bob usually uses, he gives a prior probability of 1/3 to each of the three possibilities. What is the boss estimate of the probability that Bob drove to work? 7

Bayes Theorem Example Suppose that a coworker of Bob s knows that he almost always takes the commuter train to work, never takes the bus, but sometimes, 10% of the time, takes the car. What is the coworker s probability that Bob drove to work that day, given that he was late? Mode Of Transportation Late 8

Earthquake Example (Pearl) You have a new burglar alarm installed. It is reliable about detecting burglary, but responds to minor earthquakes. Two neighbors (John, Mary) promises to call you at work when they hear the alarm. John always call when hears alarm, but confuses alarm with phone ringing (and calls then also) Mary likes loud music and sometimes misses alarm Given evidence about who has and hasn t called, estimate the probability of a burglary. 9

Cause and Effect Relationships Burglary Alarm Goes Off Earthquake Alarm Goes Off Alarm Goes Off John Calls Alarm Goes Off Mary Calls Number of Variables 5 Number of Links 4 10

Earthquake Example (Pearl) P(B) 0.1 Burglary Earthquake P(E) 0.02 Alarm B E P(A B,E) T T 0.95 T F 0.94 F T 0.29 F F 0.001 A P(J A) T 0.9 F 0.05 JohnCalls MaryCalls A P(M A) T 0.7 F 0.01 11

Graphical Models Models are descriptions of how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables What do we do with probabilistic models? We need to reason about unknown variables, given evidence Example: explanation (diagnostic reasoning) Example: prediction (causal reasoning) 12

Bayesian Networks Two problems with using full joint distributions for probabilistic models: Unless there are only a few variables, the joint is WAY too big to represent explicitly Hard to estimate anything empirically about more than a few variables at a time Bayesian Networks are a technique for describing complex joint distributions using a bunch of simple, local distributions We describe how variables locally interact Local interactions chain together to give global, indirect interactions 13

Bayesian Networks A problem domain is modeled by a list of variables X 1, X 2,..., X n. Knowledge about the problem domain is represented by a joint probability P(X 1, X 2,..., X n ). General probability distribution of 8 variables with 2 states each has 2 8 = 256 possible values and 2 8-1 probabilities need to be specified. Exponential model size Knowledge acquisition difficult Exponential storange and inference A Bayes net is a graphical representation of the joint distribution. Assumes that each node is conditionally independent of all its non-descendants given its parents. Product of all conditional probabilities is the joint probability of all variables. P(X 1, X 2,..., X n = P(X i parents(x i )) A B D C 18 probabilities are required to specify the joint distribution E H F G 14

Bayesian Networks A BN is a Directed Acyclic Graph (DAG) in which: A set of random variables makes up the nodes in the network. A set of directed links or arrows connects pairs of nodes. Each node has a conditional probability table that quantifies the effects the parents have on the node. The intuitive meaning of an arrow from a parent to a child is that the parent directly influences the child. The direction of this influence is often taken to represent casual influence. These influences are quantified by conditional probabilities. 15

Inference Problem The canonical inference problem: find the posterior probability distribution for some variable(s) given direct or virtual evidence about other variable(s) 16

Cancer Example (Pearl) Metastatic cancer is a possible cause of a brain tumor and is also an explanation for increased total serum calcium. In turn, either of these could explain a patient falling into a coma. Severe headache is also possibly associated with a brain tumor. 17

Cancer Example (Pearl) 18

Bayes Theorem Example Suppose that Bob can decide to go to work by one of three modes of transportation, car, bus, or commuter train. Because of high traffic, if he decides to go by car, there is a 50% chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes overcrowded, the probability of being late is only 20%. The commuter train is almost never late, with a probability of only 1%, but is more expensive than the bus. Suppose that Bob is late one day, and his boss wishes to estimate the probability that he drove to work that day by car. Since he does not know which mode of transportation Bob usually uses, he gives a prior probability of 1/3 to each of the three possibilities. What is the boss estimate of the probability that Bob drove to work? 19

Revised Bob late to work Example Mode of Transportation: {Car, Not Car (Public Transport)} Late: {True, False} P(C) = 1/3, P( C) = 2/3 P(L C) = 0.4, P(L C) = 0.2 Mode Of Transportation Late P(L) =? P(C L), P( C L) 20

Revised Bob late to work Example One approach is to compute the joint distribution through P(X 1, X 2,..., X n )= P(X i parents(x i )) P(L, C) = P(L C) P(C) = 0.4 x 0.33 P(L, C) = P(L C) P( C) = 0.2 x 0.67 P( L, C) = P( L C) P(C) = 0.6 x 0.33 P( L, C) = P( L C) P( C) = 0.8 x 0.67 Now compute P(L), P(C L) P(L) = P(L, C) + P(L, C) P(C L) = P(C, L) P(L) 21

Revised Bob late to work Example Or we could use Bayes Theorm based propagation. P(C L) = P(L C) P(C) = 0.4 x 0.33 P(L) P(L) where P(L) = P(L C) + P(L C) = P(L C)P(C) + P(L C) P( C) = 0.4 x 0.33 + 0.2 x 0.67 =? Both approaches (on the previous and the current slides) should give the same result Mode Of Transportation Late Inference Direction 22

Lecturer s Life Example (Korb & Nicholson) Dr. Ann Nicholson spends 60% of her work time in her office. The rest of her work time is spent elsewhere. When Ann is in her office, half the time her light is off (when she is trying to hide from students and get some real work done). When she is not in her office, she leaves her light on only 5% of the time. 80% of the time she is in her office, Ann is logged onto the computer. Because she sometimes logs onto the computer from home, 10% fo the time she is not in her office, she is still logged onto the computer. Suppose a student checks Dr. Nicholson s login status and sees that she is logged on. What effect does this have on the student s belief that Dr. Nicholson s light is on? 23

Lecturer s Life Example (Cont d) In Office (O): {T, F} Lights On (L): {T, F} Logged on to the Computer (C) : {T, F} P(O) = 0.6 P(L O) = 0.5, P(L O) = 0.05 P(C O) = 0.8, P(C O) = 0.1 Lights On In Office Logged on to the Computer P(L C) =? 24

Lecturer s Life Example (Cont d) One possibility is to compute the joint distribution and then compute the conditional probability P(X 1, X 2,..., X n = P(X i parents(x i )) P(O, L, C) = P(L O) P(C O) P(O) P(O, L, C) = P(L O) P( C O) P(O) P(O, L, C) = P( L O) P(C O) P(O) P(O, L, C) = P( L O) P( C O) P(O) P( O, L, C) = P(L O) P(C O) P( O) P( O, L, C) = P(L O) P( C O) P( O) P( O, L, C) = P( L O) P(C O) P( O) P( O, L, C) = P( L O) P( C O) P(O) After computing the joint distribution, we can compute P(L C) Inference Direction Lights On In Office Logged on to the Computer 25

Lecturer s Life Example (Cont d) The other approach is to do inference using probability propagation based on Bayes Theorem First compute the posterior probability P(O C) {referred to as P(O*)} P(O*) = P(C O) P(O) P(C) And then P(L C) {referred to as P(L*)} P(L*) = P(L O)P(O*) + P(L O)P( O*) Inference Direction Lights On In Office Logged on to the Computer 26

Inference in Singly Connected Networks P(y1) = P(y1 x1) P(x1) + P(y1 x2) P(x2) P(y2) = 1 P(y1) P(z1) = P(z1 y1) P(y1) + P(z1 y2) P(y2) P(z2) = 1 P(z1) P(w1) =??? 27

Inference in Singly Connected Networks Suppose we get evidence that w1 is true, i.e., P(w1) = 1. Now compute the posterior probabilities: P*(z1), P*(y1), P*(x1) P*(z1) = P(z1 w1) P(w1) + P(z1 w2) P(w2) P(w1)=1 P(w2)=0 Computing P(z1 w1) using Bayes theorem: P(z1 w1) = P(w1 z1) P(z1) / P(w1) P(z1 w1) = 0.5 x 0.652 / 0.5348 = 0.61 => P*(z1) = 0.61 * 1 + 0 = 0.61 Computation of P*(y1) on the next slide 28

Inference in Singly Connected Networks Now update the probability of Y. P*(y1) = P(y1 z1) P(z1) + P(y1 z2) P(z2) P(z1)=0.61 P(z2)=0.39 } Computed on the previous slide Computing P(y1 z1) and P(y1 z2) using Bayes theorem: P(y1 z1) = P(z1 y1) P(y1) / P(z1) P(y1 z1) = 0.7 x 0.84 / 0.652 = 0.90 P(y1 z2) = P(z2 y1) P(y1) / P(z2) P(y1 z2) = 0.3 x 0.84 / 0.348 = 0.92 Finally plug the values in the equation at the top P*(y1) = P(y1 z1) P(z1) + P(y1 z2) P(z2) = 0.90 x 0.61 + 0.92 X 0.39 = 0.91 Confirm the results using Genie. Finally compute P*(x1) 29

d-separation d-separation is short for direction-dependent separation d-separation of vertices in a graph corresponds to conditional independence of the associated random variables d-separation is the mathematical basis for efficient inference algorithms in Bayesian networks Two nodes X and Y are d-separated by a set Z if all paths between X and Y are blocked by Z. When X and Y are d-separated by Z, no information can be transmitted between X and Y given Z. Hence X and Y are conditional independent given Z. 30

Markov Blanket (Laskey) A node s Markov blanket consists of its parents, children, and other parents of its children (co-parents) The Markov blanket of node B consists of all nodes whose local probability table mentions B and all nodes their local probability tables mention A node s Markov blanket d-separates it from all other nodes in the graph A node is conditionally independent of all other nodes given its Markov blanket E B A D H C F G 31

Conditional Dependence/Independence Causal Chains smoking causes cancer which causes dyspnoea if we have evidence on B then A and C become independent A B C Common Causes cancer is a common cause of the two symptoms, a positive XRay result and dyspnoea If we have evidence on B then A and C become independent Common Effects Cancer is a common effect of pollution and smoking If we have evidence on B then A and C become dependent A A B C C 32 B

Inference Mechanism A B C E F D H J G A and K are independent? What s the Markov blanket of H? A and B are independent given D? A and K are independent given G? K E and J are independent given K? G and B are independent given D? E and H are independent given B? E and H are independent given B and K? 33

Inference Mechanism A B C E F D H J G A and K are independent? (F) A and K are independent given G? (F) What s the Markov blanket of H? A and B are independent given D? (F) K E and J are independent given K? (F) G and B are independent given D? (F) E and H are independent given B? (T) E and H are independent given B and K? (F) 34