Review: Bayesian learning and inference

Similar documents
Another look at Bayesian. inference

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Bayesian networks. Chapter 14, Sections 1 4

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian networks. Chapter Chapter

Quantifying uncertainty & Bayesian networks

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Probabilistic Reasoning Systems

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Graphical Models - Part I

PROBABILISTIC REASONING SYSTEMS

Bayesian networks. Chapter Chapter

Bayesian networks: Modeling

Informatics 2D Reasoning and Agents Semester 2,

Probabilistic Models

Bayesian networks (1) Lirong Xia

Directed Graphical Models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Bayesian Networks. Motivation

Introduction to Artificial Intelligence Belief networks

Bayesian Belief Network

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Fall 2009

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

CS Belief networks. Chapter

CSE 473: Artificial Intelligence Autumn 2011

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 343: Artificial Intelligence

Probabilistic Classification

Probabilistic Models. Models describe how (a portion of) the world works

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Learning Bayesian Networks (part 1) Goals for the lecture

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

CS 188: Artificial Intelligence Spring Announcements

Probabilistic representation and reasoning

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Bayesian Networks. Material used

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic representation and reasoning

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Graphical Models - Part II

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Introduction to Artificial Intelligence. Unit # 11

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

14 PROBABILISTIC REASONING

CS 188: Artificial Intelligence Fall 2008

Bayes Nets: Independence

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

CS 5522: Artificial Intelligence II

COMP5211 Lecture Note on Reasoning under Uncertainty

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Bayesian Networks. Philipp Koehn. 6 April 2017

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

CMPSCI 240: Reasoning about Uncertainty

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Bayesian belief networks. Inference.

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Bayesian Networks. Philipp Koehn. 29 October 2015

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayesian belief networks

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Course Overview. Summary. Outline

Lecture 8: Bayesian Networks

Artificial Intelligence Bayesian Networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayes Networks 6.872/HST.950

Artificial Intelligence Bayes Nets: Independence

Y. Xiang, Inference with Uncertain Knowledge 1

Reasoning Under Uncertainty

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Probabilistic Representation and Reasoning

PROBABILISTIC REASONING Outline

Foundations of Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Probabilistic Machine Learning

Bayesian Networks (Part II)

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Rapid Introduction to Machine Learning/ Deep Learning

School of EECS Washington State University. Artificial Intelligence

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Probabilistic Graphical Networks: Definitions and Basic Results

Naïve Bayes classification

Uncertainty and Bayesian Networks

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Transcription:

Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem: given some evidence E = e, what is P(X e? Learning problem: estimate the parameters of the probabilistic bili i model P(X E given a training i sample {(e 1,x 1,, (e n,x n }

Example of model and parameters Naïve Bayes model: n P( spam message P( spam P ( spam message P ( spam Model parameters: i= 1 n P( w i= 1 i spam P ( w i spam prior P(spam P( spam Likelihood of spam P(w 1 spam P(w 2 spam P(w n spam Likelihood of spam P(w 1 spam P(w 2 spam P(w n spam

Example of model and parameters Naïve Bayes model: P θ n θ ( spam message Pθ ( spam Pθ i= 1 P n ( w ( spam message Pθ ( spam Pθ ( w i spam Model parameters (θ: i= 1 i spam prior P(spam P( spam Likelihood of spam P(w 1 spam P(w 2 spam P(w n spam Likelihood of spam P(w 1 spam P(w 2 spam P(w n spam

Learning and Inference x: class, e: evidence, θ: model parameters MAP inference: x* = arg max P ( x e arg max P ( e x P ( x ML inference: Learning: x θ x x θ * = arg max P ( e x = arg maxθ P ( θ ( e1, x1, K,( en, xn arg max P( ( e, x, K,( e, x θ P( θ θ* θ θ 1 1 (( e, x, K,( e n, x θ = arg max P θ θ * 1 1 n n n x θ θ (MAP (ML

Probabilistic inference A general scenario: Query variables: X Evidence (observed variables: E = e Unobserved variables: Y If we know the full joint distribution P(X, E, Y, how can we perform inference about X? Problems P(X ( X, e P( X E = e = P( X, e, y P( e Full joint distributions are too large y Marginalizing out Y may involve too many summation terms

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification of full joint distributions

Structure des: random variables Can be assigned (observed or unassigned (unobserved Arcs: interactions An arrow from one variable to another indicates direct influence Encode conditional independence Weather is independent d of the other variables Toothache and Catch are conditionally independent given Cavity Must form a directed, d acyclic graph

Example: N independent coin flips Complete independence: no interactions X 1 X 2 X n

Example: Naïve Bayes spam filter Random variables: C: message class (spam or not spam W 1,, W n : words comprising the message C W 1 W 2 W n

Example: Burglar Alarm I have a burglar alarm that is sometimes set off by minor earthquakes. My two neighbors, John and Mary, promised to call me at work if they hear the alarm Example inference task: suppose Mary calls and John doesn t call. Is there a burglar? What are the random variables? Burglary, Earthquake, Alarm, JohnCalls, MaryCalls What are the direct influence relationships? A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Example: Burglar Alarm What are the model parameters?

Conditional probability distributions To specify the full joint distribution, we need to specify a conditional distribution for each node given its parents: P (X Parents(X Z 1 Z 2 Z n X P (X Z 1,,Z n

Example: Burglar Alarm

The joint probability distribution For each node X i i, we know P(X i Parents(X i i How do we get the full joint distribution P(X 1,, X n? Using chain rule: n n 1, K, X n = P i 1 i 1 = i i i= 1 i= 1 P( X K ( X X,, X P( X Parents( X For example, P(j, m, a, b, e = P( b P( e P(a b, e P(j ap(m a

Conditional independence Key assumption: X is conditionally independent of every non-descendant node given its parents every non descendant node given its parents Example: causal chain Are X and Z independent? Is Z independent of X given Y? Is Z independent of X given Y? ( ( ( (,, ( ( Y Z P Y Z P X Y P X P Z Y X P Y X Z P = = = ( ( (, (, ( Y Z P X Y P X P Y X P Y X Z P = = =

Conditional independence Common cause Common effect Are X and Z independent? Are they conditionally independent given Y? Yes Are X and Z independent? Yes Are they conditionally independent given Y?

Compactness Suppose we have a Boolean variable X i with k Boolean parents. How many rows does its conditional probability table have? 2 k rows for all the combinations of parent values Each row requires one number p for X i = true If each variable has no more than k parents, how many numbers does the complete network require? O(n 2 k numbers vs. O(2 n for the full joint distribution How many nodes for the burglary network? 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5-1 = 31

Constructing Bayesian networks 1. Choose an ordering of variables X 1 1,,, X n 2. For i = 1 to n add X i to the network select parents from X 1,,X i-1 such that P(X i Parents(X i = P(X i X 1,... X i-1

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J?

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J?

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M?

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M?

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M? P(B A, J, M = P(B? P(B A, J, M = P(B A?

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M? P(B A, J, M = P(B? P(B A, J, M = P(B A? Yes

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M? P(B A, J, M = P(B? P(B A, J, M = P(B A? P(E B, A,J, M = P(E? P(E B, A, J, M = P(E A, B? Yes

Example Suppose we choose the ordering M, J, A, B, E P(J M = P(J? P(A J, M = P(A? P(A J, M = P(A J? P(A J, M = P(A M? P(B A, J, M = P(B? P(B A, J, M = P(B A? P(E B, A,J, M = P(E? P(E B, A, J, M = P(E A, B? Yes Yes

Example contd. Deciding conditional independence is hard in noncausal directions The causal direction seems much more natural Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

A more realistic Bayes Network: Initial observation: car won t start Orange: broken, so fix it nodes Car diagnosis Green: testable evidence Gray: hidden variables to ensure sparse structure, reduce parameteres

Car insurance

In research literature Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data Karen Sachs, Omar Perez, Dana Pe'er, Douglas A. Lauffenburger, and Garry P. lan (22 April 2005 Science 308 (5721, 523.

In research literature Describing Visual Scenes Using Transformed Objects and Parts E. Sudderth, A. Torralba, W. T. Freeman, and A. Willsky. International Journal of Computer Vision,. 1-3, May 2008, pp. 291-330.

Summary Bayesian networks provide a natural representation for (causally induced conditional independence Topology + conditional probability tables Generally easy for domain experts to construct