Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Similar documents
Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Review: Bayesian learning and inference

Bayes Nets: Independence

Probabilistic Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

A brief introduction to Conditional Random Fields

Directed Graphical Models or Bayesian Networks

Bayesian networks (1) Lirong Xia

Lecture 4 October 18th

Covariance. if X, Y are independent

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Uncertainty and Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Machine Learning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Reasoning Under Uncertainty: Bayesian networks intro

Bayesian Networks. Motivation

Some Probability and Statistics

Directed Graphical Models

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Lecture 8: Bayesian Networks

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Directed Graphical Models

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CMPSCI 240: Reasoning about Uncertainty

4 : Exact Inference: Variable Elimination

STA 4273H: Statistical Machine Learning

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

CS 188: Artificial Intelligence Fall 2009

Another look at Bayesian. inference

Chapter 16. Structured Probabilistic Models for Deep Learning

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

PROBABILITY. Inference: constraint propagation

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian networks. Chapter 14, Sections 1 4

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Introduction to Artificial Intelligence. Unit # 11

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

CSE 473: Artificial Intelligence Autumn 2011

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Networks Representation

CS 5522: Artificial Intelligence II

Bayesian Networks. Alan Ri2er

Undirected Graphical Models

Introduction to Bayesian Learning

1 Probabilities. 1.1 Basics 1 PROBABILITIES

CS 188: Artificial Intelligence Spring Announcements

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CS 5522: Artificial Intelligence II

Computational Genomics

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

CS Lecture 4. Markov Random Fields

Review: Directed Models (Bayes Nets)

Some Probability and Statistics

Probabilistic Graphical Networks: Definitions and Basic Results

Intelligent Systems (AI-2)

Directed and Undirected Graphical Models

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

CS 343: Artificial Intelligence

Chris Bishop s PRML Ch. 8: Graphical Models

Conditional probabilities and graphical models

CS 188: Artificial Intelligence Fall 2008

Directed Probabilistic Graphical Models CMSC 678 UMBC

Artificial Intelligence Bayes Nets: Independence

STA 4273H: Statistical Machine Learning

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence. Bayes Nets

Directed and Undirected Graphical Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Machine Learning Algorithm. Heejun Kim

Introduction to Bayesian inference

Rapid Introduction to Machine Learning/ Deep Learning

CS 343: Artificial Intelligence

Bayesian Learning. Instructor: Jesse Davis

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

10601 Machine Learning

4.1 Notation and probability review

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

The Ising model and Markov chain Monte Carlo

Quantifying uncertainty & Bayesian networks

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Review of probability

Graphical Models - Part I

An Introduction to Bayesian Networks: Representation and Approximate Inference

Causal Bayesian networks. Peter Antal

CS6220: DATA MINING TECHNIQUES

Bayesian Machine Learning - Lecture 7

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Qualifier: CS 6375 Machine Learning Spring 2015

Transcription:

Bayesian Networks

Today Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability Conditional independence What is a Bayes net? Mostly, we discuss discrete r.v. s What can you do with a Bayes net?

Exponential growth of joint probability tables The size of a table representing the joint probabilities of discrete r.v. s is exponential in the number of r.v. s. recur not recur 0.24 0.76 recur not recur big cells 0.16 0.33 little cells 0.08 0.43 recur not recur rough cells smooth cells rough cells smooth cells big cells 0.06 0.10 0.17 0.16 little cells 0.02 0.06 0.18 0.25

Large joint probability tables are: Awkward or impossible to store and maintain, if there are many r.v. s. Difficult to reason about or visualize. Computing marginal or conditional probabilities requires intractably large summations. Difficult to learn. (Because there are so many free parameters!) To represent an arbitrary joint probability distribution, an exponentially-large table is necessary. But...

The world is not arbitrarily complex! Effects have a limited number of causes. R.v. s have limited relationships with other r.v. s. Bayes nets are a technique for representing and reasoning about big joint probability distributions in a compact way. They rely on two things: 1. Writing the joint probability as a produce of conditional probabilities. 2. Simplifying based on conditional independence.

Rewriting the joint as a product of conditionals By the definition of conditional probability, any joint probability can be rewritten as P(X 1, X 2,..., X m ) = P(X 1 X 2,..., X m )P(X 2,..., X m ) = P(X 1 X 2,..., X m )P(X 2 X 3,..., X m )P(X 3,..., X m ) = P(X 1 X 2,..., X m )P(X 2 X 3,..., X m ) P(X m 1 X m )P(X m ) We can factor the joint probability into a product of conditional probabilities in different ways. Another one is: P(X 1, X 2,..., X m ) = P(X m X 1,..., X m 1 )P(X 1,..., X m 1 ) = P(X m X 1,..., X m 1 )P(X m 1 X 1,..., X m 2 ) P(X 2 X 1 )P(X 1 )

Example Recur = whether or not the cancer recurs. Size = whether tumor cells are big or little. Texture = whether tumor cells are rough or smooth. P(Recur, Size, T exture) = P(Recur Size, T exture)p(size T exture)p(t exture) = P(Recur Size, T exture)p(t exture Size)P(Size) = P(Size Recur, T exture)p(recur T exture)p(t exture) = P(Size Recur, T exture)p(t exture Recur)P(Recur) = P(T exture Recur, Size)P(Recur Size)P(Size) = P(T exture Recur, Size)P(Size Recur)P(Size) With m variables, there are m! ways of doing this.

Viewing it graphically A factorization can be depicted graphically. Nodes correspond to r.v. s. Arcs to a r.v. come from the r.v. s upon which it is conditioned. P(Recur Size, T exture)p(size T exture)p(t exture):

Space savings? None yet... If we imagine the terms, P, are represented by tables, there is no advantage to rewriting. P(Recur, Size, T exture) }{{} 2 3 =8 cells = P(Recur Size, T exture) }{{} 2 3 =8 cells P(Size T exture) }{{} 2 2 =4 cells P(Texture) }{{} 2 cells

Conditional independence a generalization of independence Let X, Y, and Z each represent one or more r.v. s X is conditionally independent of Y given Z if P(X Y,Z) = P(X Z) That is, if we know Z, knowing Y too doesn t help us predict X any better. Examples: Is there conditional independence or not? X = die roll, Y = roll is even, Z = roll is odd. X = die roll, Y = roll is even, Z = roll is prime. X = coin flip, Y = another coin flip, Z = X and Y match. Independence of X and Y is conditional independence with Z =.

Taking advantage of conditional independencies If there are conditional independencies between r.v. s, and if we factor the joint correctly, we can represent the joint more compactly. Example, if W, X, Y, and Z are independent binary r.v.s: P(W, X,Y, Z) }{{} 16 cells = P(W X, Y,Z)P(X Y,Z)P(Y Z)P(Z) = P(W) }{{} 2 cells P(X) }{{} 2 cells P(Y ) }{{} 2 cells P(Z) }{{} 2 cells For m independent binary random variables, the space requirement goes from 2 m to 2m.

Example: Markov chains Let X t denote the state of a dynamical system at time t. The Markov assumption: X t+1 is conditionally independent of X 0,...,X t 1 given X t. P(X 1, X 2,..., X T ) = P(X 1 )P(X 2 X 1 ) P(X T X T 1 )

Example: Hidden markov model Let X t denote the unobserved state of a dynamical system at time t. Let O t denote an observation made at time t. P(X 1, O 1,..., X T, O T ) = P(X 1 )P(O 1 X 1 )P(X 2 X 1 )P(O 2 X 2 ) P(X T X T 1 )P(O T X T )

Example: Prediction problems Let X 1,..., X m represent input features and Y the r.v. to be predicted. We might assume the X i are independent, but Y depends on them.

What is a Bayes net? It is a representation for a joint probability in terms of conditional probabilities. P(W, X, Y,Z) = P(W)P(Y W)P(X W)P(Z Y,X) The corresponding graphical structure is a directed, acyclic graph.

What is a Bayes net (2)? For discrete r.v. s, conditional probabilities can be represented as tables (CPTs) or in more compact forms, such as trees. Continuous r.v. s can be included too, with conditional probabilities represented, e.g., parametrically.

What do we do with Bayes nets? Mainly, we compute conditional probabilities after observing some data. Diagnosis: What s the probability of cancer, given symptoms x, y, z? Prediction/causal reasoning: What s the probability of recurrence, given measurements x, y, z? What s the probability of a UFO sighting being a true alien spaceship?

Advantages of Bayes nets Represent the joint compactly. Marginal and conditional probabilities can be computed more efficiently than by naive summing-out over the joint. (at least if all r.v. s are discrete) Provides a visual representation for the relationships between variables. Generalizes the prediction problem, allowing other forms of reasoning.