Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Similar documents
Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Directed Graphical Models

Introduction to Bayesian Networks

2 : Directed GMs: Bayesian Networks

6.867 Machine learning, lecture 23 (Jaakkola)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Bayesian Learning

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Introduction to Artificial Intelligence. Unit # 11

2 : Directed GMs: Bayesian Networks

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Probabilistic Reasoning Systems

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Latent Dirichlet Allocation Introduction/Overview

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

An Introduction to Bioinformatics Algorithms Hidden Markov Models

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

CS Lecture 4. Markov Random Fields

Probabilistic Graphical Models

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Machine Learning for Data Science (CS4786) Lecture 19

Directed Graphical Models or Bayesian Networks

HIDDEN MARKOV MODELS

Directed and Undirected Graphical Models

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Probabilistic Models

Review: Bayesian learning and inference

Chapter 16. Structured Probabilistic Models for Deep Learning

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Introduction to Artificial Intelligence Belief networks

3 : Representation of Undirected GM

Lecture 5: Bayesian Network

Bayesian Learning. Bayesian Learning Criteria

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

p L yi z n m x N n xi

CS 188: Artificial Intelligence Fall 2009

OPPA European Social Fund Prague & EU: We invest in your future.

Lecture 15. Probabilistic Models on Graph

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian networks Lecture 18. David Sontag New York University

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

L23: hidden Markov models

Learning Bayesian Networks (part 1) Goals for the lecture

STA 4273H: Statistical Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Chris Bishop s PRML Ch. 8: Graphical Models

Hidden Markov Models in Language Processing

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian Networks. Material used

CS 188: Artificial Intelligence Fall 2008

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

A Tutorial on Learning with Bayesian Networks

Linear Dynamical Systems

Another look at Bayesian. inference

Advanced Machine Learning

Probabilistic Classification

Reasoning Under Uncertainty: Belief Network Inference

Naïve Bayes classification

CSE 473: Artificial Intelligence Autumn 2011

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Approximate Inference

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probability Theory and Simulation Methods

Bayesian networks. Chapter Chapter

CS Lecture 3. More Bayesian Networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Sampling Methods (11/30/04)

Bayesian Networks. Motivation

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Review: Directed Models (Bayes Nets)

ECE521 Lecture 19 HMM cont. Inference in HMM

CSE 473: Artificial Intelligence Autumn Topics

Cognitive Systems 300: Probability and Causality (cont.)

Basic math for biology

Bayesian Networks. Alan Ri2er

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Qualifier: CS 6375 Machine Learning Spring 2015

Graphical Models - Part I

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Diversity-Promoting Bayesian Learning of Latent Variable Models

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Transcription:

Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu

opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution ommi Jaakkola, MI CAI 2

What is a good representation? Properties of good representations 1. Explicit 2. Modular 3. Permits efficient computation 4. etc. ommi Jaakkola, MI CAI 3

Representation: explicit Representation in terms of variables and dependencies (a graphical model): s 1 s 2 s 3 s 4 Representation in terms of state transitions (transition diagram) P (s 2 s 1 ) P (s 3 s 2 ) P (s 1 )......... s 1 s 2 s 3 ommi Jaakkola, MI CAI 4

Representation: modular We can easily add/remove components of the model Markov model s 1 s 2 s 3 s 4 Hidden Markov model s 1 s 2 s 3 s 4 x 1 x 2 x 3 x 4 ommi Jaakkola, MI CAI 5

Representation: efficient computation s 2 s 1 2 s 3 1 x 1 x 2 Posterior marginals (forward-backward) Max-probabilities (viterbi) x 3 ommi Jaakkola, MI CAI 6

Graphical models: examples Factorial Hidden Markov model as a Bayesian network (directed graphical model)... linguistic features...... acoustic observations ommi Jaakkola, MI CAI 7

Graphical models: examples Plates and repeated sampling topics his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled words M class his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled each document has words, sampled from a distribution that depends on the choice of topics the topics for each document are sampled from a class conditional distribution ommi Jaakkola, MI CAI 8

Graphical models: examples attice models (e.g., Ising model) as a Markov random field s 1 s 2...... symmetric interactions (e.g., alignment of two nearby spins is energetically favorable)... ommi Jaakkola, MI CAI 9

Graphical models: examples Factor graphs and codes (information theory) y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits... parity checks circles denote variables while the squares are factors (functions) that constrain the values of the variables... ommi Jaakkola, MI CAI 10

linguistic features s 1 s 2 Graphical models acoustic observations............... M topics words class y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits...... parity checks... Graph semantics: graph separation properties independence Association with probability distributions: independence family of distributions Inference and estimation: graph structure efficient computation ommi Jaakkola, MI CAI 11

Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) "child of i" ommi Jaakkola, MI CAI 12

Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) Graph semantics: "child of i" graph separation properties independence Association with probability distributions: independence family of distributions ommi Jaakkola, MI CAI 13

Example A simple Bayesian network: coin tosses x 1 x 2 ommi Jaakkola, MI CAI 14

Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 ommi Jaakkola, MI CAI 15

Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 x 3 = same? ommi Jaakkola, MI CAI 16

Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 P (x 3 x 1, x 2 ) : x 3 = same? hh ht th tt y 1.0 0.0 0.0 1.0 n 0.0 1.0 1.0 0.0 ommi Jaakkola, MI CAI 17

Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 x 3 = same? hh ht th tt y 1.0 0.0 0.0 1.0 P (x 3 x 1, x 2 ) : n 0.0 1.0 1.0 0.0 wo levels of description 1. graph structure (dependencies, independencies) 2. associated probability distribution ommi Jaakkola, MI CAI 18

Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? ommi Jaakkola, MI CAI 19

Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent ommi Jaakkola, MI CAI 20

Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent x 1 x 2 x 3 = same? x 1 and x 2 become dependent if we know x 3 (the dependence concerns our beliefs about the outcomes) ommi Jaakkola, MI CAI 21

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 22

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 23

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 24

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 25

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 26

raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C If we only know that X decided to stop, can X s character (variable ) tell us anything about the other car turning (variable )? ommi Jaakkola, MI CAI 27

Graph, independence, d-separation Are and independent given? C ommi Jaakkola, MI CAI 28

Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C ommi Jaakkola, MI CAI 29

Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ommi Jaakkola, MI CAI 30

Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral ommi Jaakkola, MI CAI 31

Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 32

Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 33

Graphs and distributions A graph is a compact representation of a large collection of independence properties C ommi Jaakkola, MI CAI 34

Graphs and distributions A graph is a compact representation of a large collection of independence properties heorem: Any probability distribution that is consistent with a directed graph G has to factor according to node given parents : d P (x G) = P (x i x pai ) i=1 where x pai are the parents of x i and d is the number of nodes (variables) in the graph. C ommi Jaakkola, MI CAI 35

Model Explaining away phenomenon Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 36

Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 37

Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm Additional evidence and explaining away Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 38