Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu
opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution ommi Jaakkola, MI CAI 2
What is a good representation? Properties of good representations 1. Explicit 2. Modular 3. Permits efficient computation 4. etc. ommi Jaakkola, MI CAI 3
Representation: explicit Representation in terms of variables and dependencies (a graphical model): s 1 s 2 s 3 s 4 Representation in terms of state transitions (transition diagram) P (s 2 s 1 ) P (s 3 s 2 ) P (s 1 )......... s 1 s 2 s 3 ommi Jaakkola, MI CAI 4
Representation: modular We can easily add/remove components of the model Markov model s 1 s 2 s 3 s 4 Hidden Markov model s 1 s 2 s 3 s 4 x 1 x 2 x 3 x 4 ommi Jaakkola, MI CAI 5
Representation: efficient computation s 2 s 1 2 s 3 1 x 1 x 2 Posterior marginals (forward-backward) Max-probabilities (viterbi) x 3 ommi Jaakkola, MI CAI 6
Graphical models: examples Factorial Hidden Markov model as a Bayesian network (directed graphical model)... linguistic features...... acoustic observations ommi Jaakkola, MI CAI 7
Graphical models: examples Plates and repeated sampling topics his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled words M class his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled each document has words, sampled from a distribution that depends on the choice of topics the topics for each document are sampled from a class conditional distribution ommi Jaakkola, MI CAI 8
Graphical models: examples attice models (e.g., Ising model) as a Markov random field s 1 s 2...... symmetric interactions (e.g., alignment of two nearby spins is energetically favorable)... ommi Jaakkola, MI CAI 9
Graphical models: examples Factor graphs and codes (information theory) y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits... parity checks circles denote variables while the squares are factors (functions) that constrain the values of the variables... ommi Jaakkola, MI CAI 10
linguistic features s 1 s 2 Graphical models acoustic observations............... M topics words class y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits...... parity checks... Graph semantics: graph separation properties independence Association with probability distributions: independence family of distributions Inference and estimation: graph structure efficient computation ommi Jaakkola, MI CAI 11
Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) "child of i" ommi Jaakkola, MI CAI 12
Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) Graph semantics: "child of i" graph separation properties independence Association with probability distributions: independence family of distributions ommi Jaakkola, MI CAI 13
Example A simple Bayesian network: coin tosses x 1 x 2 ommi Jaakkola, MI CAI 14
Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 ommi Jaakkola, MI CAI 15
Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 x 3 = same? ommi Jaakkola, MI CAI 16
Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 P (x 3 x 1, x 2 ) : x 3 = same? hh ht th tt y 1.0 0.0 0.0 1.0 n 0.0 1.0 1.0 0.0 ommi Jaakkola, MI CAI 17
Example A simple Bayesian network: coin tosses P (x 1 ) : 0.5 0.5 x 1 x 2 P (x2 ) : 0.5 0.5 x 3 = same? hh ht th tt y 1.0 0.0 0.0 1.0 P (x 3 x 1, x 2 ) : n 0.0 1.0 1.0 0.0 wo levels of description 1. graph structure (dependencies, independencies) 2. associated probability distribution ommi Jaakkola, MI CAI 18
Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? ommi Jaakkola, MI CAI 19
Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent ommi Jaakkola, MI CAI 20
Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent x 1 x 2 x 3 = same? x 1 and x 2 become dependent if we know x 3 (the dependence concerns our beliefs about the outcomes) ommi Jaakkola, MI CAI 21
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 22
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 23
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 24
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 25
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 26
raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C If we only know that X decided to stop, can X s character (variable ) tell us anything about the other car turning (variable )? ommi Jaakkola, MI CAI 27
Graph, independence, d-separation Are and independent given? C ommi Jaakkola, MI CAI 28
Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C ommi Jaakkola, MI CAI 29
Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ommi Jaakkola, MI CAI 30
Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral ommi Jaakkola, MI CAI 31
Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 32
Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 33
Graphs and distributions A graph is a compact representation of a large collection of independence properties C ommi Jaakkola, MI CAI 34
Graphs and distributions A graph is a compact representation of a large collection of independence properties heorem: Any probability distribution that is consistent with a directed graph G has to factor according to node given parents : d P (x G) = P (x i x pai ) i=1 where x pai are the parents of x i and d is the number of nodes (variables) in the graph. C ommi Jaakkola, MI CAI 35
Model Explaining away phenomenon Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 36
Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 37
Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm Additional evidence and explaining away Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 38