Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Size: px

Start display at page:

Download "Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL"

Bernard O’Neal’
6 years ago
Views:

1 Machine learning: lecture 20 ommi. Jaakkola MI CAI

2 opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution ommi Jaakkola, MI CAI 2

3 What is a good representation? Properties of good representations 1. Explicit 2. Modular 3. Permits efficient computation 4. etc. ommi Jaakkola, MI CAI 3

4 Representation: explicit Representation in terms of variables and dependencies (a graphical model): s 1 s 2 s 3 s 4 Representation in terms of state transitions (transition diagram) P (s 2 s 1 ) P (s 3 s 2 ) P (s 1 ) s 1 s 2 s 3 ommi Jaakkola, MI CAI 4

5 Representation: modular We can easily add/remove components of the model Markov model s 1 s 2 s 3 s 4 Hidden Markov model s 1 s 2 s 3 s 4 x 1 x 2 x 3 x 4 ommi Jaakkola, MI CAI 5

6 Representation: efficient computation s 2 s 1 2 s 3 1 x 1 x 2 Posterior marginals (forward-backward) Max-probabilities (viterbi) x 3 ommi Jaakkola, MI CAI 6

7 Graphical models: examples Factorial Hidden Markov model as a Bayesian network (directed graphical model)... linguistic features acoustic observations ommi Jaakkola, MI CAI 7

8 Graphical models: examples Plates and repeated sampling topics his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled words M class his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled his paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled each document has words, sampled from a distribution that depends on the choice of topics the topics for each document are sampled from a class conditional distribution ommi Jaakkola, MI CAI 8

9 Graphical models: examples attice models (e.g., Ising model) as a Markov random field s 1 s symmetric interactions (e.g., alignment of two nearby spins is energetically favorable)... ommi Jaakkola, MI CAI 9

10 Graphical models: examples Factor graphs and codes (information theory) y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits... parity checks circles denote variables while the squares are factors (functions) that constrain the values of the variables... ommi Jaakkola, MI CAI 10

11 linguistic features s 1 s 2 Graphical models acoustic observations M topics words class y 1 y 2 y 3 y 4 y 5... x 1 x 2 x 3 x 4 x 5 Bits parity checks... Graph semantics: graph separation properties independence Association with probability distributions: independence family of distributions Inference and estimation: graph structure efficient computation ommi Jaakkola, MI CAI 11

12 Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) "child of i" ommi Jaakkola, MI CAI 12

13 Bayesian networks Bayesian networks are directed acyclic graphs, where the nodes represent variables and directed edges capture dependencies "parent of x" A mixture model as a Bayesian network "i influences x" "i causes x" "x depends on i" i x P (i)p (x i) Graph semantics: "child of i" graph separation properties independence Association with probability distributions: independence family of distributions ommi Jaakkola, MI CAI 13

14 Example A simple Bayesian network: coin tosses x 1 x 2 ommi Jaakkola, MI CAI 14

15 Example A simple Bayesian network: coin tosses P (x 1 ) : x 1 x 2 P (x2 ) : ommi Jaakkola, MI CAI 15

16 Example A simple Bayesian network: coin tosses P (x 1 ) : x 1 x 2 P (x2 ) : x 3 = same? ommi Jaakkola, MI CAI 16

17 Example A simple Bayesian network: coin tosses P (x 1 ) : x 1 x 2 P (x2 ) : P (x 3 x 1, x 2 ) : x 3 = same? hh ht th tt y n ommi Jaakkola, MI CAI 17

18 Example A simple Bayesian network: coin tosses P (x 1 ) : x 1 x 2 P (x2 ) : x 3 = same? hh ht th tt y P (x 3 x 1, x 2 ) : n wo levels of description 1. graph structure (dependencies, independencies) 2. associated probability distribution ommi Jaakkola, MI CAI 18

19 Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? ommi Jaakkola, MI CAI 19

20 Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent ommi Jaakkola, MI CAI 20

21 Example cont d What can the graph alone tell us? x 1 x 2 x 3 = same? x 1 and x 2 are marginally independent x 1 x 2 x 3 = same? x 1 and x 2 become dependent if we know x 3 (the dependence concerns our beliefs about the outcomes) ommi Jaakkola, MI CAI 21

22 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 22

23 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 23

24 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 24

25 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 25

26 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C ommi Jaakkola, MI CAI 26

27 raffic example = X is nice? = traffic light = X decides to stop? = the other car turns left? C = crash? C If we only know that X decided to stop, can X s character (variable ) tell us anything about the other car turning (variable )? ommi Jaakkola, MI CAI 27

28 Graph, independence, d-separation Are and independent given? C ommi Jaakkola, MI CAI 28

29 Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C ommi Jaakkola, MI CAI 29

30 Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ommi Jaakkola, MI CAI 30

31 Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral ommi Jaakkola, MI CAI 31

32 Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 32

33 Graph, independence, d-separation Are and independent given? Definition: Variables and are D-separated given if separates them in the moralized ancestral graph C C original ancestral moralized ancestral ommi Jaakkola, MI CAI 33

34 Graphs and distributions A graph is a compact representation of a large collection of independence properties C ommi Jaakkola, MI CAI 34

35 Graphs and distributions A graph is a compact representation of a large collection of independence properties heorem: Any probability distribution that is consistent with a directed graph G has to factor according to node given parents : d P (x G) = P (x i x pai ) i=1 where x pai are the parents of x i and d is the number of nodes (variables) in the graph. C ommi Jaakkola, MI CAI 35

36 Model Explaining away phenomenon Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 36

37 Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 37

38 Model Explaining away phenomenon Earthquake Burglary Radio report Alarm Evidence, competing causes Earthquake Burglary Radio report Alarm Additional evidence and explaining away Earthquake Burglary Radio report Alarm ommi Jaakkola, MI CAI 38

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL Machine learning: lecture 20 ommi. Jaakkola MI AI tommi@csail.mit.edu Bayesian networks examples, specification graphs and independence associated distribution Outline ommi Jaakkola, MI AI 2 Bayesian networks