Probability and Structure in Natural Language Processing

Size: px
Start display at page:

Download "Probability and Structure in Natural Language Processing"

Transcription

1 Probability and Structure in Natural Language Processing Noah Smith Heidelberg University, November 2014

2

3 methods in NLP arrived ~20 years ago and now dominate. Mercer was right: There's no data like more data. And there's more and more data. Lots of new and new techniques it's formidable to learn and keep up with all of them.

4 Thesis Most of the main ideas are related and similar to each other. Different approaches to decoding. Different learning criteria. Supervised and unsupervised learning. Umbrella: reasoning about discrete structures. This is good news!

5 Plan 1. Graphical models and inference Monday 2. Decoding and structures Tuesday 3. Supervised learning Wednesday 4. Hidden variables Thursday

6 The content is formal, but the style doesn't need to be. Ask Help me find the right pace. Lecture 4 can be dropped/reduced if needed.

7 Lecture 1: Graphical Models and Inference

8 Random Variables Probability usually defined by events Events are complicated! We tend to group events by a(ributes Person Age, Grade, HairColor Random variables formalize a_ributes: Grade = A is shorthand for event { : f Grade ( )=A} Proper@es of random variable X: Val(X) = possible values of X For discrete (categorical): For con@nuous: Nonnega@vity: x P (X = x)dx =1 Val(X) P (X = x) =1 x Val(X),P(X = x) 0

9 Aeer learning that α is true, how do we feel about β? P(β α) Ω α α β β

10 Chain Rule P ( ) =P ( )P ( ) P ( 1 k) =P ( 1 )P ( 2 1) P ( k 1... k 1)

11 Bayes Rule likelihood prior posterior constant γ is an external event

12 Independence α and β are independent if P(β α) = P(β) P (α β) Proposi4on: α and β are independent if and only if P(α β) = P(α) P(β)

13 Condi4onal Independence Independence is rarely true. α and β are condi4onally independent given γ if P(β α γ) = P(β γ) P (α β γ) Proposi4on: P (α β γ) if and only if P(α β γ) =P (α γ) P(β γ)

14 Joint and P (Grade, Intelligence) = Intelligence = very high Intelligence = high Grade = A Compute the marginal over each individual random variable? Grade = B

15 General Case p(x 1 = x) = X X P (X 1 = x, X 2 = x 2,...,X n = x n ) x 2 2Val(X 2 ) x n 2Val(X n ) How many terms?

16 Basic Concepts So Far Atomic outcomes: assignment of x 1,,x n to X 1,, X n Condi@onal probability: P(X, Y) = P(X) P(Y X) Bayes rule: P(X Y) = P(Y X) P(X) / P(Y) Chain rule: P(X 1,,X n ) = P(X 1 ) P(X 2 X 1 ) P(X k X 1,,X k- 1 ) Marginals: deriving P(X = x) from P(X, Y)

17 Sets of Variables Sets of variables X, Y, Z X is independent of Y given Z if P (X=x Y=y Z=z), x Val(X), y Val(Y), z Val(Z) Shorthand: Condi@onal independence: P (X Y Z) For P (X Y ), write P (X Y) Proposi4on: P sa@sfies (X Y Z) if and only if P(X,Y Z) = P(X Z) P(Y Z)

18 Factor Graphs

19 Factor Graphs Random variable nodes (circles) Factor nodes (squares) Edge between variable and factor if the factor depends on that variable. The graph is A factor is a func@on from tuples of r.v. values to nonnega@ve numbers. X 1 X 2 X 3 φ 1 φ 2 φ 3 φ 4 P (X = x) / Y j j(x j )

20 Two Kinds of Factors probability tables E.g., P(X 2 X 1, X 3 ) Leads to Bayesian networks, causal explana@ons Poten@al func@ons Arbitrary posi@ve scores Leads to Markov networks X 1 X 2 X 3 φ 1 φ 2 φ 3 φ 4

21 Example: Bayesian Network The flu causes sinus Flu All. Allergies also cause sinus Sinus causes a runny nose R.N. S.I. H Sinus inflamma@on causes headaches

22 ϕ F ϕ A The flu causes sinus inflamma@on Allergies also cause sinus inflamma@on Sinus inflamma@on causes a runny nose Sinus inflamma@on causes headaches Flu ϕ SR ϕ FAS S.I. All. ϕ SH R.N. H. Some local configura@ons are more likely than others.

23 F 0 1 ϕ F (F) F A S ϕ FAS (F,A,S) ϕ F Flu ϕ FAS ϕ A All. A ϕ A (A) ϕ SR S.I. ϕ SH R.N. H. S R ϕ SR (S, R) S H ϕ SH (S, H) Some local configura@ons are more likely than others.

24 Example: Markov Network Swinging couples or confused students A C B, D B D A, C B D A C A B D C

25 Example: Markov Network Each random variable is a vertex. Undirected edges. Factors are associated with subsets of nodes that form cliques. A factor maps assignments of its nodes to nonnega@ve values. B A C D

26 In this example, associate a factor with each edge. Could also have factors for single nodes! A B ϕ 1 (A, B) B C ϕ 2 (B, C) B A D ϕ 4 (A, D) A D C C D ϕ 3 (C, D)

27 Markov Networks Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C

28 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C

29 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C P(0, 1, 1, 0) = 5,000,000 / Z = 0.69

30 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C P(1, 1, 0, 0) = 10 / Z =

31 Independence and Structure There's a lot of theory about how BNs and MNs encode condi@onal independence assump@ons. BNs: A variable X is independent of its non- descendants given its parents. MNs: Condi@onal independence derived from Markov blanket and separa,on proper@es. Local configura@ons can be used to check all condi@onal independence ques@ons; almost no need to look at the values in the factors!

32 Independence Spectrum various graphs Y i i(x i ) full independence assump@ons (x) everything is dependent

33 Products of Factors Given two factors with different scopes, we can calculate a new factor equal to their products. product(x y) = 1(x) 2(y)

34 Products of Factors Given two factors with different scopes, we can calculate a new factor equal to their products. A B ϕ 1 (A, B) B C ϕ 2 (B, C) = A B C ϕ 3 (A, B, C)

35 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via maximiza@on: (X) = max Y (X,Y) We can refer to this new factor by max Y ϕ.

36 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via maximiza@on: (X) = max A B C ϕ (A, B, C) maximizing out B Y (X,Y) A C ψ(a, C) B= B= B= B=0

37 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = y Val(Y ) (X,y)

38 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = A B C ϕ (A, B, C) y Val(Y ) summing out B (X,y) A C ψ(a, C)

39 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = A B C ϕ (A, B, C) y Val(Y ) summing out C (X,y) A B ψ(a, B)

40 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = y Val(Y ) (X,y) We can refer to this new factor by Y ϕ.

41 Marginalizing Everything? Take a factor graph s everything factor by mul@plying all of its factors. Sum out all the variables (one by one). What do you get?

42 Factors Are Like Numbers Products are ϕ 1 ϕ 2 = ϕ 2 ϕ 1 Products are associa@ve: (ϕ 1 ϕ 2 ) ϕ 3 = ϕ 1 (ϕ 2 ϕ 3 ) Sums are commuta@ve: X Y ϕ = Y X ϕ (max, too). Distribu@vity of mul@plica@on over marginaliza@on and maximiza@on: X Scope( 1 ) X ( 1 2) = 1 X max ( 1 2) = 1 max X X 2 2

43 Inference

44 Querying the Model ϕ F ϕ A Inference (e.g., do you have allergies or the flu?) Flu ϕ FAS S.I. All. What's the best explana@on for your symptoms? ϕ SR ϕ SH R.N. H. Ac@ve data collec@on (what is the next best r.v. to observe?)

45 A Bigger Example: Your Car The car doesn't start. What do we conclude about the ba_ery age? 18 random variables 2 18 possible scenarios

46 Inference: An Ubiquitous Obstacle Decoding is inference (lecture 2). Learning is inference (lectures 3 and 4). Exact inference is #P- complete. Even approxima@ons within a given absolute or rela@ve error are hard.

47 Inference Problems Given values for some random variables (X V) Most Probable what are the most probable values of the rest of the r.v.s V \ X? (More generally ) Maximum A Posteriori (MAP): what are the most probable values of some other r.v.s, Y (V \ X)? Random sampling from the posterior over values of Y Full posterior over values of Y Marginal probabili@es from the posterior over Y Minimum Bayes risk: What is the Y with the lowest expected cost? Cost- augmented decoding: What is the most dangerous Y?

48 Approaches to Inference inference exact approximate variable ILP randomized dynamic program'ng MCMC importance sampling randomized search loopy belief LP local search Gibbs simulated annealing mean field dual decomp. beam search today tomorrow hard inference methods; soe inference methods; methods for both

49 Exact Marginal for Y This will be a generaliza@on of algorithms you may already have seen: the forward and backward algorithms. The general name is variable elimina@on. Aeer we see it for the marginal, we'll see how to use it for the MAP.

50 Inference Example Goal: P(D) A B P(B A) = ϕ AB (A, B) C D P(D C) = ϕ CD (C, D) A B C D A P(A) = ϕ A (A) 0 1 B C P(C B) = ϕ BC (B, C)

51 Inference Example Let s calculate P(B) first. A B C D

52 Inference Example Let s calculate P(B) first. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) 0 1 P (B) = a Val(A) P (A = a)p (B A = a) C D

53 Inference Example Let s calculate P(B) first. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) 0 1 P (B) = a Val(A) P (A = a)p (B A = a) C Note: C and D don t ma_er. D

54 Inference Example Let s calculate P(B) first. A B P (B) = P (A = a)p (B A = a) C a Val(A) B P(B) = ϕ B (B) 0 1 A P(A) = ϕ A (A) 0 1 A B P(B A) = ϕ AB (A, B) D

55 Inference Example New model in which A is eliminated; defines P(B, C, D) C D P(D C) = ϕ CD (C, D) B P(B) = ϕ B (B) 0 1 B C D B C P(C B) = ϕ BC (B, C)

56 Inference Example Same thing to eliminate B. B P (C) = b Val(B) P (B = b)p (C B = b) C C P(C) = ϕ C (C) 0 1 B P(B) = ϕ B (B) 0 1 B C P(C B) = ϕ BC (B, C) D

57 Inference Example New model in which B is eliminated; defines P(C, D) C P(C) = ϕ C (C) 0 1 C C D P(D C) = ϕ CD (C, D) D

58 Simple Inference Example Last step to get P(D): P (D) = D P(D) = ϕ D (D) 0 1 c Val(C) P (C = c)p (D C = c) C P(C) = ϕ C (C) 0 1 C D P(D C) = ϕ CD (C, D) D

59 Simple Inference Example that the same step happened for each random variable: We created a new factor over the variable and its successor We summed out (marginalized) the variable. P (D) = P (A = a)p (B = b A = a)p (C = c B = b)p (D C = c) a Val(A) b Val(B) c Val(C) = P (D C = c) P (C = c B = b) P (A = a)p (B = b A = a) c Val(C) b Val(B) a Val(A)

60 That Was Variable We reused from previous steps and avoided doing the same work more than once. Dynamic programming à la forward algorithm! We exploited the graph structure (each subexpression only depends on a small number of variables). Exponen@al blowup avoided!

61 What Remains Variable in general The version (for MAP inference) A bit about approximate inference

62 One Variable Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ' = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let ψ be Z ϕ Φ' ϕ 4. Return Ψ {ψ}

63 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let's eliminate H. ϕ SR ϕ SH R.N. H.

64 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ Φ' ϕ 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H.

65 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H.

66 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H. S H ϕ SH (S, H) S ψ(s)

67 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} ϕ SR R.N. S H ϕ SH (S, H) ψ S ψ(s)

68 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. We can actually ignore the new factor, equivalently just dele@ng H! Why? In some cases elimina@ng a variable is really easy! ϕ SR R.N. S.I. S ψ(s)

69 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. H is already eliminated. ϕ SR Let's now eliminate S. R.N.

70 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ Φ' ϕ 4. Return Ψ {ψ FAR } ϕ SR R.N.

71 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ SR ϕ FAS 4. Return Ψ {ψ FAR } ϕ SR R.N.

72 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ SR ϕ FAS 4. Return Ψ {ψ FAR } R.N.

73 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Finally, eliminate A. R.N.

74 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Elimina@ng A. 1. Φ' = {ϕ A, ϕ FAR } 2. Ψ = {ϕ F } 3. ψ FR = A ϕ A ψ FAR 4. Return Ψ {ψ FR } R.N.

75 Query: P(Flu runny nose) A. 1. Φ' = {ϕ A, ϕ FAR } 2. Ψ = {ϕ F } 3. ψ FR = A ϕ A ψ FAR 4. Return Ψ {ψ FR } Example ϕ F Flu ψ FR R.N.

76 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C B C B C P(C B) = ϕ BC (B, C) C D P(D C) = ϕ CD (C, D) D

77 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C. Let s start with C C D P(D C) = ϕ CD (C, D) 0 0 B C B C P(C B) = ϕ BC (B, C) D

78 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C. Let s start with C C D P(D C) = ϕ CD (C, D) 0 0 B C B C P(C B) = ϕ BC (B, C) D

79 Chain, Again A Elimina@ng C. B B C P(C B) = ϕ BC (B, C) C D P(D C) = ϕ CD (C, D) B C D ϕ BCD (B, C, D) C D

80 Chain, Again A Elimina@ng C. B B C D ϕ BCD (B, C, D) B D ψ(b, D) C D

81 Chain, Again B will be similarly complex. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) B D ψ(b, D) D

82 Variable Comments Can prune away all non- ancestors of the query variables. Ordering makes a difference!

83 What about Evidence? So far, we've just considered the posterior/ marginal P(Y). Next: P(Y X = x). It's almost the same: the addi@onal step is to reduce factors to respect the evidence.

84 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Let's reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. S R ϕ SR (S, R) S R ϕ' S (S) S R ϕ' S (S)

85 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Let's reduce to R = true (runny nose). ϕ' S ϕ SH H. S R ϕ' S (S)

86 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Now run variable elimina@on all the way down to one factor (for F). ϕ' S ϕ SH H. Eliminate H.

87 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Now run variable elimina@on all the way down to one factor (for F). ϕ' S Eliminate S.

88 Example ϕ F ϕ A Query: P(Flu runny nose) Now run variable elimina@on all the way down to one factor (for F). Flu ψ FA All. Eliminate A.

89 Example ϕ F Query: P(Flu runny nose) Flu ψ F Now run variable elimina@on all the way down to one factor (for F). Take final product.

90 Example Query: P(Flu runny nose) ϕ F ψ F Now run variable elimina@on all the way down to one factor.

91 Comments depends on the size of the intermediate factors. Hence, variable ordering ma_ers a lot. But it's NP- hard to find the best one. For MNs, chordal graphs permit inference linear in the size of the original factors. For BNs, polytree structures do the same. If you can avoid big intermediate factors, you can make inference linear in the size of the original factors.

92 Variable for P(Y X = x) Input: Graphical model on V, set of query variables Y, evidence X = x Output: factor ϕ and scalar α 1. Φ = factors in the model 2. Reduce factors in Φ by X = x 3. Choose variable ordering π on Z = V \ Y \ X 4. ϕ = Variable- Elimina@on(Φ, Z, π) 5. α = z Val(Z) ϕ(z) 6. Return ϕ, α

93 Ge ng Back to NLP structured NLP models were chosen for these HMMs, PCFGs (with a li_le work) But not: IBM model 3 To decode, we need MAP inference for decoding! When models get complicated, need approxima@ons!

94 From Marginals to MAP Replace factor steps with maximiza,on. Add bookkeeping to keep track of the maximizing values. Add a traceback at the end to recover the solu@on. This is analogous to the connec@on between the forward algorithm and the Viterbi algorithm. Ordering challenge is the same.

95 Variable (Max- Product Version with Decoding) Input: Set of factors Φ, ordered list of variables Z to eliminate Output: new factor 1. For each Z i Z (in order): Let (Φ, ψ Z i) = Eliminate- One(Φ, Z i ) 2. Return ϕ Φ ϕ, Traceback({ψ Z i})

96 One Variable (Max- Product Version with Bookkeeping) Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ' = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let τ be max Z ϕ Φ' ϕ Let ψ be ϕ Φ' ϕ (bookkeeping) 4. Return Ψ {τ}, ψ

97 Traceback Input: Sequence of factors with associated variables: (ψ Z 1,, ψ Z k) Output: z * Each ψ Z is a factor with scope including Z and variables eliminated a<er Z. Work backwards from i = k to 1: Let z i = arg max z ψ Z i(z, z i+1, z i+2,, z k ) Return z

98 About the Traceback No extra expense. Linear traversal over the intermediate factors. The factor for both sum- product VE and max- product VE can be generalized. Example: get the K most likely assignments

99 Variable Tips Any ordering will be correct. Most orderings will be too expensive. There are for choosing an ordering. If the graph is chain- like, work from one end toward the other.

100 (Rocket Science: True MAP) Evidence: X = x Query: Y Other variables: Z = V \ X \ Y y = arg max P (Y = y X = x) y Val(Y ) = arg max y Val(Y ) z Val(Z) P (Y = y, Z = z X = x) First, marginalize out Z, then do MAP inference over Y given X = x This is not usually a_empted in NLP, with some excep@ons.

101 Shots You will probably never implement the general variable algorithm. You will rarely use exact inference. Understand the inference problem would look like in exact form; then approximate. you get lucky. You ll appreciate be_er as they come along.

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Last Time Probabilis:c inference is

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks

More information

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.]

More information

Bayesian Networks Representation

Bayesian Networks Representation Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r

More information

Bayesian networks Lecture 18. David Sontag New York University

Bayesian networks Lecture 18. David Sontag New York University Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Variable Elimination (VE) Barak Sternberg

Variable Elimination (VE) Barak Sternberg Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important Bayesian Networks CompSci 270 Duke University Ron Parr Why Joint Distributions are Important Joint distributions gives P(X 1 X n ) Classification/Diagnosis Suppose X1=disease X2 Xn = symptoms Co-occurrence

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

More belief propaga+on (sum- product)

More belief propaga+on (sum- product) Notes for Sec+on 5 Today More mo+va+on for graphical models A review of belief propaga+on Special- case: forward- backward algorithm From variable elimina+on to junc+on tree (mainly just intui+on) More

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22. Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 15: Bayes Nets 3 Midterms graded Assignment 2 graded Announcements Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Bayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University

Bayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University Bayesian Networks CompSci 570 Ron Parr Department of Computer Science Duke University Why Joint Distributions are Important Joint distribu6ons gives P(X 1 X n ) Classifica6on/Diagnosis Suppose X 1 =disease

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Where are we? è Five major sec3ons of this course

Where are we? è Five major sec3ons of this course UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 12: Probability and Sta3s3cs Review Yanjun Qi / Jane University of Virginia Department of Computer Science 10/02/14 1

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Sta$s$cal sequence recogni$on

Sta$s$cal sequence recogni$on Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear

More information

BN Semantics 3 Now it s personal! Parameter Learning 1

BN Semantics 3 Now it s personal! Parameter Learning 1 Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum Graphical Models Lecture 12: Belief Update Message Passing Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for slide materials. 1 Today s Plan Quick Review: Sum Product Message

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models Machine Learning & Data Mining CS/CNS/EE 155 Lecture 11: Hidden Markov Models 1 Kaggle Compe==on Part 1 2 Kaggle Compe==on Part 2 3 Announcements Updated Kaggle Report Due Date: 9pm on Monday Feb 13 th

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole University of British Columbia 1 Overview Belief Networks Variable Elimination Algorithm Parent Contexts

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech Probabilistic Graphical Models and Bayesian Networks Artificial Intelligence Bert Huang Virginia Tech Concept Map for Segment Probabilistic Graphical Models Probabilistic Time Series Models Particle Filters

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders

More information

BN Semantics 3 Now it s personal!

BN Semantics 3 Now it s personal! Readings: K&F: 3.3, 3.4 BN Semantics 3 Now it s personal! Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2008 10-708 Carlos Guestrin 2006-2008 1 Independencies encoded

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Hidden Markov Models. George Konidaris

Hidden Markov Models. George Konidaris Hidden Markov Models George Konidaris gdk@cs.brown.edu Fall 2018 Recall: Bayesian Network Flu Allergy Sinus Nose Headache Recall: BN Flu Allergy Flu P True 0.6 False 0.4 Sinus Allergy P True 0.2 False

More information

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011 Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

CSE 473: Ar+ficial Intelligence

CSE 473: Ar+ficial Intelligence CSE 473: Ar+ficial Intelligence Hidden Markov Models Luke Ze@lemoyer - University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum

Graphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum Graphical Models Lecture 4: Undirected Graphical Models Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Administrivia HW#1: Source code was due

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Hidden Markov Models 1 x = Fish Sleep y = (N, V) Sequence Predic=on (POS Tagging) x = The Dog Ate My Homework y = (D, N, V, D, N) x = The Fox Jumped

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Dynamic models 1 Kalman filters, linearization,

Dynamic models 1 Kalman filters, linearization, Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2 CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 8.1 and 8.2 Status Check Projects Project 2 Midterm is coming, please do your homework!

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863 Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863 Daniel, Edmundo, Rosa Terceiro trimestre de 2012 UFRJ - COPPE Programa de Engenharia de Sistemas e Computação Bayesian Networks

More information

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 Probabilistic Graphical Models Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 The Big Objective(s) In a wide variety of application fields two main problems need to be addressed over and over:

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information