Probability and Structure in Natural Language Processing
|
|
- Monica Janel Miles
- 6 years ago
- Views:
Transcription
1 Probability and Structure in Natural Language Processing Noah Smith Heidelberg University, November 2014
2
3 methods in NLP arrived ~20 years ago and now dominate. Mercer was right: There's no data like more data. And there's more and more data. Lots of new and new techniques it's formidable to learn and keep up with all of them.
4 Thesis Most of the main ideas are related and similar to each other. Different approaches to decoding. Different learning criteria. Supervised and unsupervised learning. Umbrella: reasoning about discrete structures. This is good news!
5 Plan 1. Graphical models and inference Monday 2. Decoding and structures Tuesday 3. Supervised learning Wednesday 4. Hidden variables Thursday
6 The content is formal, but the style doesn't need to be. Ask Help me find the right pace. Lecture 4 can be dropped/reduced if needed.
7 Lecture 1: Graphical Models and Inference
8 Random Variables Probability usually defined by events Events are complicated! We tend to group events by a(ributes Person Age, Grade, HairColor Random variables formalize a_ributes: Grade = A is shorthand for event { : f Grade ( )=A} Proper@es of random variable X: Val(X) = possible values of X For discrete (categorical): For con@nuous: Nonnega@vity: x P (X = x)dx =1 Val(X) P (X = x) =1 x Val(X),P(X = x) 0
9 Aeer learning that α is true, how do we feel about β? P(β α) Ω α α β β
10 Chain Rule P ( ) =P ( )P ( ) P ( 1 k) =P ( 1 )P ( 2 1) P ( k 1... k 1)
11 Bayes Rule likelihood prior posterior constant γ is an external event
12 Independence α and β are independent if P(β α) = P(β) P (α β) Proposi4on: α and β are independent if and only if P(α β) = P(α) P(β)
13 Condi4onal Independence Independence is rarely true. α and β are condi4onally independent given γ if P(β α γ) = P(β γ) P (α β γ) Proposi4on: P (α β γ) if and only if P(α β γ) =P (α γ) P(β γ)
14 Joint and P (Grade, Intelligence) = Intelligence = very high Intelligence = high Grade = A Compute the marginal over each individual random variable? Grade = B
15 General Case p(x 1 = x) = X X P (X 1 = x, X 2 = x 2,...,X n = x n ) x 2 2Val(X 2 ) x n 2Val(X n ) How many terms?
16 Basic Concepts So Far Atomic outcomes: assignment of x 1,,x n to X 1,, X n Condi@onal probability: P(X, Y) = P(X) P(Y X) Bayes rule: P(X Y) = P(Y X) P(X) / P(Y) Chain rule: P(X 1,,X n ) = P(X 1 ) P(X 2 X 1 ) P(X k X 1,,X k- 1 ) Marginals: deriving P(X = x) from P(X, Y)
17 Sets of Variables Sets of variables X, Y, Z X is independent of Y given Z if P (X=x Y=y Z=z), x Val(X), y Val(Y), z Val(Z) Shorthand: Condi@onal independence: P (X Y Z) For P (X Y ), write P (X Y) Proposi4on: P sa@sfies (X Y Z) if and only if P(X,Y Z) = P(X Z) P(Y Z)
18 Factor Graphs
19 Factor Graphs Random variable nodes (circles) Factor nodes (squares) Edge between variable and factor if the factor depends on that variable. The graph is A factor is a func@on from tuples of r.v. values to nonnega@ve numbers. X 1 X 2 X 3 φ 1 φ 2 φ 3 φ 4 P (X = x) / Y j j(x j )
20 Two Kinds of Factors probability tables E.g., P(X 2 X 1, X 3 ) Leads to Bayesian networks, causal explana@ons Poten@al func@ons Arbitrary posi@ve scores Leads to Markov networks X 1 X 2 X 3 φ 1 φ 2 φ 3 φ 4
21 Example: Bayesian Network The flu causes sinus Flu All. Allergies also cause sinus Sinus causes a runny nose R.N. S.I. H Sinus inflamma@on causes headaches
22 ϕ F ϕ A The flu causes sinus inflamma@on Allergies also cause sinus inflamma@on Sinus inflamma@on causes a runny nose Sinus inflamma@on causes headaches Flu ϕ SR ϕ FAS S.I. All. ϕ SH R.N. H. Some local configura@ons are more likely than others.
23 F 0 1 ϕ F (F) F A S ϕ FAS (F,A,S) ϕ F Flu ϕ FAS ϕ A All. A ϕ A (A) ϕ SR S.I. ϕ SH R.N. H. S R ϕ SR (S, R) S H ϕ SH (S, H) Some local configura@ons are more likely than others.
24 Example: Markov Network Swinging couples or confused students A C B, D B D A, C B D A C A B D C
25 Example: Markov Network Each random variable is a vertex. Undirected edges. Factors are associated with subsets of nodes that form cliques. A factor maps assignments of its nodes to nonnega@ve values. B A C D
26 In this example, associate a factor with each edge. Could also have factors for single nodes! A B ϕ 1 (A, B) B C ϕ 2 (B, C) B A D ϕ 4 (A, D) A D C C D ϕ 3 (C, D)
27 Markov Networks Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C
28 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C
29 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C P(0, 1, 1, 0) = 5,000,000 / Z = 0.69
30 Example: Markov Network Probability P (a, b, c, d) 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) P (a, b, c, d) = 1(a, b) 2 (b, c) 3 (c, d) 4 (a, d) 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) Z = a,b,c,d 1(a,b) 2 (b,c) 3 (c,d) 4 (a,d) A a,b,c,d = 7,201,840 B D A B ϕ 1 (A, B) B C ϕ 2 (B, C) C D ϕ 3 (C, D) A D ϕ 4 (A, D) C P(1, 1, 0, 0) = 10 / Z =
31 Independence and Structure There's a lot of theory about how BNs and MNs encode condi@onal independence assump@ons. BNs: A variable X is independent of its non- descendants given its parents. MNs: Condi@onal independence derived from Markov blanket and separa,on proper@es. Local configura@ons can be used to check all condi@onal independence ques@ons; almost no need to look at the values in the factors!
32 Independence Spectrum various graphs Y i i(x i ) full independence assump@ons (x) everything is dependent
33 Products of Factors Given two factors with different scopes, we can calculate a new factor equal to their products. product(x y) = 1(x) 2(y)
34 Products of Factors Given two factors with different scopes, we can calculate a new factor equal to their products. A B ϕ 1 (A, B) B C ϕ 2 (B, C) = A B C ϕ 3 (A, B, C)
35 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via maximiza@on: (X) = max Y (X,Y) We can refer to this new factor by max Y ϕ.
36 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via maximiza@on: (X) = max A B C ϕ (A, B, C) maximizing out B Y (X,Y) A C ψ(a, C) B= B= B= B=0
37 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = y Val(Y ) (X,y)
38 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = A B C ϕ (A, B, C) y Val(Y ) summing out B (X,y) A C ψ(a, C)
39 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = A B C ϕ (A, B, C) y Val(Y ) summing out C (X,y) A B ψ(a, B)
40 Factor Given X and Y (Y X), we can turn a factor ϕ(x, Y) into a factor ψ(x) via marginaliza@on: (X) = y Val(Y ) (X,y) We can refer to this new factor by Y ϕ.
41 Marginalizing Everything? Take a factor graph s everything factor by mul@plying all of its factors. Sum out all the variables (one by one). What do you get?
42 Factors Are Like Numbers Products are ϕ 1 ϕ 2 = ϕ 2 ϕ 1 Products are associa@ve: (ϕ 1 ϕ 2 ) ϕ 3 = ϕ 1 (ϕ 2 ϕ 3 ) Sums are commuta@ve: X Y ϕ = Y X ϕ (max, too). Distribu@vity of mul@plica@on over marginaliza@on and maximiza@on: X Scope( 1 ) X ( 1 2) = 1 X max ( 1 2) = 1 max X X 2 2
43 Inference
44 Querying the Model ϕ F ϕ A Inference (e.g., do you have allergies or the flu?) Flu ϕ FAS S.I. All. What's the best explana@on for your symptoms? ϕ SR ϕ SH R.N. H. Ac@ve data collec@on (what is the next best r.v. to observe?)
45 A Bigger Example: Your Car The car doesn't start. What do we conclude about the ba_ery age? 18 random variables 2 18 possible scenarios
46 Inference: An Ubiquitous Obstacle Decoding is inference (lecture 2). Learning is inference (lectures 3 and 4). Exact inference is #P- complete. Even approxima@ons within a given absolute or rela@ve error are hard.
47 Inference Problems Given values for some random variables (X V) Most Probable what are the most probable values of the rest of the r.v.s V \ X? (More generally ) Maximum A Posteriori (MAP): what are the most probable values of some other r.v.s, Y (V \ X)? Random sampling from the posterior over values of Y Full posterior over values of Y Marginal probabili@es from the posterior over Y Minimum Bayes risk: What is the Y with the lowest expected cost? Cost- augmented decoding: What is the most dangerous Y?
48 Approaches to Inference inference exact approximate variable ILP randomized dynamic program'ng MCMC importance sampling randomized search loopy belief LP local search Gibbs simulated annealing mean field dual decomp. beam search today tomorrow hard inference methods; soe inference methods; methods for both
49 Exact Marginal for Y This will be a generaliza@on of algorithms you may already have seen: the forward and backward algorithms. The general name is variable elimina@on. Aeer we see it for the marginal, we'll see how to use it for the MAP.
50 Inference Example Goal: P(D) A B P(B A) = ϕ AB (A, B) C D P(D C) = ϕ CD (C, D) A B C D A P(A) = ϕ A (A) 0 1 B C P(C B) = ϕ BC (B, C)
51 Inference Example Let s calculate P(B) first. A B C D
52 Inference Example Let s calculate P(B) first. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) 0 1 P (B) = a Val(A) P (A = a)p (B A = a) C D
53 Inference Example Let s calculate P(B) first. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) 0 1 P (B) = a Val(A) P (A = a)p (B A = a) C Note: C and D don t ma_er. D
54 Inference Example Let s calculate P(B) first. A B P (B) = P (A = a)p (B A = a) C a Val(A) B P(B) = ϕ B (B) 0 1 A P(A) = ϕ A (A) 0 1 A B P(B A) = ϕ AB (A, B) D
55 Inference Example New model in which A is eliminated; defines P(B, C, D) C D P(D C) = ϕ CD (C, D) B P(B) = ϕ B (B) 0 1 B C D B C P(C B) = ϕ BC (B, C)
56 Inference Example Same thing to eliminate B. B P (C) = b Val(B) P (B = b)p (C B = b) C C P(C) = ϕ C (C) 0 1 B P(B) = ϕ B (B) 0 1 B C P(C B) = ϕ BC (B, C) D
57 Inference Example New model in which B is eliminated; defines P(C, D) C P(C) = ϕ C (C) 0 1 C C D P(D C) = ϕ CD (C, D) D
58 Simple Inference Example Last step to get P(D): P (D) = D P(D) = ϕ D (D) 0 1 c Val(C) P (C = c)p (D C = c) C P(C) = ϕ C (C) 0 1 C D P(D C) = ϕ CD (C, D) D
59 Simple Inference Example that the same step happened for each random variable: We created a new factor over the variable and its successor We summed out (marginalized) the variable. P (D) = P (A = a)p (B = b A = a)p (C = c B = b)p (D C = c) a Val(A) b Val(B) c Val(C) = P (D C = c) P (C = c B = b) P (A = a)p (B = b A = a) c Val(C) b Val(B) a Val(A)
60 That Was Variable We reused from previous steps and avoided doing the same work more than once. Dynamic programming à la forward algorithm! We exploited the graph structure (each subexpression only depends on a small number of variables). Exponen@al blowup avoided!
61 What Remains Variable in general The version (for MAP inference) A bit about approximate inference
62 One Variable Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ' = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let ψ be Z ϕ Φ' ϕ 4. Return Ψ {ψ}
63 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let's eliminate H. ϕ SR ϕ SH R.N. H.
64 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ Φ' ϕ 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H.
65 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H.
66 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} S.I. ϕ SR ϕ SH R.N. H. S H ϕ SH (S, H) S ψ(s)
67 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Let's eliminate H. 1. Φ' = {ϕ SH } 2. Ψ = {ϕ F, ϕ A, ϕ FAS, ϕ SR } 3. ψ = H ϕ SH 4. Return Ψ {ψ} ϕ SR R.N. S H ϕ SH (S, H) ψ S ψ(s)
68 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. Let's eliminate H. We can actually ignore the new factor, equivalently just dele@ng H! Why? In some cases elimina@ng a variable is really easy! ϕ SR R.N. S.I. S ψ(s)
69 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. H is already eliminated. ϕ SR Let's now eliminate S. R.N.
70 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ Φ' ϕ 4. Return Ψ {ψ FAR } ϕ SR R.N.
71 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FAS All. S.I. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ SR ϕ FAS 4. Return Ψ {ψ FAR } ϕ SR R.N.
72 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Elimina@ng S. 1. Φ' = {ϕ SR, ϕ FAS } 2. Ψ = {ϕ F, ϕ A } 3. ψ FAR = S ϕ SR ϕ FAS 4. Return Ψ {ψ FAR } R.N.
73 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Finally, eliminate A. R.N.
74 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ψ FAR All. Elimina@ng A. 1. Φ' = {ϕ A, ϕ FAR } 2. Ψ = {ϕ F } 3. ψ FR = A ϕ A ψ FAR 4. Return Ψ {ψ FR } R.N.
75 Query: P(Flu runny nose) A. 1. Φ' = {ϕ A, ϕ FAR } 2. Ψ = {ϕ F } 3. ψ FR = A ϕ A ψ FAR 4. Return Ψ {ψ FR } Example ϕ F Flu ψ FR R.N.
76 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C B C B C P(C B) = ϕ BC (B, C) C D P(D C) = ϕ CD (C, D) D
77 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C. Let s start with C C D P(D C) = ϕ CD (C, D) 0 0 B C B C P(C B) = ϕ BC (B, C) D
78 Chain, Again A A P(A) = ϕ A (A) 0 Goal: P(D) A B P(B A) = ϕ AB (A, B) 1 Earlier, we eliminated A, then B, then C. Let s start with C C D P(D C) = ϕ CD (C, D) 0 0 B C B C P(C B) = ϕ BC (B, C) D
79 Chain, Again A Elimina@ng C. B B C P(C B) = ϕ BC (B, C) C D P(D C) = ϕ CD (C, D) B C D ϕ BCD (B, C, D) C D
80 Chain, Again A Elimina@ng C. B B C D ϕ BCD (B, C, D) B D ψ(b, D) C D
81 Chain, Again B will be similarly complex. A B P(B A) = ϕ AB (A, B) A B A P(A) = ϕ A (A) B D ψ(b, D) D
82 Variable Comments Can prune away all non- ancestors of the query variables. Ordering makes a difference!
83 What about Evidence? So far, we've just considered the posterior/ marginal P(Y). Next: P(Y X = x). It's almost the same: the addi@onal step is to reduce factors to respect the evidence.
84 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Let's reduce to R = true (runny nose). ϕ SR ϕ SH R.N. H. S R ϕ SR (S, R) S R ϕ' S (S) S R ϕ' S (S)
85 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Let's reduce to R = true (runny nose). ϕ' S ϕ SH H. S R ϕ' S (S)
86 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Now run variable elimina@on all the way down to one factor (for F). ϕ' S ϕ SH H. Eliminate H.
87 Example ϕ F ϕ A Query: P(Flu runny nose) Flu ϕ FA S All. S.I. Now run variable elimina@on all the way down to one factor (for F). ϕ' S Eliminate S.
88 Example ϕ F ϕ A Query: P(Flu runny nose) Now run variable elimina@on all the way down to one factor (for F). Flu ψ FA All. Eliminate A.
89 Example ϕ F Query: P(Flu runny nose) Flu ψ F Now run variable elimina@on all the way down to one factor (for F). Take final product.
90 Example Query: P(Flu runny nose) ϕ F ψ F Now run variable elimina@on all the way down to one factor.
91 Comments depends on the size of the intermediate factors. Hence, variable ordering ma_ers a lot. But it's NP- hard to find the best one. For MNs, chordal graphs permit inference linear in the size of the original factors. For BNs, polytree structures do the same. If you can avoid big intermediate factors, you can make inference linear in the size of the original factors.
92 Variable for P(Y X = x) Input: Graphical model on V, set of query variables Y, evidence X = x Output: factor ϕ and scalar α 1. Φ = factors in the model 2. Reduce factors in Φ by X = x 3. Choose variable ordering π on Z = V \ Y \ X 4. ϕ = Variable- Elimina@on(Φ, Z, π) 5. α = z Val(Z) ϕ(z) 6. Return ϕ, α
93 Ge ng Back to NLP structured NLP models were chosen for these HMMs, PCFGs (with a li_le work) But not: IBM model 3 To decode, we need MAP inference for decoding! When models get complicated, need approxima@ons!
94 From Marginals to MAP Replace factor steps with maximiza,on. Add bookkeeping to keep track of the maximizing values. Add a traceback at the end to recover the solu@on. This is analogous to the connec@on between the forward algorithm and the Viterbi algorithm. Ordering challenge is the same.
95 Variable (Max- Product Version with Decoding) Input: Set of factors Φ, ordered list of variables Z to eliminate Output: new factor 1. For each Z i Z (in order): Let (Φ, ψ Z i) = Eliminate- One(Φ, Z i ) 2. Return ϕ Φ ϕ, Traceback({ψ Z i})
96 One Variable (Max- Product Version with Bookkeeping) Input: Set of factors Φ, variable Z to eliminate Output: new set of factors Ψ 1. Let Φ' = {ϕ Φ Z Scope(ϕ)} 2. Let Ψ = {ϕ Φ Z Scope(ϕ)} 3. Let τ be max Z ϕ Φ' ϕ Let ψ be ϕ Φ' ϕ (bookkeeping) 4. Return Ψ {τ}, ψ
97 Traceback Input: Sequence of factors with associated variables: (ψ Z 1,, ψ Z k) Output: z * Each ψ Z is a factor with scope including Z and variables eliminated a<er Z. Work backwards from i = k to 1: Let z i = arg max z ψ Z i(z, z i+1, z i+2,, z k ) Return z
98 About the Traceback No extra expense. Linear traversal over the intermediate factors. The factor for both sum- product VE and max- product VE can be generalized. Example: get the K most likely assignments
99 Variable Tips Any ordering will be correct. Most orderings will be too expensive. There are for choosing an ordering. If the graph is chain- like, work from one end toward the other.
100 (Rocket Science: True MAP) Evidence: X = x Query: Y Other variables: Z = V \ X \ Y y = arg max P (Y = y X = x) y Val(Y ) = arg max y Val(Y ) z Val(Z) P (Y = y, Z = z X = x) First, marginalize out Z, then do MAP inference over Y given X = x This is not usually a_empted in NLP, with some excep@ons.
101 Shots You will probably never implement the general variable algorithm. You will rarely use exact inference. Understand the inference problem would look like in exact form; then approximate. you get lucky. You ll appreciate be_er as they come along.
Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum
Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Last Time Probabilis:c inference is
More informationProbability and Structure in Natural Language Processing
Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks
More informationGraphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum
Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationGraphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum
Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationCSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i
CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.]
More informationBayesian Networks Representation
Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationInference and Representation
Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of
More informationVariable Elimination (VE) Barak Sternberg
Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationBayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important
Bayesian Networks CompSci 270 Duke University Ron Parr Why Joint Distributions are Important Joint distributions gives P(X 1 X n ) Classification/Diagnosis Suppose X1=disease X2 Xn = symptoms Co-occurrence
More informationBayes Nets III: Inference
1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy
More informationMore belief propaga+on (sum- product)
Notes for Sec+on 5 Today More mo+va+on for graphical models A review of belief propaga+on Special- case: forward- backward algorithm From variable elimina+on to junc+on tree (mainly just intui+on) More
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationAnnouncements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.
Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 15: Bayes Nets 3 Midterms graded Assignment 2 graded Announcements Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationBayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University
Bayesian Networks CompSci 570 Ron Parr Department of Computer Science Duke University Why Joint Distributions are Important Joint distribu6ons gives P(X 1 X n ) Classifica6on/Diagnosis Suppose X 1 =disease
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationWhere are we? è Five major sec3ons of this course
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 12: Probability and Sta3s3cs Review Yanjun Qi / Jane University of Virginia Department of Computer Science 10/02/14 1
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationSta$s$cal sequence recogni$on
Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationQuantifying uncertainty & Bayesian networks
Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear
More informationBN Semantics 3 Now it s personal! Parameter Learning 1
Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationGraphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum
Graphical Models Lecture 12: Belief Update Message Passing Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for slide materials. 1 Today s Plan Quick Review: Sum Product Message
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 11: Hidden Markov Models 1 Kaggle Compe==on Part 1 2 Kaggle Compe==on Part 2 3 Announcements Updated Kaggle Report Due Date: 9pm on Monday Feb 13 th
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationProbabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference
Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole University of British Columbia 1 Overview Belief Networks Variable Elimination Algorithm Parent Contexts
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationProbabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech
Probabilistic Graphical Models and Bayesian Networks Artificial Intelligence Bert Huang Virginia Tech Concept Map for Segment Probabilistic Graphical Models Probabilistic Time Series Models Particle Filters
More informationRecall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.
ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationBN Semantics 3 Now it s personal!
Readings: K&F: 3.3, 3.4 BN Semantics 3 Now it s personal! Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2008 10-708 Carlos Guestrin 2006-2008 1 Independencies encoded
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationHidden Markov Models. George Konidaris
Hidden Markov Models George Konidaris gdk@cs.brown.edu Fall 2018 Recall: Bayesian Network Flu Allergy Sinus Nose Headache Recall: BN Flu Allergy Flu P True 0.6 False 0.4 Sinus Allergy P True 0.2 False
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationCSE 473: Ar+ficial Intelligence
CSE 473: Ar+ficial Intelligence Hidden Markov Models Luke Ze@lemoyer - University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationLecture 17: May 29, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador
More informationGraphical Models. Lecture 4: Undirected Graphical Models. Andrew McCallum
Graphical Models Lecture 4: Undirected Graphical Models Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Administrivia HW#1: Source code was due
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Hidden Markov Models 1 x = Fish Sleep y = (N, V) Sequence Predic=on (POS Tagging) x = The Dog Ate My Homework y = (D, N, V, D, N) x = The Fox Jumped
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationA Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004
A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 8.1 and 8.2 Status Check Projects Project 2 Midterm is coming, please do your homework!
More informationIntroduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationTópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863
Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863 Daniel, Edmundo, Rosa Terceiro trimestre de 2012 UFRJ - COPPE Programa de Engenharia de Sistemas e Computação Bayesian Networks
More informationProbabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153
Probabilistic Graphical Models Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153 The Big Objective(s) In a wide variety of application fields two main problems need to be addressed over and over:
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More information