Directed Probabilistic Graphical Models CMSC 678 UMBC
|
|
- Charles Kelly
- 5 years ago
- Views:
Transcription
1 Directed Probabilistic Graphical Models CMSC 678 UMBC
2 Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions?
3 Announcement 2: Progress Report on Project Due Monday April 16 th, 11:59 AM Build on the proposal: Update to address comments Discuss the progress you ve made Discuss what remains to be done Discuss any new blocks you ve experienced (or anticipate experiencing) Any questions?
4 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
5 Recap from last time
6 Expectation Maximization (EM): E-step 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters p(z i ) count(z i, w i ) 2. M-step: maximize log-likelihood, assuming these uncertain counts p (t) (z) estimated counts p t+1 (z)
7 max θ new parameters EM Math E-step: count under uncertainty E old parameters log p (z, w) z ~ p (t)( w) θ θ posterior distribution M-step: maximize log-likelihood new parameters M θ = marginal loglikelihood of observed data X C θ = log-likelihood of complete data (X,Y) P θ = posterior loglikelihood of incomplete data Y M θ = E Y θ (t)[c θ X] E Y θ (t)[p θ X] EM does not decrease the marginal log-likelihood
8 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
9 Lagrange multipliers Assume an original optimization problem
10 Lagrange multipliers Assume an original optimization problem We convert it to a new optimization problem:
11 Lagrange multipliers: an equivalent problem?
12 Lagrange multipliers: an equivalent problem?
13 Lagrange multipliers: an equivalent problem?
14 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
15 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = i p w i w 1 = 1 w 2 = 5 w 3 = 4 Generative Story for roll i = 1 to N: w i Cat(θ) 6 θ k = 1 k=1 a probability distribution over 6 sides of the die 0 θ k 1, k
16 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = w 1 = 1 w 2 = 5 w 3 = 4 Generative Story for roll i = 1 to N: w i Cat(θ) Maximize Log-likelihood i p w i L θ = log p θ (w i ) i = log θ wi i
17 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = Generative Story for roll i = 1 to N: w i Cat(θ) Maximize Log-likelihood L θ = log θ wi i i p w i Q: What s an easy way to maximize this, as written exactly (even without calculus)?
18 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = Generative Story for roll i = 1 to N: w i Cat(θ) Maximize Log-likelihood L θ = log θ wi i i p w i Q: What s an easy way to maximize this, as written exactly (even without calculus)? A: Just keep increasing θ k (we know θ must be a distribution, but it s not specified)
19 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = i p w i Maximize Log-likelihood (with distribution constraints) L θ = log θ wi i 6 s. t. θ k = 1 k=1 (we can include the inequality constraints 0 θ k, but it complicates the problem and, right now, is not needed) solve using Lagrange multipliers
20 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = i p w i Maximize Log-likelihood (with distribution constraints) 6 F θ = log θ wi λ θ k 1 i k=1 (we can include the inequality constraints 0 θ k, but it complicates the problem and, right now, is not needed) F θ 1 = λ θ k θ wi i:w i =k F θ λ 6 = θ k + 1 k=1
21 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = i p w i Maximize Log-likelihood (with distribution constraints) 6 F θ = log θ wi λ θ k 1 i k=1 (we can include the inequality constraints 0 θ k, but it complicates the problem and, right now, is not needed) θ k = σ i:w i =k 1 λ optimal λ when θ k = 1 6 k=1
22 Probabilistic Estimation of Rolling a Die N different (independent) rolls p w 1, w 2,, w N = p w 1 p w 2 p w N = i p w i Maximize Log-likelihood (with distribution constraints) 6 F θ = log θ wi λ θ k 1 i k=1 (we can include the inequality constraints 0 θ k, but it complicates the problem and, right now, is not needed) θ k = σ i:w i =k 1 σ k σ i:wi =k 1 = N k N optimal λ when θ k = 1 6 k=1
23 Example: Conditionally Rolling a Die p w 1, w 2,, w N = p w 1 p w 2 p w N = p w i add complexity to better explain what we see p z 1, w 1, z 2, w 2,, z N, w N = p z 1 p w 1 z 1 p z N p w N z N i z 1 = H z 2 = T w 1 = 1 w 2 = 5 = p w i z i i p z i p heads = λ p tails = 1 λ p heads = γ p tails = 1 γ p heads = ψ p tails = 1 ψ
24 Example: Conditionally Rolling a Die p w 1, w 2,, w N = p w 1 p w 2 p w N = p w i p z 1, w 1, z 2, w 2,, z N, w N = p z 1 p w 1 z 1 p z N p w N z N p heads = λ p tails = 1 λ p heads = γ p tails = 1 γ p heads = ψ p tails = 1 ψ = p w i z i i add complexity to better explain what we see p z i Generative Story λ = distribution over penny γ = distribution for dollar coin ψ = distribution over dime for item i = 1 to N: z i ~ Bernoulli λ if z i = H: w i ~ Bernoulli γ else: w i ~ Bernoulli ψ i
25 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
26 Classify with Bayes Rule argmax Y p Y X) argmax Y log p X Y) + log p(y) likelihood prior
27 The Bag of Words Representation Adapted from Jurafsky & Martin (draft)
28 The Bag of Words Representation Adapted from Jurafsky & Martin (draft)
29 The Bag of Words Representation Adapted from Jurafsky & Martin (draft) 29
30 Bag of Words Representation seen 2 sweet 1 γ( )=c whimsical 1 classifier Adapted from Jurafsky & Martin (draft) recommend 1 happy classifier
31 Naïve Bayes: A Generative Story Generative Story φ = distribution over K labels for label k = 1 to K: θ k = generate parameters global parameters
32 Naïve Bayes: A Generative Story Generative Story φ = distribution over K labels for label k = 1 to K: θ k = generate parameters for item i = 1 to N: y i ~ Cat φ y
33 Naïve Bayes: A Generative Story Generative Story φ = distribution over K labels for label k = 1 to K: θ k = generate parameters for item i = 1 to N: y i ~ Cat φ for each feature j x ij F j (θ yi ) local variables y x i1 x i2 x i3 x i4 x i5
34 Naïve Bayes: A Generative Story Generative Story φ = distribution over K labels for label k = 1 to K: θ k = generate parameters for item i = 1 to N: y i ~ Cat φ for each feature j x ij F j (θ yi ) y x i1 x i2 x i3 x i4 x i5 each x ij conditionally independent of one another (given the label)
35 Naïve Bayes: A Generative Story Generative Story φ = distribution over K labels for label k = 1 to K: θ k = generate parameters for item i = 1 to N: y i ~ Cat φ for each feature j x ij F j (θ yi ) y x i1 x i2 x i3 x i4 x i5 Maximize Log-likelihood L θ = i log F yi (x ij ; θ yi ) + j i log φ yi s. t. φ k = 1 k θ k is valid for F k
36 Multinomial Naïve Bayes: A Generative Generative Story φ = distribution over K labels for label k = 1 to K: for item i = 1 to N: y i ~ Cat φ Story θ k = distribution over J feature values for each feature j x ij Cat(θ yi,j) y x i1 x i2 x i3 x i4 x i5 Maximize Log-likelihood L θ = i log θ yi,x i,j + j i log φ yi s. t. φ k = 1 k θ kj = 1 k j
37 Multinomial Naïve Bayes: A Generative Generative Story φ = distribution over K labels for label k = 1 to K: for item i = 1 to N: y i ~ Cat φ Story θ k = distribution over J feature values for each feature j x ij Cat(θ yi,j) y x i1 x i2 x i3 x i4 x i5 L θ Maximize Log-likelihood via Lagrange Multipliers = log θ yi,x i,j + log φ yi μ φ k 1 i j i k k λ k θ kj 1 j
38 Multinomial Naïve Bayes: Learning Calculate class priors For each k: items k = all items with class = k Calculate feature generation terms For each k: obs k = single object containing all items labeled as k For each feature j n kj = # of occurrences of j in obs k p k = items k # items p j k = n kj σ j n kj
39 Adapted from Jurafsky & Martin (draft) Brill and Banko (2001) With enough data, the classifier may not matter
40 Summary: Naïve Bayes is Not So Naïve, but not without issue Pro Very Fast, low storage requirements Robust to Irrelevant Features Very good in domains with many equally important features Optimal if the independence assumptions hold Dependable baseline for text classification (but often not the best) Con Model the posterior in one go? (e.g., use conditional maxent) Are the features really uncorrelated? Are plain counts always appropriate? Are there better ways of handling missing/noisy data? (automated, more principled) Adapted from Jurafsky & Martin (draft)
41 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
42 Hidden Markov Models (i): Adjective Noun Verb Prep Noun Noun (ii): Noun Verb Noun Prep Noun Noun p(british Left Waffles on Falkland Islands) Class-based model Bigram model of the classes Model all class sequences p w i z i p z i z i 1 z 1,..,z N p z 1, w 1, z 2, w 2,, z N, w N
43 Hidden Markov Model p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 Goal: maximize (log-)likelihood In practice: we don t actually observe these z values; we just see the words w
44 Hidden Markov Model p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 Goal: maximize (log-)likelihood In practice: we don t actually observe these z values; we just see the words w if we did observe z, estimating the probability parameters would be easy but we don t! :( if we knew the probability parameters then we could estimate z and evaluate likelihood but we don t! :(
45 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 Each z i can take the value of one of K latent states
46 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 transition probabilities/parameters Each z i can take the value of one of K latent states
47 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 emission probabilities/parameters transition probabilities/parameters Each z i can take the value of one of K latent states
48 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 emission probabilities/parameters transition probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change
49 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 emission probabilities/parameters transition probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items?
50 Hidden Markov Model Terminology p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N = p w i z i i p z i z i 1 emission probabilities/parameters transition probabilities/parameters Each z i can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items? A: VK emission values and K 2 transition values
51 Hidden Markov Model Representation p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N emission probabilities/parameters = p w i z i i p z i z i 1 transition probabilities/parameters z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph
52 Hidden Markov Model Representation p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N emission probabilities/parameters initial starting distribution ( SEQ_SYM ) p z 1 z 0 z 1 = p w i z i i p z i z i 1 p z 2 z 1 p z 3 z 2 p z 4 z 3 z 2 z 3 z 4 p w 1 z 1 p w 2 z 2 p w 3 z 3 p w 4 z 4 transition probabilities/parameters w 1 w 2 w 3 w 4 Each z i can take the value of one of K latent states Transition and emission distributions do not change
53 Example: 2-state Hidden Markov Model as a Lattice z 1 = V z 2 = V z 3 = V z 4 = V z 1 = N z 2 = N z 3 = N z 4 = N w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 N V N V end start N V
54 Example: 2-state Hidden Markov Model as a Lattice z 1 = V z 2 = V z 3 = V z 4 = V z 1 = N z 2 = N z 3 = N z 4 = N p w 1 V p w 2 V p w 3 V p w 1 N p w 2 N p w 3 N p w 4 N p w 4 V w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 N V N V end start N V
55 Example: 2-state Hidden Markov Model as a Lattice p V start z 1 = V p V V z 2 = p V V p V V z 3 = V V z 4 = V p N start z 1 = N p N N z 2 = p N N z 3 = p N N N N z 4 = N p w 1 V p w 2 V p w 3 V p w 1 N p w 2 N p w 3 N p w 4 N p w 4 V w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 N V N V end start N V
56 Example: 2-state Hidden Markov Model as a Lattice p V start z 1 = V p V V z 2 = p V V p V V z 3 = V V z 4 = V p N start p N V z 1 = N p V N p V N p V N p N V p N V p N N z 2 = p N N z 3 = p N N N N z 4 = N p w 1 V p w 2 V p w 3 V p w 1 N p w 2 N p w 3 N p w 4 N p w 4 V w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 N V N V end start N V
57 A Latent Sequence is a Path through the Graph z 1 = V z 2 = V p V V z 3 = V z 4 = V p V N p N V p N start z 1 = N z 2 = N z 3 = N z 4 = N p w 2 V p w 3 V p w 1 N p w 4 N w 1 w 2 w 3 w 4 N V end start N V w 1 w 2 w 3 w 4 N V Q: What s the probability of (N, w 1 ), (V, w 2 ), (V, w 3 ), (N, w 4 )? A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) =
58 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
59 Message Passing: Count the Soldiers If you are the front soldier in the line, say the number one to the soldier behind you. If you are the rearmost soldier in the line, say the number one to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16
60 Message Passing: Count the Soldiers If you are the front soldier in the line, say the number one to the soldier behind you. If you are the rearmost soldier in the line, say the number one to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16
61 Message Passing: Count the Soldiers If you are the front soldier in the line, say the number one to the soldier behind you. If you are the rearmost soldier in the line, say the number one to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16
62 Message Passing: Count the Soldiers If you are the front soldier in the line, say the number one to the soldier behind you. If you are the rearmost soldier in the line, say the number one to the soldier in front of you. If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side ITILA, Ch 16
63 What s the Maximum Weighted Path?
64 What s the Maximum Weighted Path?
65 What s the Maximum Weighted Path?
66 What s the Maximum Weighted Path?
67 What s the Maximum Weighted Path?
68 What s the Maximum Weighted Path?
69 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
70 What s the Maximum Value? z i-1 = A z i = A z i-1 = B z i = B z i-1 = C z i = C consider any shared path ending with B (AB, BB, or CB) B maximize across the previous hidden state values v i, B = max s v i 1, s p B s ) p(obs at i B) v(i, B) is the maximum probability of any paths to that state B from the beginning (and emitting the observation)
71 What s the Maximum Value? z i-2 = A z i-1 = A z i = A z i-2 = B z i-1 = B z i = B z i-2 = C z i-1 = C z i = C consider any shared path ending with B (AB, BB, or CB) B maximize across the previous hidden state values v i, B = max s v i 1, s p B s ) p(obs at i B) v(i, B) is the maximum probability of any paths to that state B from the beginning (and emitting the observation)
72 What s the Maximum Value? z i-2 computing = A v at time i-1 will correctly incorporate (maximize over) paths z through i-2 = B time i-2: we correctly obey the Markov property z i-2 = C z i-1 = A z i-1 = B z i-1 = C z i = A z i = B z i = C consider any shared path ending with B (AB, BB, or CB) B maximize across the previous hidden state values v i, B = max s v i 1, s p B s ) p(obs at i B) Viterbi algorithm v(i, B) is the maximum probability of any paths to that state B from the beginning (and emitting the observation)
73 v = double[n+2][k*] b = int[n+2][k*] Viterbi Algorithm v[*][*] = 0 v[0][start] = 1 backpointers/ book-keeping for(i = 1; i N+1; ++i) { for(state = 0; state < K*; ++state) { p obs = p emission (obs i state) for(old = 0; old < K*; ++old) { p move = p transition (state old) if(v[i-1][old] * p obs * p move > v[i][state]) { } } } } v[i][state] = v[i-1][old] * p obs * p move b[i][state] = old
74 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
75 Forward Probability z i-2 = A z i-1 = A z i = A z i-2 = B z i-1 = B z i = B z i-2 = C z i-1 = C z i = C α(i, B) is the total probability of all paths to that state B from the beginning
76 Forward Probability z i-2 = A z i-1 = A z i = A z i-2 = B z i-1 = B z i = B z i-2 = C z i-1 = C z i = C marginalize across the previous hidden state values α i, B = s α i 1, s p B s ) p(obs at i B) computing α at time i-1 will correctly incorporate paths through time i-2: we correctly obey the Markov property α(i, B) is the total probability of all paths to that state B from the beginning
77 Forward Probability α i, s = s α i 1, s p s s ) p(obs at i s) α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i
78 Forward Probability α i, s = s α i 1, s p s s ) p(obs at i s) what are the immediate ways to get into state s? what s the total probability up until now? how likely is it to get into state s this way? α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i
79 α i, s = what are the immediate ways to get into state s? Forward Probability s α i 1, s p s s ) p(obs at i s) what s the total probability up until now? α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i how likely is it to get into state s this way? 3. that emit the observation obs at i Q: What do we return? (How do we return the likelihood of the sequence?) A: α[n+1][end]
80 Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
81 Forward & Backward Message Passing z i-1 = A z i = A z i+1 = A z i-1 = B z i = B z i+1 = B z i-1 = C z i = C z i+1 = C α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i β(i, s) is the total probability of all paths: 1. that start at step i at state s 2. that terminate at the end 3. (that emit the observation obs at i+1)
82 Forward & Backward Message Passing z i-1 = A z i = A z i+1 = A z i-1 = B z i = B z i+1 = B z i-1 = C z i = C z i+1 = C α(i, B) β(i, B) α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i β(i, s) is the total probability of all paths: 1. that start at step i at state s 2. that terminate at the end 3. (that emit the observation obs at i+1)
83 Forward & Backward Message Passing z i-1 = A z i = A z i+1 = A z i-1 = B z i = B z i+1 = B z i-1 = C z i = C z i+1 = C α(i, B) β(i, B) α(i, B) * β(i, B) = total probability of paths through state B at step i α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i β(i, s) is the total probability of all paths: 1. that start at step i at state s 2. that terminate at the end 3. (that emit the observation obs at i+1)
84 Forward & Backward Message Passing z i-1 = A z i = A z i+1 = A z i-1 = B z i = B z i+1 = B z i-1 = C z i = C z i+1 = C α(i, B) β(i+1, s) α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i β(i, s) is the total probability of all paths: 1. that start at step i at state s 2. that terminate at the end 3. (that emit the observation obs at i+1)
85 Forward & Backward Message Passing z i-1 = A z i = A z i+1 = A z i-1 = B z i = B z i+1 = B z i-1 = C z i = C z i+1 = C α(i, B) β(i+1, s ) α(i, B) * p(s B) * p(obs at i+1 s ) * β(i+1, s ) = total probability of paths through the B s arc (at time i) α(i, s) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i β(i, s) is the total probability of all paths: 1. that start at step i at state s 2. that terminate at the end 3. (that emit the observation obs at i+1)
86 With Both Forward and Backward Values α(i, s) * β(i, s) = total probability of paths through state s at step i p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) α(i, s) * p(s B) * p(obs at i+1 s ) * β(i+1, s ) = total probability of paths through the s s arc (at time i) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END)
87 estimated counts Expectation Maximization (EM) 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters 2. M-step: maximize log-likelihood, assuming these uncertain counts
88 estimated counts Expectation Maximization (EM) 0. Assume some value for your parameters Two step, iterative algorithm p obs (w s) p trans (s s) 1. E-step: count under uncertainty, assuming these parameters 2. M-step: maximize log-likelihood, assuming these uncertain counts
89 estimated counts Expectation Maximization (EM) 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) p obs (w s) p trans (s s) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END) 2. M-step: maximize log-likelihood, assuming these uncertain counts
90 M-Step maximize log-likelihood, assuming these uncertain counts p new s s) = c(s s ) σ x c(s x) if we observed the hidden transitions
91 M-Step maximize log-likelihood, assuming these uncertain counts p new s s) = E s s [c s s ] σ x E s x [c s x ] we don t the hidden transitions, but we can approximately count
92 M-Step maximize log-likelihood, assuming these uncertain counts p new s s) = E s s [c s s ] σ x E s x [c s x ] we compute these in the E-step, with our α and β values we don t the hidden transitions, but we can approximately count
93 α = computeforwards() β = computebackwards() L = α[n+1][end] EM For HMMs (Baum-Welch Algorithm) for(i = N; i 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 next) += α[i+1][next]* β[i+1][next]/l for(state = 0; state < K*; ++state) { u = p obs (obs i+1 next) * p trans (next state) c trans (next state) += α[i][state] * u * β[i+1][next]/l } } }
94 Bayesian Networks: Directed Acyclic Graphs p x 1, x 2, x 3,, x N = i p x i π(x i )) parents of topological sort
95 Bayesian Networks: Directed Acyclic Graphs p x 1, x 2, x 3,, x N = i p x i π(x i )) exact inference in general DAGs is NP-hard inference in trees can be exact
96 D-Separation: Testing for Conditional Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable in) X to (any variable in) Y are d-separated by Z Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: P has a chain with an observed middle node X P has a fork with an observed parent node X P includes a v-structure or collider with all unobserved descendants X Z Y Y Y
97 D-Separation: Testing for Conditional Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable in) X to (any variable in) Y are d-separated by Z Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node observing Z blocks the path from X to Y X Z Y not observing Z blocks the path from X to Y P includes a v-structure or collider with all unobserved descendants X Z Y
98 D-Separation: Testing for Conditional Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable in) X to (any variable in) Y are d-separated by Z Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node observing Z blocks the path from X to Y X Z Y not observing Z blocks the path from X to Y p x, y, z = p x p y p(z x, y) p x, y = p x p y p(z x, y) = p x p y z P includes a v-structure or collider with all unobserved descendants X Z Y
99 Markov Blanket p x i x j i = factorization of graph = the set of nodes needed to form the complete conditional for a variable x i p(x 1,, x N ) p x 1,, x N dx i ς k p(x k π x k ) ς k p x k π x k ) dx i x = factor out terms not dependent on x i ς k:k=i or i π xk p(x k π x k ) ς k:k=i or i π xk p x k π x k ) dx i Markov blanket of a node x is its parents, children, and children's parents
A brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCS838-1 Advanced NLP: Hidden Markov Models
CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationA gentle introduction to Hidden Markov Models
A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationGraphical Models Seminar
Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationProbabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech
Probabilistic Graphical Models and Bayesian Networks Artificial Intelligence Bert Huang Virginia Tech Concept Map for Segment Probabilistic Graphical Models Probabilistic Time Series Models Particle Filters
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture 9. Intro to Hidden Markov Models (finish up)
Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationQualifier: CS 6375 Machine Learning Spring 2015
Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationBayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net
Bayesian Networks aka belief networks, probabilistic networks A BN over variables {X 1, X 2,, X n } consists of: a DAG whose nodes are the variables a set of PTs (Pr(X i Parents(X i ) ) for each X i P(a)
More informationCS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III
CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More information6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm
6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More information