Infering the Number of State Clusters in Hidden Markov Model and its Extension
|
|
- Hillary Hodges
- 5 years ago
- Views:
Transcription
1 Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University
2 Elements of a Hidden Markov Model (HMM) S: Set of state clusters, S ={i} O: Set of observations, O = {k} X t : Time series of state clusters Y t : Time series of observations A: Set of state transitions, A = {A(i, )}, where A(i, )= P(X t+1 = X t = i) t = 0, 1, 2, B: Set of emissions, B = {B(i, k)}, where B(i, k) = P(Y t = k X t = i) t = 0, 1, 2, μ : initial state cluster distribution, μ = {μ(i)}, where μ(i) = P(X 0 = i) Graphic illustration Property 1: Stationary X X X X X t X t+1 Y 0 Y 1 Y 2 Y 3 Y t Y t+1 Property 2 (implicit): P(X t+1 X 0:t ) = P(X t+1 X t ) Property 3 (implicit): P(Y t+1 Y 0:t, X 0:t +1 ) = P(Y t+1 X t+1 ) Compact notation λ = (A, B, μ)
3 Factorization of complete oint likelihood of history up to time T : T T P(X 0:T, Y 0:T λ) = B( X, ) ( 0 t Y t A X 1 t 1, Xt) μ( X0) t= t= P T (Try to find recursive relation!) = P(X 0:T, Y 0:T λ) = P(X T, Y T X 0:T-1, Y 0:T-1 ; λ) P( X 0:T-1, Y 0:T-1 λ) = P(X T, Y T X 0:T-1, Y 0:T-1 ; λ) P T-1 (Conditioning on history up to time T-1) = P(Y T X T, X 0:T-1, Y 0:T-1 ; λ) P(X T X 0:T-1, Y 0:T-1 ; λ) P T-1 (Conditioning on X T ) = P(Y( T X T ; λ) P(X) ( T X T-1 1 ; λ) P) T-1 1 (By Assumption 2 and Assumption 3) = B(X T, Y T ) A(X T-1, X T ) P T-1 (Transition factor and emission factor) = B(X 0, Y 0 ) μ(x 0 ) = T T B( X, ) ( 1 t Y t A X 1 t 1, Xt) P t= t= 0 = T T B( X, ) ( 0 t Y t A X 1 t 1, Xt) μ( X0) t= t= P 0 = P(X 0, Y 0 λ) = P(Y 0 X 0 ; λ) P(X 0 λ) = B(X 0 Y 0 ) μ(x 0 )
4 A fundamental problem P(Y 0:T λ) Data Likelihood Computing method: forward/backward iteration (dynamic programming) Forward calculation: Define α t(i) = P(Y 0:t, X t = i λ) Then α 0 (i) = P(Y 0, X 0 = i λ) = P(Y 0 X 0 = i, λ) P(X 0 = i λ) = B(i, Y 0 ) μ(i) and for t = 1, 2,, T, α t (i) = P(Y 0:t, X t = i λ) = P(Y 0:t, X t-1 =, X t = i λ) = P(Y t, X t = i Y 0:t-1, X t-1 = ; λ) P(Y 0:t-1, X t-1 = λ) = P(Y t, X t = i Y 0:t-1, X t-1 = ; λ) α t-1 () t t 0:t t t 0:t t t = P(Y t X t = i, Y 0:t-1, X t-1 = ; λ) P(X t = i Y 0:t-1, X t-1 = ; λ) α t-1 () = P(Y t X t = i; λ) P(X t = i X t-1 = ; λ) α t-1 () = α t-1 ()A(, i)b(i, Y t ) Finally, by α T (i) = P(Y 0:T, X T = i λ), we have P(Y 0:T λ) = α T (i) i
5 Backward calculation: Define β t (i) = P(Y t+1:t X t = i ; λ) Then β T (i) = 1 and for t = T 1, T 2,, 0, β t (i) = P(Y t+1:t X t = i ; λ) = P(Y t+1:t, X t+1 = X t = i ; λ) = P(Y t+2:t Y t+1, X t+1 =, X t = i ; λ) P(Y t+1, X t+1 = X t = i ; λ) = P(Y t+2:t X t+1 = ; λ) P(Y t+1 X t+1 =, X t = i ; λ) P(X t+1 = X t = i ; λ) = β t+1 () P(Y t+1 X t+1 = ; λ) P(X t+1 = X t = i ; λ) = A(i, ) B(, Y t+1 ) β t+1 () Finally, P(Y 0:T T λ) = P(Y 0:T T, X 0 = i λ) = P(Y 1:T T Y 0, X 0 = i ; λ) P(Y 0, X 0 = i λ) i i = P(Y 1:T X 0 = i ; λ) P(Y 0 X 0 = i ; λ) P(X 0 = i λ) i = μ(i) β 0 (i) B(i, Y 0 ) i
6 An important posterior P(X t = i Y 0:T ; λ) P(X t = i Y 0:T ; λ) = = = P(X t = i, Y 0:T λ) P( ( Y 0:T λ) ) P(X t = i, Y 0:t, Y t+1:t λ) P(Y 0:T λ) P(Y t+1:t X t = i, Y 0:t ; λ) P(X t = i, Y 0:t λ) P(Y 0:T λ) = P(Y t+1:t X t = i; λ) P(X t = i, Y 0:t λ) P(Y 0:T λ) = α t(i) β t (i) α T ()! Now, consider the inverse problem of inferring λ given Y 0:T. A annoying problem is that we don t know the dimension of A, B, and μ.
7 Goal: Infer the number of different hidden state clusters given the observation data: Y (1:N) 0:T Note that for different n and n,y 0:T (n) and Y 0:T (n ) are independent. However, for any n, there exists sequential dependence relation within Y 0:T (n). Inference Method: Gibbs Sampling + Stick Breaking Construction + Dirichlet Distribution Steps: Step 1. Select an initial estimate of S (hence, S = {1, 2,, S }). Select an initialization of λ = (A, B, μ) (use uniform initialization for A, μ and initialize B by drawing from Dirichlet distribution parameterized by the empirical distribution of Y t ) Step 2. For each n = 1, 2,..., N, draw a sequence of state clusters X 0:T (n) from the posteriors: P(X t (n) = i Y 0:T (n) ; λ), t = 0, 1, 2,, T. Step 3. For each n = 1, 2,..., N, compute the count statistics #(i ) (n),#(i k) (n), and #(X = (n) 0 i). Compute the state cluster occupation based on newly drawed X (1:N) 0:T. This will lead to the new estimate of S. Relabel the newly drawed X (1:N) 0:T according to the occupation if necessary. Step 4. Based on the count statistics obtained in in Step 3, draw A and μ via stick-breaking process, and draw B from the Dirichlet conditional posteriors. Go to Step 2.
8 Posteriors involved in stick-breaking process: α i (A) ~ Gamma(c 1 + 1, d 1 log(1 V i (A) ) ) ' V (A) ~ Beta(1 + #(i (n) α (A) i ), i + #(i > ) (n) ), a i1 = V i1,, a i = V i (1 V i ) A n α (μ) ~ Gamma(c 2 + 1, d 2 log(1 V (μ) ) ) n V (μ) ~ Beta(1 + #(X 0 = ) (n), α (μ) + #(X 0 > ) (n) ), μ 1 = V 1 (μ), μ = V (μ) (1 ) μ n n ' < ( ) V μ ' ' < Conditional Dirichlet posteriors: (b i1, b i2,, b i O ) ~ Dirichlet ( β 1 (B) + #(i 1) (n), β 2 (B) + #(i 2) (n),, β O (B) + #(i O ) (n) ) n n n B Empirical hyper-parameters: c 1 = 10-6, d 1 = 0.1, c 2 = 0.01, d 2 = 0.01, all β k s are determined by the empirical distribution of Y t.
9 A toy ground truth (randomly generated): A 10 10, B 10 4, μ 1 10 A = B = μ = Simulated toy data: Y 0:39 (1:100)
10 Estimation Results 6 starting estimates: 60 Estimation of S S Ground dtruthth = 10: Number of Gibbs Iterations
11 An extension of the Hidden Markov Model S: Set of state clusters, S = {i} A: Set of behaviors, A = {a} O: set of inputs, O = {o} z t : Time series of state clusters a t : Time series of behaviors o t : Time series of inputs W: Set of state transitions, W = {W(i, a, o, )}, where W(i, a, o, )= P(z t+1 = z t = i, a t = a, o t+1 = o) t π : St Set of emissions, i π = {π (i, a)}, where π (i, a) ) = P(a t = a z t = i) t μ : initial state cluster distribution, μ = {μ(i)}, where μ(i) = P(X 0 = i) Graphic illustration μ(z 0 ) W(z 0, a 0, o 1, z 1 ) W(z 1, a 1, o 2, z 2 ) z 0 z 1 z 2 π(z 0, a 0 ) π(z 1, a 1 ) π(z 2, a 2 ) a 0 o 1 a 1 o 2 a 2 W(z 2, a 2, o 3, z 3 ) z3 o 3 3 a 3 π(z 3, a 3) t = 0 t = 1 t = 2 t = 3
12 Key properties Property 1: Stationary Property 2 (implicit): P(z t+1, a t+1 z 0:t, a 0:t, o 1:t+1 ) = P(z t+1, a t+1 z t, a t, o t+1 ) Property 3 (implicit): P(z( 0:t, a 0:t o 1:t+1) ) = P(z( 0:t, a 0:t o 1:t) ) Property 4 (implicit): P(a t+1 z t+1, z t, a t, o t+1 ) = P(a t+1 z t+1 ) Compact notation θ = {W, π, μ} A tensor A matrix A vector Factorization of complete oint likelihood of history given input data up to time T : P(z (, T T a o θ = π z a W z a o z z 0:T 0:T 1:T ; ) (, ) (,,, ) ( ) t t t t t t t t μ = = 0
13 P(z 0:T, a 0:T o 1:T ; θ ) = P T (Try to find recursive relation!) = P(z T, a T z 0:T-1 T, a 0:T-1 T, o 1:T T ; θ ) P(z 0:T-1 T, a 0:T-1 T o 1:T T ; θ ) (Conditioning on history up to time T-1) = P(z T, a T z T-1, a T-1, o T ; θ ) P(z 0:T-1, a 0:T-1 o 1:T-1 ; θ ) (By property 2, z T, a T are independent of z 0:T-2, a 0:T-2, o 1:T-1 when z T-1, a T-1, o T are given) (By property 3, z 0:T-1, a 0:T-1 are id independent d of o T, that tis, history does not tdepend don future!) = P(z T, a T z T-1, a T-1, o T ; θ ) P T-1 = P(a T z T, z T-1, a T-1, o T ; θ ) P(z T z T-1, a T-1, o T ; θ ) P T-1 (Conditioning on z T ) = P(a T z T, z T-1, a T-1, o T ; θ ) W(z T-1, a T-1, o T, z T ) P T-1 (Transition factor) = P(a T z T ; θ ) W(z T-1, a T-1, o T, z T ) P T-1 (By property 4, a T is independent of z T-1, a T-1, o T given z T ) = π(z T, a T ) W(z T-1, a T-1, o T, z T ) P T-1 (Emission factor) = T T = π ( z, ) ( a W z 1, a 1, o 1, z ) P 1 t= 2 t t t= 2 t t t t T T = π( z, ) ( 0 t a t W z 1 t 1, at 1, ot 1, zt) μ( z0) t= t= P 1 = P(z 0:1, a 0:1 o 1 ; θ ) = P(z 1, a 1 z 0, a 0, o 1 ; θ ) P(z 0, a 0 o 1 ; θ ) = π(z 1, a 1 ) W(z 0, a 0, o 1, z 1 ) P(z 0, a 0 θ ) = π(z 1, a 1 ) W(z 0, a 0, o 1, z 1 ) P(a 0 z 0, θ ) P(z 0 θ ) = π(z 1, a 1 ) W(z 0, a 0, o 1, z 1 ) π(z 0, a 0 ) μ(z 0 )
14 An important posterior P(z t = i a 0:T, o 1:T ; θ ) Similar to (but different from) the hidden markov model we define α 0 (i) = P(z 0 = i, a 0 o 1:T ; θ ) = P(z 0 = i, a 0 θ ) = P(a 0 z 0 = i ; θ ) P(z 0 = i θ ) = π(i, a 0 ) μ(i) For t = 1, 2,, T, α t (i) = P(z t = i, a 0:t, o 1:T ; θ ) = P(z t = i, a 0:t, o 1:t ; θ ) = P(z t-1 =, z t = i, a 0:t o 1:t ; θ ) = P(z t = i, a t z t-1 =, a 0:t-1, o 1:t ; θ ) P(z t-1 =, a 0:t-1 o 1:t ; θ ) = P(z t = i, a t z t-1 =, a t-1, o t ; θ ) P(z t-1 =, a 0:t-1 o 1:t-1 ; θ ) = P(z t = i, a t z t-1 =, a t-1, o t ; θ ) α t-1 () = P(a t z t = i, z t-1 =, a t-1, o t ; θ ) P(z t = i z t-1 =, a t-1, o t ; θ ) α t-1 () = P(a t z t = i ; θ ) P(z t = i z t-1 =, a t-1, o t ; θ ) α t-1 () = α t-1 () W(, a t-1, o t, i) π(i, a t )
15 By α T (i) = P(z T = i, a 0:T o 1:T ; θ ), we have P(a 0:T o 1:T ; θ ) = P(z T = i, a 0:T o 1:T ; θ ) = α T (i), this is complete data likelihood. We also define β T (i) = 1 and for t = T 1, T 2,, 0, i β t (i) = P(a t+1:t z t = i, a t, o 1:T ; θ ) = P(a t+1:t z t = i, a t, o t+1:t ; θ ) = P(z t+1 =, a t+1:t z t = i, a t, o t+1:t ; θ ) = P(a t+2:t z t+1 =, z t = i, a t+1, a t, o t+2:t ; θ ) P(z t+1 =, a t+1 z t = i, a t, o t+1:t ; θ ) = β t+1 () P(z t+1 =, a t+1 z t = i, a t, o t+1:t ; θ ) = β t+1 () P(a t+1 z t+1 =, z t = i, a t, o t+1:t ; θ ) P(z t+1 = z t = i, a t, o t+1:t ; θ ) = β t+1 () P(a t+1 z t+1 = ; θ ) P(z t+1 = z t = i, a t, o t+1 ; θ ) = W(i, a t, o t+1, ) π(, a t+1 ) β t+1 () i
16 Finally, P(z t = i a 0:T, o 1:T ; θ ) = = = = = P(z t = i, a 0:T o 1:T ; θ ) P(a 0:T o 1:T ; θ ) P(z t = i, a 0:t, a t+1:t o 1:T ; θ ) P(z T =, a 0:T o 1:T ; θ ) P(z t = i, a 0:t, a t+1:t o 1:T ; θ ) α T () P(a t+1:t z t = i, a 0:t, o 1:T ; θ ) P(z t = i, a 0:t o 1:T ; θ ) α T () P(a t+1:t z t = i, a t, o t+1:t ; θ ) P(z t = i, a 0:t o 1:t ; θ ) α T () = β t (i) α t (i) α T () Same form as that in HMM!!!
17 Goal: Infer the number of different hidden state clusters given the observation data: a 0:T (1:N) and the input data o 1:T (1:N) Inference Method: Gibbs Sampling + Stick Breaking Construction + Dirichlet Distribution Steps: Step 1. Select an initial estimate of S (hence, S = {1, 2,, S }). Select an initialization of θ = {W, π, μ} (use uniform initialization for W, μ and initialize π by drawing from Dirichlet distribution parameterized by the empirical distribution of a t ) Step 2. For each n = 1, 2,..., N, draw a sequence of states z (n) 0:T from the conditional posteriors: P(z (n) t = i a (n) 0:T, o (n) 1:T ; θ ), t = 0, 1, 2,, T. Step 3. For each n = 1, 2,..., N, compute the count statistics #(i a, o) (n), #(i a) (n), and #(X 0 = i) (n). Compute the state t cluster occupation based on newly drawed d z (1:N) 0:T. This will lead to the new estimate of S. Relabel the newly drawed z (1:N) 0:T according to the occupation if necessary. Step 4. Based on the count statistics obtained in in Step 3, draw W and μ via stick-breaking process, and draw π from the Dirichlet conditional posteriors. Go to Step 2.
18 Posteriors involved in stick-breaking process: α (W, a, o) ~ +1,d (W, a, o) i Gamma(c 1 d 1 log(1 V i )) V i (W, a, o) ~ Beta(1 + #(i a, o) (n), α i (W, a, o) + #(i >, a, o) (n) ), n w i1 (a, o) = V i1 (W, a, o), w i (a, o) = V i (W, a, o) (1 α (μ) ~ Gamma(c 2 + 1, d 2 log(1 V (μ) ) ) ( W, a, o) V i ' ' < n ) W V (μ) ~ Beta(1 + #(z 0 = ) (n), α (μ) + #(z 0 > ) (n) ), μ 1 = V 1 (μ), μ = V (μ) (1 ) μ n n ( ) V μ ' ' < Conditional Dirichlet posteriors: (π i1, π i2,, π i A ) ~ Dirichlet ( β 1 (π) + #(i 1) (n), β 2 (π) + #(i 2) (n),, β A (π) + #(i A ) (n) ) n n n π Empirical i hyper-parameters: ete c 1 = 10-6, d 1 = 0.1, c 2 = 0.01, d 2 = 0.01, all β k s are determined by the empirical distribution of Y t.
19 A toy ground truth (randomly generated): W , π 10 4, μ 1 10 ( A A = 4, O O = 15, S S = 10) W_a_o = Columns 1 through 8 [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] Columns 9 through 15 [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] [10x10 double] >> W_a_o{1,1} ans =
20 A toy ground truth (randomly generated): >> W_a_o{3,5} ans = e >> W_a_o{4,10} ans =
21 A toy ground truth (randomly generated): >> W_a_o{3,5} ans = e >> W_a_o{4,10} ans =
22 A toy ground truth (randomly generated): >> P_i P_i = >> m_u m_u = >> p_o (this discrete distribution is used to simulate the input data) p_o = Columns 1 through Columns 13 through
23 Simulated toy data: o 1:19 (1:100) a 0:19 (1:100) 6 starting estimates: 60 Estimation of S S Ground truth = 10: Number of Gibbs Iterations
24 Now, with the same toy ground truth, change the mechanism of generating o 1:T (1:N), that is, don t ust use a single multinomial distribution Simulated toy data: o 1:39 (1:100) a 0:39 (1:100) 6 starting Estimation of S estimates: S Ground truth = 10: Number of Gibbs Iterations
25 Questions? Thanks
Dynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationSteven L. Scott. Presented by Ahmet Engin Ural
Steven L. Scott Presented by Ahmet Engin Ural Overview of HMM Evaluating likelihoods The Likelihood Recursion The Forward-Backward Recursion Sampling HMM DG and FB samplers Autocovariance of samplers Some
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationDynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models
6 Dependent data The AR(p) model The MA(q) model Hidden Markov models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm
CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationMarkov Models and Hidden Markov Models
Markov Models and Hidden Markov Models Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Markov Models We have already seen that an MDP provides
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationHidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationEM with Features. Nov. 19, Sunday, November 24, 13
EM with Features Nov. 19, 2013 Word Alignment das Haus ein Buch das Buch the house a book the book Lexical Translation Goal: a model p(e f,m) where e and f are complete English and Foreign sentences Lexical
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More information28 : Approximate Inference - Distributed MCMC
10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Rejection Sampling Rejection Sampling Sample variables from joint
More informationAges of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008
Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationLecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007
COS 597C: Bayesian Nonparametrics Lecturer: David Blei Lecture # Scribes: Jordan Boyd-Graber and Francisco Pereira October, 7 Gibbs Sampling with a DP First, let s recapitulate the model that we re using.
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationBayesian Hidden Markov Models and Extensions
Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationInference in Explicit Duration Hidden Markov Models
Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More information