Biostatistics 615/815 Lecture 20: The Baum-Welch Algorithm Advanced Hidden Markov Models

Size: px

Start display at page:

Download "Biostatistics 615/815 Lecture 20: The Baum-Welch Algorithm Advanced Hidden Markov Models"

Oswin Reed
5 years ago
Views:

1 Biostatistics 615/815 Lecture 20: The Baum-Welch Algorithm Advanced Hidden Markov Models Hyun Min Kang November 22nd, 2011 Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

2 Today Baum-Welch Algorithm An E-M algorithm for HMM parameter estimation Three main HMM algorithms The forward-backward algorithm The Viterbi algorithm The Baum-Welch Algorithm Advanced HMM Expedited inference with uniform HMM Continuous-time Markov Process Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

3 Revisiting Hidden Markov Model (me# states# " 1 # 2 # 3 # T # " a q 12# a 1# q 23# a 2# q (T01)T# 3# q T# b q1 (o 1 ) # b q2 (o 2 ) # b q3 (o 3 ) # b qt (o T ) # data# " o 1# o 2# o 3# o T# π # Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

4 Statistical analysis with HMM HMM for a deterministic problem Given Given parameters λ = {π, A, B and data o = (o 1,, o T ) Forward-backward algorithm Compute Pr(q t o, λ) Viterbi algorithm Compute arg max q Pr(q o, λ) HMM for a stochastic process / algorithm Generate random samples of o given λ Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

5 Deterministic Inference using HMM If we know the exact set of parameters, the inference is deterministic given data No stochastic process involved in the inference procedure Inference is deterministic just as estimation of sample mean is deterministic The computational complexity of the inference procedure is exponential using naive algorithms Using dynamic programming, the complexity can be reduced to O(n 2 T) Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

6 Using Stochastic Process for HMM Inference Using random process for the inference Randomly sampling o from Pr(o λ) Estimating arg max λ Pr(o λ) No analyitic algorithm available Simplex, E-M algorithm, or Simulated Annealing is possible apply Estimating the distribution Pr(λ o) Gibbs Sampling Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

7 Recap : The E-M Algorithm Expectation step (E-step) Given the current estimates of parameters θ (t), calculate the conditional distribution of latent variable z Then the expected log-likelihood of data given the conditional distribution of z can be obtained Q(θ θ (t) ) = E z x,θ (t) [log p(x, z θ)] Maximization step (M-step) Find the parameter that maximize the expected log-likelihood θ (t+1) = arg max Q(θ θ t ) θ Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

8 Baum-Welch for estimating arg max λ Pr(o λ) Assumptions Transition matrix is identical between states a ij = Pr(q t+1 = j q t = i) = Pr(q t = j q t 1 = i) Emission matrix is identical between states b i (j) = Pr(o t = j q t = i) = Pr(o t=1 = j q t 1 = i) This is NOT the only possible configurations of HMM For example, a ij can be parameterized as a function of t Multiple sets of o independently drawn from the same distribution can be provided Other assumptions will result in different formulation of E-M algorithm Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

9 E-step of the Baum-Welch Algorithm 1 Run the forward-backward algorithm given λ (τ) α t (i) = Pr(o 1,, o t, q t = i λ (τ) ) β t (i) = Pr(o t+1,, o T q t = i, λ (τ) ) γ t (i) = Pr(q t = i o, λ (τ) ) = α t(i)β t (i) k α t(k)β t (k) Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

10 E-step of the Baum-Welch Algorithm 1 Run the forward-backward algorithm given λ (τ) α t (i) = Pr(o 1,, o t, q t = i λ (τ) ) β t (i) = Pr(o t+1,, o T q t = i, λ (τ) ) γ t (i) = Pr(q t = i o, λ (τ) ) = α t(i)β t (i) k α t(k)β t (k) 2 Compute ξ t (i, j) using α t (i) and β t (i) ξ t (i, j) = Pr(q t = i, q t+1 = j o, λ (τ) ) = α t(i)a ij b j (o t+1 )β t+1 (j) Pr(o λ (τ) ) α t (i)a ij b j (o t+1 )β t+1 (j) = (k,l) α t(k)a kl b l (o t+1 )β t+1 (l) Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

11 E-step : ξ t (i, j) ξ t (i, j) = Pr(q t = i, q t+1 = j o, λ (τ) ) Quantifies joint state probability between consecutive states Need to estimate transition probability Requires O(n 2 T) memory to store entirely Only O(n 2 ) is necessary for running Baum-Welch algorithm Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

12 M-step of the Baum-Welch Algorithm Let λ (τ+1) = (π (τ+1), A (τ+1), B (τ+1) ) π (τ+1) (i) = a (τ+1) ij = b i (k) (τ+1) = T t=1 Pr(q t = i o, λ (τ) ) T t=1 = γ t(i) T T T 1 t=1 Pr(q t = i, q t+1 = j o, λ (τ) ) T 1 t=1 Pr(q = t = i o, λ (τ) ) T t=1 Pr(q t = i, o t = k o, λ (τ) ) T t=1 Pr(q t = i o, λ (τ) ) = T 1 t=1 ξ t(i, j) T 1 t=1 γ t(i) T t=1 γ t(i)i(o t = k) T t=1 γ t(i) A detailed derivation can be found at Welch, Hidden Markov Models and The Baum Welch Algorithm, IEEE Information Theory Society News Letter, Dec 2003 Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

13 Additional function to HMM615h class HMM615 { // assign newval to dst, after computing the relative differences between them // note that dst is call-by-reference, and newval is call-by-value static double update(double& dst, double newval) { // calculate the relative differences double reldiff = fabs((dst-newval)/(newval+zeps)); dst = newval; // update the destination value return reldiff; ; Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

14 Handling large number of states class HMM615 { void normalize(std::vector<double>& v) { // additional function double sum = 0; for(int i=0; i < (int)vsize(); ++i) sum += v[i]; for(int i=0; i < (int)vsize(); ++i) v[i] /= sum; void forward() { for(int i=0; i < nstates; ++i) alphasdata[0][i] = pis[i] * emisdata[i][outs[0]]; for(int t=1; t < ntimes; ++t) { for(int i=0; i < nstates; ++i) { alphasdata[t][i] = 0; for(int j=0; j < nstates; ++j) { alphasdata[t][i] += (alphasdata[t-1][j] * transdata[j][i] * emisdata[i][outs[t]]); normalize(alphasdata[t]); // **ADD THIS LINE** ; Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

15 Additional function to Matrix615h void Matrix615::fill(T val) { int nr = rownums(); for(int i=0; i < nr; ++i) { std::fill(data[i]begin(),data[i]end(),val); // print the content of matrix void Matrix615::print(std::ostream& o) { int nr = rownums(); int nc = colnums(); for(int i=0; i < nr; ++i) { for(int j=0; j < nc; ++j) { if ( j > 0 ) o << "\t"; o << data[i][j]; o << std::endl; Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

16 Baum-Welch algorithm : initialization // return a pair of (# iter, relative diff) given tolerance std::pair<int,double> HMM615::baumWelch(double tol) { // temporary variables to use internally Matrix615<double> xis(nstates,nstates); // Pr(q_{t+1 = j q_t = j) Matrix615<double> sumxis(nstates,nstates); // sum_t xis(i,j) Matrix615<double> sumobsgammas(nstates,nobs); // sum_t gammas(i)i(o_t=j) std::vector<double> sumgammas(nstates); // sum_t gammas(i) double tmp, sum, reldiff = 1; int iter; for(iter=0; (iter < MAX_ITERATION) && ( reldiff > tol ); ++iter) { reldiff = 0; // E-step : compute Pr(q o,lambda) forwardbackward(); Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

17 Baum-Welch algorithm : M-step // initialize temporary storage std::fill(sumgammasbegin(),sumgammasend(),0); sumxisfill(0); sumobsgammasfill(0); xisfill(0); // M-step : updates pis, trans, and emis for(int t=0; t < ntimes-2; ++t) { sum = 0; // sum stores sum of xis for(int i=0; i < nstates; ++i) for(int j=0; j < nstates; ++j) sum += (xisdata[i][j] = alphasdata[t][i] * transdata[i][j] * betasdata[t+1][j] * emisdata[j][outs[t+1]]); // update sumgammas, sumobsgamms, sumxis for(int i=0; i < nstates; ++i) { sumgammas[i] += gammasdata[t][i]; sumobsgammasdata[i][outs[t]] += gammasdata[t][i]; for(int j=0; j < nstates; ++j) sumxisdata[i][j] += (xisdata[i][j] /= sum); Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

18 Baum-Welch algorithm : M-step for(int i=0; i < nstates; ++i) { reldiff += update( pis[i], sumgammas[i]/(ntimes-1) ); for(int j=0; j < nstates; ++j) { reldiff += update(transdata[i][j], sumxisdata[i][j] / (sumgammas[i] - gammasdata[ntimes-1][i] + ZEPS)); for(int j=0; j < nobs; ++j ) { reldiff += update(emisdata[i][j], sumobsgammasdata[i][j] / (sumgammas[i] + ZEPS) ); return std::pair<int,double>(iter,reldiff); Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

19 A working example : Biased dice example Observations : O = {1, 2,, 6 Hidden states : S = {FAIR, 1 BIASED,, 6 BIASED Priors : π = {070, 005, 005,, 005 Transition matrix : A = Emission matrix : B = /6 1/6 1/6 1/6 1/6 1/ Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

20 Biased dice example : main() function #include <iostream> #include <iomanip> #include "Matrix615h" #include "HMM615h" int main(int argc, char** argv) { if ( argc!= 5 ) { std::cerr << "Usage: baumwelch [trans0] [emis0] [pis0] [obs]" << std::endl; return -1; std::vector<int> obs; Matrix615<double> trans(argv[1]); int ns = transrownums(); if ( ns!= transcolnums() ) { std::cerr << "Transition matrix is not square" << std::endl; return -1; Matrix615<double> emis(argv[2]); Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

21 Biased dice example : main() function (cont d) if ( ns!= emisrownums() ) { std::cerr << "Emission and transition matrices do not match" << std::endl; return -1; int no = emiscolnums(); readfromfile<int> (obs, argv[4]); int nt = (int)obssize(); HMM615 hmm(ns, no, nt); readfromfile<double> (hmmpis, argv[3]); if ( ns!= (int)hmmpissize() ) { std::cerr << "Transition and Prior matrices do not match" << std::endl; return -1; hmmtrans = trans; hmmemis = emis; hmmouts = obs; Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

22 Biased dice example : main() function (cont d) std::pair<int,double> result = hmmbaumwelch(1e-6); std::cout << "# ITERATIONS : " << resultfirst << "\t"; std::cout << "SUM RELATIVE DIFF : " << resultsecond << std::endl; std::cout << std::fixed << std::setprecision(5); std::cout << "PIS:" << std::endl; for(int i=0; i < ns; ++i) { if ( i > 0 ) std::cout << "\t"; std::cout << hmmpis[i]; std::cout << std::endl; std::cout << " " << std::endl; std::cout << "TRANS:" << std::endl; hmmtransprint(std::cout); std::cout << " " << std::endl; std::cout << "EMIS:" << std::endl; hmmemisprint(std::cout); std::cout << " " << std::endl; return 0; Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

23 Biased dice example : Results with 20,000 samples $ /baumwelch transdicetxt emisdicetxt pisdicetxt outsdicetxt # ITERATIONS : 23 SUM RELATIVE DIFF : e-07 PIS: TRANS: EMIS: Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

24 Biased dice example : Starting with uniform parameters $ /baumwelch trans0dicetxt emis0dicetxt pis0dicetxt outsdicetxt # ITERATIONS : 2 SUM RELATIVE DIFF : 0 PIS: TRANS: EMIS: Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

25 Starting with incorrect emission matrix $ cat emis1dicetxt Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

26 Starting with incorrect emission matrix $ /baumwelch trans0dicetxt emis1dicetxt pis0dicetxt outsdicetxt # ITERATIONS : 37 SUM RELATIVE DIFF : e-07 PIS: TRANS: EMIS: Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

27 : Baum-Welch Algorithm E-M algorithm for estimating HMM parameters Assumes identical transition and emission probabilities across t The framework can be accomodated for differently contrained HMM Requires many observations to reach a reliable estimates Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

28 Rapid Inference with Definition π i = 1/n { θ i j a ij = 1 (n 1)θ i = j b i (k) has no restriction Independent transition between n states Useful model in genetics and speech recognition Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

29 Rapid Inference with Definition π i = 1/n { θ i j a ij = 1 (n 1)θ i = j b i (k) has no restriction Independent transition between n states Useful model in genetics and speech recognition The Problem The time complexity of HMM inference is O(n 2 T) For large n, this still can be a substantial computational burden Can we reduce the time complexity by leveraging the simplicity? Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

30 Forward Algorithm with Original Forward Algorithm α t (i) = Pr(o 1,, o t, q t = i λ) = [ n j=1 α t 1(j)a ij ] b i (o t ) Rapid Forward Algorithm for [ n ] α t (i) = j=1 α t 1(j)a ij b i (o t ) [ = (1 (n 1)θ)α t 1 (i) + ] j i α t 1(j)θ b i (o t ) = [(1 nθ)α t 1 (i) + θ] b i (o t ) Assuming normalizaed i α t(i) = 1 for every t The total time complexity is O(nT) Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

31 Backward Algorithm with Original Forward Algorithm β t (i) = Pr(o t+1,, o T q t = i, λ) = n β t+1 (j)a ji b j (o t+1 ) j=1 Rapid Forward Algorithm for β t (i) = n β t+1 (j)a ji b j (o t+1 ) j=1 = (1 (n 1)θ)β t+1 (i)b i (o t+1 ) + θ j i β t+1(j)b j (o t+1 ) = (1 nθ)β t+1 (i)b t (o t+1 ) + θ Assuming i β t(i)b i (o t ) = 1 for every t Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

32 : Rapid computation of forward-backward algorithm leveraging symmetric structure Rapid Baum-Welch algorithm is also possible in a similar manner It is important to understand the computational details of exisitng methods to further tweak the method when necessary Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

33 Today The Baum-Welch Algorithm Rapid inference with Next Lecture Linear Algebra in C++ Hyun Min Kang Biostatistics 615/815 - Lecture 20 November 22nd, / 31

Graphical Models 101. Biostatistics 615/815 Lecture 12: Hidden Markov Models. Example probability distribution. An example graphical model

Graphical Models 101. Biostatistics 615/815 Lecture 12: Hidden Markov Models. Example probability distribution. An example graphical model 101 Biostatistics 615/815 Lecture 12: Hidden Markov Models Hyun Min Kang February 15th, 2011 Marriage between probability theory and graph theory Each random variable is represented as vertex Dependency