Further details of the Baum-Welch algorithm

Size: px

Start display at page:

Download "Further details of the Baum-Welch algorithm"

Silvia Clarke
5 years ago
Views:

1 Further details of the Baum-Welch algorithm Martin Emms November 15, 2018

2 real Baum-Welch: summing the clock-tick probs brute-force EM would for each o d calculate responsibility γ d (s) = p(s o d ) for all s and from these calculate various expectations (eg. E d (i), E d (ij)) Baum-Welch instead first runs α and β for o d. By various terms involving α and β can derive various per t clock tick mini responsibilities

3 real Baum-Welch: summing the clock-tick probs brute-force EM would for each o d calculate responsibility γ d (s) = p(s o d ) for all s and from these calculate various expectations (eg. E d (i), E d (ij)) Baum-Welch instead first runs α and β for o d. By various terms involving α and β can derive various per t clock tick mini responsibilities and then summations over these give various expectations (E d (i), E d (ij) etc). These are

4 real Baum-Welch: summing the clock-tick probs brute-force EM would for each o d calculate responsibility γ d (s) = p(s o d ) for all s and from these calculate various expectations (eg. E d (i), E d (ij)) Baum-Welch instead first runs α and β for o d. By various terms involving α and β can derive various per t clock tick mini responsibilities and then summations over these give various expectations (E d (i), E d (ij) etc). These are occupation γ d t (i) = the cond. prob. of state i at t given o d = α t(i)β t(i)/p(o d )

5 real Baum-Welch: summing the clock-tick probs brute-force EM would for each o d calculate responsibility γ d (s) = p(s o d ) for all s and from these calculate various expectations (eg. E d (i), E d (ij)) Baum-Welch instead first runs α and β for o d. By various terms involving α and β can derive various per t clock tick mini responsibilities and then summations over these give various expectations (E d (i), E d (ij) etc). These are occupation γ d t (i) = the cond. prob. of state i at t given o d = α t(i)β t(i)/p(o d ) transition ξ d t (i,j) = the cond. prob. of transition ij at t given o d = [α t(i) a ijb j(o d t+1) β t+1(j)]/p(o d )

6 occupation γ t (i) α(t, i) β(t, i) i o 1 o t o t+1 o T occupation γ d t (i) = the probability of state i at t given o d = α t(i)β t(i)/p(o d )

7 transition ξ t (i,j) α(t, i) β(t+1, j) i A[i j] j B[j, o t+1 ] o 1 o t o t+1 o t+2 o T transition ξ t(i,j) = the probability of transition ij at t given o d = [α t(i) a ijb j(o t+1) β t+1(j)]/p(o)

8 re-estimation of transition probs A the re-estimation for the transition probs a ij involves getting the expected count of transition ij and comparing to the expected count of i â ij = T 1 d t=1 ξd t (i,j) d T 1 t=1 γd t (i) Note the limit T 1: at the last time tick there is no defined ij transition, nor should any expected state value at T be relevant.

9 picturing the numerator summation for transition probs sum over t = expectation of i to j given obs ξ(t i j) ξ(t i j) ξ (t i j) i j i j i j t t t T 1 t=1 ξt(i,j)

10 re-estimating the observation probs B the re-estimation for the obs probs b j(k) involves getting the expected count being in state j and producing observation symbol k and comparing this to the expected count of being in state j T d t=1 ˆb j(k) = o t=kγ t(j) T d t=1 γt(j) in the numerator just the time ticks where o t = k are taken, and in the denominator every time tick is taken

11 picturing the numerator summation for the observation probs sum over t where obs is k = expectation of ob k with state j given obs γ(t j k) γ(t j k) γ (t j k) j j j o at t = k o at t = k o at t = k T t=1 o t=kγ t(j)

12 picturing the numerator summation for the observation probs sum over t where obs is k = expectation of ob k with state j given obs γ(t j k) γ(t j k) γ (t j k) j j j o at t = k o at t = k o at t = k T t=1 o t=kγ t(j) ie. sum γ t(j) only where o t is obs k

13 re-estimating the start probs π the re-estimation for start prob π[i] involves getting the expected count of being in state i at t = 1 and comparing to number of observations D ˆpi[i] = d γd 1(i) D

14 The backward algorithm recall forward probability α t(i) = P(o 1...o t,s t = i) Recursion for α base α 1(i) = π(i)b i(o 1) recursive α t(j) = N i=1αt 1(i)aijbj(ot), for t = 2,...,T corresponding for β backward probability β t(i) = P(o t+1...o T s t = i) Recursion for β base β T (i) = 1 recursive β t(i) = N j=1aijbj(ot+1)βt+1(j), for t = T 1,...,1

15 deriving β for β t(i) need P(o t+1...o T s t = i). let j be some arbitrary state at t +1. If had P(s t+1 = j,o t+1...o T s t = i), could sum over the j to get desired quantity. hence P(s t+1 = j,o t+1...o T s t = i) = P(st = i,st+1 = j,ot+1...o T) P(s t = i) P(st = 1,st+1 = j,ot+1)βt+1(j) = P(s t = i) P(st = 1,st+1 = j)bj(ot+1)βt+1(j) = P(s t = i) = a ijb j(o t+1)β t+1(j) β t(i) = N a ijb j(o t+1)β t+1(j) j=1

Lab 3: Practical Hidden Markov Models (HMM)

Advanced Topics in Bioinformatics Lab 3: Practical Hidden Markov Models () Maoying, Wu Department of Bioinformatics & Biostatistics Shanghai Jiao Tong University November 27, 2014 Hidden Markov Models