Hidden Markov Models

Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe saic siuaions Each random variable ges a single fixed value in a single problem insance Now we consider he problem of describing probabilisic environmens ha evolve over ime Examples: robo localizaion, racking, speech,

Hidden Markov Models A each ime slice, he sae of he world is described by an unobservable variable X and an observable evidence variable E Transiion model: disribuion over he curren sae given he whole pas hisory: P(X X 0,, X -1 = P(X X 0:-1 Observaion model: P(E X 0:, E 1:-1 X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

Hidden Markov Models Markov assumpion The curren sae is condiionally independen of all he oher saes given he sae in he previous ime sep (firs order Wha is he ransiion model? P(X X 0:-1 = P(X X -1 Markov assumpion for observaions The evidence a ime depends only on he sae a ime Wha is he observaion model? P(E X 0:, E 1:-1 = P(E X X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

Example sae evidence

Example Transiion model sae evidence Observaion model

An alernaive visualizaion U=T: 0.9 U=F: 0.1 0.3 0.7 R=T R=F 0.7 0.3 U=T: 0.2 U=F: 0.8 Transiion probabiliies R = T R = F R -1 = T 0.7 0.3 R -1 = F 0.3 0.7 Observaion (emission probabiliies U = T U = F R = T 0.9 0.1 R = F 0.2 0.8

Anoher example Saes: X = {home, office, cafe} Observaions: E = {sms, facebook, email} Slide credi: Andy Whie

The Join Disribuion Transiion model: P(X X 0:-1 = P(X X -1 Observaion model: P(E X 0:, E 1:-1 = P(E X How do we compue he full join P(X 0:, E 1:? P( X E 0 :, 1: = P( X 0 P( X i Xi 1 P( Ei Xi i= 1 X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? The forward algorihm Query variable X 0 X 1 X k X -1 X E 1 E k E -1 E Evidence variables

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? The forward-backward algorihm X 0 X 1 X k X -1 X E 1 E k E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: X 0 X 1 X k X -1 X E 1 E k E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1: Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: Decoding: wha is he mos likely sae sequence X 0: given he observaion sequence e 1:? The Vierbi algorihm X 0 X 1 X k X -1 X E 1 E k E -1 E

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Query variable X 0 X 1 X k X -1 X E 1 E k E -1 E Evidence variables

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 Time: e -1 = Facebook Wha is P(X = Office e 1:-1? Home 0.6 0.6 Home?? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 Office 0.3 0.2 0.8 Office?? Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 Home?? Office?? Cafe?? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 P( X e = = 1: 1 x 1 x 1 = P( X P( X x 1 P( X x x 1 1, e, x 1: 1 P( x 1 P( x 1 e e 1: 1 1 1: 1 e 1: 1 P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 0.6 0.2 0.8 Home?? Office?? P 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: e = Email Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 0.6 0.2 0.8 Home?? Office?? P 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Wha is P(X = Office e 1:? Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1 P(e X = 0.8

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 P(X -1 e 1:-1 P(X X -1 Time: e = Email Home?? Office?? Cafe?? P(e X = 0.8 P Wha is P(X = Office e 1:-1? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 P( X x 1 Wha is P(X = Office e 1:? e = ; e P( e P( e 1: 1 X ; e1: 1 P( X P( e e1: 1 X P( X e 1: 1 e 1: 1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 P(X -1 e 1:-1 P(X X -1 Time: e = Email Home?? Office?? Cafe?? P(e X = 0.8 P Wha is P(X = Office e 1:-1? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Wha is P(X = Office e 1:? P( X e1 : P( e X P( X e1: 1 0.5 * 0.8 = 0.4 Noe: mus also compue his value for Home and Cafe, and renormalize o sum o 1

Filering: The Forward Algorihm Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Base case: priors P(X 0 Predicion: propagae belief from X -1 o X P ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Correcion: weigh by evidence e P( X e1 : = P( X e ; e1: 1 P( e X P( X e1: 1 Renormalize o have all P(X = x e 1: sum o 1

Filering: The Forward Algorihm Time: 0 Time: 1 Time: e -1 e Home prior Home Home Office prior Office Office Cafe prior Cafe Cafe

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: X 0 X 1 X k X -1 X E 1 E k E -1 E

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: Recursive formulaion: suppose we know P(e 1:-1 = = = = = x x x x P x e P P x P x e P P x e P P e P P e P P ( ( ( (, ( (, ( ( ( (, ( ( 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1: e e e e e e e e e e e

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: Recursive formulaion: suppose we know P(e 1:-1 P ( e1 : = P( e1: 1 P( e x P( x e1: 1 x recursion filering

Smoohing Wha is he disribuion of some sae X k given he enire observaion sequence e 1:? X 0 X 1 X k X -1 X E 1 E k E -1 E

Smoohing Wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Soluion: he forward-backward algorihm Time: 0 Time: k e k Time: e Home Home Home Office Office Office Cafe Cafe Cafe Forward message: P(X k e 1:k Backward message: P(e k+1: X k

Decoding: Vierbi Algorihm Task: given observaion sequence e 1:, compue mos likely sae sequence x 0: x * : = arg max P( x e x 0: 1: 0 0: X 0 X 1 X k X -1 X E 1 E k E -1 E

Decoding: Vierbi Algorihm Task: given observaion sequence e 1:, compue mos likely sae sequence x 0: The mos likely pah ha ends in a paricular sae x consiss of he mos likely pah o some sae x -1 followed by he ransiion o x Time: 0 Time: 1 Time: x -1 x

Decoding: Vierbi Algorihm Le m (x denoe he probabiliy of he mos likely pah ha ends in x : m ( x = = max max max x x x 0: 1 0: 1 1 P( x P( x,,, e [ m ( x P( x x P( e x ] 1 0: 1 0: 1 1 x x e 1: 1: 1 Time: 0 Time: 1 Time: m -1 (x -1 x -1 P(x x -1 x

Learning Given: a raining sample of observaion sequences Goal: compue model parameers Transiion probabiliies P(X X -1 Observaion probabiliies P(E X Wha if we have complee daa, i.e., e 1: and x 0:? Then we can esimae all he parameers by relaive frequencies # of imes sae b follows sae a P(X = b X -1 = a = oal # of ransiions from sae a P(E = e X = a = # of imes e is emied from sae a oal # of emissions from sae a

Learning Given: a raining sample of observaion sequences Goal: compue model parameers Transiion probabiliies P(X X -1 Observaion probabiliies P(E X Wha if we have complee daa, i.e., e 1: and x 0:? Then we can esimae all he parameers by relaive frequencies Wha if we only have he observaions? Need o use EM algorihm (and somehow figure ou he number of saes

Review: HMM Learning and Inference Inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1: Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: Decoding: wha is he mos likely sae sequence X 0: given he observaion sequence e 1:? Learning Given a raining sample of sequences, learn he model parameers (ransiion and emission probabiliies EM algorihm

Applicaions of HMMs Speech recogniion HMMs: Observaions are acousic signals (coninuous valued Saes are specific posiions in specific words (so, ens of housands Machine ranslaion HMMs: Observaions are words (ens of housands Saes are ranslaion opions Robo racking: Observaions are range readings (coninuous Saes are posiions on a map (coninuous Source: Tamara Berg

Applicaion of HMMs: Speech recogniion Noisy channel model of speech

Speech feaure exracion Specrogram Acousic wave form Sampled a 8KHz, quanized o 8-12 bis Frequency Ampliude Frame (10 ms or 80 samples Time Feaure vecor ~39 dim.

Phoneic model Phones: speech sounds Phonemes: groups of speech sounds ha have a unique meaning/funcion in a language (e.g., here are several differen ways o pronounce

Phoneic model

HMM models for phones HMM saes in mos speech recogniion sysems correspond o subphones There are around 60 phones and as many as 60 3 conex-dependen riphones

HMM models for words

Puing words ogeher Given a sequence of acousic feaures, how do we find he corresponding word sequence?

Decoding wih he Vierbi algorihm

Reference D. Jurafsky and J. Marin, Speech and Language Processing, 2 nd ed., Prenice Hall, 2008

More general models: Dynamic Bayesian neworks Deecing ineracion links in a collaboraing group using manually annoaed daa S. Mahur, M.S. Poole, F. Pena-Mora, M. Hasegawa-Johnson, N. Conracor Social Neworks 10.1016/j.socne.2012.04.002 Speaking: S i =1 if #i is speaking. Link: L ij =1 if #i is lisening o #j. Neighborhood: N ij =1 if hey are near one anoher. Gaze: G ij =1 if #i is looking a #j. Indirec: I ij =1 if #i and #j are boh lisening o he same person.