Temporal probabiliy models CS194-10 Fall 2011 Lecure 25 CS194-10 Fall 2011 Lecure 25 1
Ouline Hidden variables Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic Bayesian neworks Paricle ilering CS194-10 Fall 2011 Lecure 25 2
Hidden variables The underlying sae o he process is usually unobservable; observaions a ime do no cause observaions a ime + 1 E.g., diabees managemen X = se o unobservable sae variables a ime e.g., BloodSugar, SomachConens, ec. E = se o observable evidence variables a ime e.g., MeasuredBloodSugar, PulseRae, FoodEaen Sensor Markov assumpion: P(E X 0:,E 0: 1 ) = P(E X ) This assumes discree ime sep size depends on problem model srucure depends on ime sep chosen CS194-10 Fall 2011 Lecure 25 3
Example Rain 1 R 1 P(R ) 0.7 0.3 Rain Rain +1 R P(U ) 0.9 0.2 Umbrella 1 Umbrella Umbrella +1 Firs-order Markov assumpion no exacly rue in real world! Possible ixes: 1. Increase order o Markov process 2. Augmen sae, e.g., add Temp, Pressure Example: robo moion Augmen posiion and velociy wih Baery CS194-10 Fall 2011 Lecure 25 4
Inerence asks Filering: P(X e 1: ) belie sae inpu o he decision process o a raional agen Predicion: P(X +k e 1: ) or k > 0 evaluaion o possible acion sequences; like ilering wihou he evidence Smoohing: P(X k e 1: ) or 0 k < beer esimae o pas saes, essenial or learning Mos likely explanaion: arg max x1: P(x 1: e 1: ) speech recogniion, decoding wih a noisy channel CS194-10 Fall 2011 Lecure 25 5
Filering Aim: devise a recursive sae esimaion algorihm: P(X +1 e 1:+1 ) = (e +1,P(X e 1: )) P(X +1 e 1:+1 ) = P(X +1 e 1:,e +1 ) = αp(e +1 X +1,e 1: )P(X +1 e 1: ) = αp(e +1 X +1 )P(X +1 e 1: ) I.e., predicion + esimaion. Predicion by summing ou X : P(X +1 e 1:+1 ) = αp(e +1 X +1 )Σ x P(X +1 x,e 1: )P(x e 1: ) = αp(e +1 X +1 )Σ x P(X +1 x )P(x e 1: ) 1:+1 = Forward( 1:,e +1 ) where 1: =P(X e 1: ) Time and space consan (independen o ) or inie-sae X CS194-10 Fall 2011 Lecure 25 6
Filering example 0.500 0.500 0.627 0.373 True False 0.500 0.500 0.818 0.182 0.883 0.117 Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 CS194-10 Fall 2011 Lecure 25 7
Smoohing X 0 X 1 X k X E1 E k E Divide evidence e 1: ino e 1:k, e k+1: : P(X k e 1: ) = P(X k e 1:k,e k+1: ) = αp(x k e 1:k )P(e k+1: X k,e 1:k ) = αp(x k e 1:k )P(e k+1: X k ) = α 1:k b k+1: Backward message compued by a backwards recursion: P(e k+1: X k ) = Σ xk+1 P(e k+1: X k,x k+1 )P(x k+1 X k ) = Σ xk+1 P(e k+1: x k+1 )P(x k+1 X k ) = Σ xk+1 P(e k+1 x k+1 )P(e k+2: x k+1 )P(x k+1 X k ) CS194-10 Fall 2011 Lecure 25 8
Smoohing example 0.500 0.500 0.627 0.373 True False 0.500 0.500 0.818 0.182 0.883 0.117 orward 0.883 0.117 0.883 0.117 smoohed 0.690 0.410 1.000 1.000 backward Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 Forward backward algorihm: cache orward messages along he way Time linear in (polyree inerence), space O( ) CS194-10 Fall 2011 Lecure 25 9
Mos likely explanaion Mos likely sequence sequence o mos likely saes!!!! Mos likely pah o each x +1 = mos likely pah o some x plus one more sep max P(x x 1...x 1,...,x,X +1 e 1:+1 ) = P(e +1 X +1 ) max P(X+1 x x ) max P(x x 1...x 1,...,x 1,x e 1: ) 1 Idenical o ilering, excep 1: replaced by m 1: = max x 1...x 1 P(x 1,...,x 1,X e 1: ), I.e., m 1: (i) gives he probabiliy o he mos likely pah o sae i. Updae has sum replaced by max, giving he Vierbi algorihm: m 1:+1 = P(e +1 X +1 ) max x (P(X +1 x )m 1: ) CS194-10 Fall 2011 Lecure 25 10
Vierbi example Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 sae space pahs rue alse rue alse rue alse rue alse rue alse umbrella rue rue alse rue rue mos likely pahs.8182.5155.0361.0334.0210.1818.0491.1237.0173.0024 m 1:1 m 1:2 m 1:3 m 1:4 m 1:5 CS194-10 Fall 2011 Lecure 25 11
Hidden Markov models (HMMs) X is a single, discree variable (oen E is oo) Domain o X is {1,...,S} Transiion marix T ij = P(X = j X 1 =i), e.g., 0.7 0.3 0.3 0.7 Sensor marix O or each ime sep, diagonal elemens P(e X = i) 0.9 0 e.g., wih U 1 = rue, O 1 = 0 0.2 Forward and backward messages as column vecors: 1:+1 = αo +1 T 1: b k+1: = TO k+1 b k+2: Forward backward algorihm needs ime O(S 2 ) and space O(S) (logarihmic or consan space is also possible) CS194-10 Fall 2011 Lecure 25 12
Learning in HMMs Rain 1 R 1 P(R 2 ) θ 112 θ 012 Rain 2 R 2 P(R 3 ) θ 113 θ 013 Rain 3 Umbrella 1 Umbrella 2 R2 P(U 2) φ 112 φ 012 Umbrella 3 R3 P(U 3) φ 113 φ 013 I paremeers a each ime sep were separae, he EM updae would be E sep: p ijk = P(X = k,x 1 = j e (i), θ) i,j,k, M sep: θ jk = ˆN jk /Σ k ˆNjk = Σ i p ijk /Σ i Σ k p ijk Saionary parameers θ jk = λ jk, so EM chain rule: L λ jk = Σ L θ jk θ jk λ jk E sep: p ijk = P(X = k,x 1 = j e (i), λ) i,j, k, M sep: λ jk = ˆN jk /Σ k ˆNjk = Σ i Σ p ijk /Σ i Σ Σ k p ijk CS194-10 Fall 2011 Lecure 25 13
CS194-10 Fall 2011 Lecure 25 14
Kalman ilers Modelling sysems described by a se o coninuous variables, e.g., racking a bird lying X = X,Y, Z, Ẋ, Ẏ, Ż. Airplanes, robos, ecosysems, economies, chemical plans, planes,... X X+1 X X+1 Z Z+1 Gaussian prior, linear Gaussian ransiion model and sensor model CS194-10 Fall 2011 Lecure 25 15
Updaing Gaussian disribuions Predicion sep: i P(X e 1: ) is Gaussian, hen predicion P(X +1 e 1: ) = x P(X +1 x )P(x e 1: ) dx is Gaussian. I P(X +1 e 1: ) is Gaussian, hen he updaed disribuion P(X +1 e 1:+1 ) = αp(e +1 X +1 )P(X +1 e 1: ) is Gaussian Hence P(X e 1: ) is mulivariae Gaussian N(µ,Σ ) or all General (nonlinear, non-gaussian) process: descripion o poserior grows unboundedly as CS194-10 Fall 2011 Lecure 25 16
Simple 1-D example Gaussian random walk on X axis, s.d. σ x, sensor s.d. σ z µ +1 = (σ2 + σ 2 x)z +1 + σ 2 zµ σ 2 + σ 2 x + σ 2 z σ 2 +1 = (σ2 + σ 2 x)σ 2 z σ 2 + σ 2 x + σ 2 z P(X) 0.45 0.4 0.35 P(x1 z1=2.5) 0.3 0.25 P(x0) 0.2 0.15 0.1 P(x1) 0.05 0 *z1-8 -6-4 -2 0 2 4 6 8 X posiion CS194-10 Fall 2011 Lecure 25 17
Transiion and sensor models: General Kalman updae P(x +1 x ) = N(Fx,Σ x )(x +1 ) P(z x ) = N(Hx,Σ z )(z ) F is he marix or he ransiion; Σ x he ransiion noise covariance H is he marix or he sensors; Σ z he sensor noise covariance Filer compues he ollowing updae: µ +1 = Fµ + K +1 (z +1 HFµ ) Σ +1 = (I K +1 H)(FΣ F + Σ x ), where K +1 = (FΣ F + Σ x )H (H(FΣ F + Σ x )H + Σ z ) 1 is he Kalman gain marix Σ and K are independen o observaion sequence, so compue oline CS194-10 Fall 2011 Lecure 25 18
2-D racking example: ilering 12 2D ilering 11 rue observed ilered 10 Y 9 8 7 6 8 10 12 14 16 18 20 22 24 26 X CS194-10 Fall 2011 Lecure 25 19
2-D racking example: smoohing 12 2D smoohing 11 rue observed smoohed 10 Y 9 8 7 6 8 10 12 14 16 18 20 22 24 26 X CS194-10 Fall 2011 Lecure 25 20
Dynamic Bayesian neworks X, E conain arbirarily many variables in a replicaed Bayes ne BMeer 1 P(R ) 0 0.7 R 0 P(R 1) 0.7 0.3 Baery 0 Baery 1 Rain 0 Rain 1 R 1 P(U 1) 0.9 0.2 X 0 X 1 Umbrella 1 XX 0 X 1 Z 1 CS194-10 Fall 2011 Lecure 25 21
DBNs vs. HMMs Every HMM is a single-variable DBN; every discree DBN is an HMM X X +1 Y Y+1 Z Z +1 Sparse dependencies exponenially ewer parameers; e.g., 20 sae variables, hree parens each DBN has 20 2 3 = 160 parameers, HMM has 2 20 2 20 10 12 CS194-10 Fall 2011 Lecure 25 22
DBNs vs Kalman ilers Every Kalman iler model is a DBN, bu ew DBNs are KFs; real world requires non-gaussian poseriors E.g., where are my keys? Wha s he baery charge? BMBroken0 BMBroken1 BMeer 1 Baery 0 Baery 1 5 E(Baery...5555005555...) 4 E(Baery...5555000000...) X 0 XX 0 X 1 X 1 E(Baery) 3 2 1 P(BMBroken...5555000000...) 0 P(BMBroken...5555005555...) Z 1-1 15 20 25 30 Time sep CS194-10 Fall 2011 Lecure 25 23
Exac inerence in DBNs Naive mehod: unroll he nework and run any exac algorihm P(R 0) 0.7 Rain 0 R 0 P(R 1) 0.7 0.3 Rain 1 P(R 0) 0.7 Rain 0 R 0 P(R 1) 0.7 0.3 Rain 1 R 0 P(R 1) 0.7 0.3 Rain 2 R 0 P(R 1) 0.7 0.3 Rain 3 R 0 P(R 1) 0.7 0.3 Rain 4 R 0 P(R 1) 0.7 0.3 Rain 5 R 0 P(R 1) 0.7 0.3 Rain 6 R 0 P(R 1) 0.7 0.3 Rain 7 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 R 1 P(U 1) 0.9 0.2 Umbrella 1 Umbrella 1 Umbrella 2 Umbrella 3 Umbrella 4 Umbrella 5 Umbrella 6 Umbrella 7 Problem: inerence cos or each updae grows wih Rollup ilering: add slice + 1, sum ou slice using variable eliminaion Larges acor is O(K D+L ), updae cos O(K D+L+1 ) (c. HMM updae cos O(K 2D )) CS194-10 Fall 2011 Lecure 25 24
Paricle ilering Basic idea: poserior a represened by populaion o paricles ; resample given evidence o rack high-likelihood regions o he sae-space Replicae paricles proporional o likelihood or e rue Rain Rain +1 Rain +1 Rain +1 alse (a) Propagae (b) Weigh (c) Resample Widely used or racking nonlinear sysems, esp. in vision Also used or simulaneous localizaion and mapping in mobile robos 10 5 -dimensional sae space CS194-10 Fall 2011 Lecure 25 25
Paricle ilering cond. Assume consisen a ime : N(x e 1: )/N = P(x e 1: ) Propagae orward: populaions o x +1 are N(x +1 e 1: ) = Σ x P(x +1 x )N(x e 1: ) Weigh samples by heir likelihood or e +1 : W(x +1 e 1:+1 ) = P(e +1 x +1 )N(x +1 e 1: ) Resample o obain populaions proporional o W: N(x +1 e 1:+1 )/N = αw(x +1 e 1:+1 ) = αp(e +1 x +1 )N(x +1 e 1: ) = αp(e +1 x +1 )Σ x P(x +1 x )N(x e 1: ) = α P(e +1 x +1 )Σ x P(x +1 x )P(x e 1: ) = P(x +1 e 1:+1 ) CS194-10 Fall 2011 Lecure 25 26
Paricle ilering perormance Approximaion error o paricle ilering remains bounded over ime, i ransiion and sensor probabiliies bounded away rom 0 and 1 Avg absolue error 1 0.8 0.6 0.4 0.2 LW(25) LW(100) LW(1000) LW(10000) ER/SOF(25) 0 0 5 10 15 20 25 30 35 40 45 50 Time sep CS194-10 Fall 2011 Lecure 25 27
Summary General saionary Markovian model composed o ransiion modelp(x X 1 ) sensor model P(E X ) Tasks are ilering, predicion, smoohing, mos likely sequence; all done recursively wih consan cos per ime sep Hidden Markov models have a single discree sae variable EM raining by orward backward algorihm core model or speech recogniion Kalman ilers allow n sae variables, linear Gaussian, O(n 3 ) updae Dynamic Bayes nes subsume HMMs, Kalman ilers; exac updae inracable Paricle ilering is a good approximae ilering algorihm or DBNs CS194-10 Fall 2011 Lecure 25 28