CS 88: Artificil Intelligence Lecture 7: HMMs nd Prticle Filtering Resoning over Time or Spce Often, we wnt to reson out sequence of oservtions Speech recognition Root locliztion User ttention Medicl monitoring Pieter Aeel --- UC Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Need to introduce time (or spce) into our models 5 Outline Mrkov Models Mrkov Models ( = prticulr Byes net) Hidden Mrkov Models (HMMs) Representtion ( = nother prticulr Byes net) Inference Forwrd lgorithm ( = vrile elimintion) Prticle filtering ( = likelihood weighting with some tweks) Viteri (= vrile elimintion, ut replce sum y mx = grph serch) Dynmic Byes Nets Representtion (= yet nother prticulr Byes net) Inference: forwrd lgorithm nd prticle filtering 6 A Mrkov model is chin-structured BN Ech node is identiclly distriuted (sttionrity) Vlue of X t given time is clled the stte As BN: X X 2 X 3 X 4 Prmeters: clled trnsition proilities or dynmics, specify how the stte evolves over time (lso, initil stte proilities) Sme s MDP trnsition model, ut no choice of ction Conditionl Independence X X 2 X 3 X 4 Bsic conditionl independence: Pst nd future independent of the present Ech time step only depends on the previous This is clled the (first order) Mrkov property Note tht the chin is just (growing) BN We cn lwys use generic BN resoning on it if we truncte the chin t fixed length X X 2 X 3 X 4 Query: P(X 4 ) Slow nswer: inference y enumertion Enumerte ll sequences of length t which end in s Add up their proilities 8 = join on X, X 2, X 3, then sum over x, x 2, x 3 9
X X 2 X 3 X 4 Query: P(X 4 ) Query P(X_t) Fst nswer: vrile elimintion Order: X, X 2, X 3 X X 2 X 3 X 4 Vrile elimintion in order X, X 2,, X t- computes for k = 2, 3,, t Forwrd simultion = mini-forwrd lgorithm Note: common thred in this lecture: specil cses of lgorithms we lredy know, nd they hve specil nme in the context of HMMs for historicl resons. Exmple Mrkov Chin: Wether Sttes: X = {rin, sun} CPT P(X t X t- ): X t- X t P(X t X t- ) sun sun.9 sun rin. rin sun.3 rin rin.7 Exmple Run of Mini-Forwrd Algorithm From initil oservtion of sun P(X ) P(X 2 ) P(X 3 ) P(X 4 ) P(X ) From initil oservtion of rin P(X ) P(X 2 ) P(X 3 ) P(X 4 ) P(X ) Two new wys of representing the sme CPT, tht re often used for Mrkov models (These re not BNs!).7 rin.3. sun.9 sun rin.9..3.7 sun rin 2 From yet nother initil distriution P(X ): P(X ) P(X ) 4 Sttionry Distriutions For most chins: influence of initil distriution gets less nd less over time. the distriution we end up in is independent of the initil distriution Sttionry distriution: Distriution we end up with is clled the sttionry distriution P of the chin It stisfies Appliction of Mrkov Chin Sttionry Distriution: We Link Anlysis PgeRnk over we grph Ech we pge is stte Initil distriution: uniform over pges Trnsitions: With pro. c, uniform jump to rndom pge (dotted lines, not ll shown) With pro. -c, follow rndom outlink (solid lines) Sttionry distriution Will spend more time on highly rechle pges E.g. mny wys to get to the Acrot Reder downlod pge Somewht roust to link spm Google. returned the set of pges contining ll your keywords in decresing rnk, now ll serch engines use link nlysis long with mny other fctors (rnk ctully getting less importnt over time) 6 2
Appliction of Mrkov Chin Sttionry Distriution: Gis Smpling* Ech joint instntition over ll hidden nd query vriles is stte. Let X = H \union Q Trnsitions: With proility /n resmple vrile X j ccording to P(X j x, x 2,, x j-, x j+,, x n, e,, e m ) Sttionry distriution: = conditionl distriution P(X, X 2,, X n e,, e m ) à When running Gis smpling long enough we get smple from the desired distriution! We did not prove this, ll we did is stting this result. 7 Outline Mrkov Models ( = prticulr Byes net) Hidden Mrkov Models (HMMs) Representtion ( = nother prticulr Byes net) Inference Forwrd lgorithm ( = vrile elimintion) Prticle filtering ( = likelihood weighting with some tweks) Viteri (= vrile elimintion, ut replce sum y mx = grph serch) Dynmic Byes Nets Representtion (= yet nother prticulr Byes net) Inference: forwrd lgorithm nd prticle filtering 8 Hidden Mrkov Models Exmple Mrkov chins not so useful for most gents Need oservtions to updte your eliefs Hidden Mrkov models (HMMs) Underlying Mrkov chin over sttes S You oserve outputs (effects) t ech time step As Byes net: X X 2 X 3 X 4 X 5 E E 2 E 3 E 4 E 5 An HMM is defined y: Initil distriution: Trnsitions: Emissions: Ghostusters HMM Conditionl Independence P(X ) = uniform P(X X ) = usully move clockwise, ut sometimes move in rndom direction or sty in plce P(R ij X) = sme sensor model s efore: red mens close, green mens fr wy. /9 /9 /9 /9 /9 /9 /9 /9 /9 P(X ) HMMs hve two importnt independence properties: Mrkov hidden process, future depends on pst vi the present Current oservtion independent of ll else given current stte X X 2 X 3 X 4 X 5 /6 /6 /2 X X 2 X 3 X 4 /6 E E 2 E 3 E 4 E 5 R i,j R i,j R i,j R i,j X 5 P(X X =<,2>) Quiz: does this men tht evidence vriles re gurnteed to e independent? [No, they tend to correlted y the hidden stte] E 5 3
Rel HMM Exmples Speech recognition HMMs: Oservtions re coustic signls (continuous vlued) Sttes re specific positions in specific words (so, tens of thousnds) Mchine trnsltion HMMs: Oservtions re words (tens of thousnds) Sttes re trnsltion options Root trcking: Oservtions re rnge redings (continuous) Sttes re positions on mp (continuous) Filtering / Monitoring Filtering, or monitoring, is the tsk of trcking the distriution B t (X) = P t (X t e,, e t ) (the elief stte) over time We strt with B (X) in n initil setting, usully uniform As time psses, or we get oservtions, we updte B(X) The Klmn filter ws invented in the 6 s nd first implemented s method of trjectory estimtion for the Apollo progrm Exmple: Root Locliztion Exmple: Root Locliztion Exmple from Michel Pfeiffer Pro t= Sensor model: cn red in which directions there is wll, never more thn mistke Motion model: my not execute ction with smll pro. Pro t= Lighter grey: ws possile to get the reding, ut less likely /c required mistke Exmple: Root Locliztion Exmple: Root Locliztion Pro Pro t=2 t=3 4
Exmple: Root Locliztion Exmple: Root Locliztion Pro Pro t=4 t=5 Query: P(X 4 e,e 2,e 3,e 4 ) --- Vrile Elimintion, X, X 2, X 3 P (X4 e,e2,e3,e4) P (X4,e,e2,e3,e4) = P (x,x2,x3,x4,e,e2,e3,e4) x,x2,x3 = P (e4 X4)P (X4 x3)p (e3 x3)p (x3 x2)p (e2 x2)p (x2 x)p (e x)p (x) x3 x2 x = P (e4 X4)P (X4 x3)p (e3 x3)p (x3 x2)p (e2 x2)p (x2 x)p (x,e) x3 x2 x = P (e4 X4)P (X4 x3)p (e3 x3)p (x3 x2)p (e2 x2) P (x2 x)p (x,e) x3 x2 x = P (e4 X4)P (X4 x3)p (e3 x3)p (x3 x2)p (e2 x2)p (x2,e) x3 x2 = P (e4 X4)P (X4 x3)p (e3 x3)p (x3 x2)p (x2,e,e2) x3 The Forwrd Algorithm We re given evidence t ech time nd wnt to know We cn derive the following updtes We cn normlize s we go if we wnt to hve P(x e) t ech time step, or just once t the end x2 = P (e4 X4)P (X4 x3)p (e3 x3) P (x3 x2)p (x2,e,e2) x3 x2 = P (e4 X4)P (X4 x3)p (e3 x3)p (x3,e,e2) x3 = P (e4 X4)P (X4 x3)p (x3,e,e2,e3) x3 Re-occurring computtion: = P (e4 X4) P (X4 x3)p (x3,e,e2,e3) x3 = P (e4 X4)P (x4,e,e2,e3) = P (X4,e,e2,e3,e4) 32 = exctly vrile elimintion in order X, X 2, Belief Updting = the forwrd lgorithm roken down into two steps nd with normliztion Belief updtes cn lso esily e derived from sic proility Forwrd lgorithm: Cn rek this down into: Time updte: Oservtion updte: Pssge of Time Given: P(X t ), P(X t+ X t ) Query: P(x t+ ) 8 x t+ Oservtion Given: P(X t+ ), P(e t+ X t+ ) Query: P(x t+ e t+ ) 8 x t+ X t+ Normlizing in the oservtion updte gives: Time updte: X t X t+ E t+ Oservtion updte: Nottion: Time updte: Oservtion updte: 5
Exmple: Pssge of Time As time psses, uncertinty ccumultes Exmple: Oservtion As we get oservtions, eliefs get reweighted, uncertinty decreses T = T = 2 T = 5 Before oservtion After oservtion Trnsition model: ghosts usully go clockwise Exmple HMM Outline Mrkov Models ( = prticulr Byes net) Hidden Mrkov Models (HMMs) Representtion ( = nother prticulr Byes net) Inference Forwrd lgorithm ( = vrile elimintion) Prticle filtering ( = likelihood weighting with some tweks) Viteri (= vrile elimintion, ut replce sum y mx = grph serch) Dynmic Byes Nets Representtion (= yet nother prticulr Byes net) Inference: forwrd lgorithm nd prticle filtering 43 Prticle Filtering Representtion: Prticles Filtering: pproximte solution Sometimes X is too ig to use exct inference X my e too ig to even store B(X) E.g. X is continuous Solution: pproximte inference Trck smples of X, not ll vlues Smples re clled prticles Time per step is liner in the numer of smples But: numer needed my e lrge In memory: list of prticles, not sttes This is how root locliztion works in prctice Prticle is just new nme for smple......2..2.5 Our representtion of P(X) is now list of N prticles (smples) Generlly, N << X Storing mp from X to counts would defet the point P(x) pproximted y numer of prticles with vlue x So, mny x will hve P(x) =! More prticles, more ccurcy For now, ll prticles hve weight of Prticles: (,2) 45 6
Prticle Filtering: Elpse Time Prticle Filtering: Oserve Ech prticle is moved y smpling its next position from the trnsition model This is like prior smpling smples frequencies reflect the trnsition pros Here, most smples move clockwise, ut some move in nother direction or sty in plce This cptures the pssge of time If enough smples, close to exct vlues efore nd fter (consistent) Prticles: (,2) Prticles: (3,) (,3) (2,2) Slightly trickier: Don t smple oservtion, fix it Similr to likelihood weighting, downweight smples sed on the evidence As efore, the proilities don t sum to one, since most hve een downweighted (in fct they sum to n pproximtion of P(e)) Prticles: (3,) (,3) (2,2) Prticles: w=.9 w=.2 w=.9 (3,) w=.4 w=.4 w=.9 (,3) w=. w=.2 w=.9 (2,2) w=.4 Prticle Filtering: Resmple Recp: Prticle Filtering Rther thn trcking weighted smples, we resmple N times, we choose from our weighted smple distriution (i.e. drw with replcement) Prticles: w=.9 w=.2 w=.9 (3,) w=.4 w=.4 w=.9 (,3) w=. w=.2 w=.9 (2,2) w=.4 Prticles: trck smples of sttes rther thn n explicit distriution Elpse Weight Resmple This is equivlent to renormlizing the distriution Now the updte is complete for this time step, continue with the next one (New) Prticles: (2,2) (,3) Prticles: (,2) Prticles: (3,) (,3) (2,2) Prticles: w=.9 w=.2 w=.9 (3,) w=.4 w=.4 w=.9 (,3) w=. w=.2 w=.9 (2,2) w=.4 (New) Prticles: (2,2) (,3) 49 Outline Mrkov Models ( = prticulr Byes net) Hidden Mrkov Models (HMMs) Representtion ( = nother prticulr Byes net) Inference Forwrd lgorithm ( = vrile elimintion) Prticle filtering ( = likelihood weighting with some tweks) Viteri (= vrile elimintion, ut replce sum y mx = grph serch) Dynmic Byes Nets Representtion (= yet nother prticulr Byes net) Inference: forwrd lgorithm nd prticle filtering 5 Dynmic Byes Nets (DBNs) We wnt to trck multiple vriles over time, using multiple sources of evidence Ide: Repet fixed Byes net structure t ech time Vriles from time t cn condition on those from t- G E t = t =2 G E G 2 E 2 Discrete vlued dynmic Byes nets re lso HMMs G 2 E 2 G 3 E 3 t =3 G 3 E 3 7
Exct Inference in DBNs Vrile elimintion pplies to dynmic Byes nets Procedure: unroll the network for T time steps, then eliminte vriles until P(X T e :T ) is computed G E t = t =2 t =3 G E G 2 E 2 Online elief updtes: Eliminte ll vriles from the previous time step; store fctors for current time only G 2 E 2 G 3 E 3 G 3 E 3 52 DBN Prticle Filters A prticle is complete smple for time step Initilize: Generte prior smples for the t= Byes net Exmple prticle: G = G = (5,3) Elpse time: Smple successor for ech prticle Exmple successor: G 2 = G 2 = (6,3) Oserve: Weight ech entire smple y the likelihood of the evidence conditioned on the smple Likelihood: P(E G ) * P(E G ) Resmple: Select prior smples (tuples of vlues) in proportion to their likelihood 53 Trick I to Improve Prticle Filtering Performnce: Low Vrince Resmpling Trick II to Improve Prticle Filtering Performnce: Regulriztion If no or little noise in trnsitions model, ll prticles will strt to coincide Advntges: More systemtic coverge of spce of smples If ll smples hve sme importnce weight, no smples re lost Lower computtionl complexity à regulriztion: introduce dditionl (rtificil) noise into the trnsition model Root Locliztion In root locliztion: We know the mp, ut not the root s position Oservtions my e vectors of rnge finder redings Stte spce nd redings re typiclly continuous (works siclly like very fine grid) nd so we cnnot store B(X) Prticle filtering is min technique SLAM SLAM = Simultneous Locliztion And Mpping We do not know the mp or our loction Stte consists of position AND mp! Min techniques: Klmn filtering (Gussin HMMs) nd prticle methods Demos: glol-floor.gif 8
Prticle Filter Exmple 3 prticles SLAM DEMOS Intel-l-rw-odo.wmv Intel-l-scn-mtching.wmv visionslm_helioffice.wmv mp of prticle mp of prticle 3 mp of prticle 2 6 P4: Ghostusters 2. (et) Outline Plot: Pcmn's grndfther, Grndpc, lerned to hunt ghosts for sport. He ws linded y his power, ut could her the ghosts nging nd clnging. Trnsition Model: All ghosts move rndomly, ut re sometimes ised Emission Model: Pcmn knows noisy distnce to ech ghost Noisy distnce pro True distnce = 8 5 3 9 7 5 3 Mrkov Models ( = prticulr Byes net) Hidden Mrkov Models (HMMs) Representtion ( = nother prticulr Byes net) Inference Forwrd lgorithm ( = vrile elimintion) Prticle filtering ( = likelihood weighting with some tweks) Viteri (= vrile elimintion, ut replce sum y mx = grph serch) Dynmic Byes Nets Representtion (= yet nother prticulr Byes net) Inference: forwrd lgorithm nd prticle filtering 63 Best Explntion Queries Best Explntion Query Solution Method : Serch slight use of nottion, ssuming P(x x ) = P(x ) X X 2 X 3 X 4 X 5 E E 2 E 3 E 4 E 5 Sttes: {(), +x, -x, +x 2, -x 2,, +x t, -x t } Strt stte: () Query: most likely seq: 64 Actions: in stte x k, choose ny ssignment for stte x k+ Cost: Gol test: gol(x k ) = true iff k == t à Cn run uniform cost grph serch to find solution à Uniform cost grph serch will tke O( t d 2 ). Think out this! 9
Best Explntion Query Solution Method 2: Viteri Algorithm (= mx-product version of forwrd lgorithm) Further redings We re done with Prt II Proilistic Resoning To lern more (eyond scope of 88): Koller nd Friedmn, Proilistic Grphicl Models (CS28A) Thrun, Burgrd nd Fox, Proilistic Rootics (CS287) Viteri computtionl complexity: O(t d 2 ) Compre to forwrd lgorithm: 66