CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Hidden Markov Models Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, and Anca. http://ai.berkeley.edu.]

Markov Models o Value of X at a given time is called the state X1 X2 X3 X4 o Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) o Stationarity assumption: transition probabilities the same at all times o Same as MDP transition model, but no choice of action

Mini-Forward Algorithm o Question: What s P(X) on some day t? X1 X2 X3 X4 Forward simulation

Application of Stationary Distribution: Web Link Analysis o PageRank over a web graph o Each web page is a possible value of a state o Initial distribution: uniform over pages o Transitions: o With prob. c, uniform jump to a random page (dotted lines, not all shown) o With prob. 1-c, follow a random outlink (solid lines) o Stationary distribution o Will spend more time on highly reachable pages o E.g. many ways to get to the Acrobat Reader download page o Somewhat robust to link spam. o Google 1.0 returned the set of pages containing all your

Application of Stationary Distributions: Gibbs Sampling* o Each joint instantiation over all hidden and query variables is a state: {X1,, Xn} = H U Q o Transitions: o With probability 1/n resample variable Xj according to P(Xj x1, x2,, xj-1, xj+1,, xn, e1,, em) o Stationary distribution: o Conditional distribution P(X1, X2,, Xn e1,, em) o Means that when running Gibbs sampling long enough we get a sample from the desired distribution

Hidden Markov Models

Hidden Markov Models o Markov chains not so useful for most agents o Need observations to update your beliefs o Hidden Markov models (HMMs) o Underlying Markov chain over states X o You observe outputs (effects) at each time step X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Example: Weather HMM Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 o An HMM is defined by: o Initial distribution: o Transitions: o Emissions: Rt-1 Rt P(Rt Rt-1) Rt Ut P(Ut Rt) +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 -r +r 0.3 -r +u 0.2 -r -r 0.7 -r -u 0.8

Inference: Find State Given Evidence o We are given evidence at each time and want to know o Idea: start with P(X1) and derive Bt in terms of Bt-1 o equivalently, derive Bt+1 in terms of Bt

Two Steps: Passage of Time + Observation X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Passage of Time o Assume we have current belief P(X evidence to date) X1 X2 o Then, after one time step passes: o Basic idea: beliefs get pushed through the transitions o With the B notation, we have to be careful about what time step t the belief is about, and what evidence it includes

Example: Passage of Time o As time passes, uncertainty accumulates T=1 T=2 (Transition model: ghosts usually go clockwise) T=5

Observation o Assume we have current belief P(X previous evidence): X1 o Then, after evidence comes in: E1 Basic idea: beliefs reweighted by likelihood of evidence Unlike passage of time, we have to renormalize

Example: Observation o As we get observations, beliefs get reweighted, uncertainty decreases Before observation After observation

Example: Weather HMM B(+r) = 0.5 B(-r) = 0.5 Rain0 B (+r) = 0.5 B (-r) = 0.5 B (+r) = 0.627 B (-r) = 0.373 B(+r) = 0.818 B(-r) = 0.182 B(+r) = 0.883 B(-r) = 0.117 Rain1 Umbrella1 Rain2 Umbrella2 Rt Rt+1 P(Rt+1 Rt) Rt Ut P(Ut Rt) +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 -r +r 0.3 -r +u 0.2 -r -r 0.7 -r -u 0.8

Online Belief Updates o Every time step, we start with current P(X evidence) o We update for time: X1 o We update for evidence: o The forward algorithm does both at once (and doesn t normalize) X2 X2 E2

The Forward Algorithm o We are given evidence at each time and want to know o We can derive the following updates t We can normalize as we go if we want to have P(x e) at each time step, or just once at the end [Demo: Ghostbusters Exact Filtering (L15D2)]

Particle Filtering

Particle Filtering Filtering: approximate solution Sometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous Solution: approximate inference Track samples of X, not all values Samples are called particles Time per step is linear in the number of samples But: number needed may be large In memory: list of particles, not states This is how robot localization works in practice Particle is just new name for sample 0.0 0.1 0.0 0.0 0.0 0.2 0.0 0.2 0.5

Representation: Particles o Our representation of P(X) is now a list of N particles (samples) o Generally, N << X o Storing map from X to counts would defeat the point o P(x) approximated by number of particles with value x o So, many x may have P(x) = 0! o More particles, more accuracy o For now, all particles have a weight of 1 Particles: (2,3) (1,2) (2,3)

Particle Filtering: Elapse Time Each particle is moved by sampling its next position from the transition model This is like prior sampling samples frequencies reflect the transition probabilities Here, most samples move clockwise, but some move in another direction or stay in place This captures the passage of time If enough samples, close to exact values before and after (consistent) Particles: (2,3) (1,2) (2,3) Particles: (2,3) (3,1) (1,3) (2,3) (2,2)

Particle Filtering: Observe Slightly trickier: Don t sample observation, fix it Similar to likelihood weighting, downweight samples based on the evidence As before, the probabilities don t sum to one, since all have been downweighted (in fact they now sum to (N times) an approximation of P(e)) Particles: (2,3) (3,1) (1,3) (2,3) (2,2) Particles: w=.9 (2,3) w=.2 w=.9 (3,1) w=.4 w=.4 w=.9 (1,3) w=.1 (2,3) w=.2 w=.9 (2,2) w=.4

Particle Filtering: Resample o Rather than tracking weighted samples, we resample o N times, we choose from our weighted sample distribution (i.e. draw with replacement) o This is equivalent to renormalizing the distribution o Now the update is complete for this time step, continue with the next one Particles: w=.9 (2,3) w=.2 w=.9 (3,1) w=.4 w=.4 w=.9 (1,3) w=.1 (2,3) w=.2 w=.9 (2,2) w=.4 (New) Particles: (2,2) (2,3) (1,3) (2,3)

Recap: Particle Filtering o Particles: track samples of states rather than an explicit distribution Elapse Particles: (2,3) (1,2) (2,3) Weight Particles: (2,3) (3,1) (1,3) (2,3) (2,2) Resample Particles: w=.9 (2,3) w=.2 w=.9 (3,1) w=.4 w=.4 w=.9 (1,3) w=.1 (2,3) w=.2 w=.9 (2,2) w=.4 (New) Particles: (2,2) (2,3) (1,3) (2,3) [Demos: ghostbusters particle filtering (L15D3,4,5)]

Video of Demo Moderate Number of Particles

Video of Demo One Particle

Video of Demo Huge Number of Particles

Robot Localization o In robot localization: o We know the map, but not the robot s position o Observations may be vectors of range finder readings o State space and readings are typically continuous (works basically like a very fine grid) and so we cannot store B(X) o Particle filtering is a main technique

Particle Filter Localization (Sonar) [Dieter Fox, et al.] [Video: global-sonar-uw-annotated.avi]

Particle Filter Localization (Laser) [Dieter Fox, et al.] [Video: global-floor.gif]

Robot Mapping o SLAM: Simultaneous Localization And Mapping o We do not know the map or our location o State consists of position AND map! o Main techniques: Kalman filtering (Gaussian HMMs) and particle methods DP-SLAM, Ron Parr [Demo: PARTICLES-SLAM-mapping1-new.avi]

Particle Filter SLAM Video 1 [Sebastian Thrun, et al.] [Demo: PARTICLES-SLAM-mapping1-new.avi]

Particle Filter SLAM Video 2 [Dirk Haehnel, et al.] [Demo: PARTICLES-SLAM-fastslam.avi]

Most Likely Explanation*

HMMs: MLE Queries* o HMMs defined by o o o o o States X Observations E Initial distribution: Transitions: Emissions: o New query: most likely explanation: o New method: the Viterbi algorithm X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

State Trellis* o State trellis: graph of states and transitions over time o o o o o sun sun sun sun rain rain rain rain Each arc represents some transition Each arc has weight Each path is a sequence of states The product of weights on a path is that sequence s probability along with the evidence Forward algorithm computes sums of paths, Viterbi computes best paths

Forward / Viterbi Algorithms* sun sun sun sun rain rain rain rain Forward Algorithm (Sum) Viterbi Algorithm (Max)