Hidden Markov Models

Similar documents
Temporal probability models

Temporal probability models. Chapter 15, Sections 1 5 1

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Self assessment due: Monday 4/29/2019 at 11:59pm (submit via Gradescope)

Anno accademico 2006/2007. Davide Migliore

Localization and Map Making

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Machine Learning 4771

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

CS 4495 Computer Vision Tracking 1- Kalman,Gaussian

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS

Probabilistic Robotics SLAM

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Reliability of Technical Systems

Estimation of Poses with Particle Filters

Linear Gaussian State Space Models

Tracking. Announcements

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Vehicle Arrival Models : Headway

Probabilistic Robotics SLAM

Object tracking: Using HMMs to estimate the geographical location of fish

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Authors. Introduction. Introduction

Sequential Importance Resampling (SIR) Particle Filter

Tracking. Many slides adapted from Kristen Grauman, Deva Ramanan

Kalman filtering for maximum likelihood estimation given corrupted observations.

Probabilistic Robotics

Ensamble methods: Boosting

Ensamble methods: Bagging and Boosting

מקורות לחומר בשיעור ספר הלימוד: Forsyth & Ponce מאמרים שונים חומר באינטרנט! פרק פרק 18

Isolated-word speech recognition using hidden Markov models

Doctoral Course in Speech Recognition

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Introduction to Mobile Robotics SLAM: Simultaneous Localization and Mapping

Viterbi Algorithm: Background

Tracking. Many slides adapted from Kristen Grauman, Deva Ramanan

EKF SLAM vs. FastSLAM A Comparison

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Hidden Markov models 1

Speech and Language Processing

An EM algorithm for maximum likelihood estimation given corrupted observations. E. E. Holmes, National Marine Fisheries Service

Using the Kalman filter Extended Kalman filter

Y. Xiang, Learning Bayesian Networks 1

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Overview. COMP14112: Artificial Intelligence Fundamentals. Lecture 0 Very Brief Overview. Structure of this course

Object Tracking. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Expectation- Maximization & Baum-Welch. Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

Recent Developments In Evolutionary Data Assimilation And Model Uncertainty Estimation For Hydrologic Forecasting Hamid Moradkhani

Introduction to Mobile Robotics

CS 4495 Computer Vision Hidden Markov Models

Presentation Overview

Localization. Mobile robot localization is the problem of determining the pose of a robot relative to a given map of the environment.

Solutions to the Exam Digital Communications I given on the 11th of June = 111 and g 2. c 2

GMM - Generalized Method of Moments

Probabilistic Robotics Sebastian Thrun-- Stanford

Modeling Economic Time Series with Stochastic Linear Difference Equations

Energy Storage Benchmark Problems

2016 Possible Examination Questions. Robotics CSCE 574

Simultaneous Localisation and Mapping. IAR Lecture 10 Barbara Webb

Nonlinear Parametric Hidden Markov Models

Probabilistic learning

Probabilistic Robotics The Sparse Extended Information Filter

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Progress in the Raytheon BBN Arabic Offline Handwriting Recognition System

Graphical Event Models and Causal Event Models. Chris Meek Microsoft Research

Reconstructing the power grid dynamic model from sparse measurements

Maximum Likelihood Parameter Estimation in State-Space Models

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

CSE-473. A Gentle Introduction to Particle Filters

) were both constant and we brought them from under the integral.

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

5. Stochastic processes (1)

1 Review of Zero-Sum Games

Hidden Markov Models. Advances and applications. Diego Milone d.milone ieee.org

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Chapter 14. (Supplementary) Bayesian Filtering for State Estimation of Dynamic Systems

Inferring Dynamic Dependency with Applications to Link Analysis

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

CSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14

Institute for Mathematical Methods in Economics. University of Technology Vienna. Singapore, May Manfred Deistler

Financial Econometrics Kalman Filter: some applications to Finance University of Evry - Master 2

Applications in Industry (Extended) Kalman Filter. Week Date Lecture Title

Recursive Estimation and Identification of Time-Varying Long- Term Fading Channels

Latent Variable Models and Signal Separation

Hidden Markov Models. Seven. Three-State Markov Weather Model. Markov Weather Model. Solving the Weather Example. Markov Weather Model

An EM based training algorithm for recurrent neural networks

Temporal Integration of Multiple Silhouette-based Body-part Hypotheses

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

Block Diagram of a DCS in 411

Random Processes 1/24

Statistical Machine Learning Methods for Bioinformatics I. Hidden Markov Model Theory

Transcription:

Hidden Markov Models

Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe saic siuaions Each random variable ges a single fixed value in a single problem insance Now we consider he problem of describing probabilisic environmens ha evolve over ime Examples: robo localizaion, racking, speech,

Hidden Markov Models A each ime slice, he sae of he world is described by an unobservable variable X and an observable evidence variable E Transiion model: disribuion over he curren sae given he whole pas hisory: P(X X 0,, X -1 = P(X X 0:-1 Observaion model: P(E X 0:, E 1:-1 X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

Hidden Markov Models Markov assumpion The curren sae is condiionally independen of all he oher saes given he sae in he previous ime sep (firs order Wha is he ransiion model? P(X X 0:-1 = P(X X -1 Markov assumpion for observaions The evidence a ime depends only on he sae a ime Wha is he observaion model? P(E X 0:, E 1:-1 = P(E X X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

Example sae evidence

Example Transiion model sae evidence Observaion model

An alernaive visualizaion U=T: 0.9 U=F: 0.1 0.3 0.7 R=T R=F 0.7 0.3 U=T: 0.2 U=F: 0.8 Transiion probabiliies R = T R = F R -1 = T 0.7 0.3 R -1 = F 0.3 0.7 Observaion (emission probabiliies U = T U = F R = T 0.9 0.1 R = F 0.2 0.8

Anoher example Saes: X = {home, office, cafe} Observaions: E = {sms, facebook, email} Slide credi: Andy Whie

The Join Disribuion Transiion model: P(X X 0:-1 = P(X X -1 Observaion model: P(E X 0:, E 1:-1 = P(E X How do we compue he full join P(X 0:, E 1:? P( X E 0 :, 1: = P( X 0 P( X i Xi 1 P( Ei Xi i= 1 X 0 X 1 X 2 X -1 X E 1 E 2 E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? The forward algorihm Query variable X 0 X 1 X k X -1 X E 1 E k E -1 E Evidence variables

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? The forward-backward algorihm X 0 X 1 X k X -1 X E 1 E k E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1:? Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: X 0 X 1 X k X -1 X E 1 E k E -1 E

HMM inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1: Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: Decoding: wha is he mos likely sae sequence X 0: given he observaion sequence e 1:? The Vierbi algorihm X 0 X 1 X k X -1 X E 1 E k E -1 E

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Query variable X 0 X 1 X k X -1 X E 1 E k E -1 E Evidence variables

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 Time: e -1 = Facebook Wha is P(X = Office e 1:-1? Home 0.6 0.6 Home?? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 Office 0.3 0.2 0.8 Office?? Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 Home?? Office?? Cafe?? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 P( X e = = 1: 1 x 1 x 1 = P( X P( X x 1 P( X x x 1 1, e, x 1: 1 P( x 1 P( x 1 e e 1: 1 1 1: 1 e 1: 1 P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 0.6 0.2 0.8 Home?? Office?? P 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Time: e = Email Wha is P(X = Office e 1:-1? Home 0.6 Office 0.3 0.6 0.2 0.8 Home?? Office?? P 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Wha is P(X = Office e 1:? Cafe 0.1 Cafe?? P(X -1 e 1:-1 P(X X -1 P(e X = 0.8

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 P(X -1 e 1:-1 P(X X -1 Time: e = Email Home?? Office?? Cafe?? P(e X = 0.8 P Wha is P(X = Office e 1:-1? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 P( X x 1 Wha is P(X = Office e 1:? e = ; e P( e P( e 1: 1 X ; e1: 1 P( X P( e e1: 1 X P( X e 1: 1 e 1: 1

Filering Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Time: 1 e -1 = Facebook Home 0.6 Office 0.3 Cafe 0.1 0.6 0.2 0.8 P(X -1 e 1:-1 P(X X -1 Time: e = Email Home?? Office?? Cafe?? P(e X = 0.8 P Wha is P(X = Office e 1:-1? 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1 = 0.5 ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Wha is P(X = Office e 1:? P( X e1 : P( e X P( X e1: 1 0.5 * 0.8 = 0.4 Noe: mus also compue his value for Home and Cafe, and renormalize o sum o 1

Filering: The Forward Algorihm Task: compue he probabiliy disribuion over he curren sae given all he evidence so far: P(X e 1: Recursive formulaion: suppose we know P(X -1 e 1:-1 Base case: priors P(X 0 Predicion: propagae belief from X -1 o X P ( X e1: 1 = P( X x 1 P( x 1 e1: 1 x 1 Correcion: weigh by evidence e P( X e1 : = P( X e ; e1: 1 P( e X P( X e1: 1 Renormalize o have all P(X = x e 1: sum o 1

Filering: The Forward Algorihm Time: 0 Time: 1 Time: e -1 e Home prior Home Home Office prior Office Office Cafe prior Cafe Cafe

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: X 0 X 1 X k X -1 X E 1 E k E -1 E

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: Recursive formulaion: suppose we know P(e 1:-1 = = = = = x x x x P x e P P x P x e P P x e P P e P P e P P ( ( ( (, ( (, ( ( ( (, ( ( 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1: e e e e e e e e e e e

Evaluaion Compue he probabiliy of he curren sequence: P(e 1: Recursive formulaion: suppose we know P(e 1:-1 P ( e1 : = P( e1: 1 P( e x P( x e1: 1 x recursion filering

Smoohing Wha is he disribuion of some sae X k given he enire observaion sequence e 1:? X 0 X 1 X k X -1 X E 1 E k E -1 E

Smoohing Wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Soluion: he forward-backward algorihm Time: 0 Time: k e k Time: e Home Home Home Office Office Office Cafe Cafe Cafe Forward message: P(X k e 1:k Backward message: P(e k+1: X k

Decoding: Vierbi Algorihm Task: given observaion sequence e 1:, compue mos likely sae sequence x 0: x * : = arg max P( x e x 0: 1: 0 0: X 0 X 1 X k X -1 X E 1 E k E -1 E

Decoding: Vierbi Algorihm Task: given observaion sequence e 1:, compue mos likely sae sequence x 0: The mos likely pah ha ends in a paricular sae x consiss of he mos likely pah o some sae x -1 followed by he ransiion o x Time: 0 Time: 1 Time: x -1 x

Decoding: Vierbi Algorihm Le m (x denoe he probabiliy of he mos likely pah ha ends in x : m ( x = = max max max x x x 0: 1 0: 1 1 P( x P( x,,, e [ m ( x P( x x P( e x ] 1 0: 1 0: 1 1 x x e 1: 1: 1 Time: 0 Time: 1 Time: m -1 (x -1 x -1 P(x x -1 x

Learning Given: a raining sample of observaion sequences Goal: compue model parameers Transiion probabiliies P(X X -1 Observaion probabiliies P(E X Wha if we have complee daa, i.e., e 1: and x 0:? Then we can esimae all he parameers by relaive frequencies # of imes sae b follows sae a P(X = b X -1 = a = oal # of ransiions from sae a P(E = e X = a = # of imes e is emied from sae a oal # of emissions from sae a

Learning Given: a raining sample of observaion sequences Goal: compue model parameers Transiion probabiliies P(X X -1 Observaion probabiliies P(E X Wha if we have complee daa, i.e., e 1: and x 0:? Then we can esimae all he parameers by relaive frequencies Wha if we only have he observaions? Need o use EM algorihm (and somehow figure ou he number of saes

Review: HMM Learning and Inference Inference asks Filering: wha is he disribuion over he curren sae X given all he evidence so far, e 1: Smoohing: wha is he disribuion of some sae X k given he enire observaion sequence e 1:? Evaluaion: compue he probabiliy of a given observaion sequence e 1: Decoding: wha is he mos likely sae sequence X 0: given he observaion sequence e 1:? Learning Given a raining sample of sequences, learn he model parameers (ransiion and emission probabiliies EM algorihm

Applicaions of HMMs Speech recogniion HMMs: Observaions are acousic signals (coninuous valued Saes are specific posiions in specific words (so, ens of housands Machine ranslaion HMMs: Observaions are words (ens of housands Saes are ranslaion opions Robo racking: Observaions are range readings (coninuous Saes are posiions on a map (coninuous Source: Tamara Berg

Applicaion of HMMs: Speech recogniion Noisy channel model of speech

Speech feaure exracion Specrogram Acousic wave form Sampled a 8KHz, quanized o 8-12 bis Frequency Ampliude Frame (10 ms or 80 samples Time Feaure vecor ~39 dim.

Speech feaure exracion Specrogram Acousic wave form Sampled a 8KHz, quanized o 8-12 bis Frequency Ampliude Frame (10 ms or 80 samples Time Feaure vecor ~39 dim.

Phoneic model Phones: speech sounds Phonemes: groups of speech sounds ha have a unique meaning/funcion in a language (e.g., here are several differen ways o pronounce

Phoneic model

HMM models for phones HMM saes in mos speech recogniion sysems correspond o subphones There are around 60 phones and as many as 60 3 conex-dependen riphones

HMM models for words

Puing words ogeher Given a sequence of acousic feaures, how do we find he corresponding word sequence?

Decoding wih he Vierbi algorihm

Reference D. Jurafsky and J. Marin, Speech and Language Processing, 2 nd ed., Prenice Hall, 2008

More general models: Dynamic Bayesian neworks Deecing ineracion links in a collaboraing group using manually annoaed daa S. Mahur, M.S. Poole, F. Pena-Mora, M. Hasegawa-Johnson, N. Conracor Social Neworks 10.1016/j.socne.2012.04.002 Speaking: S i =1 if #i is speaking. Link: L ij =1 if #i is lisening o #j. Neighborhood: N ij =1 if hey are near one anoher. Gaze: G ij =1 if #i is looking a #j. Indirec: I ij =1 if #i and #j are boh lisening o he same person.