CS 7180: Behavioral Modeling and Decision- making in AI

Similar documents
CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models

Hidden Markov models 1

Introduction to Artificial Intelligence (AI)

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Advanced Data Science

Multiscale Systems Engineering Research Group

Learning from Sequential and Time-Series Data

Statistical Processing of Natural Language

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CS532, Winter 2010 Hidden Markov Models

COMP90051 Statistical Machine Learning

Parametric Models Part III: Hidden Markov Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Statistical NLP: Hidden Markov Models. Updated 12/15

Lecture 11: Hidden Markov Models

PROBABILISTIC REASONING OVER TIME

Hidden Markov Models

L23: hidden Markov models

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models NIKOLAY YAKOVETS

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

STA 414/2104: Machine Learning

Hidden Markov Modelling

CS 343: Artificial Intelligence

Note Set 5: Hidden Markov Models

Hidden Markov Model and Speech Recognition

CS 343: Artificial Intelligence

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Graphical Models Seminar

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

CSE 473: Ar+ficial Intelligence

Introduction to Markov systems

ASR using Hidden Markov Model : A tutorial

Dynamic Approaches: The Hidden Markov Model

Dept. of Linguistics, Indiana University Fall 2009

STA 4273H: Statistical Machine Learning

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

CS 4495 Computer Vision

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Computational Genomics and Molecular Biology, Fall

CS 5522: Artificial Intelligence II

Introduction to Artificial Intelligence (AI)

Machine Learning for natural language processing

CS 5522: Artificial Intelligence II

Data-Intensive Computing with MapReduce

CS 188: Artificial Intelligence Fall 2011

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models (HMMs)

Data Mining in Bioinformatics HMM

Statistical Machine Learning from Data

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Intelligent Systems (AI-2)

Introduction to Machine Learning CMU-10701

Temporal probability models. Chapter 15, Sections 1 5 1

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Intelligent Systems (AI-2)

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Markov Chains and Hidden Markov Models

CSE 473: Artificial Intelligence

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

CSEP 573: Artificial Intelligence

CS 7180: Behavioral Modeling and Decisionmaking

Hidden Markov Models,99,100! Markov, here I come!

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

A gentle introduction to Hidden Markov Models

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Temporal probability models. Chapter 15

Conditional Random Field

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Basic math for biology

Hidden Markov Models (recap BNs)

CS711008Z Algorithm Design and Analysis

Lab 3: Practical Hidden Markov Models (HMM)

Sequence Labeling: HMMs & Structured Perceptron

Brief Introduction of Machine Learning Techniques for Content Analysis

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

1. Markov models. 1.1 Markov-chain

order is number of previous outputs

CS 7180: Behavioral Modeling and Decision- making in AI

Statistical Methods for NLP

Hidden Markov Models

Hidden Markov Models

Lecture 3: Markov chains.

Statistical Methods for NLP

Transcription:

CS 7180: Behavioral Modeling and Decision- making in AI Hidden Markov Models Prof. Amy Sliva October 26, 2012

Par?ally observable temporal domains POMDPs represented uncertainty about the state Belief states give probability of a state, given the actions and observations Considered observations in states, but not time or sequence of events Sometimes order is important, but underlying states still hidden Computational linguistics (speech recognition, part- of- speech tagging) Vision (facial expression and human behavior recognition from video) Bioinformatics (gene Linding) Computer security (attack detection and prediction, anomaly detection) International politics (conllict recognition and forecasting) Hidden Markov models Temporal probabilistic model special case of DBNs with hidden states Structure allows for elegant matrix implementation

Hidden Markov model Stochastic system represented by three matrices N = number of states Q ={q 1,,q T } M = number of observations O = {o 1,,o T } A = transition model a ij = P(q t+1 = j q t = i) B = observation model b j (k)= P(o 1 = k q t = j) π = prior state Environmental context sequence of states from times 1- T Sequence of evidence the agent observes at times 1- T State transition probability matrix Probability distribution over observations (probability of seeing observation o in state q) Probability distribution that state q is the start probabilities Full HMM is a triple λ state = (A,B,π) First- order Markov transition model Stationary transition and observation model

Graphical representa?on of HMMs R t -1 t f P(R t) 0.7 0.3 Rain t 1 Rain t Rain t+1 R t f t P(U t ) 0.9 Umbrella t 1 Umbrella t Umbrella t+1 Assume each state is represented by a single random variable Same as Bayesian networks If states have more than one variable, use set theoretic representation Megavariable for state equal to tuple of all values of the individual variables

Graphical representa?on of HMMs R t -1 Rain t 1 Umbrella t 1 t f P(R t) 0.7 0.3 Rain t R t f Umbrella t t P(U t ) 0.9 Rain t+1 Umbrella t+1 Hidden States Observed Evidence Assume each state is represented by a single random variable Same as Bayesian networks If states have more than one variable, use state space representation Megavariable for state equal to tuple of all values of the individual variables

Matrix representa?on of HMM HMM is a triple λ = (A,B,π) P(q) q 1 P(q 1 ) q 2 P(q 2 ) q 1 q 2 q n q 1 P(q 1 q 1 ) q 2 P(q 2 q 1 ) P(q 1 q 2 ) P(q 2 q 2 ) P(q 1 q n )...... q 1 q 2 q n o 1 P(o 1 q 1 ) P(o 1 q 2 ) o 2 P(o 2 q 1 ) P(o 2 q 2 ) P(o 1 q n ) q m P(q m ) q n P(q n q 1 ) P(q n q n )...

Example: recognizing human behaviors Use HMM to classify human actions in time- sequential images (Yamato et al., 1992) Recognize sports activities from images Backhand volley Backhand stroke Forehand volley Forehand stroke Smash Service Why use an HMM?

Example: recognizing human behaviors Use HMM to classify human actions in time- sequential images (Yamato et al., 1992) Recognize sports activities from images Backhand volley Backhand stroke Forehand volley Forehand stroke Smash Service Temporal Each activity characterized by temporally related stances Why use an HMM?

Example: recognizing human behaviors Use HMM to classify human actions in time- sequential images (Yamato et al., 1992) Recognize sports activities from images Backhand volley Backhand stroke Forehand volley Forehand stroke Smash Service Why use an HMM? Temporal Each activity characterized by temporally related stances Observable Each activity associated with observable symbols associated with each characteristic stance

Example: recognizing human behaviors Background subtraction Feature extraction Observation sequence for HMM Feature vector sequence

Example: Robot localiza?on Observations? Hidden states?

Three basic HMM problems Evaluation Given observation sequence O = {o 1,,o T } and an HMM λ = (A,B,π) how do we compute the probability of O given the model P(O λ) Decoding Given observation sequence O = {o 1,,o T } and an HMM λ = (A,B,π) how do we Lind the state sequence Q ={q 1,,q T } that best explains the observations argmax Q P(Q O, λ) Learning How do we adjust the model parameters λ = (A,B,π) to best Lit the sequence argmax λ P(O λ)

Probability of an observa?on sequence What is P(O λ)? Useful in sequence classification which model most likely generated the observations? The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM P(O λ) = Σ q P(O q, λ) P(q λ) Naïve computation is very expensive Given T observations and N states, there are N T possible state sequences Even small HMMs, e.g. T=10N=10 à 10 billion different paths Compute more efliciently using DP

Forward probabili?es Auxiliary probabilities needed for DP algorithm What is the probability, given HMM λ, that at time t the state is i and the partial observation o 1 o t has been generated? α t (i) = P(o 1 o t q t = i, λ) This is the forward probability

Forward probabili?es Forward probability α t (i) = P(o 1 o t q t = i, λ) Recursive delinition α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t )

Forward probabili?es Forward probability α t (i) = P(o 1 o t q t = i, λ) Forward probability of all possible prior states at t- 1 Recursive delinition α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t )

Forward probabili?es Forward probability α t (i) = P(o 1 o t q t = i, λ) Transition probability from state i to j Recursive delinition α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t )

Forward probabili?es Forward probability α t (i) = P(o 1 o t q t = i, λ) Observation probability of seeing o t in state j Recursive delinition α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t )

Forward algorithm Dynamic programming for P(O λ) using forward probability 1. Initialization: for each state i compute probability at time 1 α 1 (i) = π i b i (o 1 ) 2. Induction: compute forward probability for every state j α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t ) 2 t T, 1 j N 3. Termination: sum of forward probabilities at time T

Analysis of the forward algorithm Naïve approach to solving evaluation problem Takes O(2T*N T ) computations Forward algorithm Take O(N 2 T) computations

Alterna?ve solu?on backward probability Forward probability used observation sequence seen so far to determine probability of state i Backward probability another auxiliary probability Uses future observation sequence to determine probability of state i What is the probability, given an HMM λ, the state i at time t, that the partial observation o t+1 o T is generated? β t = P(o t+1 o T q t = i, λ)

Backward probabili?es Backward probability β t = P(o t+1 o T q t = i, λ) Recursive delinition β t (i) = [Σ N j=1 a ij b j (o t+1 ) β t+1 (j)]

Backward probabili?es Backward probability β t = P(o t+1 o T q t = i, λ) Transition probability from state i to j Recursive delinition β t (i) = [Σ N j=1 a ij b j (o t+1 ) β t+1 (j)]

Backward probabili?es Backward probability β t = P(o t+1 o T q t = i, λ) Observation probability of seeing o t in state j Recursive delinition β t (i) = [Σ N j=1 a ij b j (o t+1 ) β t+1 (j)]

Backward probabili?es Backward probability β t = P(o t+1 o T q t = i, λ) Backward probability of all possible next states at t+1 Recursive delinition β t (i) = [Σ N j=1 a ij b j (o t+1 ) β t+1 (j)]

N Backward algorithm Dynamic programming for P(O λ) using backward probability 1. Initialization: for all states i at time T β T (i) = 1 1 i N 2. Induction: for all states j work backward and compute backward probability β t (i) = [Σ N j=1 a ij b j (o t+1 ) β t+1 (j)] T - 1 1, 1 i N 3. Termination: sum over all backward probabilities at time 1

Decoding problem Forward and backward solutions to evaluation problem efliciently give sum of all paths through HMM In decoding problem we want to Lind highest probability path What is the state sequence Q* = q 1 q n such that Q* = argmax Q P(Q O, λ) Viterbi algorithm Inductive DP algorithm that keeps best state sequence at each instance

Viterbi algorithm Similar to computing the forward probabilities Instead of summing over transitions from incoming states, compute the maximum Forward recursion α t (j) = [Σ N i=1 α t- 1 (i)a ij ] b j (o t ) Viterbi recursion δ t (j) = [max N i=1 δ t- 1 (i)a ij ] b j (o t )

Viterbi algorithm Use DP to compute argmax Q P(Q O, λ) 1. Initialization: for all states i at time 1 δ 1 (i) = π i b i (o 1 ) 1 i N 2. Induction: for all states j compute max path probability and previous state that produced it δ t (j) = [max N i=1 δ t- 1 (i)a ij ] b j (o t ) ψ t (j) = [argmax N i=1 δ t- 1 (i)a ij ] 2 t T, 1 j N 3. Termination: max value of probabilities at time T p* = max N i=1 δ T (i) q T * = argmax N i=1 δ T (i) 4. Read out maximal path: get the maximal state at each time point q t * = ψ t+1 (q t+1 *)

Trellis structure of Viterbi paths N Hidden States i j 2 1 t- 1 t t+1 t+2 T δ t (j) = max N i=1 δ t- 1 (i)a ij b j (o t ) Time

Using Viterbi to find maximal sequence Viterbi algorithm actually consists of two phases Forward pass to Lind maximal probabilities at each time Backward pass to extract the maximizing state sequence N Hidden States i j 2 1 Time t- 1 t t+1 t+2 T

Using Viterbi to find maximal sequence δ 1 (i) = π i b i (o 1 ) q = 4 0.4 q = 3 q = 2 0.3 q = 1 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence δ 2 (j) = [max 4 i=1 δ 1 (i)a ij ] b j (o t ) q = 4 0.4 q = 3 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence δ 3 (j) = [max 4 i=1 δ 2 (i)a ij ] b j (o t ) q = 4 0.4 0.3 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence p* = max 4 i=1 δ 4 (i) q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence q 4 * = argmax 4 i=1 δ 4 (i) q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence q 3 * = ψ 4 (q 4 *) q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence q 2 * = ψ 3 (q 3 *) q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence q 1 * = ψ 2 (q 2 *) q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4

Using Viterbi to find maximal sequence q = 4 0.4 0.3 0.6 q = 3 0.4 q = 2 0.3 q = 1 0.6 t = 1 t = 2 t = 3 t = 4 Q* = {2, 1, 2, 4}