Sequential Data and Markov Models

Similar documents
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Graphical Models Seminar

Lecture 6: Entropy Rate

STA 4273H: Statistical Machine Learning

Hidden Markov Models Part 1: Introduction

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Approximate Inference

Template-Based Representations. Sargur Srihari

STA 414/2104: Machine Learning

Hidden Markov Models. Terminology and Basic Algorithms

CS 188: Artificial Intelligence Spring 2009

Linear Dynamical Systems

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Introduction to Machine Learning CMU-10701

Hidden Markov Models

HMM part 1. Dr Philip Jackson

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS 188: Artificial Intelligence Fall 2011

Hidden Markov Models

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Bayesian Networks BY: MOHAMAD ALSABBAGH

Data Mining in Bioinformatics HMM

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CS 188: Artificial Intelligence Fall Recap: Inference Example

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Markov Chains and MCMC

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Parametric Models Part III: Hidden Markov Models

CS 343: Artificial Intelligence

Today s Lecture: HMMs

8: Hidden Markov Models

Recurrent Neural Network

ECE521 Lecture 19 HMM cont. Inference in HMM

Hidden Markov Models. Terminology and Basic Algorithms

Markov Chains. Chapter 16. Markov Chains - 1

Bayesian networks Lecture 18. David Sontag New York University

CS 188: Artificial Intelligence Spring Announcements

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models. Terminology, Representation and Basic Problems

CS 5522: Artificial Intelligence II

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Markov Chains and Hidden Markov Models

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

Lecture 11: Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS 343: Artificial Intelligence

Hidden Markov models

COMP90051 Statistical Machine Learning

Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CS 188: Artificial Intelligence

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov models 1

Chapter 16 focused on decision making in the face of uncertainty about one future

CSEP 573: Artificial Intelligence

Expectation Maximization (EM)

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

p L yi z n m x N n xi

Probabilistic Graphical Models

Introduction to Artificial Intelligence (AI)

Latent Variable View of EM. Sargur Srihari

Rapid Introduction to Machine Learning/ Deep Learning

p(d θ ) l(θ ) 1.2 x x x

CS 5522: Artificial Intelligence II

Hidden Markov Models

Introduction to Stochastic Processes

Statistical Machine Learning from Data

CSE 473: Artificial Intelligence

Temporal Modeling and Basic Speech Recognition

CS 5522: Artificial Intelligence II

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

The Noisy Channel Model and Markov Models

Hidden Markov Models 1

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Machine Learning for Data Science (CS4786) Lecture 19

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Lecture 9. Intro to Hidden Markov Models (finish up)

Lecture 3: Markov chains.

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Hidden Markov Models

2 : Directed GMs: Bayesian Networks

Graphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Introduction to Machine Learning Midterm, Tues April 8

Lecture 3: ASR: HMMs, Forward, Viterbi

A gentle introduction to Hidden Markov Models

Transcription:

equential Data and Markov Models argur N. rihari srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/ce574/index.html 0

equential Data Examples Often arise through measurement of time series nowfall measurements on successive days Rainfall measurements on successive days Daily values of currency exchange rate Acoustic features at successive time frames in speech recognition Nucleotide base pairs in a strand of DNA equence of characters in an English sentence Parts of speech of successive words 1 10

Markov Model Weather The weather of a day is observed as being one of the following: tate 1: Rainy tate 2: Cloudy tate : unny Weather pattern of a location Today Tomorrow Rain Cloudy unny Rain 0. 0. 0.4 Cloudy 0.2 0.6 0.2 unny 0.1 0.1 0.8 2

Markov Model Weather tate Diagram 0.6 0.8 0.2 0.1 0. 0.2 0. 0.1 0.4

ound pectrogram of peech Bayes Theorem Plot of the intensity of the spectral coefficients versus time index uccessive observations of speech spectrum highly correlated (Markov dependency 4

Markov model for the production of spoken words tates represent phonemes Production of word: cat Represented by states /k/ /a/ /t/ Transitions from /k/ to /a/ /a/ to /t/ /t/ to a silent state Although only correct cat sound is represented by model, perhaps other transitions can be introduced, eg, /k/ followed by /t/ /k/ Markov Model for word cat /a/ /t/ 5 10

tationary vs Non-stationary tationary: Data evolves over time but distribution remains same e.g., dependence of current word over previous word remains constant Non-stationary: Generative distribution itself changes over time 6

Making a equence of Decisions Processes in time, states at time t are influenced by a state at time t-1 Wish to predict next value from previous values, e.g., financial forecasting Impractical to consider general dependence of future dependence on all previous observations Complexity grows without limit as number of observations increases Markov models assume dependence on most recent observations 7 10

Latent Variables While Markov models are tractable they are severely limited Introduction of latent variables provides a more general framework Lead to state-space models When latent variables are: Discrete they are called Hidden Markov models Continuous they are linear dynamical systems 8

Hidden Markov Model 0.5 0. 0.0 0.2 0.6 0.2 0.2 0.2 0.8 0.1 0. 0. 0.8 0.1 0.1 Hidden 0.4 Observations are activities of friend described over telephone Alternatively from data: Day M Tu W Th 0. 0.6 Temp Barometer 9

Markov Model Assuming Independence implest model: Assume observations are independent Graph without links To predict whether it rains tomorrow is only based on relative frequency of rainy days Ignores influence of whether it rained the previous day 10

Markov Model Most general Markov model for observations {x n } Product rule to express joint distribution of sequence of observations N n= 1 p( x1,.. xn = p( xn x1,.. xn 1 11

First Order Markov Model Chain of observations {x n } Joint distribution for a sequence of n variables It can be verified (using product rule from above that If model is used to predict next observation, distribution of prediction will only depend on preceding observation and independent of earlier observations tationarity implies conditional distributions p(x n x n-1 are all equal N n = 2 p ( x1,.. x N = p ( x1 p ( x n x n 1 p( xn x1.. xn 1 = p( xn xn 1 12

1 Markov Model equence probability What is the probability that the weather for the next 7 days will be --R-R--C-? Find the probability of O, given the model. O = {,,,,,,, } 1 1 2 4 2 2 1 11 1 2 2 1 1 1 1 2 1 1 10 1.56 (0.2 (0.1 (0. (0.4 (0.1 (0.8 (0.8 1 ( ( ( ( ( ( ( (,,,,,,, ( ( = = = = = a a a a a a a P P P P P P P P Model P Model O P π

econd Order Markov Model Conditional distribution of observation x n depends on the values of two previous observations x n-1 and x n-2 N n= p( x1,.. xn = p( x1 p( x2 x1 p( xn xn 1, xn 2 Each observation is influenced by previous two observations 14

M th Order Markov ource Conditional distribution for a particular variable depends on previous M variables Pay a price for number of parameters Discrete variable with K states First order: p(x n x n-1 needs K-1 parameters for each value of x n-1 for each of K states of x n giving K(K-1 parameters M th order will need K M-1 (K-1 parameters 15

Introducing Latent Variables Model for sequences not limited by Markov assumption of any order but with limited number of parameters For each observation x n, introduce a latent variable z n z n may be of different type or dimensionality to the observed variable Latent variables form the Markov chain Gives the state-space model Latent variables Observations 16

Conditional Independence with Latent Variables atisfies key assumption that z n+ 1 zn 1 From d-separation When latent node z n is filled, the only path between z n-1 and z n+1 has a head-to-tail node that is blocked z n 17

Jt Distribution with Latent Variables Latent variables Observations Joint distribution for this model p( x,.. x, z,.. z = p( z N p( z p( x 1 N 1 n 1 n n 1 n n n= 2 n= 1 z N There is always a path between any x n and x m via latent variables which is never blocked Thus predictive distribution p(x n+1 x 1,..,x n for observation x n+1 does not exhibit conditional independence properties and is hence dependent on all previous observations z 18

Two Models Described by Graph Latent variables Observations 1. Hidden Markov Model: If latent variables are discrete: Observed variables in a HMM may be discrete or continuous 2. Linear Dynamical ystems: If both latent and observed variables are Gaussian 19

Further Topics on equential Data Hidden Markov Models: http://www.cedar.buffalo.edu/~srihari/ce574/chap11/ch11.2- HiddenMarkovModels.pdf Extensions of HMMs: http://www.cedar.buffalo.edu/~srihari/ce574/chap11/ch11.- HMMExtensions.pdf Linear Dynamical ystems: http://www.cedar.buffalo.edu/~srihari/ce574/chap11/ch11.4- LinearDynamicalystems.pdf Conditional Random Fields: http://www.cedar.buffalo.edu/~srihari/ce574/chap11/ch11.5- ConditionalRandomFields.pdf 20