Advanced Data Science

Similar documents
Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models

Lecture 9. Intro to Hidden Markov Models (finish up)

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

MACHINE LEARNING 2 UGM,HMMS Lecture 7

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

2 : Directed GMs: Bayesian Networks

Hidden Markov Models

Hidden Markov models 1

STA 414/2104: Machine Learning

HIDDEN MARKOV MODELS

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Graphical Models Seminar

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Administrivia. What is Information Extraction. Finite State Models. Graphical Models. Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models (I)

CS711008Z Algorithm Design and Analysis

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models,99,100! Markov, here I come!

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

CS 7180: Behavioral Modeling and Decision- making in AI

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Speech Recognition HMM

Hidden Markov Models Part 2: Algorithms

Brief Introduction of Machine Learning Techniques for Content Analysis

CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Models. x 1 x 2 x 3 x K

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Stephen Scott.

STA 4273H: Statistical Machine Learning

Lecture 11: Hidden Markov Models

Hidden Markov Models. Terminology, Representation and Basic Problems

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

2 : Directed GMs: Bayesian Networks

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Hidden Markov Models. x 1 x 2 x 3 x K

order is number of previous outputs

Machine Learning for OR & FE

Hidden Markov Models

Statistical Methods for NLP

HMM part 1. Dr Philip Jackson

Computational Genomics and Molecular Biology, Fall

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

A gentle introduction to Hidden Markov Models

Markov Chains and Hidden Markov Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Linear Dynamical Systems (Kalman filter)

Note Set 5: Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models for biological sequence analysis

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models

Pattern Recognition with Hidden Markov Modells

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

Hidden Markov models

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models

8: Hidden Markov Models

Hidden Markov Models

Intelligent Systems (AI-2)

Machine Learning for natural language processing

11.3 Decoding Algorithm

Parametric Models Part III: Hidden Markov Models

HMM: Parameter Estimation

Intelligent Systems (AI-2)

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Introduction to Machine Learning Midterm, Tues April 8

L23: hidden Markov models

Statistical Methods for NLP

Statistical Processing of Natural Language

1 Hidden Markov Models

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Hidden Markov Models Hamid R. Rabiee

CS532, Winter 2010 Hidden Markov Models

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Transcription:

Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell

Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter 13 2

Sequential Data stock market prediction speech recognition gene data analysis O 1 O 2 O T how shall we represent and learn P(O 1, O 2 O T )? 3

Markov Model O 1 O 2 O T how shall we represent and learn P(O 1, O 2 O T )? Use a Bayes net: Markov model: nth order Markov model: 4

N-order Markov Model the Markov property specifies that the probability of a state depends only on the probability of the previous state but we can build more memory into our states by using a higher order Markov model higher order models remember more history - can have higher predictive value O T-3 O T-2 O T-1 O T O T-3 O T-2 O T-1 O T O T-3 O T-2 O T-1 O T 5

Markov Model O 1 O 2 O T how shall we represent and learn P(O 1, O 2 O T )? Use a Bayes net: Markov model: nth order Markov model: if O t real valued, and assume P(O t )~ N( f(o t-1,o t-2 O t-n ), σ), where f is some linear function, called nth order autoregressive (AR) model 6

Hidden Markov Models: Example An experience in a casino Game: 1. You bet $1 2. You roll (always with a fair die) 3. Casino player rolls (sometimes with fair die, sometimes with loaded die) 4. Highest number wins $2 Here is his sequence of die rolls: 1245526462146146136136661664661636 616366163616515615115146123562344 Which die is being used in each play? 7

Question: 1245526462146146136136661664661636 616366163616515615115146123562344 Which die is being used in each play? S " {fair, loaded} Unobserved state: S t-2 S t-1 S t S t+1 S t+2 Observed output: O t-2 O t-1 O t O t+1 O t+2 O " {1,2,3,4,5,6} 8

The Dishonest Casino Model 0.95 0.05 0.95 FAIR LOADED P(1 F) = 1/6 P(2 F) = 1/6 P(3 F) = 1/6 P(4 F) = 1/6 0.05 P(1 L) = 1/10 P(2 L) = 1/10 P(3 L) = 1/10 P(4 L) = 1/10 9

Puzzles Regarding the Dishonest Casino GIVEN: A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem What portion of the sequence was generated with the fair die, and what portion with the loaded die? This is the DECODING question How loaded is the loaded die? How fair is the fair die? How often does the casino player change from fair to loaded, and back? This is the LEARNING question 10

Definition of HMM S t-2 S t-1 S t 1. N: number of hidden states X can take on X = {1, 2, N} 2. O: set of values O t can take on 3. Initial state distribution: P(X 1 =k) for k=1, 2, N 4. State transition distribution: P(X t+1 =k X t =i ), for k,i =1, 2, N 5. Emission distribution: P(Ot Xt) O t-2 O t-1 O t P S " S "78 ) K Graphical model 1 2 State automata view 11

Handwriting recognition Character recognition, e.g., logistic regression, Naïve Bayes 12

Handwriting recognition S t-2 S t-1 S {a, b,, z } O Character recognition, e.g., logistic regression, Naïve Bayes 13

Example of a hidden Markov model (HMM) 14

Need to evaluates Only 3 distributions! Understanding HMM Semantics P(X 8 = i ) X 1 = {a, z} X 2 = {a, z} X 3 = {a, z} X 4 = {a, z} X 5 = {a, z} O 1 = O 2 = O 3 = O 4 = O 5 = Vector of pixel intensities P(O " X " ) O "8 O "BCC Sharing parameters N(μ F ", σ) θ IJ = P(X "K8 = L b X " = L v ) θ OP 26x26 vector 15

Using and Learning HMM s Core HMM questions: 1. How do we calculate P(o 1, o 2, o n )? 2. How do we calculate argmax over x 1, x 2, x n of P(x 1, x 2, x n o 1, o 2, o n )? 3. How do we train the HMM, given its structureand 3a. Fully observed training examples: < x 1, x n, o 1, o n > 3b. Partially observed training examples: < o 1, o n > 16

How do we compute 1. brute force: LXL LXL P o 8, o Q,, o R = S S S S P(x 8 = i, x Q = j,, x " = k, O 8,, O " ) OY Z [L PY Z [L 17

How do we compute α Q k = P O 8 = o 8, O Q = o Q, X Q = k = P O Q = o Q, X Q = k S P O 8 = o 8, X 8 = j P X Q = k X 8 = j P α 8 j 2. Forward algorithm (dynamic progr., variable elimination): define P O 8 = o 8, X 8 = k = P X 8 = k P(O 8 = o 8 X 8 = k) Q] P O "K8 = o "K8, X "K8 = k S α " j P X "K8 = k X " = j PY8 Q] S α R k ^Y8 18

How do we compute 2. Backward algorithm (dynamic progr., variable elimination): define α R k β " (k) S α R k ^ 19

Russell & Norvig 2010 Chapter 15 pp. 566 Infer the weather given observation of a man either carrying or not carrying an umbrella 2 states of weather: state 1 = rain, state 2 = no rain 70% chance of staying the same each day 30% chance of changing Each state generates 2 events: event 1 = umbrella, event 2 = no umbrella conditional probabilities for events occurring in each state We then observe the following sequence of events: {umbrella, umbrella, no umbrella, umbrella, umbrella} umbrella umbrella No umbrella umbrella umbrella 20

Computing Forward Probabilities: we don't know which state the weather is in before our observations 21

Computing Backward Probabilities: 22

23

How do we compute Viterbi algorithm, based on recursive computation of 24

Learning HMMs from fully observable data: easy Just 3 distributions: Hope I convinced you. 25

Learning HMMs when only observe o1...ot EM Algorithm (the Baum Welch algorithm) 1. E step: estimate P X 8,, X R O 8 O R ) using forward-backward 2. M step: choose θ to maximize E a Fb,,F c d b d c )logp X 8,, X R, O 8 O R θ) 26

Additional Time Series Models 27

hidden Observation at time t Can depend on all state variables t that time observed hidden observed In factorial HMM as above the S i at time t are independent. Here we couple them + add observable input X i observed 28

Real value hidden Si discrete With M values observed 29

What you need to know Hidden Markov models(hmms) Very useful, very powerful! Speech, OCR, time series, Parameter sharing, only learn 3 distributions Dynamic programming (variable elimination) reduces inference complexity Special case of Bayes net Dynamic Bayesian Networks 30