Artificial Intelligence Markov Chains
|
|
- Claude Reed
- 6 years ago
- Views:
Transcription
1 Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Overview Uncertain reasoning in time Using Markov chains for simulations Hidden Markov models State estimation Most probable path estimation Application: speech recognition Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
2 Uncertain reasoning in time Want to model systems that change through time, in some non-deterministic manner Use stochastic processes: Collections of random variables X 0, X 1,... that take on values in some state space S One random variable collection for each quantity of interest Current random variable value (state) depends on previous states Easier to model if states are observable, otherwise use hidden Markov models Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Example: Wumpus world Wumpus world is static, only agent moves Assume that agent does not reason logically, but moves randomly Current agent position X t depends on previous positions X t 1,..., X 1 I.e., current position is given by P(X t X t 1,..., X 1 ) How to calculate this? Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
3 Simplifying assumptions Markov assumption: Current state depends on finite number of previous states First-order Markov process (chain): P(X t X t 1,..., X 0 ) = P(X t X t 1 ) Second-order Markov process (chain): P(X t X t 1,..., X 0 ) = P(X t X t 1, X t 2 ) Stationarity assumption: Transition probabilities P(X t parents(x t )) do not depend on t Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Markov chains as Bayesian networks First-order Markov chain:... = P( X t 1 X t 2 ) = P( X t X t 1 ) = X t+1 X t P( ) =... X t 2 X t 1 X t X t+1 Second-order Markov chain: X t 2 X t 1 X t X t+1... = P( X t+1 X t, X t 1 ) =... Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
4 Using Markov chains for simulations Express transition probabilities P(X t = s i X t 1 = s j ) in matrix A ij (columns always sum to 1) Equilibrium distribution of Markov chain: Distribution the chain converges to, i.e. lim t P(X t ) Problem: Sometimes hard to generate random values for complicated distributions (e.g., Bayesian networks) Solution: Construct Markov chain with desired distribution as equilibrium distribution, random values are samples of Markov chain Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Simple simulation example States S = {s 1, s 2 }, state transition matrix A = ( ) s s 1 2 1/2 1/4 1/2 3/4 s s 1 2 s s 1 2 s s 1 2 X t 2 X t 1 X t X t+1 With matrix algebra and conditional probabilities, can show that P(X t+1 ) = A t P(X 1 ) and lim t A t = ( 1 3 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / )
5 Simple simulation example (cont.) Obtain equilibrium distribution P(X t ) = ( 1 3, 2 3) for arbitrary initial distribution P(X 1 ) Verify numerically: relative frequencies of states s 1 and s 2 with start state s 1 (left) and s 2 (right) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Calculating state probabilities Consider stochastic wumpus world example: States are positions on 4 4 board Distinguish between r.v. X t, state constants s j and variables S t denoting state in time t For brevity, also write S t to denote X t = S t Marginalize to obtain probability P(X t = g) of reaching gold at time step t as P(X t = g) = P(X t = g (S 1,..., S t 1 )) all state sequences (S 1,...,S t 1 ) that lead to g Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
6 Calculating state probabilities (cont.) State transitions of agent are first-order Markov process: With known first state (i.e., P(X 1 = s i ) = 1) obtain P(X t = g (S 1,..., S t 1 )) = P(X t = g S t 1 )P(S t 1 S t 2 ) P(S 2 S 1 ) Therefore, calculation P(X t = g) = all state sequences (S 1,...,S t 1 ) that lead to g grows exponentially with t (bad) P(X t = g S t 1 ) P(S 2 S 1 ) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Improving calculations To simplify notation, write p t (i) = P(X t = s i ) Idea to reduce complexity to polynomial (good) { 1 if s i is start state p 1 (i) = 0 otherwise n p t+1 (i) = P(X t+1 = s i ) = P(X t+1 = s i X t = s j ) j=1 n n = P(X t+1 = s i X t = s j )P(X t = s j ) = A ij p t (j) j=1 This trick (dynamic programming) used often with hidden Markov models Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 j=1
7 Hidden Markov models (HMMs) Often, states of world are not observable (hidden). However, some evidence E t that depends on state (stochastically) is available, i.e., know P(E t X t ) X t 2 X t 1 X t X t+1 E t 2 E t 1 E t E t+1 Assume P(X t X t 1 ) and P(E t X t ) do not depend on t: t t P(X 1,..., X t, E 1,..., E t ) = P(X 1 ) P(X k X k 1 ) P(E k X k ) k=2 k=1 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 HMM formal specification To completely specify an HMM, we need number N of possible hidden states for each X t number M of possible observations for each E t initial state probabilites π 1,..., π N : π i = P(X 1 = s i ) state transition prob. A ij = P(X t = s i X t 1 = s j ) observation prob. B j (o i ) = P(E t = o i X t = s j ) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
8 HMM notational conventions Distinguish between possible states s 1,..., s N for each r.v. X t and concrete state S t that system is in at time t (one of the s 1,..., s N ) For brevity, write S t instead of X t = S t Same for evidence E t : At time t, one of M possible outputs o 1,..., o M can be observed. Use O t to denote concrete observation at time t (one of o 1,..., o M ) For brevity, write O t instead of E t = O t Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 HMM interesting problems State estimation: What is probability of state s i, given list of observations? I.e., what is P(X t = s i (O 1,..., O t ))? Most probable path: Given observations O 1,..., O t, what is the most probable sequence of states S 1,..., S t? Learning HMMs: Given observations O 1,..., O t, what is most likely HMM to produce these observations? HMM applications: Speech recognition, bioinformatics, consumer decision modelling, economics and finance Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
9 Simple HMM example Employee wants to infer food quality in cafeteria from co-worker s expression after lunch Three food qualities (hidden states): good (g), mediocre (m), bad (b) Three co-worker s expressions (observations): happy (h), indifferent (i), angry (a) One day s food quality influences next (leftovers) X t 2 X t 1 X t X t+1 E t 2 E t 1 E t E t+1 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Simple HMM example (cont.) Start: P(X 1 =g)= 0.3,P(X 1 =m)= 0.5,P(X 1 =b)= 0.2 State transitions: P(g g) = 0.1 P(g m) = 0.3 P(g b) = 0 P(m g) = 0.7 P(m m) = 0.6 P(m b) = 0.8 P(b g) = 0.2 P(b m) = 0.1 P(b b) = 0.2 Observation probabilities: P(h g) = 0.8 P(h m) = 0.3 P(h b) = 0.1 P(i g) = 0.2 P(i m) = 0.5 P(i b) = 0.2 P(a g) = 0 P(a m) = 0.2 P(a b) = 0.7 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
10 Simple HMM example (cont.) Assume first three days are like this: m g m i h h Employee sees only co-worker s expression sequence (i,h,h) What can be inferred about the food quality? Tackle some easier questions first Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Probability of observation sequence What is probability of the observation sequence (i,h,h)? Not very good way (but easier to see): P(i,h,h) = = 3-element state sequence (S 1,S 2,S 3 ) 3-element state sequence (S 1,S 2,S 3 ) How to compute P(S 1, S 2, S 3 )? P((i,h,h) (S 1, S 2, S 3 )) P((i,h,h) (S 1, S 2, S 3 )) P(S 1, S 2, S 3 ) How to compute P((i,h,h) (S 1, S 2, S 3 ))? Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
11 Probability of observation sequence (cont.) How to compute P(S 1, S 2, S 3 )? With cond. independence P(S 1, S 2, S 3 ) = P(S 1 )P(S 2 S 1 )P(S 3 S 2 ) E.g., with (S 1, S 2, S 3 ) = (m,b,b), we get P = = 0.01 How to compute P((i,h,h) (S 1, S 2, S 3 ))? P((i,h,h) (S 1, S 2, S 3 )) = P(i S 1 )P(h S 2 )P(h S 3 ) E.g., with (S 1, S 2, S 3 ) = (m,b,b), we get P = = Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Probability of observation sequence (cont.) Problem: 27 possibilities for (S 1, S 2, S 3 ), so calculating P(i,h,h) = P((i,h,h) (S 1, S 2, S 3 ))P(S 1, S 2, S 3 ) 3-element state sequence (S 1,S 2,S 3 ) requires = 54 calculations (exponential growth in length of sequence) Better: Use same trick as before (dynamic programming) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
12 Dynamic programming for state estimation For observation sequence (O 1,..., O n ) and t n, define α t (i) = P(X t = s i (O 1,..., O t )) as probability of seeing (O 1,..., O t ) and ending in state s i Recursive definition gives polynomial time calculation: B i (O 1 ) π i if t = 1 α t (i) = N B i (O t ) A ik α t 1 (k) if t > 1 k=1 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Dynamic programming for state estimation Now easy to calculate probabilities of interest: Because of marginalizing yields α t (i) = P(X t = s i (O 1,..., O t )) P(O 1,..., O t ) = N α t (i) i=1 and definition of conditional probability gives P(X t = s i (O 1,..., O t )) = α t (i) N i=1 α t(i) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
13 State estimation in cafeteria example Calculate state probabilities for observations (i,h,h): α 1 (g) = = 0.06 α 1 (m) = = 0.25 α 1 (b) = = 0.04 α 2 (g) = = α 2 (m) = = α 2 (b) = = α 3 (g) = = α 3 (m) = = α 3 (b) = = Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 State estimation in cafeteria example Most likely state at time 1 is m with 0.25 P(X 1 = m i) = = Most likely state at time 2 is m with P(X 2 = m (i,h)) = = Most likely state at time 3 is m with P(X 3 = m (i,h,h)) = = Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
14 Inferring most probable path For given observation sequence (O 1,..., O t ), find state sequence (S 1,..., S t ) with P((S 1,..., S t ) (O 1,..., O t )) max Call this best sequence (S 1,..., S t ). Slow idea to calculate: P((S 1,..., S t ) (O 1,..., O t )) = P((O 1,..., O t ) (S 1,..., S t ))P(S 1,..., S t ) P(O 1,..., O t ) (S 1,..., S t ) is not sequence of most likely states! Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Dynamic programming for most prob. path Use dynamic programming: For each state s i and time t, calculate most probable path that ends in s i at t: mpp t (i) Can do this recursively (as before) Key insight: mpp t (i) can be calculated from all mpp t 1 (j) that are one state shorter transition probabilities P(X t = s i X t 1 = s j ) probability B i (O t ) of observing O t in state s i Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
15 Viterbi algorithm Fleshing out these ideas is Viterbi algorithm δ t (i) = max P((S 1,..., S t 1 ) X t = s i (O 1,..., O t )) S 1,...,S t 1 mpp t (i) is the path that achieves probability δ t (i) Recursive formula: { B i (O 1 ) π i if t = 1 δ t (i) = max j {B i (O t ) A ij δ t 1 (j)} if t > 1 Then, (S 1,..., S t ) is mpp t (i) with final state S t = s i s.t. s i = max i δ t (i) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Viterbi algorithm on cafeteria example Calculate most probable path for observations (i,h,h): δ 1 (g) = = 0.06 δ 1 (m) = = 0.25 δ 1 (b) = = 0.04 δ 2 (g) = max{ , , } = 0.06 δ 2 (m) = max{ , , } = δ 2 (b) = max{ , , } = δ 3 (g) = max{ , , } = δ 3 (m) = max{ , , } = δ 3 (b) = max{ , , } = Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
16 Viterbi algorithm on cafeteria example Highest δ 3 value is δ 3 (m) = , so S 3 = m Work backwards t = 3 t = 2: S3 transition g m, so S2 = g One more step t = 2 t = 1: S2 transition m g, so S1 = m Most probable path is therefore (m,g,m) achieved by a achieved by a Not the same as sequence of most probable states (m,m,m)! Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Sample application: speech recognition Have signals, want to find words that generate signals, i.e. maximize P(words signals) With Bayes rules, get P(words signals) = α P(signals words) }{{} P(words) }{{} acoustic model language model Acoustic model comprises pronounciation model and phone model A phone is an atomic speech sound, a phoneme is a set of phones that is indistinguishable in a language Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
17 Phone models Sound is discretized, split into frames (typically 30ms long) and represented by features Analog acoustic signal: Sampled, quantized digital signal: Frames with features: Phone model is P(feature phone) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Pronounciation models Each word is represented as a distribution over phone sequences, implemented as a transition model [t] 0.2 [ow] 1.0 [m] 0.5 [ey] 1.0 [t] 1.0 [ow] 0.8 [ah] [aa] 1.0 P([towmeytow] tomato ) = P([towmaatow] tomato ) = 0.1 P([tahwmeytow] tomato ) = P([tahmaatow] tomato ) = 0.4 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
18 Language models Prior probability P(w 1,..., w n ) of word sequences modeled with Markov assumption (bigram model) P(w 1,..., w n ) = P(w 1 ) n P(w i w i 1 ) i=2 Obtain conditional probabilities by analyzing large texts Can be improved by model of language grammar Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36 Summary Temporal/sequential reasoning achieved by random processes Markov chains: current state depends only on previous state Markov chains widely used in simulations Hidden Markov models when states are not observable State estimation and most probable path by dynamic programming only linear time/space complexity Sample HMM application: speech recognition Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS / 36
Artificial Intelligence Bayesian Networks
Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationDynamic Bayesian Networks and Hidden Markov Models Decision Trees
Lecture 11 Dynamic Bayesian Networks and Hidden Markov Models Decision Trees Marco Chiarandini Deptartment of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationSupervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore
Supervised Learning Hidden Markov Models Some of these slides were inspired by the tutorials of Andrew Moore A Markov System S 2 Has N states, called s 1, s 2.. s N There are discrete timesteps, t=0, t=1,.
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationCourse Overview. Summary. Performance of approximation algorithms
Course Overview Lecture 11 Dynamic ayesian Networks and Hidden Markov Models Decision Trees Marco Chiarandini Deptartment of Mathematics & Computer Science University of Southern Denmark Slides by Stuart
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationCS 188: Artificial Intelligence Fall Recap: Inference Example
CS 188: Artificial Intelligence Fall 2007 Lecture 19: Decision Diagrams 11/01/2007 Dan Klein UC Berkeley Recap: Inference Example Find P( F=bad) Restrict all factors P() P(F=bad ) P() 0.7 0.3 eather 0.7
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationProbability and Time: Hidden Markov Models (HMMs)
Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013 CPSC 322, Lecture 32 Slide 1 Lecture Overview Recap Markov Models Markov Chain
More informationCS532, Winter 2010 Hidden Markov Models
CS532, Winter 2010 Hidden Markov Models Dr. Alan Fern, afern@eecs.oregonstate.edu March 8, 2010 1 Hidden Markov Models The world is dynamic and evolves over time. An intelligent agent in such a world needs
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More information15-381: Artificial Intelligence. Hidden Markov Models (HMMs)
15-381: Artificial Intelligence Hidden Markov Models (HMMs) What s wrong with Bayesian networks Bayesian networks are very useful for modeling joint distributions But they have their limitations: - Cannot
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationGraphical Models for Automatic Speech Recognition
Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationHidden Markov models 1
Hidden Markov models 1 Outline Time and uncertainty Markov process Hidden Markov models Inference: filtering, prediction, smoothing Most likely explanation: Viterbi 2 Time and uncertainty The world changes;
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More informationIntroduction to Stochastic Processes
Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More informationHidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1
Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationHidden Markov Models. Terminology and Basic Algorithms
Hidden Markov Models Terminology and Basic Algorithms What is machine learning? From http://en.wikipedia.org/wiki/machine_learning Machine learning, a branch of artificial intelligence, is about the construction
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationHidden Markov Models. George Konidaris
Hidden Markov Models George Konidaris gdk@cs.brown.edu Fall 2018 Recall: Bayesian Network Flu Allergy Sinus Nose Headache Recall: BN Flu Allergy Flu P True 0.6 False 0.4 Sinus Allergy P True 0.2 False
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationOutline. Logical Agents. Logical Reasoning. Knowledge Representation. Logical reasoning Propositional Logic Wumpus World Inference
Outline Logical Agents ECE57 Applied Artificial Intelligence Spring 007 Lecture #6 Logical reasoning Propositional Logic Wumpus World Inference Russell & Norvig, chapter 7 ECE57 Applied Artificial Intelligence
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationMultiscale Systems Engineering Research Group
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials
More informationDept. of Linguistics, Indiana University Fall 2009
1 / 14 Markov L645 Dept. of Linguistics, Indiana University Fall 2009 2 / 14 Markov (1) (review) Markov A Markov Model consists of: a finite set of statesω={s 1,...,s n }; an signal alphabetσ={σ 1,...,σ
More informationCS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering
CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering Cengiz Günay, Emory Univ. Günay Ch. 15,20 Hidden Markov Models and Particle FilteringSpring 2013 1 / 21 Get Rich Fast!
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationA gentle introduction to Hidden Markov Models
A gentle introduction to Hidden Markov Models Mark Johnson Brown University November 2009 1 / 27 Outline What is sequence labeling? Markov models Hidden Markov models Finding the most likely state sequence
More informationReasoning under Uncertainty: Intro to Probability
Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15, 2010 CPSC 322, Lecture 24 Slide 1 To complete your Learning about Logics Review
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationMachine Learning for Data Science (CS4786) Lecture 19
Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationOutline. Logical Agents. Logical Reasoning. Knowledge Representation. Logical reasoning Propositional Logic Wumpus World Inference
Outline Logical Agents ECE57 Applied Artificial Intelligence Spring 008 Lecture #6 Logical reasoning Propositional Logic Wumpus World Inference Russell & Norvig, chapter 7 ECE57 Applied Artificial Intelligence
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More informationParametric Models Part III: Hidden Markov Models
Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent
More informationCSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9
CSCI 5832 Natural Language Processing Jim Martin Lecture 9 1 Today 2/19 Review HMMs for POS tagging Entropy intuition Statistical Sequence classifiers HMMs MaxEnt MEMMs 2 Statistical Sequence Classification
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCS 7180: Behavioral Modeling and Decision- making in AI
CS 7180: Behavioral Modeling and Decision- making in AI Learning Probabilistic Graphical Models Prof. Amy Sliva October 31, 2012 Hidden Markov model Stochastic system represented by three matrices N =
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationQuantifying uncertainty & Bayesian networks
Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 8.1 and 8.2 Status Check Projects Project 2 Midterm is coming, please do your homework!
More informationPengju
Introduction to AI Chapter13 Uncertainty Pengju Ren@IAIR Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Example: Car diagnosis Wumpus World Environment Squares
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationReasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Alan Mackworth UBC CS 322 Uncertainty 2 March 13, 2013 Textbook 6.1.3 Lecture Overview Recap: Probability & Possible World Semantics
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationCSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto
CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto Revisiting PoS tagging Will/MD the/dt chair/nn chair/?? the/dt meeting/nn from/in that/dt
More informationAnnouncements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example
CS 88: Artificial Intelligence Fall 29 Lecture 9: Hidden Markov Models /3/29 Announcements Written 3 is up! Due on /2 (i.e. under two weeks) Project 4 up very soon! Due on /9 (i.e. a little over two weeks)
More information