Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

Similar documents
Today. CS 188: Artificial Intelligence. Recap: Reasoning Over Time. Particle Filters and Applications of HMMs. HMMs

CS 188: Artificial Intelligence Fall Announcements

Pacman in the News. Announcements. Pacman in the News. CS 188: Ar7ficial Intelligence Par7cle Filters and A pplica7ons of HMMs

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2011

Reasoning over Time or Space. CS 188: Artificial Intelligence. Outline. Markov Models. Conditional Independence. Query: P(X 4 )

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Probabilistic Reasoning. CS 188: Artificial Intelligence Spring Inference by Enumeration. Probability recap. Chain Rule à Bayes net

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Bayesian Networks: Approximate Inference

Hidden Markov Models

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

LECTURE NOTE #12 PROF. ALAN YUILLE

Reinforcement learning II

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Forward algorithm vs. particle filtering

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

19 Optimal behavior: Game theory

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Inductive and statistical learning of formal grammars

Finite Automata-cont d

Learning Partially Observable Markov Models from First Passage Times

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Continuous Joint Distributions Chris Piech CS109, Stanford University

CS 188: Artificial Intelligence

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Bellman Optimality Equation for V*

Monte Carlo method in solving numerical integration and differential equation

Minimal DFA. minimal DFA for L starting from any other

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Review of Calculus, cont d

More on automata. Michael George. March 24 April 7, 2014

Learning Moore Machines from Input-Output Traces

First Midterm Examination

Section: Other Models of Turing Machines. Definition: Two automata are equivalent if they accept the same language.

Hybrid Control and Switched Systems. Lecture #2 How to describe a hybrid system? Formal models for hybrid system

Section 6.1 Definite Integral

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Nondeterminism and Nodeterministic Automata

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

Review of Gaussian Quadrature method

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Fingerprint idea. Assume:

Hidden Markov Model Induction by Bayesian Model Merging

What We re Talking About Today. Lecture 8. Part I. Review. LVCSR Training

Homework 3 Solutions

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

FABER Formal Languages, Automata and Models of Computation

Convert the NFA into DFA

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

2D1431 Machine Learning Lab 3: Reinforcement Learning

Chapter 5 Plan-Space Planning

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Reinforcement Learning

Non-Linear & Logistic Regression

Lexical Analysis Part III

Model Reduction of Finite State Machines by Contraction

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

This lecture covers Chapter 8 of HMU: Properties of CFLs

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Formal Language and Automata Theory (CS21004)

Chapter 2 Finite Automata

Non-deterministic Finite Automata

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

TM M ... TM M. right half left half # # ...

A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data

2 b. , a. area is S= 2π xds. Again, understand where these formulas came from (pages ).

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Formal languages, automata, and theory of computation

Tutorial 4. b a. h(f) = a b a ln 1. b a dx = ln(b a) nats = log(b a) bits. = ln λ + 1 nats. = log e λ bits. = ln 1 2 ln λ + 1. nats. = ln 2e. bits.

Simple Harmonic Motion I Sem

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

A Symbolic Approach to Control via Approximate Bisimulations

Terminal Velocity and Raindrop Growth

Module 9: Tries and String Matching

Module 9: Tries and String Matching

Describe in words how you interpret this quantity. Precisely what information do you get from x?

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Some Theory of Computation Exercises Week 1

Fully Kinetic Simulations of Ion Beam Neutralization

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

The Value 1 Problem for Probabilistic Automata

Reasoning with Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Transcription:

CS 188: Artificil Intelligence Advnced HMMs Dn Klein, Pieter Aeel University of Cliforni, Berkeley Demo Bonnz! Tody HMMs Demo onnz! Most likely explntion queries Speech recognition A mssive HMM! Detils of this section not required Strt mchine lerning Recp: Resoning Over Time [demo: sttionry] Mrkov models 0.3 X 1 X 2 X 3 X 4 rin sun 0.7 0.7 0.3 Hidden Mrkov models X 1 X 2 X 3 X 4 X 5 E 1 X E P rin umrell 0.9 rin no umrell 0.1 E 2 E 3 E 4 E 5 sun umrell 0.2 sun no umrell 0.8

Recp: Filtering Elpse time: compute P( X t e 1:t 1 ) Oserve: compute P( X t e 1:t ) X 1 X 2 Belief: <P(rin), P(sun)> <0.5, 0.5> Prior on X 1 <0.82, 0.18> Oserve E 1 E 2 <0.63, 0.37> Elpse time <0.88, 0.12> Oserve [demo: exct filtering] Recp: Prticle Filtering Prticles: trck smples of sttes rther thn n explicit distriution Elpse Weight Resmple Prticles: (1,2) Prticles: (3,1) (1,3) (2,2) Prticles: w=.9 w=.2 w=.9 (3,1) w=.4 w=.4 w=.9 (1,3) w=.1 w=.2 w=.9 (2,2) w=.4 (New) Prticles: (2,2) (1,3) [demo: prticle filtering] Prticle Filtering Root Locliztion In root locliztion: We know the mp, ut not the root s position Oservtions my e vectors of rnge finder redings Stte spce nd redings re typiclly continuous (works siclly like very fine grid) nd so we cnnot store B(X) Prticle filtering is min technique

Root Mpping SLAM: Simultneous Locliztion And Mpping We do not know the mp or our loction Stte consists of position AND mp! Min techniques: Klmn filtering (Gussin HMMs) nd prticle methods DP SLAM, Ron Prr Dynmic Byes Nets (DBNs) We wnt to trck multiple vriles over time, using multiple sources of evidence Ide: Repet fixed Byes net structure t ech time Vriles from time t cn condition on those from t 1 t =1 t =2 t =3 G 1 G 2 G 3 G 1 G 2 G 3 E 1 E 1 E 2 E 2 E 3 E 3 Dynmic Byes nets re generliztion of HMMs Dynmic ByesNets DBN Prticle Filters A prticle is complete smple for time step Initilize: Generte prior smples for the t=1 Byes net Exmple prticle: G 1 = G 1 = (5,3) Elpse time: Smple successor for ech prticle Exmple successor: G 2 = G 2 = (6,3) Oserve: Weight ech entire smple y the likelihood of the evidence conditioned on the smple Likelihood: P(E 1 G 1 ) * P(E 1 G 1 ) Resmple: Select prior smples (tuples of vlues) in proportion to their likelihood

Most Likely Explntion Stte Trellis Stte trellis: grph of sttes nd trnsitions over time sun sun sun sun rin rin rin rin Ech rc represents some trnsition Ech rc hs weight Ech pth is sequence of sttes (evidence The product of weights on pth is tht sequence s proility long with the evidence Forwrd lgorithm computes sums of pths, Viteri computes est pths HMMs: MLE Queries HMMs defined y Sttes X Oservtions E Initil distriution: Trnsitions: Emissions: X 1 X 2 X 3 X 4 E 1 E 2 E 3 E 4 E 5 X 5 New query: most likely explntion: New method: the Viteri lgorithm Forwrd / Viteri Algorithms sun sun sun sun rin rin rin rin Forwrd Algorithm (Sum) Viteri Algorithm (Mx)

Nturl Lnguge Speech technologies (e.g. Siri) Automtic speech recognition (ASR) Text to speech synthesis (TTS) Dilog systems Lnguge processing technologies Question nswering Mchine trnsltion We serch Text clssifiction, spm filtering, etc Digitizing Speech Speech Recognition Speech in n Hour Speech input is n coustic wveform s p ee ch l l to trnsition: Figure: Simon Arnfield, http://www.psyc.leeds.c.uk/reserch/cogn/speech/tutoril/

Spectrl Anlysis Frequency gives pitch; mplitude gives volume Smpling t ~8 khz (phone), ~16 khz (mic) (khz=1000 cycles/sec) s p ee ch l Fourier trnsform of wve displyed s spectrogrm Drkness indictes energy t ech frequency Humn er figure: depion.logspot.com Why These Peks? Articultor process: Vocl cord virtions crete hrmonics The mouth is n mplifier Depending on shpe of mouth, some hrmonics re mplified more thn others Prt of [e] from l Complex wve repeting nine times Plus smller wve tht repets 4x for every lrge cycle Lrge wve: freq of 250 Hz (9 times in.036 seconds) Smll wve roughly 4 times this, or roughly 1000 Hz Resonnces of the Vocl Trct The humn vocl trct s n open tue Open end Closed end Length 17.5 cm. Air in tue of given length will tend to virte t resonnce frequency of tue Constrint: Pressure differentil should e mximl t (closed) glottl end nd miniml t (open) lip end Figure: W. Brry Speech Science slides

Spectrum Shpes [ demo ] Figure: Mrk Liermn Acoustic Feture Sequence Time slices re trnslted into coustic feture vectors (~39 rel numers per slice)..e 12 e 13 e 14 e 15 e 16.. These re the oservtions E, now we need the hidden sttes X Vowel [i] sung t successively higher pitches F#2 A2 C3 F#3 A3 C4 (middle C) A4 Grphs: Rtree Wylnd Speech Stte Spce HMM Specifiction P(E X) encodes which coustic vectors re pproprite for ech phoneme (ech kind of sound) P(X X ) encodes how sounds cn e strung together Stte Spce We will hve one stte for ech sound in ech word Mostly, sttes dvnce sound y sound Build little stte grph for ech word nd chin them together to form the stte spce X

Trnsitions with Bigrm Model 198015222 the first 194623024 the sme 168504105 the following 158562063 the world 14112454 the door ----------------------------------- 23135851162 the * Trining Counts Sttes in Word Decoding Finding the words given the coustics is n HMM inference prolem Which stte sequence x 1:T is most likely given the evidence e 1:T? From the sequence x, we cn simply red off the words Figure: Hung et l, p. 618 End of Prt II! Now we re done with our unit on proilistic resoning Lst prt of clss: mchine lerning

Mchine Lerning Prmeter Estimtion Estimting the distriution of rndom vrile Elicittion: sk humn (why is this hrd?) Empiriclly: use trining dt (lerning!) E.g.: for ech outcome x, look t the empiricl rte of tht vlue: r r This is the estimte tht mximizes the likelihood of the dt r r r Mchine Lerning Up until now: how use model to mke optiml decisions Mchine lerning: how to cquire model from dt / experience Lerning prmeters (e.g. proilities) Lerning structure (e.g. BN grphs) Lerning hidden concepts (e.g. clustering) Estimtion: Smoothing Reltive frequencies re the mximum likelihood estimtes Another option is to consider the most likely prmeter vlue given the dt???? r r

Smoothing Estimtion: Lplce Smoothing Lplce s estimte (extended): Pretend you sw every outcome k extr times r r Wht s Lplce with k = 0? k is the strength of the prior Lplce for conditionls: Smooth ech condition independently: Estimtion: Lplce Smoothing Lplce s estimte: Pretend you sw every outcome once more thn you ctully did r r Cn derive this estimte with Dirichlet priors (see cs281)