Spracherkennung. Vorlesung Computerlinguistische Techniken Alexander Koller. 4. Dezember 2015

Size: px
Start display at page:

Download "Spracherkennung. Vorlesung Computerlinguistische Techniken Alexander Koller. 4. Dezember 2015"

Transcription

1 Spracherkennung Vorlesung Computerlinguistische Techniken Alexander Koller 4. Dezember 2015

2 Spracherkennung

3 Spracherkennung (automatic speech recognition = ASR) T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)

4 Spracherkennung (automatic speech recognition = ASR) T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) sh she AFT Time (s) iy

5 Das Noisy-Channel-Modell Annahme: Satz wurde durch verrauschten Übertragungskanal verzerrt. Will aus verrauschtem Signal den ursprünglichen Satz rekonstruieren. one two three decoding one two three W O Frage: Gegeben akustischen Input O, was ist der wahrscheinlichste Satz W, der zu O verzerrt worden sein kann?

6 Noisy-Channel-Modell Wir formalisieren Rekonstruktion von W wie folgt: Ŵ = arg max W = arg max W = arg max W P (W O) P (O W ) P (W ) P (O) P (O W ) P (W ) noisy channel hier: akustisches Modell Original-Signal hier: Sprachmodell

7 HMM-basierte ASR one two three w ah n t uw th r iy Aussprache-Lexikon w ah n P(one) P(one two) P(three) P(two) t P(two two) uh P(three two) th r iy

8 HMM-basierte ASR one two three w ah n t uw th r iy ah b ah m ah f Aussprache-Lexikon Subphon-HMM für jedes Phon w b w m w f ah b ah m ah f n b n m n f P(one) P(one two) P(three) P(two) t b t m t f uh b uh m uh f P(two two) P(three two) th b th m th f r b r m r f iy b iy m iy f

9 HMM-basierte ASR one two three w ah n t uw th r iy ah b ah m ah f Aussprache-Lexikon Subphon-HMM für jedes Phon w b w m w f ah b ah m ah f n b n m n f P(one) P(one two) P(three) P(two) t b t m t f uh b uh m uh f P(two two) P(three two) th b th m th f r b r m r f iy b iy m iy f

10 Übersicht Verwende HMM mit diesen Bestandteilen: Zustände sind Subphone (Anfang/Mitte/Ende eines Phons) Beobachtungen sind Vektoren von akustischen Features HMM codiert Phon-Sequenz für jedes Wort im Lexikon Sprachmodell = Übergänge vom Ende eines Worts zu Anfang eines Worts Wir müssen uns überlegen: W.verteilung über akustische Featurevektoren? Training? Viterbi klar; aber effizient genug?

11 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)

12 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)

13 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz F0 she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)

14 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz Formanten F0 she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)

15 Akustische Features Frequenz she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Frames (25 ms Breite, alle 10 ms) FT inverse FFT magnitude spectrum log-magnitude spectrum cepstrum akustische Features für jeden Frame (39-dim. Vektor von Float-Zahlen): FT 12 cepstral 12 delta cepstral 12 double delta cepstral 1 Energie 1 delta Energie 1 double delta Energie

16 Stetige W.verteilungen Beobachtungen sind jetzt Vektoren von Zahlen. muss HMM anpassen: unendlicher Wertebereich für Emissions-W. (genaugenommen: nicht abzählbarer) solche Fälle kann man mit stetigen W.verteilungen beschreiben Dichtefunktion (pdf) f(x) P (x 1 apple X apple x 2 )= Z x2 x 1 f(x) dx

17 Die Normalverteilung Wichtigste stetige WV ist die Normalverteilung (oder Gauß-Verteilung). eindeutig charakterisiert durch Mittelwert μ und Varianz σ 2 N (x; µ, )= 1 (x µ) 2 p exp

18 Emissions-W. im HMM Definiere jetzt W., im Zustand j den D-dim. Vektor o = (o 1,, o D ) auszugeben: b j (o) = DY d=1 N (o d ; µ jd, Bemerkungen: 2 jd) = Ignoriert statistische Korrelationen zwischen Dimensionen. Aber cepstral Features nur schwach korreliert. b j (o) nicht wirklich eine W. (aber das ist okay). Gauß-Verteilung in der Praxis zu einfach; verwende stattdessen Gaussian Mixture Models (GMM). Modell-Parameter: μ jd, σ 2 jd für alle j, d. DY d=1 q 1 exp 2 jd 2! (o d µ jd ) jd

19 Decoding Decoding = berechne beste Sequenz von (Sub)phonen für akustischen Input, laut HMM. Kann man mit Viterbi machen. Problem: Laufzeit von Viterbi ist O(N 2 T), und N ist sehr groß. In der Praxis verwendet man Beam Search: wähle Faktor θ < 1 in Schritt t+1 schauen wir nur Zustände q j als Vorgänger an, falls V t (j) > θ max i V t (i) erhöht Decoder-Geschwindigkeit dramatisch; auch in vielen anderen Situationen nützlich

20 Aufnahme Training

21 Training Aufnahme Feature-Vektoren T

22 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T

23 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T Textkorpus

24 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T Initiales HMM LM Textkorpus

25 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T EM-Training Initiales HMM LM Textkorpus

26 EM für Gaussian HMMs Adaptiere M-Step des Forward-Backward-Algorithmus für normalverteilte Emissions-W.: µ jd = P T t=1 t(j) o td P T t=1 t(j) 2 jd = P T t=1 t(j) (o td µ jd ) 2 P T t=1 t(j) Initialisiere Übergangsw. zu 0.5 (für erlaubte Übergänge) bzw. zu 0 (für verbotene).

27 Hard EM Verwendung des Forward-Backward-Algorithmus ( richtiges oder weiches EM) korrekt, aber langsam. In der Praxis häufige Approximation: Viterbi-EM (aka hard EM ): wir berechnen in jeder Iteration von EM die eine beste Zustandssequenz mit Viterbi und tun dann so, als ob das die wahre Zustandssequenz ist, und verwenden einfach Maximum-Likelihood-Schätzung wiederhole, bis Parameter konvergieren (müssen sie theoretisch nicht, aber in der Praxis geht es oft) in der Praxis 1-2 Größenordnungen schneller als richtige EM (Rodriguez & Torres 03).

28 Evaluation D total Übliches Fehlermaß ist die Word Error Rate (WER): WER = 100 Insertions + Substitutions + Deletions words in correct transcript Berechne minimale Anzahl von ISD effizient als Minimum Edit Distance (= Levenshtein-Distanz). REF: i *** ** UM the PHONE IS i LEFT THE portable **** PHONE UPSTAIRS last night HYP: i GOT IT TO the ***** FULLEST i LOVE TO portable FORM OF STORES last night Eval: I I S D S S S I S S WER = 100 * ( ) / 13 = 76.9%

29 Stand der Kunst (2012) TASK HOURS OF TRAINING DATA DNN-HMM GMM-HMM WITH SAME DATA GMM-HMM WITH MORE DATA SWITCHBOARD (TEST SET 1) (2,000 H) SWITCHBOARD (TEST SET 2) (2,000 H) ENGLISH BROADCAST NEWS BING VOICE SEARCH (SENTENCE ERROR RATES) GOOGLE VOICE INPUT 5, (22 5,870 H) YOUTUBE 1, DNN = deep neural networks (Hinton et al. 2012)

30 Stand der Kunst (2012) TASK HOURS OF TRAINING DATA DNN-HMM GMM-HMM WITH SAME DATA GMM-HMM WITH MORE DATA SWITCHBOARD (TEST SET 1) (2,000 H) SWITCHBOARD (TEST SET 2) (2,000 H) ENGLISH BROADCAST NEWS BING VOICE SEARCH (SENTENCE ERROR RATES) GOOGLE VOICE INPUT 5, (22 5,870 H) YOUTUBE 1, DNN = deep neural networks (Hinton et al. 2012)

31 Zusammenfassung Spracherkennung (ASR) zentrales Problem in der Computerlinguistik. Klassischer Ansatz verwendet bekannte Bausteine: n-gramm-sprachmodelle Hidden Markov Models mit stetigen Ausgabew. Viele Algorithmen direkt einsetzbar. Topaktuell: Tiefe neuronale Netze statt HMM.

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Algebra. Übungsblatt 12 (Lösungen)

Algebra. Übungsblatt 12 (Lösungen) Fakultät für Mathematik Sommersemester 2017 JProf Dr Christian Lehn Dr Alberto Castaño Domínguez Algebra Übungsblatt 12 (Lösungen) Aufgabe 1 Berechnen Sie Minimalpolynome f i, i = 1,, 4, der folgenden

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

D-optimally Lack-of-Fit-Test-efficient Designs and Related Simple Designs

D-optimally Lack-of-Fit-Test-efficient Designs and Related Simple Designs AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 3&4, 245 253 D-optimally Lack-of-Fit-Test-efficient Designs and Related Simple Designs Wolfgang Bischoff Catholic University ichstätt-ingolstadt,

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Ordinals and Cardinals: Basic set-theoretic techniques in logic

Ordinals and Cardinals: Basic set-theoretic techniques in logic Ordinals and Cardinals: Basic set-theoretic techniques in logic Benedikt Löwe Universiteit van Amsterdam Grzegorz Plebanek Uniwersytet Wroc lawski ESSLLI 2011, Ljubljana, Slovenia This course is a foundational

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Oliver Kullmann Computer Science Department Swansea University. MRes Seminar Swansea, November 17, 2008

Oliver Kullmann Computer Science Department Swansea University. MRes Seminar Swansea, November 17, 2008 Computer Science Department Swansea University MRes Seminar Swansea, November 17, 2008 Introduction In this lecture some fundamental aspects of set theory and related to ideals (and their existence) are

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Grundlagen Fernerkundung - 12

Grundlagen Fernerkundung - 12 Grundlagen Fernerkundung - 12 Image classification GEO123.1, FS2015 Michael Schaepman, Rogier de Jong, Reik Leiterer 5/7/15 Page 1 5/7/15 Grundlagen Fernerkundung FS2011 Page 2 5/7/15 Grundlagen Fernerkundung

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

AGILE. Color available - A. Black White Chrome

AGILE. Color available - A. Black White Chrome AGILE AGILE-Serie sind für den Einzelhandel, Ausstellungen, Ladengeschäft, Kunstgalerien, Museen und auch in der kommerziellen Anwendung konzipiert. AGILE Wallwasher sind für Neigungswinkel von Grad und

More information

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems.

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems. CS 294-5: Statistical Natural Language Processing Speech Recognition Lecture 20: 11/22/05 Slides directly from Dan Jurafsky, indirectly many others Speech Recognition Overview: Demo Phonetics Articulatory

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Biophysics of Macromolecules

Biophysics of Macromolecules Biophysics of Macromolecules Lecture 11: Dynamic Force Spectroscopy Rädler/Lipfert SS 2014 - Forced Ligand-Receptor Unbinding - Bell-Evans Theory 22. Mai. 2014 AFM experiments with single molecules custom-built

More information

1. Einleitung. 1.1 Organisatorisches. Ziel der Vorlesung: Einführung in die Methoden der Ökonometrie. Voraussetzungen: Deskriptive Statistik

1. Einleitung. 1.1 Organisatorisches. Ziel der Vorlesung: Einführung in die Methoden der Ökonometrie. Voraussetzungen: Deskriptive Statistik 1. Einleitung 1.1 Organisatorisches Ziel der Vorlesung: Einführung in die Methoden der Ökonometrie Voraussetzungen: Deskriptive Statistik Wahrscheinlichkeitsrechnung und schließende Statistik Fortgeschrittene

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Bayesian D-optimal Design

Bayesian D-optimal Design Bayesian D-optimal Design Susanne Zaglauer, Michael Deflorian Abstract D-optimal and model based experimental designs are often criticised because of their dependency to the statistical model and the lac

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Algebra. Übungsblatt 10 (Lösungen)

Algebra. Übungsblatt 10 (Lösungen) Fakultät für Mathematik Sommersemester 2017 JProf. Dr. Christian Lehn Dr. Alberto Castaño Domínguez Algebra Übungsblatt 10 (Lösungen) Aufgabe 1. Es sei k ein Körper. Man zeige, dass es in k[x] unendlich

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Speech Recognition HMM

Speech Recognition HMM Speech Recognition HMM Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz FIT BUT Brno Speech Recognition HMM Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/38 Agenda Recap variability

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Wavelets for Computer Graphics. AK Computergrafik WS 2005/06. Markus Grabner

Wavelets for Computer Graphics. AK Computergrafik WS 2005/06. Markus Grabner Wavelets for Computer Graphics AK Computergrafik WS 2005/06 Markus Grabner 1/49 Content Introduction Simple example (Haar wavelet basis) Mathematical background Image operations Other useful properties

More information

SS BMMM01 Basismodul Mathematics/Methods Block 1: Mathematics for Economists. Prüfer: Prof. Dr.

SS BMMM01 Basismodul Mathematics/Methods Block 1: Mathematics for Economists. Prüfer: Prof. Dr. SS 2018 02.06.2018 1289BMMM01 Basismodul Mathematics/Methods Block 1: Mathematics for Economists Prüfer: Prof. Dr. Rainer Dyckerhoff Bitte füllen Sie die nachfolgende Zeile aus! Matrikelnummer (student

More information

Algebra. Übungsblatt 8 (Lösungen) m = a i m i, m = i=1

Algebra. Übungsblatt 8 (Lösungen) m = a i m i, m = i=1 Fakultät für Mathematik Sommersemester 2017 JProf. Dr. Christian Lehn Dr. Alberto Castaño Domínguez Algebra Übungsblatt 8 (Lösungen) Aufgabe 1. Es seien R ein Ring, m R ein maximales Ideal und M ein R-Modul.

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

1. Positive and regular linear operators.

1. Positive and regular linear operators. 1. Positive and regular linear operators. Objekttyp: Chapter Zeitschrift: L'Enseignement Mathématique Band (Jahr): 19 (1973) Heft 3-4: L'ENSEIGNEMENT MATHÉMATIQUE PDF erstellt am: 1.0.018 Nutzungsbedingungen

More information

Numerical Methods of Electromagnetic Field Theory II (NFT II) Numerische Methoden der Elektromagnetischen Feldtheorie II (NFT II) /

Numerical Methods of Electromagnetic Field Theory II (NFT II) Numerische Methoden der Elektromagnetischen Feldtheorie II (NFT II) / umerical Methods of lectromagnetic Field Theory II (FT II) umerische Methoden der lektromagnetischen Feldtheorie II (FT II) / 7th Lecture / 7. Vorlesung Dr.-Ing. René Marklein marklein@uni-kassel.de http://www.tet.e-technik.uni-kassel.de

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

Organische Chemie IV: Organische Photochemie

Organische Chemie IV: Organische Photochemie Organische Chemie IV: Organische Photochemie Wintersemester 2015/16 Technische Universität München Klausur am 19.02.2016 Name, Vorname... Matrikel-Nr.... (Druckbuchstaben) geboren am... in...... (Eigenhändige

More information

Chapter 9 Automatic Speech Recognition DRAFT

Chapter 9 Automatic Speech Recognition DRAFT P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.

More information

Organische Chemie IV: Organische Photochemie

Organische Chemie IV: Organische Photochemie Organische Chemie IV: Organische Photochemie Wintersemester 2014/15 Technische Universität München Klausur am 05.02.2015 Name, Vorname... Matrikel-Nr.... (Druckbuchstaben) geboren am... in...... (Eigenhändige

More information

T Automatic Speech Recognition: From Theory to Practice

T Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 20, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

HMM: Parameter Estimation

HMM: Parameter Estimation I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

Counterexamples in the Work of Karl Weierstraß

Counterexamples in the Work of Karl Weierstraß Counterexamples in the Work of Karl Weierstraß Tom Archibald Dept. of Mathematics Simon Fraser University tarchi@sfu.ca Weierstraß 200, BBAW, Oct. 31, 2015 1 / 22 Outline 1 Introduction 2 The Dirichlet

More information

12. Lecture Stochastic Optimization

12. Lecture Stochastic Optimization Soft Control (AT 3, RMA) 12. Lecture Stochastic Optimization Differential Evolution 12. Structure of the lecture 1. Soft control: the definition and limitations, basics of expert" systems 2. Knowledge

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Covariograms of convex bodies in the plane: A remark on Nagel s theorem

Covariograms of convex bodies in the plane: A remark on Nagel s theorem Elem. Math. 57 (00) 61 65 0013-6018/0/00061-5 $ 1.50+0.0/0 c Birkhäuser Verlag, Basel, 00 Elemente der Mathematik Covariograms of convex bodies in the plane: A remark on Nagel s theorem Daniel Neuenschwander

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics 4. Monte Carlo Methods Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Monte Carlo Method Any method which

More information

CS 136 Lecture 5 Acoustic modeling Phoneme modeling

CS 136 Lecture 5 Acoustic modeling Phoneme modeling + September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians

More information

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation

More information

Organische Chemie IV: Organische Photochemie

Organische Chemie IV: Organische Photochemie rganische Chemie IV: rganische Photochemie Sommersemester 2006 Technische Universität München Klausur am 04.08.2006 ame; Vorname... Matrikel-r.... (Druckbuchstaben) geboren am... in...... (Eigenhändige

More information

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1 Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position

More information

Primzahltests und das Faktorisierungsproblem

Primzahltests und das Faktorisierungsproblem Primzahltests und das Faktorisierungsproblem Ausgewählte Folien zur Vorlesung Wintersemester 2007/2008 Dozent: Prof. Dr. J. Rothe Heinrich-Heine-Universität Düsseldorf http://ccc.cs.uni-duesseldorf.de/

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

1 3 4 5 6 7 8 9 10 11 12 13 Convolutions in more detail material for this part of the lecture is taken mainly from the theano tutorial: http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Deep Learning for Automatic Speech Recognition Part I

Deep Learning for Automatic Speech Recognition Part I Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech

More information

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen

Robust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

Graphical Models Seminar

Graphical Models Seminar Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013

More information

Quantum physics from coarse grained classical probabilities

Quantum physics from coarse grained classical probabilities Quantum physics from coarse grained classical probabilities p z 0 x ψ y what is an atom? quantum mechanics : isolated object quantum field theory : excitation of complicated vacuum classical statistics

More information

Monaural speech separation using source-adapted models

Monaural speech separation using source-adapted models Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications

More information

Übungen zur Quantenmechanik (T2)

Übungen zur Quantenmechanik (T2) Arnold Sommerfeld Center LudwigMaximiliansUniversität München Prof Dr Stefan Hofmann Wintersemester 08/9 Übungen zur Quantenmechanik (T) Übungsblatt, Besprechung vom 0 40 Aufgabe Impuls Zeigen Sie für

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Observable Operator Models

Observable Operator Models AUSTRIAN JOURNAL OF STATISTICS Volume 36 (2007), Number 1, 41 52 Observable Operator Models Ilona Spanczér 1 Dept. of Mathematics, Budapest University of Technology and Economics Abstract: This paper describes

More information

ASR using Hidden Markov Model : A tutorial

ASR using Hidden Markov Model : A tutorial ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information