Spracherkennung. Vorlesung Computerlinguistische Techniken Alexander Koller. 4. Dezember 2015
|
|
- Frederick Jacobs
- 5 years ago
- Views:
Transcription
1 Spracherkennung Vorlesung Computerlinguistische Techniken Alexander Koller 4. Dezember 2015
2 Spracherkennung
3 Spracherkennung (automatic speech recognition = ASR) T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)
4 Spracherkennung (automatic speech recognition = ASR) T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) sh she AFT Time (s) iy
5 Das Noisy-Channel-Modell Annahme: Satz wurde durch verrauschten Übertragungskanal verzerrt. Will aus verrauschtem Signal den ursprünglichen Satz rekonstruieren. one two three decoding one two three W O Frage: Gegeben akustischen Input O, was ist der wahrscheinlichste Satz W, der zu O verzerrt worden sein kann?
6 Noisy-Channel-Modell Wir formalisieren Rekonstruktion von W wie folgt: Ŵ = arg max W = arg max W = arg max W P (W O) P (O W ) P (W ) P (O) P (O W ) P (W ) noisy channel hier: akustisches Modell Original-Signal hier: Sprachmodell
7 HMM-basierte ASR one two three w ah n t uw th r iy Aussprache-Lexikon w ah n P(one) P(one two) P(three) P(two) t P(two two) uh P(three two) th r iy
8 HMM-basierte ASR one two three w ah n t uw th r iy ah b ah m ah f Aussprache-Lexikon Subphon-HMM für jedes Phon w b w m w f ah b ah m ah f n b n m n f P(one) P(one two) P(three) P(two) t b t m t f uh b uh m uh f P(two two) P(three two) th b th m th f r b r m r f iy b iy m iy f
9 HMM-basierte ASR one two three w ah n t uw th r iy ah b ah m ah f Aussprache-Lexikon Subphon-HMM für jedes Phon w b w m w f ah b ah m ah f n b n m n f P(one) P(one two) P(three) P(two) t b t m t f uh b uh m uh f P(two two) P(three two) th b th m th f r b r m r f iy b iy m iy f
10 Übersicht Verwende HMM mit diesen Bestandteilen: Zustände sind Subphone (Anfang/Mitte/Ende eines Phons) Beobachtungen sind Vektoren von akustischen Features HMM codiert Phon-Sequenz für jedes Wort im Lexikon Sprachmodell = Übergänge vom Ende eines Worts zu Anfang eines Worts Wir müssen uns überlegen: W.verteilung über akustische Featurevektoren? Training? Viterbi klar; aber effizient genug?
11 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)
12 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)
13 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz F0 she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)
14 Spektrogramme Intensität T she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Fourier-Transformation Frequenz Formanten F0 she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s)
15 Akustische Features Frequenz she just had a baby sh iy j ax s h ae dx ax b ey b iy Time (s) Frames (25 ms Breite, alle 10 ms) FT inverse FFT magnitude spectrum log-magnitude spectrum cepstrum akustische Features für jeden Frame (39-dim. Vektor von Float-Zahlen): FT 12 cepstral 12 delta cepstral 12 double delta cepstral 1 Energie 1 delta Energie 1 double delta Energie
16 Stetige W.verteilungen Beobachtungen sind jetzt Vektoren von Zahlen. muss HMM anpassen: unendlicher Wertebereich für Emissions-W. (genaugenommen: nicht abzählbarer) solche Fälle kann man mit stetigen W.verteilungen beschreiben Dichtefunktion (pdf) f(x) P (x 1 apple X apple x 2 )= Z x2 x 1 f(x) dx
17 Die Normalverteilung Wichtigste stetige WV ist die Normalverteilung (oder Gauß-Verteilung). eindeutig charakterisiert durch Mittelwert μ und Varianz σ 2 N (x; µ, )= 1 (x µ) 2 p exp
18 Emissions-W. im HMM Definiere jetzt W., im Zustand j den D-dim. Vektor o = (o 1,, o D ) auszugeben: b j (o) = DY d=1 N (o d ; µ jd, Bemerkungen: 2 jd) = Ignoriert statistische Korrelationen zwischen Dimensionen. Aber cepstral Features nur schwach korreliert. b j (o) nicht wirklich eine W. (aber das ist okay). Gauß-Verteilung in der Praxis zu einfach; verwende stattdessen Gaussian Mixture Models (GMM). Modell-Parameter: μ jd, σ 2 jd für alle j, d. DY d=1 q 1 exp 2 jd 2! (o d µ jd ) jd
19 Decoding Decoding = berechne beste Sequenz von (Sub)phonen für akustischen Input, laut HMM. Kann man mit Viterbi machen. Problem: Laufzeit von Viterbi ist O(N 2 T), und N ist sehr groß. In der Praxis verwendet man Beam Search: wähle Faktor θ < 1 in Schritt t+1 schauen wir nur Zustände q j als Vorgänger an, falls V t (j) > θ max i V t (i) erhöht Decoder-Geschwindigkeit dramatisch; auch in vielen anderen Situationen nützlich
20 Aufnahme Training
21 Training Aufnahme Feature-Vektoren T
22 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T
23 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T Textkorpus
24 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T Initiales HMM LM Textkorpus
25 Training manuelle Transkription one two three Aufnahme Transkription Feature-Vektoren T EM-Training Initiales HMM LM Textkorpus
26 EM für Gaussian HMMs Adaptiere M-Step des Forward-Backward-Algorithmus für normalverteilte Emissions-W.: µ jd = P T t=1 t(j) o td P T t=1 t(j) 2 jd = P T t=1 t(j) (o td µ jd ) 2 P T t=1 t(j) Initialisiere Übergangsw. zu 0.5 (für erlaubte Übergänge) bzw. zu 0 (für verbotene).
27 Hard EM Verwendung des Forward-Backward-Algorithmus ( richtiges oder weiches EM) korrekt, aber langsam. In der Praxis häufige Approximation: Viterbi-EM (aka hard EM ): wir berechnen in jeder Iteration von EM die eine beste Zustandssequenz mit Viterbi und tun dann so, als ob das die wahre Zustandssequenz ist, und verwenden einfach Maximum-Likelihood-Schätzung wiederhole, bis Parameter konvergieren (müssen sie theoretisch nicht, aber in der Praxis geht es oft) in der Praxis 1-2 Größenordnungen schneller als richtige EM (Rodriguez & Torres 03).
28 Evaluation D total Übliches Fehlermaß ist die Word Error Rate (WER): WER = 100 Insertions + Substitutions + Deletions words in correct transcript Berechne minimale Anzahl von ISD effizient als Minimum Edit Distance (= Levenshtein-Distanz). REF: i *** ** UM the PHONE IS i LEFT THE portable **** PHONE UPSTAIRS last night HYP: i GOT IT TO the ***** FULLEST i LOVE TO portable FORM OF STORES last night Eval: I I S D S S S I S S WER = 100 * ( ) / 13 = 76.9%
29 Stand der Kunst (2012) TASK HOURS OF TRAINING DATA DNN-HMM GMM-HMM WITH SAME DATA GMM-HMM WITH MORE DATA SWITCHBOARD (TEST SET 1) (2,000 H) SWITCHBOARD (TEST SET 2) (2,000 H) ENGLISH BROADCAST NEWS BING VOICE SEARCH (SENTENCE ERROR RATES) GOOGLE VOICE INPUT 5, (22 5,870 H) YOUTUBE 1, DNN = deep neural networks (Hinton et al. 2012)
30 Stand der Kunst (2012) TASK HOURS OF TRAINING DATA DNN-HMM GMM-HMM WITH SAME DATA GMM-HMM WITH MORE DATA SWITCHBOARD (TEST SET 1) (2,000 H) SWITCHBOARD (TEST SET 2) (2,000 H) ENGLISH BROADCAST NEWS BING VOICE SEARCH (SENTENCE ERROR RATES) GOOGLE VOICE INPUT 5, (22 5,870 H) YOUTUBE 1, DNN = deep neural networks (Hinton et al. 2012)
31 Zusammenfassung Spracherkennung (ASR) zentrales Problem in der Computerlinguistik. Klassischer Ansatz verwendet bekannte Bausteine: n-gramm-sprachmodelle Hidden Markov Models mit stetigen Ausgabew. Viele Algorithmen direkt einsetzbar. Topaktuell: Tiefe neuronale Netze statt HMM.
The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationAlgebra. Übungsblatt 12 (Lösungen)
Fakultät für Mathematik Sommersemester 2017 JProf Dr Christian Lehn Dr Alberto Castaño Domínguez Algebra Übungsblatt 12 (Lösungen) Aufgabe 1 Berechnen Sie Minimalpolynome f i, i = 1,, 4, der folgenden
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationD-optimally Lack-of-Fit-Test-efficient Designs and Related Simple Designs
AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 3&4, 245 253 D-optimally Lack-of-Fit-Test-efficient Designs and Related Simple Designs Wolfgang Bischoff Catholic University ichstätt-ingolstadt,
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationOrdinals and Cardinals: Basic set-theoretic techniques in logic
Ordinals and Cardinals: Basic set-theoretic techniques in logic Benedikt Löwe Universiteit van Amsterdam Grzegorz Plebanek Uniwersytet Wroc lawski ESSLLI 2011, Ljubljana, Slovenia This course is a foundational
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke
More informationHidden Markov Models
Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns
More informationOliver Kullmann Computer Science Department Swansea University. MRes Seminar Swansea, November 17, 2008
Computer Science Department Swansea University MRes Seminar Swansea, November 17, 2008 Introduction In this lecture some fundamental aspects of set theory and related to ideals (and their existence) are
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationGrundlagen Fernerkundung - 12
Grundlagen Fernerkundung - 12 Image classification GEO123.1, FS2015 Michael Schaepman, Rogier de Jong, Reik Leiterer 5/7/15 Page 1 5/7/15 Grundlagen Fernerkundung FS2011 Page 2 5/7/15 Grundlagen Fernerkundung
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationAGILE. Color available - A. Black White Chrome
AGILE AGILE-Serie sind für den Einzelhandel, Ausstellungen, Ladengeschäft, Kunstgalerien, Museen und auch in der kommerziellen Anwendung konzipiert. AGILE Wallwasher sind für Neigungswinkel von Grad und
More informationSpeech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems.
CS 294-5: Statistical Natural Language Processing Speech Recognition Lecture 20: 11/22/05 Slides directly from Dan Jurafsky, indirectly many others Speech Recognition Overview: Demo Phonetics Articulatory
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationHidden Markov Models
Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationBiophysics of Macromolecules
Biophysics of Macromolecules Lecture 11: Dynamic Force Spectroscopy Rädler/Lipfert SS 2014 - Forced Ligand-Receptor Unbinding - Bell-Evans Theory 22. Mai. 2014 AFM experiments with single molecules custom-built
More information1. Einleitung. 1.1 Organisatorisches. Ziel der Vorlesung: Einführung in die Methoden der Ökonometrie. Voraussetzungen: Deskriptive Statistik
1. Einleitung 1.1 Organisatorisches Ziel der Vorlesung: Einführung in die Methoden der Ökonometrie Voraussetzungen: Deskriptive Statistik Wahrscheinlichkeitsrechnung und schließende Statistik Fortgeschrittene
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationBayesian D-optimal Design
Bayesian D-optimal Design Susanne Zaglauer, Michael Deflorian Abstract D-optimal and model based experimental designs are often criticised because of their dependency to the statistical model and the lac
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationAlgebra. Übungsblatt 10 (Lösungen)
Fakultät für Mathematik Sommersemester 2017 JProf. Dr. Christian Lehn Dr. Alberto Castaño Domínguez Algebra Übungsblatt 10 (Lösungen) Aufgabe 1. Es sei k ein Körper. Man zeige, dass es in k[x] unendlich
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationSpeech Recognition HMM
Speech Recognition HMM Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz FIT BUT Brno Speech Recognition HMM Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/38 Agenda Recap variability
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationLecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationWavelets for Computer Graphics. AK Computergrafik WS 2005/06. Markus Grabner
Wavelets for Computer Graphics AK Computergrafik WS 2005/06 Markus Grabner 1/49 Content Introduction Simple example (Haar wavelet basis) Mathematical background Image operations Other useful properties
More informationSS BMMM01 Basismodul Mathematics/Methods Block 1: Mathematics for Economists. Prüfer: Prof. Dr.
SS 2018 02.06.2018 1289BMMM01 Basismodul Mathematics/Methods Block 1: Mathematics for Economists Prüfer: Prof. Dr. Rainer Dyckerhoff Bitte füllen Sie die nachfolgende Zeile aus! Matrikelnummer (student
More informationAlgebra. Übungsblatt 8 (Lösungen) m = a i m i, m = i=1
Fakultät für Mathematik Sommersemester 2017 JProf. Dr. Christian Lehn Dr. Alberto Castaño Domínguez Algebra Übungsblatt 8 (Lösungen) Aufgabe 1. Es seien R ein Ring, m R ein maximales Ideal und M ein R-Modul.
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More information1. Positive and regular linear operators.
1. Positive and regular linear operators. Objekttyp: Chapter Zeitschrift: L'Enseignement Mathématique Band (Jahr): 19 (1973) Heft 3-4: L'ENSEIGNEMENT MATHÉMATIQUE PDF erstellt am: 1.0.018 Nutzungsbedingungen
More informationNumerical Methods of Electromagnetic Field Theory II (NFT II) Numerische Methoden der Elektromagnetischen Feldtheorie II (NFT II) /
umerical Methods of lectromagnetic Field Theory II (FT II) umerische Methoden der lektromagnetischen Feldtheorie II (FT II) / 7th Lecture / 7. Vorlesung Dr.-Ing. René Marklein marklein@uni-kassel.de http://www.tet.e-technik.uni-kassel.de
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationComputer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM
Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationOrganische Chemie IV: Organische Photochemie
Organische Chemie IV: Organische Photochemie Wintersemester 2015/16 Technische Universität München Klausur am 19.02.2016 Name, Vorname... Matrikel-Nr.... (Druckbuchstaben) geboren am... in...... (Eigenhändige
More informationChapter 9 Automatic Speech Recognition DRAFT
P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.
More informationOrganische Chemie IV: Organische Photochemie
Organische Chemie IV: Organische Photochemie Wintersemester 2014/15 Technische Universität München Klausur am 05.02.2015 Name, Vorname... Matrikel-Nr.... (Druckbuchstaben) geboren am... in...... (Eigenhändige
More informationT Automatic Speech Recognition: From Theory to Practice
Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 20, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University
More informationHMM: Parameter Estimation
I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationCounterexamples in the Work of Karl Weierstraß
Counterexamples in the Work of Karl Weierstraß Tom Archibald Dept. of Mathematics Simon Fraser University tarchi@sfu.ca Weierstraß 200, BBAW, Oct. 31, 2015 1 / 22 Outline 1 Introduction 2 The Dirichlet
More information12. Lecture Stochastic Optimization
Soft Control (AT 3, RMA) 12. Lecture Stochastic Optimization Differential Evolution 12. Structure of the lecture 1. Soft control: the definition and limitations, basics of expert" systems 2. Knowledge
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationCovariograms of convex bodies in the plane: A remark on Nagel s theorem
Elem. Math. 57 (00) 61 65 0013-6018/0/00061-5 $ 1.50+0.0/0 c Birkhäuser Verlag, Basel, 00 Elemente der Mathematik Covariograms of convex bodies in the plane: A remark on Nagel s theorem Daniel Neuenschwander
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics 4. Monte Carlo Methods Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Monte Carlo Method Any method which
More informationCS 136 Lecture 5 Acoustic modeling Phoneme modeling
+ September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationOrganische Chemie IV: Organische Photochemie
rganische Chemie IV: rganische Photochemie Sommersemester 2006 Technische Universität München Klausur am 04.08.2006 ame; Vorname... Matrikel-r.... (Druckbuchstaben) geboren am... in...... (Eigenhändige
More informationHidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1
Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position
More informationPrimzahltests und das Faktorisierungsproblem
Primzahltests und das Faktorisierungsproblem Ausgewählte Folien zur Vorlesung Wintersemester 2007/2008 Dozent: Prof. Dr. J. Rothe Heinrich-Heine-Universität Düsseldorf http://ccc.cs.uni-duesseldorf.de/
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More information1 3 4 5 6 7 8 9 10 11 12 13 Convolutions in more detail material for this part of the lecture is taken mainly from the theano tutorial: http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationDeep Learning for Automatic Speech Recognition Part I
Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech
More informationRobust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen
Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More informationGraphical Models Seminar
Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013
More informationQuantum physics from coarse grained classical probabilities
Quantum physics from coarse grained classical probabilities p z 0 x ψ y what is an atom? quantum mechanics : isolated object quantum field theory : excitation of complicated vacuum classical statistics
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationÜbungen zur Quantenmechanik (T2)
Arnold Sommerfeld Center LudwigMaximiliansUniversität München Prof Dr Stefan Hofmann Wintersemester 08/9 Übungen zur Quantenmechanik (T) Übungsblatt, Besprechung vom 0 40 Aufgabe Impuls Zeigen Sie für
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationObservable Operator Models
AUSTRIAN JOURNAL OF STATISTICS Volume 36 (2007), Number 1, 41 52 Observable Operator Models Ilona Spanczér 1 Dept. of Mathematics, Budapest University of Technology and Economics Abstract: This paper describes
More informationASR using Hidden Markov Model : A tutorial
ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More information