Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach
|
|
- Lily Porter
- 6 years ago
- Views:
Transcription
1 Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach Hao Tang Toyota Technological Institute at Chicago May 7, 2012 Joint work with Joseph Keshet and Karen Livescu To appear in ACL 2012
2 Outline Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
3 Problem: Pronunciation Variation probably
4 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/
5 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/ [p r aa b iy] [p r aa l iy] [p r ay] [p ow ih] [p aa iy]
6 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/ [p r aa b iy] [p r aa l iy] [p r ay] [p ow ih] [p aa iy]
7 Lexical Access: Definition [p r aa l iy]?
8 Lexical Access: Definition [p r aa l iy]? This is called Lexical Access.
9 Lexical Access: How hard is this task? In Switchboard, fewer than half of the word tokens are pronounced canonically.
10 Lexical Access: Related works Model Experiments on a subset of Switchboard. ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% Our approach you ll see...
11 Lexical Access: Goal [p r aa l iy] probably
12 Lexical Access: Goal Find a function f : P V such that w = f (p), where w word probably p sequence of sub-word units [p r aa l iy] P set of all sequence of sub-word units V vocabulary
13 Model We model f as a linear function (in φ): w = f (p) = argmax θ φ(p, w). w V Note: Linearity is not so bad, because φ can be arbitrarily non-linear. For example, φ(p, w) can be the Levenshtein distance between p and the canonical pronunciation of w.
14 Learning: Goal Given p = [pcl p r aa bcl b l iy], we want to find θ such that θ φ(p, probably) good guy margin score θ φ(p, probable) θ φ(p, problem). bad guys
15 Learning: Algorithm (p 1, w 1 ) (p 2, w 2 ) (p 3, w 3 ) (p t, w t ) θ 0 θ 1 θ 2... θ t 1: for t = 1,... do 2: Sample (p t, w t ) from S 3: ŵ = max w V [ 1wi w θ φ(p t, w t ) + θ φ(p t, w) ] 4: φ t = φ(p t, w t ) φ(p t, ŵ) 5: 6: end for { } α t = min 1 λ, 1 wt w θ φ t φ t θ t+1 = θ t + α t φ t η t = 1 λt θ t+1 = (1 λη t ) θ t + η t φ t
16 Learning: Algorithm Two Algorithms Passive Aggressive (PA) algorithm Pegasos
17 Learning: Algorithm Two Algorithms Passive Aggressive (PA) algorithm Pegasos Both have good theoretical guarantees. We happily treat them as black boxes.
18 Features w = f (p) = argmax θ φ(p, w) w V We know how to learn θ. Let s talk about φ.
19 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
20 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
21 Features Given p = [pcl p r aa bcl b l iy], we want to design φ such that θ φ(p, probably) good guy margin score θ φ(p, probable) θ φ(p, problem). bad guys
22 Dictionary Feature Function Given a pronunciation dictionary:. privacy private pro probably problem. pcl p r ay1 ay2 v ax s iy pcl p r ay1 ay2 v ax tcl t pcl p r ow1 ow2 pcl p r aa bcl b ax bcl b l iy pcl p r aa bcl b l ax m When you see p = [pcl p r aa bcl b ax bcl b l iy], intuitively, you want θ φ(p, probably) > θ φ(p, w ) for w probably.
23 Dictionary Feature Function Define the dictionary feature function as φ dict (p, w) = 1 p pron(w), where pron(w) is the set of baseforms of w in the dictionary.
24 Dictionary Feature Function Define the dictionary feature function as φ dict (p, w) = 1 p pron(w), where pron(w) is the set of baseforms of w in the dictionary. Then for the previous example, θ dict φ dict (p, probably) }{{} =1 for w probably when θ dict > 0. > θ dict φ dict (p, w ) }{{} =0
25 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
26 Length Feature Functions Suppose we have w = probably p = [pcl p r aa bcl b l iy] pron(w) = /pcl p r aa bcl b ax bcl b l iy/ We want to see how the length of the surface form deviates from the baseform. In this case l = 3.
27 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b
28 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b Now φ 3 l< 2 (p, probably) = 1. But pron(academic) = /ae kcl k ax dcl d eh m ax kcl k/, φ 3 l< 2 (p, academic) = 1.
29 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b Now φ 3 l< 2 (p, probably) = 1. But pron(academic) = /ae kcl k ax dcl d eh m ax kcl k/, φ 3 l< 2 (p, academic) = 1. We want it to be able to count deletions/insertions for different words.
30 Length Feature Functions e probably = a 0.. probable 0 probably 1 problem 0.. zero 0
31 Length Feature Functions 1 3 l< 2 e probably = a 0.. probable 0 probably 1 problem 0.. zero 0
32 Length Feature Functions The length feature function is defined as φ a l b (p, w) = 1 a l b e w, where l = p v for some v pron(w) and e wi = w w i 1 0 w i 1. w i w V 0
33 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
34 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word?
35 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,...
36 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,... What if /bcl b/ occurs twice?
37 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,... What if /bcl b/ occurs twice? probably? baseball? basketball?
38 TF-IDF Feature Functions The term (sub-word unit) frequency is defined as TF u (p) = p u u=pi:i+u 1. p u + 1 i=1 Suppose p = [p r aa l iy]. Then TF /l iy/ (p) = 1 4. Intuitively, if a sub-word unit has a high TF, then it is more discriminative.
39 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word?
40 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,...
41 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,... What if /z uw/ occurs?
42 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,... What if /z uw/ occurs? zoo? zoology?
43 TF-IDF Feature Functions The inverse document (word) frequency is defined as IDF u = log V V u, where V u = {w V (p, w) S, u p}. Intuitively, if a sub-word unit is found in a small, specific set of words, then it is more discriminative.
44 TF-IDF Feature Functions The final TF-IDF feature function for sub-word unit u is defined as φ u (p, w) = (TF u (p) IDF u ) e w. This feature function is borrowed from [Zweig et al., 2010].
45 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
46 Phonetic Alignment Feature Functions Given and pron(probably) p = [p r aa l iy] /pcl p r aa bcl b l iy/ /pcl p r aa bcl b ax bcl b l iy/ we already have dictionary feature function. Can we do something fuzzier?
47 Phonetic Alignment Feature Functions Alignment 1 p r aa l iy pcl p r aa bcl b l iy Alignment 2 p r aa l iy pcl p r aa bcl b ax bcl b l iy
48 Phonetic Alignment Feature Functions Turn these p r aa l iy pcl p r aa bcl b l iy p r aa l iy pcl p r aa bcl b ax bcl b l iy into this pcl 2 p p 2 r r 2 aa aa 2 bcl 3 b 3 ax 1
49 Phonetic Alignment Feature Functions Each of those counts, normalized by some factor, is our feature. φ ph-align (p, w) = pcl 2/19 p p 1 r r 1 bcl 3/19. As a sanity check, we can expect to see high weights for l l and iy iy, or more generally, p p for p P. We can also expect to see a high weight for t dx, but a low weight for, say, aa kcl.
50 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work
51 Articulatory Feature Functions: Alignment Similarly, we can define alignment feature functions on the articulatory level. φ artic-align (p, w) = lip-loc-lab lip-loc-den 0.5 lip-open-clo lip-open-wide 0.1 tongue-tip-den tongue-tip-alv 0.3 vel-clo vel-open 0.2. Note: The value of the articulatory variables are inferred not observed.
52 Articulatory Feature Functions: Log-likelihood We also include the log-likelihood of the alignment as a feature, where φ LL (p, w) = L(p, w) h k L(p, w) h, k log-likelihood shift scale
53 Articulatory Feature Functions: Asynchrony Define the asynchrony among articulatory variables feature functions as φ a async(f1,f 2 )<b(p, w) = 1 a async(f1,f 2 )<b, where F 1 and F 2 sets of articulatory variables async(f 1, F 2 ) the asynchrony between F 1 and F 2 This can help in examples such as sense /s eh n s/ [s eh n t s]
54 Features: Big picture 1 p pron(w) φ(p, w) = 1 a l<b e a. 1 a l<b e zero TF u (p)idf u e a. TF u (p)idf u e zero pcl p p r r bcl. # of ranges V # of sub-word units V ( P + 1) 2 1
55 Features: Big picture φ(p, w) = lip-loc-lab lip-loc-den lip-open-clo lip-open-wide tongue-tip-den tongue-tip-alv vel-clo vel-open. L(p, w) h k 1 a async(tongue tip,tongue body)<b 1 a async(lip,tongue)<b. 7 i=1 F i 2 # of ranges # of combinations
56 Experiments: Setting dataset lexicon total words Switchboard 3328 words 2893 words ranges (, 3), ( 3, 2), ( 2, 1),..., (2, 3), (3, ) asynchrony tongue tip and tongue body lip and tongue lip, tongue and glottis, velum
57 Experiments: Comparing with others Split 2893 words into 2492-word training set, 165-word development set, and 236-word test set. Model ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% Our approach 15.2%
58 Experiments: Comparing learning methods 5-fold cross-validation for different learning methods ER lex lex lev CRF DP+ PA DP+ Pegasos DP+ DP+: dictionary, length, phone bigram TF-IDF, phonetic alignment.
59 Experiments: Comparing learning methods Algorithm CRF PA and Pegasos # of non-zero entries in θ 4,000, ,000 Time for each epoch 45 min 15 min
60 Experiments: Feature selection 5-fold cross-validation for different feature combinations ER PA p2 PA DP PA p2 DP PA p2 DP length PA DP+ PA ALL
61 Interesting Weights θ p pron(w) θ p p θ t dx θ oy1 oy n θ oy2 oy n θ n r θ f kcl θ l< 3 for probably θ 3 l< 2 for probably θ 2 l< 1 for probably θ 1 l<0 for probably
62 Conclusion We have a discriminative model. We use Passive-Aggressive or Pegasos to learn it. We have lots of features: Dictionary, Length, Phonetic alignment, Alignment of articulators, Asynchrony among articulators. It achieves good results.
63 Future Work Learning Kernel Direct loss minimization Learn feature weights and alignments jointly Features Features that span across articulatory variables Context-sensitive features from alignments Word Sequences Acoustics
64 Reference K. Livescu Feature-based Pronunciation Modeling for Automatic Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology, P. Jyothi, K. Livescu, and E. Fosler-Lussier Lexical access experiments with context-dependent articulatory feature-based models. In Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), G. Zweig, P. Nguyen, and A. Acero Continuous speech recognition with a TF-IDF acoustic model. In Proc. Interspeech, 2010.
65 [th ae ng kcl k] [y uw]
66 Learning Suppose (p, w) is drawn from the distribution ρ. The goal is to find a θ that minimizes the expected zero-one loss, θ [ ] = argmin E (p,w) ρ 1w f (p). θ Let S = {(p 1, w 1 ),..., (p m, w m )}. We try optimize where θ = argmin θ λ 2 θ m l(θ; p i, w i ), i=1 l(θ; p i, w i ) = 1 f (pi ) w i. We cannot optimize zero-one loss directly. A common trick is to optimize the hinge loss, [ ] l(θ; p i, w i ) = max 1 wi w θ φ(p i, w i ) + θ φ(p i, w). w V
67 Passive-Aggressive Algorithm We can also optimize θ t+1 = argmin θ 1 2 θ θ t 2 2 s.t. w V \ w t θ φ(p i, w t ) θ φ(p i, w) 0. This is also hard to optimize, so we optimize this instead, θ t+1 = argmin θ 1 2 θ θ t 2 2 s.t. θ φ(p i, w t ) θ φ(p i, ŵ) 0, where ŵ = argmax w V\w t θ φ(p t, w).
68 Phonetic Alignment Feature Functions Given p, q P { }, we encode p and q with two four tuples (s 1, s 2, s 3, s 4 ) and (t 1, t 2, t 3, t 4 ), which represents consonant place consonant manner vowel place vowel manner. Define the similarity between p and q as { 1, if p = q = ; s(p, q) = 4 i=1 1 s i =t i, otherwise, and run dynamic programming.
69 Phonetic Alignment Feature Functions The alignment feature function for p q, for p, q P { }, is defined as, φ p q (p, w) = 1 Z p K w L k 1 ak,i =p,b k,i =q, k=1 i=1 where K w = pron(w), L k is the length of the k-th alignment, and Z p = { Kw Lk k=1 i=1 1 a k,i =p, if p P; p K w, if p =.
70 Articulatory Feature Functions Let F be the set of articulatory variables that consists of tongue tip location tongue tip opening tongue body location tongue body opening lip opening glottis velum
71 Articulatory Feature Functions Given p, q F, for F F, the feature function for articulatory alignment is defined as φ p q (p, w) = 1 L L i=1 1 ai =p,b i =q
72 Articulatory Feature Functions voicing s s eh n n n s s s s * * * * nasality s s eh n n n s s s s * * * tongue body s s eh eh eh n n s s s u u u p p u u u u u tongue tip s s eh eh eh n n s s s cr cr cr m m cl cl cr cr cr surface s s eh eh n eh n n t s s s asynchrony 1 1 1
73 Articulatory Feature Functions For F h, F k F, the asynchrony between F h and F k is defined as async(f h, F k ) = 1 L L (t h,i t k,i ) i=1 More generally, for F 1, F 2 F, the asynchrony between F 1 and F 2 is defined as async(f 1, F 2 ) = 1 L 1 t h,i 1 t k,i L F 1 F 2 i=1 F h F 1 F k F 2 Define the asynchrony among articulatory variables feature functions as φ a async(f1,f 2 ) b(p, w) = 1 a async(f1,f 2 ) b
74 Experiments Model ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% CRF/DP+ 21.5% PA/DP+ 15.2% Pegasos/DP+ 14.8% PA/ALL 15.2% DP+ ALL dictionary, length, phone bigram TF-IDF, phonetic alignment DP+, alignment for articulatory variables, asynchrony between articulators, DBN log-likelihood
Object Tracking and Asynchrony in Audio- Visual Speech Recognition
Object Tracking and Asynchrony in Audio- Visual Speech Recognition Mark Hasegawa-Johnson AIVR Seminar August 31, 2006 AVICAR is thanks to: Bowon Lee, Ming Liu, Camille Goudeseune, Suketu Kamdar, Carl Press,
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationT Automatic Speech Recognition: From Theory to Practice
Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 20, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationComputer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM
Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationGeoffrey Zweig May 7, 2009
Geoffrey Zweig May 7, 2009 Taxonomy of LID Techniques LID Acoustic Scores Derived LM Vector space model GMM GMM Tokenization Parallel Phone Rec + LM Vectors of phone LM stats [Carrasquillo et. al. 02],
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION
Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationTuesday, August 26, 14. Articulatory Phonology
Articulatory Phonology Problem: Two Incompatible Descriptions of Speech Phonological sequence of discrete symbols from a small inventory that recombine to form different words Physical continuous, context-dependent
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationBayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan
Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationAlgorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Speech Signals Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Maximum Entropy Models Improving on N-Grams? N-grams don t combine multiple sources of evidence well P(construction
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationSpeech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems.
CS 294-5: Statistical Natural Language Processing Speech Recognition Lecture 20: 11/22/05 Slides directly from Dan Jurafsky, indirectly many others Speech Recognition Overview: Demo Phonetics Articulatory
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationNatural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationCRF Word Alignment & Noisy Channel Translation
CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationDynamical Representation
Dynamical systems Dynamical Representation I Dynamical systems originated (Newton, Leibnitz) to describe the motion of objects. I But today, they are widely employed to understand (and model) an enormous
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationArtificial Intelligence Markov Chains
Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010
More informationSeq2Seq Losses (CTC)
Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationNatural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLanguage Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016
Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationCSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams)
CSCI 5832 Natural Language Processing Jim Martin Lecture 6 1 Today 1/31 Probability Basic probability Conditional probability Bayes Rule Language Modeling (N-grams) N-gram Intro The Chain Rule Smoothing:
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationCS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning
CS188 Outline We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more! Part III:
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationGenerative Adversarial Networks
Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationDriving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning
More informationLecture 6. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom
Lecture 6 pr@ n2nsi"eis@n "mad@lin Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationFEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION
FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Machine Learning? Overview slides by ETHEM ALPAYDIN Why Learn? Learn: programming computers to optimize a performance criterion using example
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationA fast and simple algorithm for training neural probabilistic language models
A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1
More informationMachine Learning (CSE 446): Probabilistic Machine Learning
Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as
More informationGraphical Models for Automatic Speech Recognition
Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationStochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)
More informationStructured Prediction
Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML) x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the
More informationA Linguistic Feature Representation of the Speech Waveform
September 1993 LIDS-TH-2195 Research Supported By: National Science Foundation NDSEG Fellowships A Linguistic Feature Representation of the Speech Waveform Ellen Marie Eide September 1993 LIDS-TH-2195
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationMultimodal Deep Learning for Predicting Survival from Breast Cancer
Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival
More informationFunctional principal component analysis of vocal tract area functions
INTERSPEECH 17 August, 17, Stockholm, Sweden Functional principal component analysis of vocal tract area functions Jorge C. Lucero Dept. Computer Science, University of Brasília, Brasília DF 791-9, Brazil
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationOur Status in CSE 5522
Our Status in CSE 5522 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more!
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationSVM optimization and Kernel methods
Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationIntroduction To Graphical Models
Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013
More informationFeature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager
Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive
More informationCS 136 Lecture 5 Acoustic modeling Phoneme modeling
+ September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationFoundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM
Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Alex Lascarides (Slides from Alex Lascarides and Sharon Goldwater) 2 February 2019 Alex Lascarides FNLP Lecture
More informationThis is the author s version of a work that was submitted/accepted for publication in the following source:
This is the author s version of a work that was submitted/accepted for publication in the following source: Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2010) Discriminative optimisation of the figure
More information