Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach

Size: px
Start display at page:

Download "Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach"

Transcription

1 Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach Hao Tang Toyota Technological Institute at Chicago May 7, 2012 Joint work with Joseph Keshet and Karen Livescu To appear in ACL 2012

2 Outline Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

3 Problem: Pronunciation Variation probably

4 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/

5 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/ [p r aa b iy] [p r aa l iy] [p r ay] [p ow ih] [p aa iy]

6 Problem: Pronunciation Variation probably /pcl p r aa bcl b ax bcl b l iy/ [p r aa b iy] [p r aa l iy] [p r ay] [p ow ih] [p aa iy]

7 Lexical Access: Definition [p r aa l iy]?

8 Lexical Access: Definition [p r aa l iy]? This is called Lexical Access.

9 Lexical Access: How hard is this task? In Switchboard, fewer than half of the word tokens are pronounced canonically.

10 Lexical Access: Related works Model Experiments on a subset of Switchboard. ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% Our approach you ll see...

11 Lexical Access: Goal [p r aa l iy] probably

12 Lexical Access: Goal Find a function f : P V such that w = f (p), where w word probably p sequence of sub-word units [p r aa l iy] P set of all sequence of sub-word units V vocabulary

13 Model We model f as a linear function (in φ): w = f (p) = argmax θ φ(p, w). w V Note: Linearity is not so bad, because φ can be arbitrarily non-linear. For example, φ(p, w) can be the Levenshtein distance between p and the canonical pronunciation of w.

14 Learning: Goal Given p = [pcl p r aa bcl b l iy], we want to find θ such that θ φ(p, probably) good guy margin score θ φ(p, probable) θ φ(p, problem). bad guys

15 Learning: Algorithm (p 1, w 1 ) (p 2, w 2 ) (p 3, w 3 ) (p t, w t ) θ 0 θ 1 θ 2... θ t 1: for t = 1,... do 2: Sample (p t, w t ) from S 3: ŵ = max w V [ 1wi w θ φ(p t, w t ) + θ φ(p t, w) ] 4: φ t = φ(p t, w t ) φ(p t, ŵ) 5: 6: end for { } α t = min 1 λ, 1 wt w θ φ t φ t θ t+1 = θ t + α t φ t η t = 1 λt θ t+1 = (1 λη t ) θ t + η t φ t

16 Learning: Algorithm Two Algorithms Passive Aggressive (PA) algorithm Pegasos

17 Learning: Algorithm Two Algorithms Passive Aggressive (PA) algorithm Pegasos Both have good theoretical guarantees. We happily treat them as black boxes.

18 Features w = f (p) = argmax θ φ(p, w) w V We know how to learn θ. Let s talk about φ.

19 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

20 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

21 Features Given p = [pcl p r aa bcl b l iy], we want to design φ such that θ φ(p, probably) good guy margin score θ φ(p, probable) θ φ(p, problem). bad guys

22 Dictionary Feature Function Given a pronunciation dictionary:. privacy private pro probably problem. pcl p r ay1 ay2 v ax s iy pcl p r ay1 ay2 v ax tcl t pcl p r ow1 ow2 pcl p r aa bcl b ax bcl b l iy pcl p r aa bcl b l ax m When you see p = [pcl p r aa bcl b ax bcl b l iy], intuitively, you want θ φ(p, probably) > θ φ(p, w ) for w probably.

23 Dictionary Feature Function Define the dictionary feature function as φ dict (p, w) = 1 p pron(w), where pron(w) is the set of baseforms of w in the dictionary.

24 Dictionary Feature Function Define the dictionary feature function as φ dict (p, w) = 1 p pron(w), where pron(w) is the set of baseforms of w in the dictionary. Then for the previous example, θ dict φ dict (p, probably) }{{} =1 for w probably when θ dict > 0. > θ dict φ dict (p, w ) }{{} =0

25 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

26 Length Feature Functions Suppose we have w = probably p = [pcl p r aa bcl b l iy] pron(w) = /pcl p r aa bcl b ax bcl b l iy/ We want to see how the length of the surface form deviates from the baseform. In this case l = 3.

27 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b

28 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b Now φ 3 l< 2 (p, probably) = 1. But pron(academic) = /ae kcl k ax dcl d eh m ax kcl k/, φ 3 l< 2 (p, academic) = 1.

29 Length Feature Functions We can have φ a l<b (p, w) = 1 a l<b Now φ 3 l< 2 (p, probably) = 1. But pron(academic) = /ae kcl k ax dcl d eh m ax kcl k/, φ 3 l< 2 (p, academic) = 1. We want it to be able to count deletions/insertions for different words.

30 Length Feature Functions e probably = a 0.. probable 0 probably 1 problem 0.. zero 0

31 Length Feature Functions 1 3 l< 2 e probably = a 0.. probable 0 probably 1 problem 0.. zero 0

32 Length Feature Functions The length feature function is defined as φ a l b (p, w) = 1 a l b e w, where l = p v for some v pron(w) and e wi = w w i 1 0 w i 1. w i w V 0

33 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

34 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word?

35 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,...

36 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,... What if /bcl b/ occurs twice?

37 TF-IDF Feature Functions If I tell you /bcl b/ occurs at least once in the surface form, can you guess the word? about, book, ball, baseball, basketball, robot, problem, probably,... What if /bcl b/ occurs twice? probably? baseball? basketball?

38 TF-IDF Feature Functions The term (sub-word unit) frequency is defined as TF u (p) = p u u=pi:i+u 1. p u + 1 i=1 Suppose p = [p r aa l iy]. Then TF /l iy/ (p) = 1 4. Intuitively, if a sub-word unit has a high TF, then it is more discriminative.

39 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word?

40 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,...

41 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,... What if /z uw/ occurs?

42 TF-IDF Feature Functions If I tell you /ay1 ay2/ occurs in a surface form, can you guess the word? buy, cry, fly, flight, lie, light, like, I, right,... What if /z uw/ occurs? zoo? zoology?

43 TF-IDF Feature Functions The inverse document (word) frequency is defined as IDF u = log V V u, where V u = {w V (p, w) S, u p}. Intuitively, if a sub-word unit is found in a small, specific set of words, then it is more discriminative.

44 TF-IDF Feature Functions The final TF-IDF feature function for sub-word unit u is defined as φ u (p, w) = (TF u (p) IDF u ) e w. This feature function is borrowed from [Zweig et al., 2010].

45 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

46 Phonetic Alignment Feature Functions Given and pron(probably) p = [p r aa l iy] /pcl p r aa bcl b l iy/ /pcl p r aa bcl b ax bcl b l iy/ we already have dictionary feature function. Can we do something fuzzier?

47 Phonetic Alignment Feature Functions Alignment 1 p r aa l iy pcl p r aa bcl b l iy Alignment 2 p r aa l iy pcl p r aa bcl b ax bcl b l iy

48 Phonetic Alignment Feature Functions Turn these p r aa l iy pcl p r aa bcl b l iy p r aa l iy pcl p r aa bcl b ax bcl b l iy into this pcl 2 p p 2 r r 2 aa aa 2 bcl 3 b 3 ax 1

49 Phonetic Alignment Feature Functions Each of those counts, normalized by some factor, is our feature. φ ph-align (p, w) = pcl 2/19 p p 1 r r 1 bcl 3/19. As a sanity check, we can expect to see high weights for l l and iy iy, or more generally, p p for p P. We can also expect to see a high weight for t dx, but a low weight for, say, aa kcl.

50 Problem Model Features Dictionary Feature Function Length Feature Functions TF-IDF Feature Functions Phonetic Alignment Feature Functions Articulatory Feature Functions Experiments Conclusion Future Work

51 Articulatory Feature Functions: Alignment Similarly, we can define alignment feature functions on the articulatory level. φ artic-align (p, w) = lip-loc-lab lip-loc-den 0.5 lip-open-clo lip-open-wide 0.1 tongue-tip-den tongue-tip-alv 0.3 vel-clo vel-open 0.2. Note: The value of the articulatory variables are inferred not observed.

52 Articulatory Feature Functions: Log-likelihood We also include the log-likelihood of the alignment as a feature, where φ LL (p, w) = L(p, w) h k L(p, w) h, k log-likelihood shift scale

53 Articulatory Feature Functions: Asynchrony Define the asynchrony among articulatory variables feature functions as φ a async(f1,f 2 )<b(p, w) = 1 a async(f1,f 2 )<b, where F 1 and F 2 sets of articulatory variables async(f 1, F 2 ) the asynchrony between F 1 and F 2 This can help in examples such as sense /s eh n s/ [s eh n t s]

54 Features: Big picture 1 p pron(w) φ(p, w) = 1 a l<b e a. 1 a l<b e zero TF u (p)idf u e a. TF u (p)idf u e zero pcl p p r r bcl. # of ranges V # of sub-word units V ( P + 1) 2 1

55 Features: Big picture φ(p, w) = lip-loc-lab lip-loc-den lip-open-clo lip-open-wide tongue-tip-den tongue-tip-alv vel-clo vel-open. L(p, w) h k 1 a async(tongue tip,tongue body)<b 1 a async(lip,tongue)<b. 7 i=1 F i 2 # of ranges # of combinations

56 Experiments: Setting dataset lexicon total words Switchboard 3328 words 2893 words ranges (, 3), ( 3, 2), ( 2, 1),..., (2, 3), (3, ) asynchrony tongue tip and tongue body lip and tongue lip, tongue and glottis, velum

57 Experiments: Comparing with others Split 2893 words into 2492-word training set, 165-word development set, and 236-word test set. Model ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% Our approach 15.2%

58 Experiments: Comparing learning methods 5-fold cross-validation for different learning methods ER lex lex lev CRF DP+ PA DP+ Pegasos DP+ DP+: dictionary, length, phone bigram TF-IDF, phonetic alignment.

59 Experiments: Comparing learning methods Algorithm CRF PA and Pegasos # of non-zero entries in θ 4,000, ,000 Time for each epoch 45 min 15 min

60 Experiments: Feature selection 5-fold cross-validation for different feature combinations ER PA p2 PA DP PA p2 DP PA p2 DP length PA DP+ PA ALL

61 Interesting Weights θ p pron(w) θ p p θ t dx θ oy1 oy n θ oy2 oy n θ n r θ f kcl θ l< 3 for probably θ 3 l< 2 for probably θ 2 l< 1 for probably θ 1 l<0 for probably

62 Conclusion We have a discriminative model. We use Passive-Aggressive or Pegasos to learn it. We have lots of features: Dictionary, Length, Phonetic alignment, Alignment of articulators, Asynchrony among articulators. It achieves good results.

63 Future Work Learning Kernel Direct loss minimization Learn feature weights and alignments jointly Features Features that span across articulatory variables Context-sensitive features from alignments Word Sequences Acoustics

64 Reference K. Livescu Feature-based Pronunciation Modeling for Automatic Speech Recognition. Ph.D. thesis, Massachusetts Institute of Technology, P. Jyothi, K. Livescu, and E. Fosler-Lussier Lexical access experiments with context-dependent articulatory feature-based models. In Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), G. Zweig, P. Nguyen, and A. Acero Continuous speech recognition with a TF-IDF acoustic model. In Proc. Interspeech, 2010.

65 [th ae ng kcl k] [y uw]

66 Learning Suppose (p, w) is drawn from the distribution ρ. The goal is to find a θ that minimizes the expected zero-one loss, θ [ ] = argmin E (p,w) ρ 1w f (p). θ Let S = {(p 1, w 1 ),..., (p m, w m )}. We try optimize where θ = argmin θ λ 2 θ m l(θ; p i, w i ), i=1 l(θ; p i, w i ) = 1 f (pi ) w i. We cannot optimize zero-one loss directly. A common trick is to optimize the hinge loss, [ ] l(θ; p i, w i ) = max 1 wi w θ φ(p i, w i ) + θ φ(p i, w). w V

67 Passive-Aggressive Algorithm We can also optimize θ t+1 = argmin θ 1 2 θ θ t 2 2 s.t. w V \ w t θ φ(p i, w t ) θ φ(p i, w) 0. This is also hard to optimize, so we optimize this instead, θ t+1 = argmin θ 1 2 θ θ t 2 2 s.t. θ φ(p i, w t ) θ φ(p i, ŵ) 0, where ŵ = argmax w V\w t θ φ(p t, w).

68 Phonetic Alignment Feature Functions Given p, q P { }, we encode p and q with two four tuples (s 1, s 2, s 3, s 4 ) and (t 1, t 2, t 3, t 4 ), which represents consonant place consonant manner vowel place vowel manner. Define the similarity between p and q as { 1, if p = q = ; s(p, q) = 4 i=1 1 s i =t i, otherwise, and run dynamic programming.

69 Phonetic Alignment Feature Functions The alignment feature function for p q, for p, q P { }, is defined as, φ p q (p, w) = 1 Z p K w L k 1 ak,i =p,b k,i =q, k=1 i=1 where K w = pron(w), L k is the length of the k-th alignment, and Z p = { Kw Lk k=1 i=1 1 a k,i =p, if p P; p K w, if p =.

70 Articulatory Feature Functions Let F be the set of articulatory variables that consists of tongue tip location tongue tip opening tongue body location tongue body opening lip opening glottis velum

71 Articulatory Feature Functions Given p, q F, for F F, the feature function for articulatory alignment is defined as φ p q (p, w) = 1 L L i=1 1 ai =p,b i =q

72 Articulatory Feature Functions voicing s s eh n n n s s s s * * * * nasality s s eh n n n s s s s * * * tongue body s s eh eh eh n n s s s u u u p p u u u u u tongue tip s s eh eh eh n n s s s cr cr cr m m cl cl cr cr cr surface s s eh eh n eh n n t s s s asynchrony 1 1 1

73 Articulatory Feature Functions For F h, F k F, the asynchrony between F h and F k is defined as async(f h, F k ) = 1 L L (t h,i t k,i ) i=1 More generally, for F 1, F 2 F, the asynchrony between F 1 and F 2 is defined as async(f 1, F 2 ) = 1 L 1 t h,i 1 t k,i L F 1 F 2 i=1 F h F 1 F k F 2 Define the asynchrony among articulatory variables feature functions as φ a async(f1,f 2 ) b(p, w) = 1 a async(f1,f 2 ) b

74 Experiments Model ER lexicon lookup (from [Livescu, 2005]) 59.3% lexicon + Levenshtein distance 41.8% [Jyothi et al., 2011] 29.1% CRF/DP+ 21.5% PA/DP+ 15.2% Pegasos/DP+ 14.8% PA/ALL 15.2% DP+ ALL dictionary, length, phone bigram TF-IDF, phonetic alignment DP+, alignment for articulatory variables, asynchrony between articulators, DBN log-likelihood

Object Tracking and Asynchrony in Audio- Visual Speech Recognition

Object Tracking and Asynchrony in Audio- Visual Speech Recognition Object Tracking and Asynchrony in Audio- Visual Speech Recognition Mark Hasegawa-Johnson AIVR Seminar August 31, 2006 AVICAR is thanks to: Bowon Lee, Ming Liu, Camille Goudeseune, Suketu Kamdar, Carl Press,

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

T Automatic Speech Recognition: From Theory to Practice

T Automatic Speech Recognition: From Theory to Practice Automatic Speech Recognition: From Theory to Practice http://www.cis.hut.fi/opinnot// September 20, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Online Passive-Aggressive Algorithms. Tirgul 11

Online Passive-Aggressive Algorithms. Tirgul 11 Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

Geoffrey Zweig May 7, 2009

Geoffrey Zweig May 7, 2009 Geoffrey Zweig May 7, 2009 Taxonomy of LID Techniques LID Acoustic Scores Derived LM Vector space model GMM GMM Tokenization Parallel Phone Rec + LM Vectors of phone LM stats [Carrasquillo et. al. 02],

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Tuesday, August 26, 14. Articulatory Phonology

Tuesday, August 26, 14. Articulatory Phonology Articulatory Phonology Problem: Two Incompatible Descriptions of Speech Phonological sequence of discrete symbols from a small inventory that recombine to form different words Physical continuous, context-dependent

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

TnT Part of Speech Tagger

TnT Part of Speech Tagger TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

CSC321 Lecture 7 Neural language models

CSC321 Lecture 7 Neural language models CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Speech Signals Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Maximum Entropy Models Improving on N-Grams? N-grams don t combine multiple sources of evidence well P(construction

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems.

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems. CS 294-5: Statistical Natural Language Processing Speech Recognition Lecture 20: 11/22/05 Slides directly from Dan Jurafsky, indirectly many others Speech Recognition Overview: Demo Phonetics Articulatory

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Structured Prediction Theory and Algorithms

Structured Prediction Theory and Algorithms Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE

More information

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Dynamical Representation

Dynamical Representation Dynamical systems Dynamical Representation I Dynamical systems originated (Newton, Leibnitz) to describe the motion of objects. I But today, they are widely employed to understand (and model) an enormous

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Artificial Intelligence Markov Chains

Artificial Intelligence Markov Chains Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016 Language Modelling: Smoothing and Model Complexity COMP-599 Sept 14, 2016 Announcements A1 has been released Due on Wednesday, September 28th Start code for Question 4: Includes some of the package import

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

CSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams)

CSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams) CSCI 5832 Natural Language Processing Jim Martin Lecture 6 1 Today 1/31 Probability Basic probability Conditional probability Bayes Rule Language Modeling (N-grams) N-gram Intro The Chain Rule Smoothing:

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning CS188 Outline We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more! Part III:

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Wei Xu Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley.] Recap: Reasoning

More information

Lecture 6. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom

Lecture 6.  Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Lecture 6 pr@ n2nsi"eis@n "mad@lin Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Machine Learning? Overview slides by ETHEM ALPAYDIN Why Learn? Learn: programming computers to optimize a performance criterion using example

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

A fast and simple algorithm for training neural probabilistic language models

A fast and simple algorithm for training neural probabilistic language models A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1

More information

Machine Learning (CSE 446): Probabilistic Machine Learning

Machine Learning (CSE 446): Probabilistic Machine Learning Machine Learning (CSE 446): Probabilistic Machine Learning oah Smith c 2017 University of Washington nasmith@cs.washington.edu ovember 1, 2017 1 / 24 Understanding MLE y 1 MLE π^ You can think of MLE as

More information

Graphical Models for Automatic Speech Recognition

Graphical Models for Automatic Speech Recognition Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)

More information

Structured Prediction

Structured Prediction Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML) x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the

More information

A Linguistic Feature Representation of the Speech Waveform

A Linguistic Feature Representation of the Speech Waveform September 1993 LIDS-TH-2195 Research Supported By: National Science Foundation NDSEG Fellowships A Linguistic Feature Representation of the Speech Waveform Ellen Marie Eide September 1993 LIDS-TH-2195

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Functional principal component analysis of vocal tract area functions

Functional principal component analysis of vocal tract area functions INTERSPEECH 17 August, 17, Stockholm, Sweden Functional principal component analysis of vocal tract area functions Jorge C. Lucero Dept. Computer Science, University of Brasília, Brasília DF 791-9, Brazil

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Our Status in CSE 5522

Our Status in CSE 5522 Our Status in CSE 5522 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more!

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

SVM optimization and Kernel methods

SVM optimization and Kernel methods Announcements SVM optimization and Kernel methods w 4 is up. Due in a week. Kaggle is up 4/13/17 1 4/13/17 2 Outline Review SVM optimization Non-linear transformations in SVM Soft-margin SVM Goal: Find

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager

Feature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive

More information

CS 136 Lecture 5 Acoustic modeling Phoneme modeling

CS 136 Lecture 5 Acoustic modeling Phoneme modeling + September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Alex Lascarides (Slides from Alex Lascarides and Sharon Goldwater) 2 February 2019 Alex Lascarides FNLP Lecture

More information

This is the author s version of a work that was submitted/accepted for publication in the following source:

This is the author s version of a work that was submitted/accepted for publication in the following source: This is the author s version of a work that was submitted/accepted for publication in the following source: Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2010) Discriminative optimisation of the figure

More information