Geoffrey Zweig May 7, 2009

Size: px
Start display at page:

Download "Geoffrey Zweig May 7, 2009"

Transcription

1 Geoffrey Zweig May 7, 2009

2 Taxonomy of LID Techniques LID Acoustic Scores Derived LM Vector space model GMM GMM Tokenization Parallel Phone Rec + LM Vectors of phone LM stats [Carrasquillo et. al. 02], [Singer et. al. 03], [Lamel and Gauvain 94] [Zissman 96], [Li et al 07]

3 The GMM Approach English Acoustic Model French Acoustic Model Output Likeliest Tamil Acoustic Model arg max l P( l x) arg max l P( l) P( x l) arg max l P( l) i k c kl N( x i ; l k, l k )

4 A digression: What Features? MFCCs PLPs Stacked Deltas Image from Approaches to Language Identification Torres-Carrasquillo et al.

5 Stacked Deltas: What Are They? Look at cepstral differences for frames separated by 2d frames Move into the future P frames and do it again Do this k times Concatenate all the deltas together

6 Stacked Deltas Seem to Make a Difference (!) From Burget et al, Discriminative Training Techniques for Acoustic Language ID

7 Polishing the Featutres: LDA Linear Discriminant Analysis (LDA) is a dimensionality reduction method [Kumar & Andereou 98] : y T x y R p x R n p n is estimated to maximize the log-likelihood of the data {x i } under assumption: class distributions are Gaussians that have different means and same variance. And the rejected n-p dimensions are modeled with a single shared gaussian. Ghinwa Choueiter, Internship Presentation

8 Visualizing LDA Images from

9 Polishing the Features: HLDA Heteroscedastic (LDA) is a generalization of (LDA) where [Kumar & Andreou 98]: y T x y R p x R n p n is estimated to maximize the log-likelihood of the data {x i } under assumption: class distributions are Gaussians that have different means and different variances. And the rejected n-p dimensions are modeled with a single shared gaussian. Ghinwa Choueiter, Internship Presentation

10 HLDA vs LDA From Saon et al., ICASSP 2000

11 HLDA: Results From: Choueiter & Zweig, ICASSP 2008

12 End of Features Digression Now, how can we improve the GMMs themselves?

13 MMI: Further Polishing the GMMs Discriminative training approach Avoids putting one classes gaussians on top of another s Instead tries to maximize the information between features and class labels, modulo the model parameters Full discussion in last lecture

14 Maximum Mutual Information Training X: Y: How much does knowing X tell about Y? MI ( X ; Y ) xvals yvals P( X xval, Y yval)log P( X P( X xval, Y xval) P( Y yval) yval)

15 Two Cases to Consider ) ( ) ( ), ( )log, ( ) ; ( y Y P x X P y Y x X P y Y x X P Y X MI y x X and Y are independent 0 ) ( ) ( ) ( ) ( )log ( ) ( ) ; ( y Y P x X P y Y P x X P y Y P x X P Y X MI y x X determines Y ) ( ) ; ( Y H Y X MI

16 Language and Acoustic Sequences a l a P l P a l P a l P A L MI ) ( ) ( ), ( )log, ( ) ; ( D is training data. Approximate by: ) ( ) ( ) ( ) ( log ) ( ) ( ), ( log ) ; ( a P l P l a P l P a P l P a l P A L MI D D ) ( ) ( ) ( log ) ( ) ( log r a P r P l a P a P l a P languages r D D Key idea: P(*) is a parametric probability distribution Estimate the parameters of P to maximize mutual information [Bahl et al. 1986]

17 MMI Training for LID Compute the sufficient statistics for the GMM of a language using only data from that language ( numerator statistics ) Do the same, but allow all data to match all GMMs ( denominator statistics ) Update the means and variances, using differences between these quantities Data Samples GMMs

18 MMI Updates For HMMs, the mean and variance update eq [Povey & Woodland 2000]:

19 MMI Results From Choueiter & Zweig, ICASSP 2008

20 Phone Recognition + Language Model (PRLM) p ih n s probably English k r p s t probably Czech Simple HMMs 5/14 Language Models 4/30 After Zissman 1996

21 But why use English phones? Parallel PRLM (PPRLM) Same methods multiple times After Zissman 1996

22 What to do the Classification With? Averaging initially used. What if we also want to use acoustic scores or other information sources? Why not: Maximum entropy model? Decision tree? SVM? Not clear what is best for combining both LM and AM features (Something a project could explore)

23 But why use Phones at all? - Gaussian Tokenizer A tokenizer is a GMM that generates sequences of indices (one index/time frame). Each index corresponds to mixture component with highest likelihood. Tokenizer used to generate index sequences for each language. Index sequences used to generate index LMs for each language.

24 Gaussian Tokenizer Language GMM lang1 utt lang2 utt Adapted from Choueiter Internship presentation

25 Training with agaussian Tokenizer Language-Independent Tokenizer: AR-utts BP- utts VI-utts. G AR LM training data BP LM training data VI LM training data Adapted from Choueiter Internship presentation

26 Testing with a Gaussian Tokenizer Language-Independent Tokenizer: AR LM utts G Index sequence BP LM. Pick Max VI LM Adapted from Choueiter Internship presentation

27 Parallel Gaussian Tokenization Analogous to Parallel-Phone Recognition + Language Modeling (PPRLM) Now using gaussian tokenization rather than phonetic tokenization Note the difference in scale between gaussian tokenization (100/second) and phonetic tokenization (20/second) Torres-Carrasquillo et al., 2002 reports no change from smoothing, compressing, etc.

28 A Digression: Information Retrieval A classical problem is to retrieve documents that are relevant to a query We know (kind of) how to do this What if we pretend that the training data we have for a language is a document? And we pretend that some sample data we want to classify is a query? Can we look up the document (language) that the query is most related to? Let s look at a common IR method: vector-space lookup with TF-IDF weightings.

29 Term-document count matrices Consider the number of occurrences of a term in a document: Each document is a count vector in N v : a column below Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser Adapted from Manning & Raghavan, CS276

30 IDF: How important is a Word? A word that occurs everywhere (like the ) doesn t tell you much IDF t log { d D : t d} Weight a term t by the log of the fraction of documents that have it IDF Antony 1.1 Brutus 0.69 Caesar 0.18 Calpurnia 1.8 Cleopatra 1.8 mercy 0.18 worser 0.4

31 TF: What words are in a document? Want more frequent words in a document to count more But also want to normalize by total document size So use term frequency Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser

32 The TF-IDF Document Vectors The value of a word in a document is now TF*IDF Each document is a vector Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser

33 A Query Cleopatra Calpurnia Represent this as a document too, and find the document(s) in the collection that are closest But what does closest mean? TF IDF FTIDF Antony Brutus Caesar Calpurnia Cleopatra mercy worser

34 Cosine Distance Return the document with the smallest angle from the query Or maximum cosine-of-angle Adapted from Manning & Raghavan, CS276

35 cosine(query,document) Dot product q d cos( q, d ) q d Unit vectors q d q d V i 1 q V i 1 2 i q d i i V i 1 d 2 i q i is the tf-idf weight of term i in the query d i is the tf-idf weight of term i in the document cos(q,d) is the cosine similarity of q and d or, equivalently, the cosine of the angle between q and d. Adapted from Manning & Raghavan, CS276

36 A More Sophisticated Approach Li, Ma &Lee: A Vector Space Modeling Approach to Spoken Language Identification Do phone decoding (use a universal phone set) Extract Phone-Ngram features Weight them with something similar to TF-IDF Now each utterance is a vector Labeled with the language in training Build a classifier Tested SVMs and ANNs

37 Wouldn t All Phones Be Everywhere? Maybe But we can use phone n-grams as features Those might be more informative! Raw Phones k r ae l p eh ih t p k ao p t ao k ih eh p l k r ae Phone 2-grams kr rae ael lp peh ehih iht tp pk kao Pt tao aok kih iheh ehp pl lk kr rae

38 Vector Space Model Results Unigram, bigram and trigram phone features Various numbers of universal phone units From Li et al., A Vector Space Modeling Approach

39 Break Next Up: Speaker Identification

40 Flavors of Speaker Recognition From JHU 2002 SuperSID Final Presentation Reynolds et al.

41 Information Sources for Speaker ID From JHU 2002 SuperSID Final Presentation Reynolds et al.

42 Approaches GMMs GMMs with a Universal Background Model SVMs and Supervectors MLLR Matrices High Level Information

43 The GMM Approach: Identification arg max s P( s x) arg max s P( s) P( x s) arg max s P( s) i k c ks N( x i ; s k, s k )

44 Verification and Universal Background Models Emphasis in NIST evaluations is on verification Rather than evaluating all models, just use the supposed speaker s model and a background model The background model is also used to estimate speaker models via adaptation

45 Creating UBMs From Reynolds et al., Speaker Verification using Adapted GMMs

46 MAP Adapting to a Speaker From Reynolds et al., Speaker Verification using Adapted GMMs

47 MAP Equations Smoothly interpolates between generic and speaker-specific parameters From Reynolds et al., Speaker Verification using Adapted GMMs

48 Super Vectors and SVMs Interesting recent work on using support vector machines Builds on classical GMM approach See, e.g. Campbell et al., SVM Based Speaker Verification Using a GMM Supervector Kernel Adapted GMM representing a speaker Concatenated means: feature vector for SVM. Labeled with speaker.

49 Pictorial Idea of SVMs (1) Map input into a high dimensional space With enough dimensions, the classes are likely to be separable with a hyperplane From

50 Pictorial Idea of SVMs (2) In this high dimensional space, the best hyperplane is the one with the maximum margin From K.K. Chin, Support Vector Machines applied to Speech Pattern Classification

51 Pictorial Idea of SVMs (3) Misclassifications allowed but penalized. (Soft margin)

52 SVM Classification f ( x) L i 1 i tik( x, xi ) d L i 1 i t i 0 and i 0 x i is a support vector or labeled example t i is its class: +1 or -1 K is a kernel function * Key idea is that a dot product in a high dimensional space can (sometimes) be implicitly represented as a normal function of two points in the original space s learned to maximize the soft margin

53 The Kernel Gaussian weight From Campbell et al., SVM Based Speaker Verification

54 Using SVMs Training: Create supervectors for background (non-target) speakers. (Label them b ) Create supervectors for each example of a particular speaker, s. (Label them s ). Train an SVM that differentiates s from b. Do this for each speaker. Test (verification): Someone claims to be s. Make a supervector from the sample. Run the s vs b SVM and see what it says

55 Using Speaker Adaptation We have been using MFCCs These are good for speech recognition Where we want to remove speaker variability Why should they be good for speaker ID Where we want to enhance and recognize speaker variability? What captures speaker-specific information? Parameters of a speaker-adaptation scheme! E.g. MLLR First tested by Stolcke et al., MLLR Transforms as Features in Speaker Recognition (2003)

56 Recall MLLR ' T T A (1 ) New mean is linear transformation of old An offset is added to the old mean as well Transformation matrix chosen to maximize the likelihood of the adaptation data under the transformed model One transformation (e.g. 39x39) shared by many gaussians (e.g. 1000s) See, e.g., Leggetter & Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Similar transforms possible for covariance matrix

57 MLLR Picture New data New means are a linear transform of the old ones Old data modeled by some gaussians

58 MLLR for Speaker Recognition Compute the MLLR Matrix In practice several matrices may be used for different phonetic classes Concatenate all the numbers into a giant feature vector Label it with the speaker Train a SVM classifier Stolcke et al used the straight dot-product kernel

59 MLLR for SID Results From Stolcke et al., MLLR Transforms as Features in Speaker Recognition

60 Moving up the Food Chain From JHU 2002 SuperSID Final Presentation Reynolds et al.

61 Pitch Pitch and pitch tracks should vary from person to person Image from

62 Prosody Rate of Speech Number of words per second Number of phones per second Number of long pauses Number of short pauses Average voiced segment lengths Average unvoiced segment lengths

63 Phone N-grams Analogous to Langauge ID, Except the languages are Speakers Can also train a global background phone-ngram model After Zissman 1996

64 Personal Word Usage From JHU 2002 SuperSID Final Presentation Reynolds et al.

65 Concluding Remarks Numerous techniques for LID and SID Sometimes ad-hoc, e.g. PPRLM Sometimes principled, e.g. MLLR in SID Often requires fusing multiple methods E.g. combining prosody and idiolect with GMM scores in SID Is there room for a more unified theory?

66 Project Discussions Who is set already? Please see me Who is interested in: Formant project Language ID Language Modeling Text-to-Speech Speaker ID HMM / Speech Recognition Speech Coding => Please discuss

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics

More information

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Collection Frequency, cf Define: The total

More information

Scoring, Term Weighting and the Vector Space

Scoring, Term Weighting and the Vector Space Scoring, Term Weighting and the Vector Space Model Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content [J

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections

More information

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare

More information

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,

More information

Dealing with Text Databases

Dealing with Text Databases Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth... ANTHONY BRUTUS CAESAR CALPURNIA

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Outline The vector space model 2 Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth...

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Multiclass Discriminative Training of i-vector Language Recognition

Multiclass Discriminative Training of i-vector Language Recognition Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center

More information

Support Vector Machines and Speaker Verification

Support Vector Machines and Speaker Verification 1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 5: Term Weighting and Ranking Acknowledgment: Some slides in this lecture are adapted from Chris Manning (Stanford) and Doug Oard (Maryland) Lecture Plan Skip for

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 6: Scoring, term weighting, the vector space model Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics,

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Front-End Factor Analysis For Speaker Verification

Front-End Factor Analysis For Speaker Verification IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

IBM Research Report. Training Universal Background Models for Speaker Recognition

IBM Research Report. Training Universal Background Models for Speaker Recognition RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Document Similarity in Information Retrieval

Document Similarity in Information Retrieval Document Similarity in Information Retrieval Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Acoustic Unit Discovery (AUD) Models. Leda Sarı Acoustic Unit Discovery (AUD) Models Leda Sarı Lucas Ondel and Lukáš Burget A summary of AUD experiments from JHU Frederick Jelinek Summer Workshop 2016 lsari2@illinois.edu November 07, 2016 1 / 23 The

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

speaker recognition using gmm-ubm semester project presentation

speaker recognition using gmm-ubm semester project presentation speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer

Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Associative Memory Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Storage Analysis Sparse Coding Implementation on a

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Lecture 4 Ranking Search Results. Many thanks to Prabhakar Raghavan for sharing most content from the following slides

Lecture 4 Ranking Search Results. Many thanks to Prabhakar Raghavan for sharing most content from the following slides Lecture 4 Ranking Search Results Many thanks to Prabhakar Raghavan for sharing most content from the following slides Recap of the previous lecture Index construction Doing sorting with limited main memory

More information

Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson

MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Latent

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Discriminative models for speech recognition

Discriminative models for speech recognition Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

MAP adaptation with SphinxTrain

MAP adaptation with SphinxTrain MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard

More information

Information Retrieval Using Boolean Model SEEM5680

Information Retrieval Using Boolean Model SEEM5680 Information Retrieval Using Boolean Model SEEM5680 1 Unstructured (text) vs. structured (database) data in 1996 2 2 Unstructured (text) vs. structured (database) data in 2009 3 3 The problem of IR Goal

More information

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Diagonal Priors for Full Covariance Speech Recognition

Diagonal Priors for Full Covariance Speech Recognition Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information