Geoffrey Zweig May 7, 2009
|
|
- Juniper Hampton
- 5 years ago
- Views:
Transcription
1 Geoffrey Zweig May 7, 2009
2 Taxonomy of LID Techniques LID Acoustic Scores Derived LM Vector space model GMM GMM Tokenization Parallel Phone Rec + LM Vectors of phone LM stats [Carrasquillo et. al. 02], [Singer et. al. 03], [Lamel and Gauvain 94] [Zissman 96], [Li et al 07]
3 The GMM Approach English Acoustic Model French Acoustic Model Output Likeliest Tamil Acoustic Model arg max l P( l x) arg max l P( l) P( x l) arg max l P( l) i k c kl N( x i ; l k, l k )
4 A digression: What Features? MFCCs PLPs Stacked Deltas Image from Approaches to Language Identification Torres-Carrasquillo et al.
5 Stacked Deltas: What Are They? Look at cepstral differences for frames separated by 2d frames Move into the future P frames and do it again Do this k times Concatenate all the deltas together
6 Stacked Deltas Seem to Make a Difference (!) From Burget et al, Discriminative Training Techniques for Acoustic Language ID
7 Polishing the Featutres: LDA Linear Discriminant Analysis (LDA) is a dimensionality reduction method [Kumar & Andereou 98] : y T x y R p x R n p n is estimated to maximize the log-likelihood of the data {x i } under assumption: class distributions are Gaussians that have different means and same variance. And the rejected n-p dimensions are modeled with a single shared gaussian. Ghinwa Choueiter, Internship Presentation
8 Visualizing LDA Images from
9 Polishing the Features: HLDA Heteroscedastic (LDA) is a generalization of (LDA) where [Kumar & Andreou 98]: y T x y R p x R n p n is estimated to maximize the log-likelihood of the data {x i } under assumption: class distributions are Gaussians that have different means and different variances. And the rejected n-p dimensions are modeled with a single shared gaussian. Ghinwa Choueiter, Internship Presentation
10 HLDA vs LDA From Saon et al., ICASSP 2000
11 HLDA: Results From: Choueiter & Zweig, ICASSP 2008
12 End of Features Digression Now, how can we improve the GMMs themselves?
13 MMI: Further Polishing the GMMs Discriminative training approach Avoids putting one classes gaussians on top of another s Instead tries to maximize the information between features and class labels, modulo the model parameters Full discussion in last lecture
14 Maximum Mutual Information Training X: Y: How much does knowing X tell about Y? MI ( X ; Y ) xvals yvals P( X xval, Y yval)log P( X P( X xval, Y xval) P( Y yval) yval)
15 Two Cases to Consider ) ( ) ( ), ( )log, ( ) ; ( y Y P x X P y Y x X P y Y x X P Y X MI y x X and Y are independent 0 ) ( ) ( ) ( ) ( )log ( ) ( ) ; ( y Y P x X P y Y P x X P y Y P x X P Y X MI y x X determines Y ) ( ) ; ( Y H Y X MI
16 Language and Acoustic Sequences a l a P l P a l P a l P A L MI ) ( ) ( ), ( )log, ( ) ; ( D is training data. Approximate by: ) ( ) ( ) ( ) ( log ) ( ) ( ), ( log ) ; ( a P l P l a P l P a P l P a l P A L MI D D ) ( ) ( ) ( log ) ( ) ( log r a P r P l a P a P l a P languages r D D Key idea: P(*) is a parametric probability distribution Estimate the parameters of P to maximize mutual information [Bahl et al. 1986]
17 MMI Training for LID Compute the sufficient statistics for the GMM of a language using only data from that language ( numerator statistics ) Do the same, but allow all data to match all GMMs ( denominator statistics ) Update the means and variances, using differences between these quantities Data Samples GMMs
18 MMI Updates For HMMs, the mean and variance update eq [Povey & Woodland 2000]:
19 MMI Results From Choueiter & Zweig, ICASSP 2008
20 Phone Recognition + Language Model (PRLM) p ih n s probably English k r p s t probably Czech Simple HMMs 5/14 Language Models 4/30 After Zissman 1996
21 But why use English phones? Parallel PRLM (PPRLM) Same methods multiple times After Zissman 1996
22 What to do the Classification With? Averaging initially used. What if we also want to use acoustic scores or other information sources? Why not: Maximum entropy model? Decision tree? SVM? Not clear what is best for combining both LM and AM features (Something a project could explore)
23 But why use Phones at all? - Gaussian Tokenizer A tokenizer is a GMM that generates sequences of indices (one index/time frame). Each index corresponds to mixture component with highest likelihood. Tokenizer used to generate index sequences for each language. Index sequences used to generate index LMs for each language.
24 Gaussian Tokenizer Language GMM lang1 utt lang2 utt Adapted from Choueiter Internship presentation
25 Training with agaussian Tokenizer Language-Independent Tokenizer: AR-utts BP- utts VI-utts. G AR LM training data BP LM training data VI LM training data Adapted from Choueiter Internship presentation
26 Testing with a Gaussian Tokenizer Language-Independent Tokenizer: AR LM utts G Index sequence BP LM. Pick Max VI LM Adapted from Choueiter Internship presentation
27 Parallel Gaussian Tokenization Analogous to Parallel-Phone Recognition + Language Modeling (PPRLM) Now using gaussian tokenization rather than phonetic tokenization Note the difference in scale between gaussian tokenization (100/second) and phonetic tokenization (20/second) Torres-Carrasquillo et al., 2002 reports no change from smoothing, compressing, etc.
28 A Digression: Information Retrieval A classical problem is to retrieve documents that are relevant to a query We know (kind of) how to do this What if we pretend that the training data we have for a language is a document? And we pretend that some sample data we want to classify is a query? Can we look up the document (language) that the query is most related to? Let s look at a common IR method: vector-space lookup with TF-IDF weightings.
29 Term-document count matrices Consider the number of occurrences of a term in a document: Each document is a count vector in N v : a column below Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser Adapted from Manning & Raghavan, CS276
30 IDF: How important is a Word? A word that occurs everywhere (like the ) doesn t tell you much IDF t log { d D : t d} Weight a term t by the log of the fraction of documents that have it IDF Antony 1.1 Brutus 0.69 Caesar 0.18 Calpurnia 1.8 Cleopatra 1.8 mercy 0.18 worser 0.4
31 TF: What words are in a document? Want more frequent words in a document to count more But also want to normalize by total document size So use term frequency Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser
32 The TF-IDF Document Vectors The value of a word in a document is now TF*IDF Each document is a vector Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser
33 A Query Cleopatra Calpurnia Represent this as a document too, and find the document(s) in the collection that are closest But what does closest mean? TF IDF FTIDF Antony Brutus Caesar Calpurnia Cleopatra mercy worser
34 Cosine Distance Return the document with the smallest angle from the query Or maximum cosine-of-angle Adapted from Manning & Raghavan, CS276
35 cosine(query,document) Dot product q d cos( q, d ) q d Unit vectors q d q d V i 1 q V i 1 2 i q d i i V i 1 d 2 i q i is the tf-idf weight of term i in the query d i is the tf-idf weight of term i in the document cos(q,d) is the cosine similarity of q and d or, equivalently, the cosine of the angle between q and d. Adapted from Manning & Raghavan, CS276
36 A More Sophisticated Approach Li, Ma &Lee: A Vector Space Modeling Approach to Spoken Language Identification Do phone decoding (use a universal phone set) Extract Phone-Ngram features Weight them with something similar to TF-IDF Now each utterance is a vector Labeled with the language in training Build a classifier Tested SVMs and ANNs
37 Wouldn t All Phones Be Everywhere? Maybe But we can use phone n-grams as features Those might be more informative! Raw Phones k r ae l p eh ih t p k ao p t ao k ih eh p l k r ae Phone 2-grams kr rae ael lp peh ehih iht tp pk kao Pt tao aok kih iheh ehp pl lk kr rae
38 Vector Space Model Results Unigram, bigram and trigram phone features Various numbers of universal phone units From Li et al., A Vector Space Modeling Approach
39 Break Next Up: Speaker Identification
40 Flavors of Speaker Recognition From JHU 2002 SuperSID Final Presentation Reynolds et al.
41 Information Sources for Speaker ID From JHU 2002 SuperSID Final Presentation Reynolds et al.
42 Approaches GMMs GMMs with a Universal Background Model SVMs and Supervectors MLLR Matrices High Level Information
43 The GMM Approach: Identification arg max s P( s x) arg max s P( s) P( x s) arg max s P( s) i k c ks N( x i ; s k, s k )
44 Verification and Universal Background Models Emphasis in NIST evaluations is on verification Rather than evaluating all models, just use the supposed speaker s model and a background model The background model is also used to estimate speaker models via adaptation
45 Creating UBMs From Reynolds et al., Speaker Verification using Adapted GMMs
46 MAP Adapting to a Speaker From Reynolds et al., Speaker Verification using Adapted GMMs
47 MAP Equations Smoothly interpolates between generic and speaker-specific parameters From Reynolds et al., Speaker Verification using Adapted GMMs
48 Super Vectors and SVMs Interesting recent work on using support vector machines Builds on classical GMM approach See, e.g. Campbell et al., SVM Based Speaker Verification Using a GMM Supervector Kernel Adapted GMM representing a speaker Concatenated means: feature vector for SVM. Labeled with speaker.
49 Pictorial Idea of SVMs (1) Map input into a high dimensional space With enough dimensions, the classes are likely to be separable with a hyperplane From
50 Pictorial Idea of SVMs (2) In this high dimensional space, the best hyperplane is the one with the maximum margin From K.K. Chin, Support Vector Machines applied to Speech Pattern Classification
51 Pictorial Idea of SVMs (3) Misclassifications allowed but penalized. (Soft margin)
52 SVM Classification f ( x) L i 1 i tik( x, xi ) d L i 1 i t i 0 and i 0 x i is a support vector or labeled example t i is its class: +1 or -1 K is a kernel function * Key idea is that a dot product in a high dimensional space can (sometimes) be implicitly represented as a normal function of two points in the original space s learned to maximize the soft margin
53 The Kernel Gaussian weight From Campbell et al., SVM Based Speaker Verification
54 Using SVMs Training: Create supervectors for background (non-target) speakers. (Label them b ) Create supervectors for each example of a particular speaker, s. (Label them s ). Train an SVM that differentiates s from b. Do this for each speaker. Test (verification): Someone claims to be s. Make a supervector from the sample. Run the s vs b SVM and see what it says
55 Using Speaker Adaptation We have been using MFCCs These are good for speech recognition Where we want to remove speaker variability Why should they be good for speaker ID Where we want to enhance and recognize speaker variability? What captures speaker-specific information? Parameters of a speaker-adaptation scheme! E.g. MLLR First tested by Stolcke et al., MLLR Transforms as Features in Speaker Recognition (2003)
56 Recall MLLR ' T T A (1 ) New mean is linear transformation of old An offset is added to the old mean as well Transformation matrix chosen to maximize the likelihood of the adaptation data under the transformed model One transformation (e.g. 39x39) shared by many gaussians (e.g. 1000s) See, e.g., Leggetter & Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Similar transforms possible for covariance matrix
57 MLLR Picture New data New means are a linear transform of the old ones Old data modeled by some gaussians
58 MLLR for Speaker Recognition Compute the MLLR Matrix In practice several matrices may be used for different phonetic classes Concatenate all the numbers into a giant feature vector Label it with the speaker Train a SVM classifier Stolcke et al used the straight dot-product kernel
59 MLLR for SID Results From Stolcke et al., MLLR Transforms as Features in Speaker Recognition
60 Moving up the Food Chain From JHU 2002 SuperSID Final Presentation Reynolds et al.
61 Pitch Pitch and pitch tracks should vary from person to person Image from
62 Prosody Rate of Speech Number of words per second Number of phones per second Number of long pauses Number of short pauses Average voiced segment lengths Average unvoiced segment lengths
63 Phone N-grams Analogous to Langauge ID, Except the languages are Speakers Can also train a global background phone-ngram model After Zissman 1996
64 Personal Word Usage From JHU 2002 SuperSID Final Presentation Reynolds et al.
65 Concluding Remarks Numerous techniques for LID and SID Sometimes ad-hoc, e.g. PPRLM Sometimes principled, e.g. MLLR in SID Often requires fusing multiple methods E.g. combining prosody and idiolect with GMM scores in SID Is there room for a more unified theory?
66 Project Discussions Who is set already? Please see me Who is interested in: Formant project Language ID Language Modeling Text-to-Speech Speaker ID HMM / Speech Recognition Speech Coding => Please discuss
Information Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either
More informationVector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics
More informationTerm Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan
Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes
More informationCS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Collection Frequency, cf Define: The total
More informationScoring, Term Weighting and the Vector Space
Scoring, Term Weighting and the Vector Space Model Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content [J
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections
More informationTerm Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either
More informationInformation Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)
Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationScoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology
Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationRanked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus
More informationRanked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy
Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,
More informationDealing with Text Databases
Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the differences between speakers Speaking
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth... ANTHONY BRUTUS CAESAR CALPURNIA
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Lecture 6-2: The Vector Space Model Outline The vector space model 2 Binary incidence matrix Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth...
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationMulticlass Discriminative Training of i-vector Language Recognition
Odyssey 214: The Speaker and Language Recognition Workshop 16-19 June 214, Joensuu, Finland Multiclass Discriminative Training of i-vector Language Recognition Alan McCree Human Language Technology Center
More informationSupport Vector Machines and Speaker Verification
1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationLecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 5: Term Weighting and Ranking Acknowledgment: Some slides in this lecture are adapted from Chris Manning (Stanford) and Doug Oard (Maryland) Lecture Plan Skip for
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 6: Scoring, term weighting, the vector space model Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics,
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationFront-End Factor Analysis For Speaker Verification
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING Front-End Factor Analysis For Speaker Verification Najim Dehak, Patrick Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet, Abstract This
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationIBM Research Report. Training Universal Background Models for Speaker Recognition
RC24953 (W1003-002) March 1, 2010 Other IBM Research Report Training Universal Bacground Models for Speaer Recognition Mohamed Kamal Omar, Jason Pelecanos IBM Research Division Thomas J. Watson Research
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationTDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1
TDDD43 Information Retrieval Fang Wei-Kleiner ADIT/IDA Linköping University Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1 Outline 1. Introduction 2. Inverted index 3. Ranked Retrieval tf-idf
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationDocument Similarity in Information Retrieval
Document Similarity in Information Retrieval Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationAcoustic Unit Discovery (AUD) Models. Leda Sarı
Acoustic Unit Discovery (AUD) Models Leda Sarı Lucas Ondel and Lukáš Burget A summary of AUD experiments from JHU Frederick Jelinek Summer Workshop 2016 lsari2@illinois.edu November 07, 2016 1 / 23 The
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationspeaker recognition using gmm-ubm semester project presentation
speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationContent-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer
Associative Memory Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Storage Analysis Sparse Coding Implementation on a
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationLecture 4 Ranking Search Results. Many thanks to Prabhakar Raghavan for sharing most content from the following slides
Lecture 4 Ranking Search Results Many thanks to Prabhakar Raghavan for sharing most content from the following slides Recap of the previous lecture Index construction Doing sorting with limited main memory
More informationMatrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson
MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Latent
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationFEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION
FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationDiscriminative models for speech recognition
Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationMAP adaptation with SphinxTrain
MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard
More informationInformation Retrieval Using Boolean Model SEEM5680
Information Retrieval Using Boolean Model SEEM5680 1 Unstructured (text) vs. structured (database) data in 1996 2 2 Unstructured (text) vs. structured (database) data in 2009 3 3 The problem of IR Goal
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationDiagonal Priors for Full Covariance Speech Recognition
Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationNgram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer
+ Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward
More information