T Automatic Speech Recognition: From Theory to Practice

Size: px
Start display at page:

Download "T Automatic Speech Recognition: From Theory to Practice"

Transcription

1 Automatic Speech Recognition: From Theory to Practice September 20, 2004 Prof. Bryan Pellom Department of Computer Science Center for Spoken Language Research University of Colorado Automatic Speech Recognition: From Theory to Practice 1

2 Today Introduction to Speech Production and Phonetics Short Video on Spectrogram Reading Quick Review of Probability and Statistics Formulation of the Speech Recognition Problem Talk about Homework #1 Automatic Speech Recognition: From Theory to Practice 2

3 Speech Production and Phonetics Peter Ladefoged, A Course In Phonetics, Harcourt Brace Jovanovich, ISBN Excellent introductory reference to this material Automatic Speech Recognition: From Theory to Practice 3

4 Speech Production Anatomy Figure from Spoken Language Processing (Huang, Acero, Hon) Automatic Speech Recognition: From Theory to Practice 4

5 Speech Production Anatomy Vocal Tract Consists of the pharyngeal and oral cavities Articulators Components of the vocal tract which move to produce various speech sounds Include: vocal folds, velum, lips, tongue, teeth Automatic Speech Recognition: From Theory to Practice 5

6 Source-Filter Representation of Speech Production Production viewed as an acoustic filtering operation Larynx and lungs provide input or source excitation Vocal and nasal tracts act as filter. Shape the spectrum of the resulting signal Automatic Speech Recognition: From Theory to Practice 6

7 Describing Sounds The study of speech sounds and their production, classification and transcription is known as phonetics A phoneme is an abstract unit that can be used for writing a language down in a systematic or unambiguous way Sub-classifications of phonemes Vowels air passes freely through resonators Consonants air passes partially or totally obstructed in one or more places as it passes through the resonators Automatic Speech Recognition: From Theory to Practice 7

8 Time-Domain Waveform Example two plus seven is less than ten Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 8

9 Wide-Band Spectrogram frequency two plus seven is less than ten Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 time Automatic Speech Recognition: From Theory to Practice 9

10 Phonetic Alphabets Allow us to describe the primitive sounds that make up a language Each language will have a unique set of phonemes Useful for speech recognition since words can be represented by sequences of phonemes as described by a phonetic alphabet. Automatic Speech Recognition: From Theory to Practice 10

11 International Phonetic Alphabet (IPA) Phonetic representation standard which describes sounds in most/all world languages IPA last published in 1993 and updated in 1996 Issue: character set difficult to manipulate on a computer Automatic Speech Recognition: From Theory to Practice 11

12 IPA Consonants Automatic Speech Recognition: From Theory to Practice 12

13 IPA Vowels Automatic Speech Recognition: From Theory to Practice 13

14 American English Phonemes (IPA) Table from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 14

15 Alternative Phonetic Alphabets ARPAbet English only ASCII representation Phoneme units represented by 1-2 letters Similar representation used by CMU Sphinx-II recognizer, SAMPA Speech Assessment Methods Phonetic Alphabet Computer Readable representation Maps symbols of the IPA into ASCII codes Automatic Speech Recognition: From Theory to Practice 15

16 Automatic Speech Recognition: From Theory to Practice 16 CMU Sphinx-II Phonetic Symbols (silence) SIL toy OY hurt ER seizure ZH oat OW ed EH zee Z ping NG matter DX yield Y note N thee DH we W me M dud DD vee V lee L dee D two UW lick KD cheese CH hood UH key K Dub BD bits TS gee JH be B theta TH eat IY hide AY lit TD acid IX user AXR tea T it IH abide AX she SH he HH cow AW sea S bag GD ought AO read R green G hut AH lip PD fee F at AE pee P ate EY odd AA Example Phone Example Phone Example Phone

17 Example Words and Corresponding CMU Dictionary Transcriptions basement Bryan perfect speech recognize B EY S M AX N TD B R AY AX N P AXR F EH KD TD S P IY CH R EH K AX G N AY Z Automatic Speech Recognition: From Theory to Practice 17

18 Classifications of Speech Sounds Voiced vs. voiceless Voiced if vocal chords vibrate Nasal vs. Oral Nasal if air travels through nasal cavity and oral cavity closed Consonant vs. Vowel Consonants: obstruction in air stream above the glottis. The Glottis is defined as the space between the vocal cords. Lateral vs. Non-lateral Non-lateral If the air stream passes through the middle of the oral cavity (compared to along side the oral cavity) Automatic Speech Recognition: From Theory to Practice 18

19 Consonants and Vowels Consonants are characterized by: Place of articulation Manner of articulation Voicing Vowels are characterized by: lip position tongue height tongue advancement Automatic Speech Recognition: From Theory to Practice 19

20 Places of Articulation Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 20

21 Places of Articulation Biliabial Made with the two lips {P,B,M} Labio-dental Lower lip & upper front teeth {F,V} Dental Tongue tip or blade & upper front teeth {TH,DH} Alveolar Tongue tip or blade & alveolar ridge {T,D,N, } Retroflex Tongue tip & back of the alveolar ridge {R} Palato-Alveolar Tongue blade & back of the alveolar ridge {SH} Palatal Front of the tongue & hard palate {Y,ZH} Velar Back of the tongue & soft palate {K,G,NG} Automatic Speech Recognition: From Theory to Practice 21

22 Manners of Articulation Stop complete obstruction with sudden (explosive) release Nasal Airflow stopped in the oral cavity, soft palate down, airflow is through the nasal tract Fricative Articulators close together, turbulent airflow produced Automatic Speech Recognition: From Theory to Practice 22

23 Manners of Articulation Retroflex (Liquid) Tip of the tongue is curled back slightly (/r/) Lateral (Liquid) Obstruction of the air stream at a point along the center of the oral tract, with incomplete closure between one or both sides of the tongue and the roof of the mouth (/l/) Glide Vowel-like, but initial position within a syllable (/y/, /w/) Automatic Speech Recognition: From Theory to Practice 23

24 American English Consonants by Place and Manner of Articulation Place Manner Automatic Speech Recognition: From Theory to Practice 24

25 American English Unvoiced Fricatives Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 25

26 Voiced vs. Unvoiced Fricatives Voiced fricatives tend to be shorter than unvoiced fricatives Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 26

27 American English Unvoiced Stops Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 27

28 Voiced vs. Unvoiced Stops Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 28

29 Stop Consonant Formant Transitions Figure from Automatic Speech Recognition: From Theory to Practice 29

30 Nasal vs. Oral Articulation velum /M/,/N/,/NG/ Automatic Speech Recognition: From Theory to Practice 30

31 Spectrogram of Nasals Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 31

32 American English Semivowels Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 32

33 Semivowels Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 33

34 Affricates Figure from MIT Course Notes: Automatic Speech Recognition, Spring 2003 Automatic Speech Recognition: From Theory to Practice 34

35 Describing Vowels Velum Position nasal vs. non-nasal Lip Shape Rounded vs. unrounded Tongue height High, mid, low Correlated to first formant position Tongue advancement Front, central, back Correlated to second formant position Automatic Speech Recognition: From Theory to Practice 35

36 American English Vowels heed hid head had IY IY IH IH EH EH AE AE hod hawed hood who'd AA AA AO AO UH UH UW UW Automatic Speech Recognition: From Theory to Practice 36

37 Vowel Chart Tongue Advancement front central back high IY UW Tongue Height mid low IH EY EH AE AY AX OY UH AW OW AA AO Diphthongs shown in green Automatic Speech Recognition: From Theory to Practice 37

38 Second Formant Frequency, F 2 (Hz) The Vowel Space First Formant Frequency, F 1 (Hz) IY 2. IH 3. EH 4. AE 5. UH 6. AA 7. AO 8. UW 9. (non English u ) 10. AX Automatic Speech Recognition: From Theory to Practice 38

39 Coarticulation F AY AY TH TH AY AY S AY AY SH SH AY AY Notice position of 2 nd formant onset for these words: "fie, thigh, sigh, shy" Automatic Speech Recognition: From Theory to Practice 39

40 Sound Classification Summary Vowels Semivowels Consonants Nasals Stops Affricates Fricatives Automatic Speech Recognition: From Theory to Practice 40

41 Spectrogram Reading Video Speech as Eyes See It (12 minute video) video by Ron Cole and Victor Zue After hours of training: phonemes and words can be transcribed from a spectrogram alone 80-90% agreement on segments Provided insight into the speech recognition problem during the 1970 s Automatic Speech Recognition: From Theory to Practice 41

42 Review: Probability & Statistics for Speech Recognition Automatic Speech Recognition: From Theory to Practice 42

43 Relative-Frequency and Probability Relative Frequency of A : Experiment is performed N times With 4 possible outcomes: A, B, C, D N A is number of times event A occurs: P(A) N A + N B + NC + N D = Defined as the probability of event A Relative Frequency that event A would occur if an experiment was repeated many times N r( A) = N N A Automatic Speech Recognition: From Theory to Practice 43

44 Automatic Speech Recognition: From Theory to Practice 44 Probability Example (con t) N N N N N D C B A = ) ( ) ( ) ( ) ( = D r C r B r A r ) ( lim ) ( A r A P N = 1 ) ( ) ( ) ( ) ( = D P C P B P A P

45 Mutually Exclusive Events For mutually exclusive events A, B, C,, M: 0 P( A) 1 P( A) + P( B) + P( C) + L+ P( M ) = 1 P( A) = P( A) = 0 1 represents impossible event represents certain event Automatic Speech Recognition: From Theory to Practice 45

46 Joint and Conditional Probability P(AB) is the joint probability of event A and B both occurring P( AB) = lim N N P(A B) is the conditional probability of event A given that event B has occurred: P( A B) = N P( AB) P( B) Automatic Speech Recognition: From Theory to Practice 46 = AB lim N N AB N B / / N N

47 Marginal Probability Probability of an event occurring across all conditions A 2 A 1 B A 3 A 5 A 4 P( B) = n k = 1 P( A k B) = n k = 1 P( B A k ) P( A k ) Automatic Speech Recognition: From Theory to Practice 47

48 Automatic Speech Recognition: From Theory to Practice 48 Bayes Theorem ) ( ) ( ) ( ) ( ) ( ) ( B P A P A B P B P A B P B A P i i i i = = ) ( ) ( ) ( ) ( 1 1 k n k k n k k A P A B P B A P B P = = = = ) ( ) ( ) ( A P A B P AB P =

49 Bayes Rule Bayes Rule allows us to update prior beliefs with new evidence, P( O W ) P( W ) i i = P( Wi O) P( O W j ) P( W j ) j Prior Probability Likelihood Evidence P(O) Posterior Probability Automatic Speech Recognition: From Theory to Practice 49

50 Statistically Independent Events Occurrence of event A does not influence occurrence of event B: P ( AB) = P( A) P( B) P ( A B) = P( A) P ( B A) = P( B) Automatic Speech Recognition: From Theory to Practice 50

51 Random Variables Used to describe events in which the number of possible outcomes is infinite Values of the outcomes can not be predicted with certainty Distribution of outcome values is known Automatic Speech Recognition: From Theory to Practice 51

52 Probability Distribution Functions The probability of the event that the random variable X is less than or equal to the allowed value x: F X ( x) = P( X x) (x) F X 1 0 x Automatic Speech Recognition: From Theory to Practice 52

53 Probability Distribution Functions Properties: 0 F X ( x) 1 < x < F X ( ) = 0 and F ( ) = 1 X F X ( x) is nondecreasing as x increases P( x1 < X x2) = FX ( x2) FX ( x1 ) Automatic Speech Recognition: From Theory to Practice 53

54 Probability Density Functions (PDF) Derivative of probability distribution function, f X ( x) = dfx ( x) dx Interpretation: f X ( x) = P( x < X x + dx) Automatic Speech Recognition: From Theory to Practice 54

55 Probability Density Functions (PDF) Properties f X ( x) 0 < x < F X f X ( x) ( x) dx = 1 x = f ( u) du X x x 1 2 f X ( x) dx = P( x1 < X x2) Automatic Speech Recognition: From Theory to Practice 55

56 Mean and Variance Expectation or Mean of a random variable X E( X ) = xf X ( x) dx Variance of a random variable X µ = E(X ) Var( X Var( X ) ) 2 = σ = σ 2 = = E [ ] 2 ( X µ ) E( X [ E( X )] 2 Automatic Speech Recognition: From Theory to Practice 56 2 )

57 Mean and Variance Variance properties (if X and Y are independent) Var ( X + Y ) = Var( X ) + Var( Y ) Var ( ax ) = Var( a 1 X 1 a 2 1 a 2 + L+ Var( X ) a Var( X n 1 X ) n + b) = + L+ a 2 n Var( X n ) Automatic Speech Recognition: From Theory to Practice 57

58 Covariance and Correlation Covariance of X and Y: Cov( X, Y ) Cov( X, Y ) = Cov( Y, X ) Correlation of X and Y: = E[( X µ )( Y µ )] x y ρ XY Cov( X, Y ) = 1 ρ XY 1 σ σ X Y Automatic Speech Recognition: From Theory to Practice 58

59 Automatic Speech Recognition: From Theory to Practice 59 Gaussian (Normal) Distribution ( ) = = exp 2 1 ), ( σ µ πσ σ µ x x X f = = = N n x n N x E 1 1 ] [ µ [ ] ) ( ) ( = = = = N n n N n n x N x N X E X E σ

60 Gaussian (Normal) Distribution f ( X = x µ, σ 2 ) µ = 0 σ 2 = 1 x Automatic Speech Recognition: From Theory to Practice 60

61 Multivariate Distributions Characterize more than one random variable at a time Example: Features for speech recognition In speech recognition we typically compute 39-dimensional feature vectors (more about that later) 100 feature vectors per second of audio Often want to compute the likelihood of the observed features given known (estimated) distribution which is being used to model some part of a phoneme (more about that later!). Automatic Speech Recognition: From Theory to Practice 61

62 Multivariate Gaussian Distribution f 1 1 ( X = x µ, Σ) = exp µ n / 2 1/ 2 Σ 2 ( 2π ) ( ) T 1( ) x Σ x u Determinant of covariance matrix Distribution covariance matrix Distribution mean vector Observed vector of random variables (features) Automatic Speech Recognition: From Theory to Practice 62

63 Diagonal Covariance Assumption Most speech recognition systems assume diagonal covariance matrices Data sparseness issue: Σ = σ σ σ σ d 2 Σ = σ n= 1 nn Automatic Speech Recognition: From Theory to Practice 63

64 Automatic Speech Recognition: From Theory to Practice 64 Diagonal Covariance Assumption Inverting a diagonal matrix involves simply inverting the elements along the diagonal: = Σ σ σ σ σ

65 Automatic Speech Recognition: From Theory to Practice 65 Multivariate Gaussians with Diagonal Covariance Matrices Assuming a diagonal covariance matrix, = Σ = = n d d d d x C x X f ] [ ]) [ ] [ ( 2 1 *exp ), ( σ µ µ ( ) 2 1/ / ] [ 2 1 = = n d n d C σ π 2 2 ] [ dd d σ σ =

66 Multivariate Mixture Gaussians Distribution is governed by several Gaussian density functions, Sum of Gaussians (w m = mixture weight) f M ( x) = w m Ν x µ m= 1 m ( ;, Σ ) m m = M m= 1 ( 2π ) w 1 exp m n / 2 1/ 2 Σ 2 m ( ) T 1 x µ Σ ( x u ) m m m Automatic Speech Recognition: From Theory to Practice 66

67 f (x) Multiple Mixtures (1-D case) Example: 3 mixtures used to model underlying random process of 3 Gaussians Automatic Speech Recognition: From Theory to Practice 67 x

68 Speech Recognition Problem Formulation Automatic Speech Recognition: From Theory to Practice 68

69 Problem Description Given a sequence of observations (evidence) from an audio signal, O = o 1 o 2 L o T Determine the underlying word sequence, W = w 1 w 2 L w m Number of words (m) unknown, observation sequence is variable length (T) Automatic Speech Recognition: From Theory to Practice 69

70 Problem Formulation Goal: Minimize the classification error rate Solution: Maximize the Posterior Probability Ŵ = arg max W P(W O) Solution requires optimization over all possible word strings! Automatic Speech Recognition: From Theory to Practice 70

71 Problem Formulation Using Bayes Rule, P (W O) = P(O W) P(W) P(O) Since P(O) does not impact optimization, Ŵ = arg max P(W O) W = arg max P(O W) P(W) W Automatic Speech Recognition: From Theory to Practice 71

72 Problem Formulation Let s assume words can be represented by a sequence of states, S, Ŵ = = arg arg max W max W P(O S W ) P(W ) P(O S) P( S W ) P(W ) Words Phonemes States States represent smaller pieces of phonemes Automatic Speech Recognition: From Theory to Practice 72

73 Problem Formulation Optimize: Ŵ = arg max W S P (O S) P( S W) P(W) Practical Realization, Observation (feature) sequence O P(O S) P(S W) P(W) Acoustic Model Lexicon / Pronunciation Model Language Model Automatic Speech Recognition: From Theory to Practice 73

74 Problem Formulation Optimization desires most likely word sequence given observations (evidence) Can not evaluate all possible word / state sequences (too many possibilities!) We need: To define a representation for modeling states (HMMs ) A means for approximately searching for the best word / state sequence given the evidence (Viterbi Algorithm) And a few other tricks up our sleeves to make it FASTER! Automatic Speech Recognition: From Theory to Practice 74

75 Hidden Markov Models (HMMs) Observation vectors are assumed to be generated by a Markov Model HMM: A finite-state machine that at each time t that a state j is entered, an observation is emitted with probability density b j (o t ) a 00 S 0 a 01 a 11 a12 S 1 a 22 S 2 Transition from state i to state j modeled with probability a ij o1 o2 o3 o4 ot ot+ 1 ot+ 2 ot Automatic Speech Recognition: From Theory to Practice 75

76 Beads-on-a-String HMM Representation SPEECH S P IY CH o1 o2 o3 o4 o5 o6 o7 Automatic Speech Recognition: From Theory to Practice 76

77 Modeled Probability Distribution a 00 a 11 a 22 S 0 a 01 a12 S 1 S 2 b ( o ) ( ) 0 t b o ( ) 1 t b 2 o t Automatic Speech Recognition: From Theory to Practice 77

78 HMMs with Mixtures of Gaussians Multivariate Mixture-Gaussian distribution, b j ( o ) t = M m = 1 w m ( 2π ) n Σ j e T ( o µ ) Σ ( o µ ) 1 t j j 1 2 t j Model parameters are (1) means, (2) variances, and (3) mixture weights Sum of Gaussians can model complex distributions Automatic Speech Recognition: From Theory to Practice 78

79 Hidden Markov Model Based ASR Observation sequence assumed to be known Probability of a particular state sequence can be computed a 00 S 0 a 01 a 11 a12 S 1 a 22 S 2 Underlying state sequence is unknown, hidden o1 o2 o3 o4 ot o6 o7 ot Automatic Speech Recognition: From Theory to Practice 79

80 Components of a Speech Recognizer Lexicon Language Model Speech Text + Feature Extraction Optimization // Search Timing + Confidence Acoustic Model Automatic Speech Recognition: From Theory to Practice 80

81 Topics for Next Time An introduction to hearing Speech Detection Frame-based Speech Analysis Feature Extraction Methods for Speech Recognition Automatic Speech Recognition: From Theory to Practice 81

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System

On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS

More information

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems.

Speech Recognition. CS 294-5: Statistical Natural Language Processing. State-of-the-Art: Recognition. ASR for Dialog Systems. CS 294-5: Statistical Natural Language Processing Speech Recognition Lecture 20: 11/22/05 Slides directly from Dan Jurafsky, indirectly many others Speech Recognition Overview: Demo Phonetics Articulatory

More information

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Speech Signals. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Speech Signals Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Maximum Entropy Models Improving on N-Grams? N-grams don t combine multiple sources of evidence well P(construction

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Object Tracking and Asynchrony in Audio- Visual Speech Recognition

Object Tracking and Asynchrony in Audio- Visual Speech Recognition Object Tracking and Asynchrony in Audio- Visual Speech Recognition Mark Hasegawa-Johnson AIVR Seminar August 31, 2006 AVICAR is thanks to: Bowon Lee, Ming Liu, Camille Goudeseune, Suketu Kamdar, Carl Press,

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM

Computer Science March, Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Computer Science 401 8 March, 2010 St. George Campus University of Toronto Homework Assignment #3 Due: Thursday, 1 April, 2010 at 12 PM Speech TA: Frank Rudzicz 1 Introduction This assignment introduces

More information

Speech Spectra and Spectrograms

Speech Spectra and Spectrograms ACOUSTICS TOPICS ACOUSTICS SOFTWARE SPH301 SLP801 RESOURCE INDEX HELP PAGES Back to Main "Speech Spectra and Spectrograms" Page Speech Spectra and Spectrograms Robert Mannell 6. Some consonant spectra

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation

More information

Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach

Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach Discriminative Pronunciation Modeling: A Large-Margin Feature-Rich Approach Hao Tang Toyota Technological Institute at Chicago May 7, 2012 Joint work with Joseph Keshet and Karen Livescu To appear in ACL

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme

More information

Witsuwit en phonetics and phonology. LING 200 Spring 2006

Witsuwit en phonetics and phonology. LING 200 Spring 2006 Witsuwit en phonetics and phonology LING 200 Spring 2006 Announcements Correction to homework #2 (due Thurs in section) 5. all 6. (a)-(g), (j) (rest of assignment remains the same) Announcements Clickers

More information

A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge

A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge by Renita Sudirga A thesis submitted to the Department of Electrical and Computer Engineering in conformity with the requirements

More information

Tuesday, August 26, 14. Articulatory Phonology

Tuesday, August 26, 14. Articulatory Phonology Articulatory Phonology Problem: Two Incompatible Descriptions of Speech Phonological sequence of discrete symbols from a small inventory that recombine to form different words Physical continuous, context-dependent

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Automatic Phoneme Recognition. Segmental Hidden Markov Models

Automatic Phoneme Recognition. Segmental Hidden Markov Models Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

ECE 598: The Speech Chain. Lecture 9: Consonants

ECE 598: The Speech Chain. Lecture 9: Consonants ECE 598: The Speech Chain Lecture 9: Consonants Today International Phonetic Alphabet History SAMPA an IPA for ASCII Sounds with a Side Branch Nasal Consonants Reminder: Impedance of a uniform tube Liquids:

More information

Supporting Information

Supporting Information Supporting Information Blasi et al. 10.1073/pnas.1605782113 SI Materials and Methods Positional Test. We simulate, for each language and signal, random positions of the relevant signal-associated symbol

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

CS 136 Lecture 5 Acoustic modeling Phoneme modeling

CS 136 Lecture 5 Acoustic modeling Phoneme modeling + September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

CS425 Audio and Speech Processing. Matthieu Hodgkinson Department of Computer Science National University of Ireland, Maynooth

CS425 Audio and Speech Processing. Matthieu Hodgkinson Department of Computer Science National University of Ireland, Maynooth CS425 Audio and Speech Processing Matthieu Hodgkinson Department of Computer Science National University of Ireland, Maynooth April 30, 2012 Contents 0 Introduction : Speech and Computers 3 1 English phonemes

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

Introduction to Stochastic Processes

Introduction to Stochastic Processes Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.

More information

Acoustic Modeling for Speech Recognition

Acoustic Modeling for Speech Recognition Acoustic Modeling for Speech Recognition Berlin Chen 2004 References:. X. Huang et. al. Spoken Language Processing. Chapter 8 2. S. Young. The HTK Book (HTK Version 3.2) Introduction For the given acoustic

More information

Determining the Shape of a Human Vocal Tract From Pressure Measurements at the Lips

Determining the Shape of a Human Vocal Tract From Pressure Measurements at the Lips Determining the Shape of a Human Vocal Tract From Pressure Measurements at the Lips Tuncay Aktosun Technical Report 2007-01 http://www.uta.edu/math/preprint/ Determining the shape of a human vocal tract

More information

Artificial Intelligence Markov Chains

Artificial Intelligence Markov Chains Artificial Intelligence Markov Chains Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 12: Markov Chains Artificial Intelligence SS2010

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Chapter 12 Sound in Medicine

Chapter 12 Sound in Medicine Infrasound < 0 Hz Earthquake, atmospheric pressure changes, blower in ventilator Not audible Headaches and physiological disturbances Sound 0 ~ 0,000 Hz Audible Ultrasound > 0 khz Not audible Medical imaging,

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

02 Background Minimum background on probability. Random process

02 Background Minimum background on probability. Random process 0 Background 0.03 Minimum background on probability Random processes Probability Conditional probability Bayes theorem Random variables Sampling and estimation Variance, covariance and correlation Probability

More information

Chapter 9 Automatic Speech Recognition DRAFT

Chapter 9 Automatic Speech Recognition DRAFT P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

A Linguistic Feature Representation of the Speech Waveform

A Linguistic Feature Representation of the Speech Waveform September 1993 LIDS-TH-2195 Research Supported By: National Science Foundation NDSEG Fellowships A Linguistic Feature Representation of the Speech Waveform Ellen Marie Eide September 1993 LIDS-TH-2195

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

c 2014 Jacob Daniel Bryan

c 2014 Jacob Daniel Bryan c 2014 Jacob Daniel Bryan AUTOREGRESSIVE HIDDEN MARKOV MODELS AND THE SPEECH SIGNAL BY JACOB DANIEL BRYAN THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Sound Correspondences in the World's Languages: Online Supplementary Materials

Sound Correspondences in the World's Languages: Online Supplementary Materials Sound Correspondences in the World's Languages: Online Supplementary Materials Cecil H. Brown, Eric W. Holman, Søren Wichmann Language, Volume 89, Number 1, March 2013, pp. s1-s76 (Article) Published by

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Parametric Specification of Constriction Gestures

Parametric Specification of Constriction Gestures Parametric Specification of Constriction Gestures Specify the parameter values for a dynamical control model that can generate appropriate patterns of kinematic and acoustic change over time. Model should

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

Deep Learning for Automatic Speech Recognition Part I

Deep Learning for Automatic Speech Recognition Part I Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech

More information

4.2 Acoustics of Speech Production

4.2 Acoustics of Speech Production 4.2 Acoustics of Speech Production Acoustic phonetics is a field that studies the acoustic properties of speech and how these are related to the human speech production system. The topic is vast, exceeding

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

Session 1: Pattern Recognition

Session 1: Pattern Recognition Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information