Cochlear modeling and its role in human speech recognition
|
|
- Dayna Wells
- 5 years ago
- Views:
Transcription
1 Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL
2 Allen/IPAM February 1, 2005 p. 2/3 odel of human speech recognition (HSR The research goal is to identify elemental HSR events An event is defined as a perceptual feature Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"
3 Allen/IPAM February 1, 2005 p. 3/3 Modeling MaxEnt HSR Definition of MaxEnt (a.k.a. nonsense ) syllables: A fixed set of { C,V } sounds are drawn from the language of interest A uniform distribution for each C and V is required, to minimize syllable context ( MaxEnt) MaxEnt CV or CVC syllable score: S cv = s 2, S cvc = s 3 MaxEnt syllables first described in Fletcher, 1929 A set of meaningful words is not MaxEnt Modeling non-maxent syllables require the context models of Boothroyd, 1968; Bronkhorst et al., 1993 Fletcher s 1921 band independence model: s() 1 e = 1 e 1 e 2 e 3... e 20 = 1 e min (1) accurately models MaxEnt HSR (Allen, 1994)
4 Allen/IPAM February 1, 2005 p. 4/3 Probabilistic measures of recognition Recognition measures (MaxEnt Maximum Entropy): k th band articulation index: k log(snr k ) Band recognition error: e k = e k/k min with K = 20 MaxEnt phone score: s = 1 e 1 e 2... e K = 1 e k min MaxEnt syllable score: S cv = s 2, S cvc = s 3 Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"
5 Allen/IPAM February 1, 2005 p. 5/3 What is an elemental event? Miller-Nicely s 1955 articulation matrix A measured at [-18, -12, -6 shown, 0, 6, 12] db SNR STIMULUS UNVOICED RESPONSE Confusion groups formed in A(snr) VOICED NASAL
6 Allen/IPAM February 1, 2005 p. 6/3 Case of /pa/, /ta/, /ka/ with /ta/ spoken Plot of A i,j (snr) for row i =2 and column j = AM 2,h (SNR): /ta/ spoken 10 0 AM s,2 (SNR): /ta/ heard P h s=2 (SNR) P s h=2 (SNR) SNR [db] Symmetric: /ta/ SNR [db] Skew symmetric: /ta/ 10 0 S 2,j (SNR) A 2,j (SNR) SNR [db] SNR [db] Solid red curve is total error e 2 1 A 2,2 = j 2 A 2,j
7 Allen/IPAM February 1, 2005 p. 7/3 The case of /ma/ vs. /na/ Plots of S(snr) 1 2 (A + At ), /ma/, /na/ spoken Solid red curve is total error e i 1 S i,i = j i S i,j 10 0 Symmetric: /ma/ S 15 (SNR) 10 0 Symmetric: /na/ S 16 (SNR) S 15,j (SNR) 1 S 15 (SNR) S 16,j (SNR) 1 S 16 (SNR) SNR [db] SNR [db] This 2-group of sounds is closed since S /ma/,/ma/ (snr) + S /ma/,/na/ (snr) 1
8 Allen/IPAM February 1, 2005 p. 8/3 Definition of the Let k be the k th band cochlear channel capacity k log 10(1 + c 2 snr 2 k ), with c = 2, k 1, then 1 K k (Allen, 1994; Allen, 2004) Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"
9 Allen/IPAM February 1, 2005 p. 9/3 Long-term spectrum of female speech Speech spectrum for female speech Dunn and White Dashed red line shows the approximation Slope 29 db/decade
10 Allen/IPAM February 1, 2005 p. 10/3 Conversion from SNR to Spectra for SNRs of [-18, -12, -6, 0, 6, 12] db Speech: 18 to +12, Noise: 0 [db] Spectral level [RMS] Frequency (khz) snr k is determined for K = 20 cochlear critical bands
11 Allen/IPAM February 1, 2005 p. 11/3 (SNR) for Miller Nicely s data (SNR) computed from the Dunn and White spectrum Spectral level [RMS] Speech: 18 to +12, Noise: 0 [db] Frequency (khz) (SNR) Female speech and white noise Wideband SNR [db] Conversion from SNR k to [Allen 2005, JASA]: k = k=1 log 10 (1 + c 2 snr 2 k ), c = 2, k 1
12 Allen/IPAM February 1, 2005 p. 12/3 MN16 and the model P (i) c () for the i th consonant and P c () 1 16 P c () = 1 ( )e min i P c (i) () is the chance corrected model (i) P correct for i= & mean (Tables I VI) P c (SNR) mean i (P (i) c ) P (i) 0.2 c (SNR) (SNR+20)/ SNR [db] P c () mean i (P c (i) ) P c (i) () 1 (1 1/16) e min Band independence seems a perfect fit to MN16!!
13 Allen/IPAM February 1, 2005 p. 13/3 Log-error probability Band-independence, corrected for chance, P e (, H = 4) 1 P c () = (1 2 H )e min This suggests linear log-error plots vs. : MN16 CVs log(p (i) e ) = log( ) + log(e min) This relation is of the form Y = a + bx Plots of log(p e )() vs. provide a nice test of Fletcher s band independence model Individual sounds P e (i) () from MN16 may be tested for band independence.
14 Allen/IPAM February 1, 2005 p. 14/3 Testing the band-independence model CV log-error model: ( ) log () P (i) e = log ( 1 2 4) + log ( ) e (i) min 10 0 Sounds 1,3,5,9,10, Sounds: 2,6,7,13,14 (i) 1 s (), 1 mean (P ), emin i i c (i) 1 s i (), 1 mean i (P c ), emin Sounds: 4,8, Sounds: 15,16 (i) 1 s (), 1 mean (P ), emin i i c (i) 1 s i (), 1 mean i (P c ), emin Model; MN16 P c (); - - -, - - -: CV sounds
15 Allen/IPAM February 1, 2005 p. 15/3 Testing the band-independence model Sounds: [1, 2, 3, 5, 8, 9, 10, 12, 13, 14] pass Sounds: [4, 8, 11, 15, 16] fail (nonlinear wrt ) 10 0 Sounds 1,3,5,9,10, Sounds: 2,6,7,13,14 (i) 1 s i (), 1 mean i (P c ), emin (i) 1 s (), 1 mean (P ), emin i i c Sounds: 4,8, Sounds: 15,16 (i) 1 s i (), 1 mean i (P c ), emin (i) 1 s (), 1 mean (P ), emin i i c
16 Allen/IPAM February 1, 2005 p. 16/3 /ma/ and /na/ vs. The multichannel model is valid for the nasals S i,j () δ i,j + ( 1) δ i,j e min (i) 1 S i,i =1 H g = 1 [bit] (i.e., a 2-group) (1 2 H g ) (e (i) min ) for > g 10 0 Symmetric: /ma/ S 15 () 10 0 Symmetric: /na/ S 16 () S 15,j () ^g 1 S 15 () S 16,j () ^g 1 S 16 ()
17 Allen/IPAM February 1, 2005 p. 17/3 Case of /pa/, /ta/, /ka/ vs. Fletcher s multichannel model is valid for i = 1, 2, 3: ( ) S i,j () δ i,j + ( 1) δ i,j (1 2 H g ) e (i,j) min for > g 10 0 S 1,j () /pa/ 10 0 S 2,j () 10 0 S 3,j () /ka/ /ta/ /ta/ Band independence holds for the [/pa/, /ta/, /ka/] group: H g = log 2 (3), g = 0.15 and e min (i,j) depends on i, j.
18 Allen/IPAM February 1, 2005 p. 18/3 Conclusions I The predicts above chance performance near -20 db HSR performance saturates near 0 db SNR No overlap in ASR vs. HSR ASR chance performance near 0 db ASR performance saturates near +20 db SNR
19 Allen/IPAM February 1, 2005 p. 19/3 Conclusions II Fletcher s theory is based on band independence e(snr) = e 1 K 1 min e 1 K 2 min e 1 K 3 min... e 1 K K min = e min (2) Recognition of MaxEnt phones satisfy 2 MaxEnt close set tests scores may be predicted by P c (, H) = 1 (1 2 H )e min The first sign of > chance performance is grouping The sound groups depend on the noise spectrum
20 Allen/IPAM February 1, 2005 p. 20/3 Conclusions III The average over the 16 Miller Nicely consonant confusions P c (i) (snr) is accurately predicted by the Articulation Index: P (i) c = P c (snr) i=1 P c (i) (snr) = 1 ( 1 1 ) e min 16 is the probability of correct identification of i th CV k log 10(1 + 4snr 2 k ) where k is the band index Chance error is given by (1 1/16) e min P c() (1 1/16) =1
21 Allen/IPAM February 1, 2005 p. 21/3 Conclusions IV The model, corrected for 1/16 chance guessing, predicts the average Miller Nicely P c quite well There are no free parameters in this model of P c (snr) For the MN experiment, the 0.5 The (snr) is not linear in snr, due to the shape of the snr(f) The individual curves remain to be analyzed and modeled (if possible) How can we explain the variance across sounds?
22 Allen/IPAM February 1, 2005 p. 22/3 Conclusions V The MN data is very close to symmetric There are a few exceptions, but even for these, the asymmetric part is very small
23 Allen/IPAM February 1, 2005 p. 23/3 Conclusions VI The off-diagonal PI() functions linearly decompose error for i th sound P c (i, ), since j P ij = 1 P e (i, ) 1 P i,i () = j i P (j i, ) In the previous figure the red curve is identically the sum of all the blue curves The green curve is the model ( P c (, H = 4) = 1 1 ) 16 e min
24 Allen/IPAM February 1, 2005 p. 24/3 MN16 vs. MN64 Results of the MN16 and the MN64 CV experiments, plotted as a function of both the SNR and the. Error (1 P c (SNR)) [%] 10 0 MN16 55 MN Wideband SNR [db] Error (1 P c (SNR)) [%] 10 0 MN16 55 MN
25 Allen/IPAM February 1, 2005 p. 25/3 Human vs. Γ-tone filters h(t, f 0 ) = t 3 e 2.22 π ERB(f 0) t cos(2πf 0 t) ERB(f 0 ) = 24.7(4.37f 0 / ) (Γ-tone filter) cilia displmt [solid] Γ tone [dashed] 10 0 Model neural tuning curves vs. Gamma tone filters Frequency [khz] High frequency slope problem Low-frequency tails are wrong Wrong bandwidth at higher frequencies Missing middle ear high-pass response
26 Allen/IPAM February 1, 2005 p. 26/3 Narayan et al. data Narayan et al (Gerbil)
27 Allen/IPAM February 1, 2005 p. 27/3 Kiang and Moxon 1979 cochlear USM Nonlinear upward spread of masking
28 Allen/IPAM February 1, 2005 p. 28/3 Neely model for Cat 1986 Active model of the cochlea and cilia, based on the resonant TM model. Cat data from Liberman and Delgutte.
29 Allen/IPAM February 1, 2005 p. 29/3 Allen model 1980 Solid: Model; dashed Cat FTC (Liberman and Delgutte) 1 2 MAGNITUDE FREQUENCY (Hz)
30 Allen/IPAM February 1, 2005 p. 30/3 Symmetric form S = (A + A t )/ /pa/ 10 0 /ta/ 10 0 /ka/ 10 0 /fa/ S r,c () /Θ/ /saw/ / / /ba/ /da/ /ga/ /va/ / / /za/ /3/ /ma/ /na/
31 Allen/IPAM February 1, 2005 p. 31/3 References Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on speech and audio, 2(4): Allen, J. B. (2004). The articulation index is a Shannon channel capacity. In Pressnitzer, D., de Cheveigné, A., McAdams, S., and Collet, L., editors, Auditory signal processing: physiology, psychoacoustics, and models, chapter Speech, pages Springer Verlag, New York, NY. Boothroyd, A. (1968). Statistical theory of the speech discrimination score. J. Acoust. Soc. Am., 43(2): Bronkhorst, A. W., Bosman, A. J., and Smoorenburg, G. F. (1993). A model for context effects in speech recognition. J. Acoust. Soc. Am., 93(1): Fletcher, H. (1929). Speech and Hearing. D. Van Nostrand Company, Inc., New York. Miller, G. A. and Nicely, P. E. (1955). An analysis of perceptual confsions among some english consonants. J. Acoust. Soc. Am., 27(2):
32 Allen/IPAM February 1, 2005 p. 32/3 title Latest from Sandeep 100 Fraction of utterances above the Error Threshold Error Threshold [%] Fraction of Utterances [%]
The effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationAN INVERTIBLE DISCRETE AUDITORY TRANSFORM
COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to
More informationarxiv:math/ v1 [math.na] 12 Feb 2005
arxiv:math/0502252v1 [math.na] 12 Feb 2005 An Orthogonal Discrete Auditory Transform Jack Xin and Yingyong Qi Abstract An orthogonal discrete auditory transform (ODAT) from sound signal to spectrum is
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationAcoustic Quantities. LMS Test.Lab. Rev 12A
Acoustic Quantities LMS Test.Lab Rev 12A Copyright LMS International 2012 Table of Contents Chapter 1 Acoustic quantities... 5 Section 1.1 Sound power (P)... 5 Section 1.2 Sound pressure... 5 Section
More informationLoudness and the JND
Allen December 27, 2004 p. 1/1 Loudness and the JND An introduction to loudness Psychophysics Jont Allen ECE-437 Allen December 27, 2004 p. 2/1 The intensity JND is internal uncertainty Perception is stochastic:
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationA Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder
A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder Frank Baumgarte, Charalampos Ferekidis, Hendrik Fuchs Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationTime-Series Analysis for Ear-Related and Psychoacoustic Metrics
Time-Series Analysis for Ear-Related and Psychoacoustic Metrics V. Mellert, H. Remmers, R. Weber, B. Schulte-Fortkamp how to analyse p(t) to obtain an earrelated parameter? general remarks on acoustical
More informationPsychophysical models of masking for coding applications
Psychophysical models of masking for coding applications Jont B. Allen Room E161 AT&T Labs-Research 180 Park AV Florham Park NJ 07932 973/360-8545voice, x8092fax http://www.research.att.com/info/jba February
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationLecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationFrog Sound Identification System for Frog Species Recognition
Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,
More informationLecture 5: Jan 19, 2005
EE516 Computer Speech Processing Winter 2005 Lecture 5: Jan 19, 2005 Lecturer: Prof: J. Bilmes University of Washington Dept. of Electrical Engineering Scribe: Kevin Duh 5.1
More informationResponse-Field Dynamics in the Auditory Pathway
Response-Field Dynamics in the Auditory Pathway Didier Depireux Powen Ru Shihab Shamma Jonathan Simon Work supported by grants from the Office of Naval Research, a training grant from the National Institute
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationWavelet Transform in Speech Segmentation
Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationStress detection through emotional speech analysis
Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or
More information'L. E. Dickson, Introduction to the Theory of Numbers, Chap. V (1929).
VOL. 23, 1937 PSYCHOLOG Y: LEWIS A ND LARSEN 415 THEOREM 2. If the discriminant contains as a factor the square of any odd prime, there is more than a single class of forms in each genus except for the
More informationTHE OUTER HAIR CELL MOTILITY MUST BE BASED ON A TONIC CHANGE IN THE MEMBRANE VOLTAGE: IMPLICATIONS FOR THE COCHLEAR AMPLIFIER
THE OUTER HAIR CELL MOTILITY MUST BE BASED ON A TONIC CHANGE IN THE MEMBRANE VOLTAGE: IMPLICATIONS FOR THE COCHLEAR AMPLIFIER Jont B. Allen Mountainside, NJ and Paul Fahey Scranton Pa Dec. 4, 2002 ASA
More informationSound 2: frequency analysis
COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationSPEECH COMMUNICATION 6.541J J-HST710J Spring 2004
6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationIntroduction to Acoustics Exercises
. 361-1-3291 Introduction to Acoustics Exercises 1 Fundamentals of acoustics 1. Show the effect of temperature on acoustic pressure. Hint: use the equation of state and the equation of state at equilibrium.
More informationLast time: small acoustics
Last time: small acoustics Voice, many instruments, modeled by tubes Traveling waves in both directions yield standing waves Standing waves correspond to resonances Variations from the idealization give
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationarxiv:math/ v1 [math.na] 7 Mar 2006
arxiv:math/0603174v1 [math.na] 7 Mar 2006 A Many to One Discrete Auditory Transform Jack Xin and Yingyong Qi Abstract A many to one discrete auditory transform is presented to map a sound signal to a perceptually
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationLecture 18: Gaussian Channel
Lecture 18: Gaussian Channel Gaussian channel Gaussian channel capacity Dr. Yao Xie, ECE587, Information Theory, Duke University Mona Lisa in AWGN Mona Lisa Noisy Mona Lisa 100 100 200 200 300 300 400
More information1 Introduction. 2 Data Set and Linear Spectral Analysis
Analogical model for self-sustained sounds generated by organ pipe E. DE LAURO, S. DE MARTINO, M. FALANGA Department of Physics Salerno University Via S. Allende, I-848, Baronissi (SA) ITALY Abstract:
More informationLinear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex
Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex D. J. Klein S. A. Shamma J. Z. Simon D. A. Depireux,2,2 2 Department of Electrical Engineering Supported in part
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationCONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT
CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering
More informationCOMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018
COMP 546 Lecture 21 Cochlea to brain, Source Localization Tues. April 3, 2018 1 Ear pinna auditory canal cochlea outer middle inner 2 Eye Ear Lens? Retina? Photoreceptors (light -> chemical) Ganglion cells
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationSPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION
SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,
More informationTone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition
Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology
More informationNew Statistical Model for the Enhancement of Noisy Speech
New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem
More informationALL-POLE MODELS OF AUDITORY FILTERING. R.F. LYON Apple Computer, Inc., One Infinite Loop Cupertino, CA USA
ALL-POLE MODELS OF AUDITORY FILTERING R.F. LYON Apple Computer, Inc., One Infinite Loop Cupertino, CA 94022 USA lyon@apple.com The all-pole gammatone filter (), which we derive by discarding the zeros
More informationEIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis
EIGENFILTERS FOR SIGNAL CANCELLATION Sunil Bharitkar and Chris Kyriakakis Immersive Audio Laboratory University of Southern California Los Angeles. CA 9. USA Phone:+1-13-7- Fax:+1-13-7-51, Email:ckyriak@imsc.edu.edu,bharitka@sipi.usc.edu
More informationEnhancement of Noisy Speech. State-of-the-Art and Perspectives
Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction
More informationQuantifying the Information in Auditory-Nerve Responses for Level Discrimination
JARO 04: 294 311 (2003) DOI: 10.1007/s10162-002-1090-6 JARO Journal of the Association for Research in Otolaryngology Quantifying the Information in Auditory-Nerve Responses for Level Discrimination H.
More informationExploring Non-linear Transformations for an Entropybased Voice Activity Detector
Exploring Non-linear Transformations for an Entropybased Voice Activity Detector Jordi Solé-Casals, Pere Martí-Puig, Ramon Reig-Bolaño Digital Technologies Group, University of Vic, Sagrada Família 7,
More informationData Analysis for an Absolute Identification Experiment. Randomization with Replacement. Randomization without Replacement
Data Analysis for an Absolute Identification Experiment 1 Randomization with Replacement Imagine that you have k containers for the k stimulus alternatives The i th container has a fixed number of copies
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationSignal Modeling Techniques In Speech Recognition
Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,
More informationA comparative study of time-delay estimation techniques for convolutive speech mixtures
A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es
More informationSignals, Instruments, and Systems W5. Introduction to Signal Processing Sampling, Reconstruction, and Filters
Signals, Instruments, and Systems W5 Introduction to Signal Processing Sampling, Reconstruction, and Filters Acknowledgments Recapitulation of Key Concepts from the Last Lecture Dirac delta function (
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationAutoregressive Neural Models for Statistical Parametric Speech Synthesis
Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn
More informationCOMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.
COMP 40420 2. Signals and Systems Dr Chris Bleakley UCD School of Computer Science and Informatics. Scoil na Ríomheolaíochta agus an Faisnéisíochta UCD. Introduction 1. Signals 2. Systems 3. System response
More informationTopic 6. Timbre Representations
Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?
More informationBayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement
Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationDavid Weenink. First semester 2007
Institute of Phonetic Sciences University of Amsterdam First semester 2007 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale
More informationA Probability Model for Interaural Phase Difference
A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract
More informationDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationLIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO*
LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO* Item Type text; Proceedings Authors Viswanathan, R.; S.C. Gupta Publisher International Foundation for Telemetering Journal International Telemetering Conference
More informationAn EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationNeural coding Ecological approach to sensory coding: efficient adaptation to the natural environment
Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.
More informationCOMP 546, Winter 2018 lecture 19 - sound 2
Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,
More informationA perception- and PDE-based nonlinear transformation for processing spoken words
Physica D 149 (21) 143 16 A perception- and PDE-based nonlinear transformation for processing spoken words Yingyong Qi a, Jack Xin b, a Department of Electrical and Computer Engineering, University of
More informationProbabilistic Inference of Speech Signals from Phaseless Spectrograms
Probabilistic Inference of Speech Signals from Phaseless Spectrograms Kannan Achan, Sam T. Roweis, Brendan J. Frey Machine Learning Group University of Toronto Abstract Many techniques for complex speech
More informationGAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS
GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux
More informationPDF estimation by use of characteristic functions and the FFT
Allen/DSP Seminar January 27, 2005 p. 1/1 PDF estimation by use of characteristic functions and the FFT Jont B. Allen Univ. of IL, Beckman Inst., Urbana IL Allen/DSP Seminar January 27, 2005 p. 2/1 iven
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationConvolutional Associative Memory: FIR Filter Model of Synapse
Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,
More informationAudio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU
Audio Coding P.1 Fundamentals Quantization Waveform Coding Subband Coding 1. Fundamentals P.2 Introduction Data Redundancy Coding Redundancy Spatial/Temporal Redundancy Perceptual Redundancy Compression
More informationTime Varying Loudness as a Means of Quantifying Loudness in Nonlinearly Propagated Acoustical Signals. S. Hales Swift. Kent L. Gee, faculty advisor
Time Varying Loudness as a Means of Quantifying Loudness in Nonlinearly Propagated Acoustical Signals by S. Hales Swift Kent L. Gee, faculty advisor A capstone project report submitted to the faculty of
More informationGaussian Mixture Model Based Coding of Speech and Audio
Gaussian Mixture Model Based Coding of Speech and Audio Sam Vakil Department of Electrical & Computer Engineering McGill University Montreal, Canada October 2004 A thesis submitted to McGill University
More informationText Independent Speaker Identification Using Imfcc Integrated With Ica
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc
More informationNew Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 10, NO 5, JULY 2002 257 New Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution Tomas Gänsler,
More informationSpeech Spectra and Spectrograms
ACOUSTICS TOPICS ACOUSTICS SOFTWARE SPH301 SLP801 RESOURCE INDEX HELP PAGES Back to Main "Speech Spectra and Spectrograms" Page Speech Spectra and Spectrograms Robert Mannell 6. Some consonant spectra
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationMDS codec evaluation based on perceptual sound attributes
MDS codec evaluation based on perceptual sound attributes Marcelo Herrera Martínez * Edwar Jacinto Gómez ** Edilberto Carlos Vivas G. *** submitted date: March 03 received date: April 03 accepted date:
More information200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs.
Overview Acoustics of Speech and Hearing Lecture 1-2 How is sound pressure and loudness related? How can we measure the size (quantity) of a sound? The scale Logarithmic scales in general Decibel scales
More informationCharacterization of phonemes by means of correlation dimension
Characterization of phonemes by means of correlation dimension PACS REFERENCE: 43.25.TS (nonlinear acoustical and dinamical systems) Martínez, F.; Guillamón, A.; Alcaraz, J.C. Departamento de Matemática
More informationEstimating Correlation Coefficient Between Two Complex Signals Without Phase Observation
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationUpper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition
Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation
More informationAcoustics 08 Paris 6013
Resolution, spectral weighting, and integration of information across tonotopically remote cochlear regions: hearing-sensitivity, sensation level, and training effects B. Espinoza-Varas CommunicationSciences
More informationR E S E A R C H R E P O R T Entropy-based multi-stream combination Hemant Misra a Hervé Bourlard a b Vivek Tyagi a IDIAP RR 02-24 IDIAP Dalle Molle Institute for Perceptual Artificial Intelligence ffl
More informationTemporal Modeling and Basic Speech Recognition
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationShort-Time ICA for Blind Separation of Noisy Speech
Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk
More informationChirp Transform for FFT
Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More information