Cochlear modeling and its role in human speech recognition

Size: px
Start display at page:

Download "Cochlear modeling and its role in human speech recognition"

Transcription

1 Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL

2 Allen/IPAM February 1, 2005 p. 2/3 odel of human speech recognition (HSR The research goal is to identify elemental HSR events An event is defined as a perceptual feature Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"

3 Allen/IPAM February 1, 2005 p. 3/3 Modeling MaxEnt HSR Definition of MaxEnt (a.k.a. nonsense ) syllables: A fixed set of { C,V } sounds are drawn from the language of interest A uniform distribution for each C and V is required, to minimize syllable context ( MaxEnt) MaxEnt CV or CVC syllable score: S cv = s 2, S cvc = s 3 MaxEnt syllables first described in Fletcher, 1929 A set of meaningful words is not MaxEnt Modeling non-maxent syllables require the context models of Boothroyd, 1968; Bronkhorst et al., 1993 Fletcher s 1921 band independence model: s() 1 e = 1 e 1 e 2 e 3... e 20 = 1 e min (1) accurately models MaxEnt HSR (Allen, 1994)

4 Allen/IPAM February 1, 2005 p. 4/3 Probabilistic measures of recognition Recognition measures (MaxEnt Maximum Entropy): k th band articulation index: k log(snr k ) Band recognition error: e k = e k/k min with K = 20 MaxEnt phone score: s = 1 e 1 e 2... e K = 1 e k min MaxEnt syllable score: S cv = s 2, S cvc = s 3 Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"

5 Allen/IPAM February 1, 2005 p. 5/3 What is an elemental event? Miller-Nicely s 1955 articulation matrix A measured at [-18, -12, -6 shown, 0, 6, 12] db SNR STIMULUS UNVOICED RESPONSE Confusion groups formed in A(snr) VOICED NASAL

6 Allen/IPAM February 1, 2005 p. 6/3 Case of /pa/, /ta/, /ka/ with /ta/ spoken Plot of A i,j (snr) for row i =2 and column j = AM 2,h (SNR): /ta/ spoken 10 0 AM s,2 (SNR): /ta/ heard P h s=2 (SNR) P s h=2 (SNR) SNR [db] Symmetric: /ta/ SNR [db] Skew symmetric: /ta/ 10 0 S 2,j (SNR) A 2,j (SNR) SNR [db] SNR [db] Solid red curve is total error e 2 1 A 2,2 = j 2 A 2,j

7 Allen/IPAM February 1, 2005 p. 7/3 The case of /ma/ vs. /na/ Plots of S(snr) 1 2 (A + At ), /ma/, /na/ spoken Solid red curve is total error e i 1 S i,i = j i S i,j 10 0 Symmetric: /ma/ S 15 (SNR) 10 0 Symmetric: /na/ S 16 (SNR) S 15,j (SNR) 1 S 15 (SNR) S 16,j (SNR) 1 S 16 (SNR) SNR [db] SNR [db] This 2-group of sounds is closed since S /ma/,/ma/ (snr) + S /ma/,/na/ (snr) 1

8 Allen/IPAM February 1, 2005 p. 8/3 Definition of the Let k be the k th band cochlear channel capacity k log 10(1 + c 2 snr 2 k ), with c = 2, k 1, then 1 K k (Allen, 1994; Allen, 2004) Recognition level Cochlea Event Phones Syllables Words Layer Layer s(t) Filters Layer Layer k e k s S W Analog objects "Front end"??? Discrete objects "Back end"

9 Allen/IPAM February 1, 2005 p. 9/3 Long-term spectrum of female speech Speech spectrum for female speech Dunn and White Dashed red line shows the approximation Slope 29 db/decade

10 Allen/IPAM February 1, 2005 p. 10/3 Conversion from SNR to Spectra for SNRs of [-18, -12, -6, 0, 6, 12] db Speech: 18 to +12, Noise: 0 [db] Spectral level [RMS] Frequency (khz) snr k is determined for K = 20 cochlear critical bands

11 Allen/IPAM February 1, 2005 p. 11/3 (SNR) for Miller Nicely s data (SNR) computed from the Dunn and White spectrum Spectral level [RMS] Speech: 18 to +12, Noise: 0 [db] Frequency (khz) (SNR) Female speech and white noise Wideband SNR [db] Conversion from SNR k to [Allen 2005, JASA]: k = k=1 log 10 (1 + c 2 snr 2 k ), c = 2, k 1

12 Allen/IPAM February 1, 2005 p. 12/3 MN16 and the model P (i) c () for the i th consonant and P c () 1 16 P c () = 1 ( )e min i P c (i) () is the chance corrected model (i) P correct for i= & mean (Tables I VI) P c (SNR) mean i (P (i) c ) P (i) 0.2 c (SNR) (SNR+20)/ SNR [db] P c () mean i (P c (i) ) P c (i) () 1 (1 1/16) e min Band independence seems a perfect fit to MN16!!

13 Allen/IPAM February 1, 2005 p. 13/3 Log-error probability Band-independence, corrected for chance, P e (, H = 4) 1 P c () = (1 2 H )e min This suggests linear log-error plots vs. : MN16 CVs log(p (i) e ) = log( ) + log(e min) This relation is of the form Y = a + bx Plots of log(p e )() vs. provide a nice test of Fletcher s band independence model Individual sounds P e (i) () from MN16 may be tested for band independence.

14 Allen/IPAM February 1, 2005 p. 14/3 Testing the band-independence model CV log-error model: ( ) log () P (i) e = log ( 1 2 4) + log ( ) e (i) min 10 0 Sounds 1,3,5,9,10, Sounds: 2,6,7,13,14 (i) 1 s (), 1 mean (P ), emin i i c (i) 1 s i (), 1 mean i (P c ), emin Sounds: 4,8, Sounds: 15,16 (i) 1 s (), 1 mean (P ), emin i i c (i) 1 s i (), 1 mean i (P c ), emin Model; MN16 P c (); - - -, - - -: CV sounds

15 Allen/IPAM February 1, 2005 p. 15/3 Testing the band-independence model Sounds: [1, 2, 3, 5, 8, 9, 10, 12, 13, 14] pass Sounds: [4, 8, 11, 15, 16] fail (nonlinear wrt ) 10 0 Sounds 1,3,5,9,10, Sounds: 2,6,7,13,14 (i) 1 s i (), 1 mean i (P c ), emin (i) 1 s (), 1 mean (P ), emin i i c Sounds: 4,8, Sounds: 15,16 (i) 1 s i (), 1 mean i (P c ), emin (i) 1 s (), 1 mean (P ), emin i i c

16 Allen/IPAM February 1, 2005 p. 16/3 /ma/ and /na/ vs. The multichannel model is valid for the nasals S i,j () δ i,j + ( 1) δ i,j e min (i) 1 S i,i =1 H g = 1 [bit] (i.e., a 2-group) (1 2 H g ) (e (i) min ) for > g 10 0 Symmetric: /ma/ S 15 () 10 0 Symmetric: /na/ S 16 () S 15,j () ^g 1 S 15 () S 16,j () ^g 1 S 16 ()

17 Allen/IPAM February 1, 2005 p. 17/3 Case of /pa/, /ta/, /ka/ vs. Fletcher s multichannel model is valid for i = 1, 2, 3: ( ) S i,j () δ i,j + ( 1) δ i,j (1 2 H g ) e (i,j) min for > g 10 0 S 1,j () /pa/ 10 0 S 2,j () 10 0 S 3,j () /ka/ /ta/ /ta/ Band independence holds for the [/pa/, /ta/, /ka/] group: H g = log 2 (3), g = 0.15 and e min (i,j) depends on i, j.

18 Allen/IPAM February 1, 2005 p. 18/3 Conclusions I The predicts above chance performance near -20 db HSR performance saturates near 0 db SNR No overlap in ASR vs. HSR ASR chance performance near 0 db ASR performance saturates near +20 db SNR

19 Allen/IPAM February 1, 2005 p. 19/3 Conclusions II Fletcher s theory is based on band independence e(snr) = e 1 K 1 min e 1 K 2 min e 1 K 3 min... e 1 K K min = e min (2) Recognition of MaxEnt phones satisfy 2 MaxEnt close set tests scores may be predicted by P c (, H) = 1 (1 2 H )e min The first sign of > chance performance is grouping The sound groups depend on the noise spectrum

20 Allen/IPAM February 1, 2005 p. 20/3 Conclusions III The average over the 16 Miller Nicely consonant confusions P c (i) (snr) is accurately predicted by the Articulation Index: P (i) c = P c (snr) i=1 P c (i) (snr) = 1 ( 1 1 ) e min 16 is the probability of correct identification of i th CV k log 10(1 + 4snr 2 k ) where k is the band index Chance error is given by (1 1/16) e min P c() (1 1/16) =1

21 Allen/IPAM February 1, 2005 p. 21/3 Conclusions IV The model, corrected for 1/16 chance guessing, predicts the average Miller Nicely P c quite well There are no free parameters in this model of P c (snr) For the MN experiment, the 0.5 The (snr) is not linear in snr, due to the shape of the snr(f) The individual curves remain to be analyzed and modeled (if possible) How can we explain the variance across sounds?

22 Allen/IPAM February 1, 2005 p. 22/3 Conclusions V The MN data is very close to symmetric There are a few exceptions, but even for these, the asymmetric part is very small

23 Allen/IPAM February 1, 2005 p. 23/3 Conclusions VI The off-diagonal PI() functions linearly decompose error for i th sound P c (i, ), since j P ij = 1 P e (i, ) 1 P i,i () = j i P (j i, ) In the previous figure the red curve is identically the sum of all the blue curves The green curve is the model ( P c (, H = 4) = 1 1 ) 16 e min

24 Allen/IPAM February 1, 2005 p. 24/3 MN16 vs. MN64 Results of the MN16 and the MN64 CV experiments, plotted as a function of both the SNR and the. Error (1 P c (SNR)) [%] 10 0 MN16 55 MN Wideband SNR [db] Error (1 P c (SNR)) [%] 10 0 MN16 55 MN

25 Allen/IPAM February 1, 2005 p. 25/3 Human vs. Γ-tone filters h(t, f 0 ) = t 3 e 2.22 π ERB(f 0) t cos(2πf 0 t) ERB(f 0 ) = 24.7(4.37f 0 / ) (Γ-tone filter) cilia displmt [solid] Γ tone [dashed] 10 0 Model neural tuning curves vs. Gamma tone filters Frequency [khz] High frequency slope problem Low-frequency tails are wrong Wrong bandwidth at higher frequencies Missing middle ear high-pass response

26 Allen/IPAM February 1, 2005 p. 26/3 Narayan et al. data Narayan et al (Gerbil)

27 Allen/IPAM February 1, 2005 p. 27/3 Kiang and Moxon 1979 cochlear USM Nonlinear upward spread of masking

28 Allen/IPAM February 1, 2005 p. 28/3 Neely model for Cat 1986 Active model of the cochlea and cilia, based on the resonant TM model. Cat data from Liberman and Delgutte.

29 Allen/IPAM February 1, 2005 p. 29/3 Allen model 1980 Solid: Model; dashed Cat FTC (Liberman and Delgutte) 1 2 MAGNITUDE FREQUENCY (Hz)

30 Allen/IPAM February 1, 2005 p. 30/3 Symmetric form S = (A + A t )/ /pa/ 10 0 /ta/ 10 0 /ka/ 10 0 /fa/ S r,c () /Θ/ /saw/ / / /ba/ /da/ /ga/ /va/ / / /za/ /3/ /ma/ /na/

31 Allen/IPAM February 1, 2005 p. 31/3 References Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on speech and audio, 2(4): Allen, J. B. (2004). The articulation index is a Shannon channel capacity. In Pressnitzer, D., de Cheveigné, A., McAdams, S., and Collet, L., editors, Auditory signal processing: physiology, psychoacoustics, and models, chapter Speech, pages Springer Verlag, New York, NY. Boothroyd, A. (1968). Statistical theory of the speech discrimination score. J. Acoust. Soc. Am., 43(2): Bronkhorst, A. W., Bosman, A. J., and Smoorenburg, G. F. (1993). A model for context effects in speech recognition. J. Acoust. Soc. Am., 93(1): Fletcher, H. (1929). Speech and Hearing. D. Van Nostrand Company, Inc., New York. Miller, G. A. and Nicely, P. E. (1955). An analysis of perceptual confsions among some english consonants. J. Acoust. Soc. Am., 27(2):

32 Allen/IPAM February 1, 2005 p. 32/3 title Latest from Sandeep 100 Fraction of utterances above the Error Threshold Error Threshold [%] Fraction of Utterances [%]

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

arxiv:math/ v1 [math.na] 12 Feb 2005

arxiv:math/ v1 [math.na] 12 Feb 2005 arxiv:math/0502252v1 [math.na] 12 Feb 2005 An Orthogonal Discrete Auditory Transform Jack Xin and Yingyong Qi Abstract An orthogonal discrete auditory transform (ODAT) from sound signal to spectrum is

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Acoustic Quantities. LMS Test.Lab. Rev 12A

Acoustic Quantities. LMS Test.Lab. Rev 12A Acoustic Quantities LMS Test.Lab Rev 12A Copyright LMS International 2012 Table of Contents Chapter 1 Acoustic quantities... 5 Section 1.1 Sound power (P)... 5 Section 1.2 Sound pressure... 5 Section

More information

Loudness and the JND

Loudness and the JND Allen December 27, 2004 p. 1/1 Loudness and the JND An introduction to loudness Psychophysics Jont Allen ECE-437 Allen December 27, 2004 p. 2/1 The intensity JND is internal uncertainty Perception is stochastic:

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder

A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder A Nonlinear Psychoacoustic Model Applied to the ISO MPEG Layer 3 Coder Frank Baumgarte, Charalampos Ferekidis, Hendrik Fuchs Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Time-Series Analysis for Ear-Related and Psychoacoustic Metrics

Time-Series Analysis for Ear-Related and Psychoacoustic Metrics Time-Series Analysis for Ear-Related and Psychoacoustic Metrics V. Mellert, H. Remmers, R. Weber, B. Schulte-Fortkamp how to analyse p(t) to obtain an earrelated parameter? general remarks on acoustical

More information

Psychophysical models of masking for coding applications

Psychophysical models of masking for coding applications Psychophysical models of masking for coding applications Jont B. Allen Room E161 AT&T Labs-Research 180 Park AV Florham Park NJ 07932 973/360-8545voice, x8092fax http://www.research.att.com/info/jba February

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31 Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31 Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Frog Sound Identification System for Frog Species Recognition

Frog Sound Identification System for Frog Species Recognition Frog Sound Identification System for Frog Species Recognition Clifford Loh Ting Yuan and Dzati Athiar Ramli Intelligent Biometric Research Group (IBG), School of Electrical and Electronic Engineering,

More information

Lecture 5: Jan 19, 2005

Lecture 5: Jan 19, 2005 EE516 Computer Speech Processing Winter 2005 Lecture 5: Jan 19, 2005 Lecturer: Prof: J. Bilmes University of Washington Dept. of Electrical Engineering Scribe: Kevin Duh 5.1

More information

Response-Field Dynamics in the Auditory Pathway

Response-Field Dynamics in the Auditory Pathway Response-Field Dynamics in the Auditory Pathway Didier Depireux Powen Ru Shihab Shamma Jonathan Simon Work supported by grants from the Office of Naval Research, a training grant from the National Institute

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Wavelet Transform in Speech Segmentation

Wavelet Transform in Speech Segmentation Wavelet Transform in Speech Segmentation M. Ziółko, 1 J. Gałka 1 and T. Drwięga 2 1 Department of Electronics, AGH University of Science and Technology, Kraków, Poland, ziolko@agh.edu.pl, jgalka@agh.edu.pl

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Stress detection through emotional speech analysis

Stress detection through emotional speech analysis Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or

More information

'L. E. Dickson, Introduction to the Theory of Numbers, Chap. V (1929).

'L. E. Dickson, Introduction to the Theory of Numbers, Chap. V (1929). VOL. 23, 1937 PSYCHOLOG Y: LEWIS A ND LARSEN 415 THEOREM 2. If the discriminant contains as a factor the square of any odd prime, there is more than a single class of forms in each genus except for the

More information

THE OUTER HAIR CELL MOTILITY MUST BE BASED ON A TONIC CHANGE IN THE MEMBRANE VOLTAGE: IMPLICATIONS FOR THE COCHLEAR AMPLIFIER

THE OUTER HAIR CELL MOTILITY MUST BE BASED ON A TONIC CHANGE IN THE MEMBRANE VOLTAGE: IMPLICATIONS FOR THE COCHLEAR AMPLIFIER THE OUTER HAIR CELL MOTILITY MUST BE BASED ON A TONIC CHANGE IN THE MEMBRANE VOLTAGE: IMPLICATIONS FOR THE COCHLEAR AMPLIFIER Jont B. Allen Mountainside, NJ and Paul Fahey Scranton Pa Dec. 4, 2002 ASA

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Introduction to Acoustics Exercises

Introduction to Acoustics Exercises . 361-1-3291 Introduction to Acoustics Exercises 1 Fundamentals of acoustics 1. Show the effect of temperature on acoustic pressure. Hint: use the equation of state and the equation of state at equilibrium.

More information

Last time: small acoustics

Last time: small acoustics Last time: small acoustics Voice, many instruments, modeled by tubes Traveling waves in both directions yield standing waves Standing waves correspond to resonances Variations from the idealization give

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

arxiv:math/ v1 [math.na] 7 Mar 2006

arxiv:math/ v1 [math.na] 7 Mar 2006 arxiv:math/0603174v1 [math.na] 7 Mar 2006 A Many to One Discrete Auditory Transform Jack Xin and Yingyong Qi Abstract A many to one discrete auditory transform is presented to map a sound signal to a perceptually

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Lecture 18: Gaussian Channel

Lecture 18: Gaussian Channel Lecture 18: Gaussian Channel Gaussian channel Gaussian channel capacity Dr. Yao Xie, ECE587, Information Theory, Duke University Mona Lisa in AWGN Mona Lisa Noisy Mona Lisa 100 100 200 200 300 300 400

More information

1 Introduction. 2 Data Set and Linear Spectral Analysis

1 Introduction. 2 Data Set and Linear Spectral Analysis Analogical model for self-sustained sounds generated by organ pipe E. DE LAURO, S. DE MARTINO, M. FALANGA Department of Physics Salerno University Via S. Allende, I-848, Baronissi (SA) ITALY Abstract:

More information

Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex

Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex D. J. Klein S. A. Shamma J. Z. Simon D. A. Depireux,2,2 2 Department of Electrical Engineering Supported in part

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

COMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018

COMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018 COMP 546 Lecture 21 Cochlea to brain, Source Localization Tues. April 3, 2018 1 Ear pinna auditory canal cochlea outer middle inner 2 Eye Ear Lens? Retina? Photoreceptors (light -> chemical) Ganglion cells

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,

More information

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition

Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information

ALL-POLE MODELS OF AUDITORY FILTERING. R.F. LYON Apple Computer, Inc., One Infinite Loop Cupertino, CA USA

ALL-POLE MODELS OF AUDITORY FILTERING. R.F. LYON Apple Computer, Inc., One Infinite Loop Cupertino, CA USA ALL-POLE MODELS OF AUDITORY FILTERING R.F. LYON Apple Computer, Inc., One Infinite Loop Cupertino, CA 94022 USA lyon@apple.com The all-pole gammatone filter (), which we derive by discarding the zeros

More information

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis EIGENFILTERS FOR SIGNAL CANCELLATION Sunil Bharitkar and Chris Kyriakakis Immersive Audio Laboratory University of Southern California Los Angeles. CA 9. USA Phone:+1-13-7- Fax:+1-13-7-51, Email:ckyriak@imsc.edu.edu,bharitka@sipi.usc.edu

More information

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Enhancement of Noisy Speech. State-of-the-Art and Perspectives Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction

More information

Quantifying the Information in Auditory-Nerve Responses for Level Discrimination

Quantifying the Information in Auditory-Nerve Responses for Level Discrimination JARO 04: 294 311 (2003) DOI: 10.1007/s10162-002-1090-6 JARO Journal of the Association for Research in Otolaryngology Quantifying the Information in Auditory-Nerve Responses for Level Discrimination H.

More information

Exploring Non-linear Transformations for an Entropybased Voice Activity Detector

Exploring Non-linear Transformations for an Entropybased Voice Activity Detector Exploring Non-linear Transformations for an Entropybased Voice Activity Detector Jordi Solé-Casals, Pere Martí-Puig, Ramon Reig-Bolaño Digital Technologies Group, University of Vic, Sagrada Família 7,

More information

Data Analysis for an Absolute Identification Experiment. Randomization with Replacement. Randomization without Replacement

Data Analysis for an Absolute Identification Experiment. Randomization with Replacement. Randomization without Replacement Data Analysis for an Absolute Identification Experiment 1 Randomization with Replacement Imagine that you have k containers for the k stimulus alternatives The i th container has a fixed number of copies

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

Signal Modeling Techniques In Speech Recognition

Signal Modeling Techniques In Speech Recognition Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,

More information

A comparative study of time-delay estimation techniques for convolutive speech mixtures

A comparative study of time-delay estimation techniques for convolutive speech mixtures A comparative study of time-delay estimation techniques for convolutive speech mixtures COSME LLERENA AGUILAR University of Alcala Signal Theory and Communications 28805 Alcalá de Henares SPAIN cosme.llerena@uah.es

More information

Signals, Instruments, and Systems W5. Introduction to Signal Processing Sampling, Reconstruction, and Filters

Signals, Instruments, and Systems W5. Introduction to Signal Processing Sampling, Reconstruction, and Filters Signals, Instruments, and Systems W5 Introduction to Signal Processing Sampling, Reconstruction, and Filters Acknowledgments Recapitulation of Key Concepts from the Last Lecture Dirac delta function (

More information

CSC321 Lecture 7 Neural language models

CSC321 Lecture 7 Neural language models CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We

More information

Autoregressive Neural Models for Statistical Parametric Speech Synthesis

Autoregressive Neural Models for Statistical Parametric Speech Synthesis Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn

More information

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics.

COMP Signals and Systems. Dr Chris Bleakley. UCD School of Computer Science and Informatics. COMP 40420 2. Signals and Systems Dr Chris Bleakley UCD School of Computer Science and Informatics. Scoil na Ríomheolaíochta agus an Faisnéisíochta UCD. Introduction 1. Signals 2. Systems 3. System response

More information

Topic 6. Timbre Representations

Topic 6. Timbre Representations Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?

More information

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

David Weenink. First semester 2007

David Weenink. First semester 2007 Institute of Phonetic Sciences University of Amsterdam First semester 2007 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale

More information

A Probability Model for Interaural Phase Difference

A Probability Model for Interaural Phase Difference A Probability Model for Interaural Phase Difference Michael I. Mandel, Daniel P.W. Ellis Department of Electrical Engineering Columbia University, New York, New York {mim,dpwe}@ee.columbia.edu Abstract

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO*

LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO* LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO* Item Type text; Proceedings Authors Viswanathan, R.; S.C. Gupta Publisher International Foundation for Telemetering Journal International Telemetering Conference

More information

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments Michael I. Mandel, Daniel P. W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University New York, NY {mim,dpwe}@ee.columbia.edu

More information

Robust Sound Event Detection in Continuous Audio Environments

Robust Sound Event Detection in Continuous Audio Environments Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

A perception- and PDE-based nonlinear transformation for processing spoken words

A perception- and PDE-based nonlinear transformation for processing spoken words Physica D 149 (21) 143 16 A perception- and PDE-based nonlinear transformation for processing spoken words Yingyong Qi a, Jack Xin b, a Department of Electrical and Computer Engineering, University of

More information

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Probabilistic Inference of Speech Signals from Phaseless Spectrograms Probabilistic Inference of Speech Signals from Phaseless Spectrograms Kannan Achan, Sam T. Roweis, Brendan J. Frey Machine Learning Group University of Toronto Abstract Many techniques for complex speech

More information

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux

More information

PDF estimation by use of characteristic functions and the FFT

PDF estimation by use of characteristic functions and the FFT Allen/DSP Seminar January 27, 2005 p. 1/1 PDF estimation by use of characteristic functions and the FFT Jont B. Allen Univ. of IL, Beckman Inst., Urbana IL Allen/DSP Seminar January 27, 2005 p. 2/1 iven

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU

Audio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU Audio Coding P.1 Fundamentals Quantization Waveform Coding Subband Coding 1. Fundamentals P.2 Introduction Data Redundancy Coding Redundancy Spatial/Temporal Redundancy Perceptual Redundancy Compression

More information

Time Varying Loudness as a Means of Quantifying Loudness in Nonlinearly Propagated Acoustical Signals. S. Hales Swift. Kent L. Gee, faculty advisor

Time Varying Loudness as a Means of Quantifying Loudness in Nonlinearly Propagated Acoustical Signals. S. Hales Swift. Kent L. Gee, faculty advisor Time Varying Loudness as a Means of Quantifying Loudness in Nonlinearly Propagated Acoustical Signals by S. Hales Swift Kent L. Gee, faculty advisor A capstone project report submitted to the faculty of

More information

Gaussian Mixture Model Based Coding of Speech and Audio

Gaussian Mixture Model Based Coding of Speech and Audio Gaussian Mixture Model Based Coding of Speech and Audio Sam Vakil Department of Electrical & Computer Engineering McGill University Montreal, Canada October 2004 A thesis submitted to McGill University

More information

Text Independent Speaker Identification Using Imfcc Integrated With Ica

Text Independent Speaker Identification Using Imfcc Integrated With Ica IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 5 (Sep. - Oct. 2013), PP 22-27 ext Independent Speaker Identification Using Imfcc

More information

New Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution

New Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 10, NO 5, JULY 2002 257 New Insights Into the Stereophonic Acoustic Echo Cancellation Problem and an Adaptive Nonlinearity Solution Tomas Gänsler,

More information

Speech Spectra and Spectrograms

Speech Spectra and Spectrograms ACOUSTICS TOPICS ACOUSTICS SOFTWARE SPH301 SLP801 RESOURCE INDEX HELP PAGES Back to Main "Speech Spectra and Spectrograms" Page Speech Spectra and Spectrograms Robert Mannell 6. Some consonant spectra

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

MDS codec evaluation based on perceptual sound attributes

MDS codec evaluation based on perceptual sound attributes MDS codec evaluation based on perceptual sound attributes Marcelo Herrera Martínez * Edwar Jacinto Gómez ** Edilberto Carlos Vivas G. *** submitted date: March 03 received date: April 03 accepted date:

More information

200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs.

200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs. Overview Acoustics of Speech and Hearing Lecture 1-2 How is sound pressure and loudness related? How can we measure the size (quantity) of a sound? The scale Logarithmic scales in general Decibel scales

More information

Characterization of phonemes by means of correlation dimension

Characterization of phonemes by means of correlation dimension Characterization of phonemes by means of correlation dimension PACS REFERENCE: 43.25.TS (nonlinear acoustical and dinamical systems) Martínez, F.; Guillamón, A.; Alcaraz, J.C. Departamento de Matemática

More information

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki

More information

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation

More information

Acoustics 08 Paris 6013

Acoustics 08 Paris 6013 Resolution, spectral weighting, and integration of information across tonotopically remote cochlear regions: hearing-sensitivity, sensation level, and training effects B. Espinoza-Varas CommunicationSciences

More information

R E S E A R C H R E P O R T Entropy-based multi-stream combination Hemant Misra a Hervé Bourlard a b Vivek Tyagi a IDIAP RR 02-24 IDIAP Dalle Molle Institute for Perceptual Artificial Intelligence ffl

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Short-Time ICA for Blind Separation of Noisy Speech

Short-Time ICA for Blind Separation of Noisy Speech Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk

More information

Chirp Transform for FFT

Chirp Transform for FFT Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information