Improved noise power spectral density tracking by a MAP-based postprocessor
|
|
- Nigel Jacobs
- 5 years ago
- Views:
Transcription
1 Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach
2 Table of Contents 1 Introduction and problem formulation MAP-based noise PSD estimation 3 Experimental framework 4 Performance evaluation 5 Summary and outlook A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13
3 Introduction Motivation Noise PSD estimation is a key component to speech enhancement y t to robust automatic speech recognition Basic assumptions Many sophisticated algorithms rely on two assumptions: Noise more stationary than speech Noise-only time-frequency bins at regular intervals Here MAP-based () postprocessor Estimate of noise power even if speech is dominant in time-frequency bin Initial estimate of the current speech power required A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach / 13
4 as a postprocessor x t - clean speech signal ˆx (I) t ˆx (II) t n t - noise signal y t STDFT ISTDFT ISTDFT Y k,l frequency bin index frame index ˆX (I) k,l ˆX (II) k,l Noise σ N,k,l a priori ˆσ X,k,l estimation SNR first speech enhancement stage Gain Gain ˆσ N,k,l a priori SNR second speech enhancement stage A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 3 / 13
5 Problem formulation Observed Y l = N l + X l We consider N l as target process corrupted by clean speech X l of known variance σ X,l Our goal Estimation of noise variance σn,l+1 at frame l+1, from the noisy observation y l+1 given the current speech power σx,l+1 a priori PDF p σ (σ ) of variance σ N N,l Approach Maximum A Posteriori (MAP) estimation A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 4 / 13
6 MAP-based noise PSD estimation Bayesian variance estimation: Y l = N l If σx,l+1 = 0 (uncorrupted observ.) and σ N,l+1 = σ N (stationary) then the textbook problem: Scaled inverse chi-square distribution p σ (σ ) (σ ) ν l + e ν l λ l σ N with the degrees of freedom ν l and the scale factor λ l is conjugate prior to normal observation PDF p Yl+1 σ N (y l+1 σ ) = 1 y l+1 πσ e σ Parameter update ν l+1 = ν l + and λ l+1 = ν l + y l+1 + ν l ν l + λ l MAP-estimate of variance [ ] ˆσ N,l+1 = argmax p σ σ N Y l+1 (σ y l+1 ) = ν l+1 ν l+1 + λ l+1 A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 5 / 13
7 Extension to non-stationary noise Still uncorrupted noise: Y l = N l If σn,l+1 is time-variant then: The parameter ν l is kept at a constant value ν l+1 = ν l = ν 0 This results in recursive smoothing of variance estimate ˆσ N,l+1 = (1 α) ˆσ N,l +α y l+1, where α = ν 0 +4 Choice of ν 0 Trade-off between tracking ability and estimation error in stationary noise A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 6 / 13
8 General case Observation corrupted by speech: Y l = N l + X l If σx,l+1 0 and is known then the posterior PDF ( p σ N Y l+1 (σ y l+1 ) (σx,l+1 +σ ) 1 (σ ) ν l + e is no longer a conjugate prior for the observation PDF. y l+1 σ X,l+1 +σ + ν l λ l σ ), In order to maintain an efficient MAP estimation procedure we approximate the posterior PDF by a scaled inverse chi-squared distribution, and match its maximum (ˆσ N,l+1 ) with the maximum of the posterior PDF, which we calculate efficiently using a bisection and Newton approach. A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 7 / 13
9 Experimental framework as postprocessor First noise PSD estimator: Improved Minima Controlled Recursive Averaging () algorithm [Cohen, 003] Gain function: Optimally-Modified Log-Spectral Amplitude (OM-LSA) estimator [Cohen, Berdugo, 001] Y k,l+1 ˆX (I) k,l+1 ˆX (II) k,l+1 σ N,k,l+1 ˆσ a priori X,k,l+1 SNR first speech enhancement stage OM-LSA OM-LSA ˆσ N,k,l+1 a priori SNR second speech enhancement stage A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 8 / 13
10 Performance evaluation Setup Clean speech: TIMIT database, sentences concatenated to 3 minutes length Artificially added noise from Noisex9 database: Noise types: Stationary WGN, Triangular WGN, Babble and Factory-1 SNR values: 5,0,5,10,15dB estimator: we set ν 0 = 40 corresponding to a time constant of 0.164s Reference noise PSD Recursive temporal smoothing σ N,k,l = 0.95 σ N,k,l N k,l, with known noise periodogram N k,l A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 9 / 13
11 Sample trajectories of noise variance estimates Babbble noise at frequency bin k = 97 (3kHz) Triangular WGN (averaged over all frequency bins) 6 68 [db] [db] reference Time [s] : continious update of noise variance estimate Time [s] : faster response to rising noise power A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 10 / 13
12 Quantitative evaluation Performance measures adopted from [Taghia et al., 011] Fig.(a): minimum averaged log distance LE m between the reference and estimated PSD obtains lower error LE m for all noise types and SNRs less than or equal to 5dB or 10dB Fig.(b): variance of the logarithmic difference LE v yields lower variance LE v for all noise types and SNRs than the LEm LEv dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN (a) (b) A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 11 / 13
13 Speech enhancement Gain in perceptual speech quality (PESQ) scores PESQ Gain = PESQ PESQ PESQGain 0.1-5dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN 0.0 Female Male Female Male Female Male Female Male has a favourable effect on speech quality for non-stationary noise types A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13
14 Summary and outlook Summary Proposed estimator is able to track the noise statistics even if the speech is dominant Low computational complexity Single parameter ν 0 Experimental evaluation: obtains lower estimation error under low SNR conditions lower fluctuation of the estimated values under all tested environments slightly improved speech quality for non-stationary noise types Outlook Investigations about dependence of algorithm performance: on the first noise PSD estimator on the degrees of freedom ν 0 A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
15 Thank you for your attention! Questions? Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach
16 Periodograms of cleen, noisy and enhanced signals File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_or57.wav Page: 1 of 1 Printed: Mo Mär 19 18:44:19 khz Periodogramms of the spoken sentence Biblical scholars argue history for factory-1 noise at an SNR of 5 db (a) cleen speech signal (b) noisy speech signal (c) enhanced speech signal based on estimates (d) enhanced speech signal based on estimates 1 time (a) (b) File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_noisy57.wav Page: 1 of Printed: Mo Mär 19 18:46:06 khz time File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_imcra57.wav Page: 1 of Printed: Mo Mär 19 18:47:00 khz has a positive influence on periodogramm of enhanced signal particularly clearly seen in the highlighted window 1 time (c) File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_mapb57.wav Page: 1 of 1 Printed: Mo Mär 19 18:47:36 khz time (d) A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
17 Minimum Statistics (MS) instead of Performance measures LE m and LE v for female speaker signals -5dB 0dB 5dB 10dB 15dB LEm Stationary Triangular Babble Factory-1 WGN WGN (a) Fig.(a): obtains lower estimation error LE m than the MS for all noise types and SNRs LEv 4 Fig.(b): yields lower variance LE v than the MS for all tested setups 0 MS MS (b) MS MS A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
18 Minimum Statistics (MS) instead of PESQ Gain = PESQ PESQ NoiseEst with NoiseEst [ ; MS ] for female speaker signals PESQGain 0.1-5dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN 0.0 MS MS MS MS A favourable effect of estimator on speech quality for non-stationary noise types is for MS smaller than for A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13
A Priori SNR Estimation Using Weibull Mixture Model
A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University
More informationNoise-Presence-Probability-Based Noise PSD Estimation by Using DNNs
Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach Department of Communications
More informationA Priori SNR Estimation Using a Generalized Decision Directed Approach
A Priori SNR Estimation Using a Generalized Decision Directed Approach Aleksej Chinaev, Reinhold Haeb-Umbach Department of Communications Engineering, Paderborn University, 3398 Paderborn, Germany {chinaev,haeb}@nt.uni-paderborn.de
More informationDigital Signal Processing
Digital Signal Processing 0 (010) 157 1578 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Improved minima controlled recursive averaging technique using
More informationOptimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator
1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il
More informationImproved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR
Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper
More informationNon-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics
Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud To cite this version: Xiaofei Li, Laurent Girin, Sharon Gannot,
More informationA POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL
A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig
More informationNOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group
NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION M. Schwab, P. Noll, and T. Sikora Technical University Berlin, Germany Communication System Group Einsteinufer 17, 1557 Berlin (Germany) {schwab noll
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationNew Statistical Model for the Enhancement of Noisy Speech
New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem
More informationBIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann
BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION Robert Rehr, Timo Gerkmann Speech Signal Processing Group, Department of Medical Physics and Acoustics
More informationA SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN Yu ang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London, UK Email: {yw09,
More informationSPEECH enhancement algorithms are often used in communication
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 25, NO. 2, FEBRUARY 2017 397 An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation Robert Rehr, Student
More informationSPEECH enhancement has been studied extensively as a
JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2017 1 Phase-Aware Speech Enhancement Based on Deep Neural Networks Naijun Zheng and Xiao-Lei Zhang Abstract Short-time frequency transform STFT)
More informationA Priori SNR Estimation Using Weibull Mixture Model. Abstract. 2 MAP estimation of the a priori SNR based on Weibull mixture models.
A Priori SNR Estimation Using Weibull ixture odel Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering, Paderborn University, 33100 Paderborn, Germany Email:
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationSINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS Hajar Momeni,2,,
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationA SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL
A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL Balázs Fodor Institute for Communications Technology Technische Universität Braunschweig 386 Braunschweig,
More informationSpectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors
IEEE SIGNAL PROCESSING LETTERS 1 Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors Nasser Mohammadiha, Student Member, IEEE, Rainer Martin, Fellow, IEEE, and Arne Leijon,
More informationMAP-BASED ESTIMATION OF THE PARAMETERS OF A GAUSSIAN MIXTURE MODEL IN THE PRESENCE OF NOISY OBSERVATIONS. Aleksej Chinaev, Reinhold Haeb-Umbach
MAP-BASED ESTIMATION OF THE PARAMETERS OF A GAUSSIAN MIXTURE MODEL IN THE PRESENCE OF NOISY OBSERVATIONS Alesej Chinaev, Reinhold Haeb-Umbach Department of Communications Engineering, University of Paderborn,
More informationAnalysis of audio intercepts: Can we identify and locate the speaker?
Motivation Analysis of audio intercepts: Can we identify and locate the speaker? K V Vijay Girish, PhD Student Research Advisor: Prof A G Ramakrishnan Research Collaborator: Dr T V Ananthapadmanabha Medical
More informationModeling speech signals in the time frequency domain using GARCH
Signal Processing () 53 59 Fast communication Modeling speech signals in the time frequency domain using GARCH Israel Cohen Department of Electrical Engineering, Technion Israel Institute of Technology,
More informationESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION
ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION Xiaofei Li 1, Laurent Girin 1,, Radu Horaud 1 1 INRIA Grenoble
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationJoint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data
Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan Ming Hsieh Department of Electrical Engineering University
More informationSupervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization Nasser Mohammadiha*, Student Member, IEEE, Paris Smaragdis,
More informationSIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS
TO APPEAR IN IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22 25, 23, UK SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS Nasser Mohammadiha
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationEnhancement of Noisy Speech. State-of-the-Art and Perspectives
Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationMANY digital speech communication applications, e.g.,
406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.
More informationMULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh
MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationMinimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach
Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach Jesper Jensen Abstract In this work we consider the problem of feature enhancement for noise-robust
More informationA Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction
A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofiène Affes Abstract We propose a second-order-statistics-based
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationA priori SNR estimation and noise estimation for speech enhancement
Yao et al. EURASIP Journal on Advances in Signal Processing (2016) 2016:101 DOI 10.1186/s13634-016-0398-z EURASIP Journal on Advances in Signal Processing RESEARCH A priori SNR estimation and noise estimation
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More informationApplication of the Tuned Kalman Filter in Speech Enhancement
Application of the Tuned Kalman Filter in Speech Enhancement Orchisama Das, Bhaswati Goswami and Ratna Ghosh Department of Instrumentation and Electronics Engineering Jadavpur University Kolkata, India
More informationSNR Features for Automatic Speech Recognition
SNR Features for Automatic Speech Recognition Philip N. Garner Idiap Research Institute Martigny, Switzerland pgarner@idiap.ch Abstract When combined with cepstral normalisation techniques, the features
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationREAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540
REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com
More informationHumans do not maximize the probability of correct decision when recognizing DANTALE words in noise
INTERSPEECH 207 August 20 24, 207, Stockholm, Sweden Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen
More informationA State-Space Approach to Dynamic Nonnegative Matrix Factorization
1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization
More informationSystem Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain
System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline
More informationNoise Reduction. Two Stage Mel-Warped Weiner Filter Approach
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationProbabilistic Inference of Speech Signals from Phaseless Spectrograms
Probabilistic Inference of Speech Signals from Phaseless Spectrograms Kannan Achan, Sam T. Roweis, Brendan J. Frey Machine Learning Group University of Toronto Abstract Many techniques for complex speech
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,
ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT Ashutosh Pandey and Deliang Wang,2 Department of Computer Science and Engineering, The Ohio State University, USA 2 Center for Cognitive
More informationR E S E A R C H R E P O R T Entropy-based multi-stream combination Hemant Misra a Hervé Bourlard a b Vivek Tyagi a IDIAP RR 02-24 IDIAP Dalle Molle Institute for Perceptual Artificial Intelligence ffl
More informationTHE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague
THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract
More informationEUSIPCO
EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of
More informationOn Optimal Smoothing in Minimum Statistics Based Noise Tracking
INTERSPEECH 25 On Optima Smoothing in Minimum Statistics Based Noise Tracking Aeksej Chinaev, Reinhod Haeb-Umbach Department of Communications Engineering, University of Paderborn, 3398 Paderborn, Germany
More informationENGINEERING TRIPOS PART IIB: Technical Milestone Report
ENGINEERING TRIPOS PART IIB: Technical Milestone Report Statistical enhancement of multichannel audio from transcription turntables Yinhong Liu Supervisor: Prof. Simon Godsill 1 Abstract This milestone
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationSPEECH signals captured using a distant microphone within
572 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 25, NO. 3, MARCH 2017 Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise Clement S. J. Doire, Student
More informationNOISE reduction is an important fundamental signal
1526 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Non-Causal Time-Domain Filters for Single-Channel Noise Reduction Jesper Rindom Jensen, Student Member, IEEE,
More informationA SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS
Proc. of the 1 th Int. Conference on Digital Audio Effects (DAFx-9), Como, Italy, September 1-4, 9 A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS Matteo
More informationSpectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates
Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates Dima Ruinskiy Niv Dadush Yizhar Lavner Department of Computer Science, Tel-Hai College, Israel Outline Phoneme
More informationSoft-Output Trellis Waveform Coding
Soft-Output Trellis Waveform Coding Tariq Haddad and Abbas Yongaçoḡlu School of Information Technology and Engineering, University of Ottawa Ottawa, Ontario, K1N 6N5, Canada Fax: +1 (613) 562 5175 thaddad@site.uottawa.ca
More informationFull-covariance model compensation for
compensation transms Presentation Toshiba, 12 Mar 2008 Outline compensation transms compensation transms Outline compensation transms compensation transms Noise model x clean speech; n additive ; h convolutional
More informationSource localization and separation for binaural hearing aids
Source localization and separation for binaural hearing aids Mehdi Zohourian, Gerald Enzner, Rainer Martin Listen Workshop, July 218 Institute of Communication Acoustics Outline 1 Introduction 2 Binaural
More information502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016
502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016 Discriminative Training of NMF Model Based on Class Probabilities for Speech Enhancement Hanwook Chung, Student Member, IEEE, Eric Plourde,
More informationOverview of Single Channel Noise Suppression Algorithms
Overview of Single Channel Noise Suppression Algorithms Matías Zañartu Salas Post Doctoral Research Associate, Purdue University mzanartu@purdue.edu October 4, 2010 General notes Only single-channel speech
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationMAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT. Ulrik Kjems and Jesper Jensen
20th European Signal Processing Conference (EUSIPCO 202) Bucharest, Romania, August 27-3, 202 MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT Ulrik Kjems
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationNoise Classification based on PCA. Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Vataya Boonpiam
Noise Classification based on PCA Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Vataya Boonpiam 1 Outline Introduction Principle component analysis (PCA) Classification using PCA Experiment
More informationShort-Time ICA for Blind Separation of Noisy Speech
Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk
More informationDespite decades of focused research on the problem, the accuracy of automatic speech recognition. Missing-Feature Approaches in Speech Recognition
[ Bhiksha Raj and Richard M. Stern ] [Improving recognition accuracy in noise by using partial spectrographic information] Missing-Feature Approaches in Speech Recognition ARTVILLE & COMSTOCK Despite decades
More informationCURRENT state-of-the-art automatic speech recognition
1850 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 6, AUGUST 2007 Switching Linear Dynamical Systems for Noise Robust Speech Recognition Bertrand Mesot and David Barber Abstract
More information1584 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
1584 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011 Transient Noise Reduction Using Nonlocal Diffusion Filters Ronen Talmon, Student Member, IEEE, Israel Cohen,
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationRelative Transfer Function Identification Using Convolutive Transfer Function Approximation
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. XX, NO. Y, MONTH 28 1 Relative Transfer Function Identification Using Convolutive Transfer Function Approximation Ronen Talmon, Israel Cohen,
More informationVoice activity detection based on conjugate subspace matching pursuit and likelihood ratio test
Deng and Han EURASIP Journal on Audio, Speech, and Music Processing, : http://asmp.eurasipjournals.com/content/// RESEARCH Open Access Voice activity detection based on conjugate subspace matching pursuit
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationRAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS
RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS Frédéric Mustière e-mail: mustiere@site.uottawa.ca Miodrag Bolić e-mail: mbolic@site.uottawa.ca Martin Bouchard e-mail: bouchard@site.uottawa.ca
More informationInvestigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition
Investigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition Bertrand Mesot IDIAP Research Institute P.O. Box 59 CH-9 Martigny Switzerland bertrand.mesot@idiap.ch
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationIMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES
IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES Andreas I. Koutrouvelis Richard C. Hendriks Jesper Jensen Richard Heusdens Circuits and Systems (CAS) Group, Delft University of Technology,
More informationDETECTION theory deals primarily with techniques for
ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for
More informationSpeech Enhancement with Applications in Speech Recognition
Speech Enhancement with Applications in Speech Recognition A First Year Report Submitted to the School of Computer Engineering of the Nanyang Technological University by Xiao Xiong for the Confirmation
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationEfficient Target Activity Detection Based on Recurrent Neural Networks
Efficient Target Activity Detection Based on Recurrent Neural Networks D. Gerber, S. Meier, and W. Kellermann Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) Motivation Target φ tar oise 1 / 15
More informationAn Investigation of Spectral Subband Centroids for Speaker Authentication
R E S E A R C H R E P O R T I D I A P An Investigation of Spectral Subband Centroids for Speaker Authentication Norman Poh Hoon Thian a Conrad Sanderson a Samy Bengio a IDIAP RR 3-62 December 3 published
More informationApproximate Bayesian Inference for Robust Speech Processing. A Thesis. Submitted to the Faculty. Drexel University. Ciira wa Maina
Approximate Bayesian Inference for Robust Speech Processing A Thesis Submitted to the Faculty of Drexel University by Ciira wa Maina in partial fulfillment of the requirements for the degree of Doctor
More informationMarkov-Switching GARCH Models and Applications to Digital Processing of Speech Signals
Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen Outline
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationEstimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition
Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition Seema Sud 1 1 The Aerospace Corporation, 4851 Stonecroft Blvd. Chantilly, VA 20151 Abstract
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More information