Improved noise power spectral density tracking by a MAP-based postprocessor

Similar documents
A Priori SNR Estimation Using Weibull Mixture Model

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs

A Priori SNR Estimation Using a Generalized Decision Directed Approach

Digital Signal Processing

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Modifying Voice Activity Detection in Low SNR by correction factors

New Statistical Model for the Enhancement of Noisy Speech

BIAS CORRECTION METHODS FOR ADAPTIVE RECURSIVE SMOOTHING WITH APPLICATIONS IN NOISE PSD ESTIMATION. Robert Rehr, Timo Gerkmann

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH enhancement algorithms are often used in communication

SPEECH enhancement has been studied extensively as a

A Priori SNR Estimation Using Weibull Mixture Model. Abstract. 2 MAP estimation of the a priori SNR based on Weibull mixture models.

2D Spectrogram Filter for Single Channel Speech Enhancement

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

Proc. of NCC 2010, Chennai, India

A SPEECH PRESENCE PROBABILITY ESTIMATOR BASED ON FIXED PRIORS AND A HEAVY-TAILED SPEECH MODEL

Spectral Domain Speech Enhancement using HMM State-Dependent Super-Gaussian Priors

MAP-BASED ESTIMATION OF THE PARAMETERS OF A GAUSSIAN MIXTURE MODEL IN THE PRESENCE OF NOISY OBSERVATIONS. Aleksej Chinaev, Reinhold Haeb-Umbach

Analysis of audio intercepts: Can we identify and locate the speaker?

Modeling speech signals in the time frequency domain using GARCH

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

CEPSTRAL analysis has been widely used in signal processing

Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

SIMULTANEOUS NOISE CLASSIFICATION AND REDUCTION USING A PRIORI LEARNED MODELS

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

MANY digital speech communication applications, e.g.,

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

Robust Sound Event Detection in Continuous Audio Environments

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features A Theoretically Consistent Approach

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

Speech Signal Representations

A priori SNR estimation and noise estimation for speech enhancement

Voice Activity Detection Using Pitch Feature

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

Application of the Tuned Kalman Filter in Speech Enhancement

SNR Features for Automatic Speech Recognition

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Independent Component Analysis and Unsupervised Learning

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

Humans do not maximize the probability of correct decision when recognizing DANTALE words in noise

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach

Feature extraction 2

9 Multi-Model State Estimation

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Exemplar-based voice conversion using non-negative spectrogram deconvolution

ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT. Ashutosh Pandey 1 and Deliang Wang 1,2. {pandey.99, wang.5664,


THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

EUSIPCO

On Optimal Smoothing in Minimum Statistics Based Noise Tracking

ENGINEERING TRIPOS PART IIB: Technical Milestone Report

Robust Speaker Identification

SPEECH signals captured using a distant microphone within

NOISE reduction is an important fundamental signal

A SPECTRAL SUBTRACTION RULE FOR REAL-TIME DSP IMPLEMENTATION OF NOISE REDUCTION IN SPEECH SIGNALS

Spectral and Textural Feature-Based System for Automatic Detection of Fricatives and Affricates

Soft-Output Trellis Waveform Coding

Full-covariance model compensation for

Source localization and separation for binaural hearing aids

502 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 4, APRIL 2016

Overview of Single Channel Noise Suppression Algorithms

Single Channel Signal Separation Using MAP-based Subspace Decomposition

MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT. Ulrik Kjems and Jesper Jensen

Automatic Speech Recognition (CS753)

Noise Classification based on PCA. Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Vataya Boonpiam

Short-Time ICA for Blind Separation of Noisy Speech

Despite decades of focused research on the problem, the accuracy of automatic speech recognition. Missing-Feature Approaches in Speech Recognition

CURRENT state-of-the-art automatic speech recognition

1584 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011

Model-based unsupervised segmentation of birdcalls from field recordings

Relative Transfer Function Identification Using Convolutive Transfer Function Approximation

Voice activity detection based on conjugate subspace matching pursuit and likelihood ratio test

Machine Recognition of Sounds in Mixtures

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

Investigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition

A Low-Cost Robust Front-end for Embedded ASR System

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

DETECTION theory deals primarily with techniques for

Speech Enhancement with Applications in Speech Recognition

Gaussian Processes for Audio Feature Extraction

Efficient Target Activity Detection Based on Recurrent Neural Networks

An Investigation of Spectral Subband Centroids for Speaker Authentication

Approximate Bayesian Inference for Robust Speech Processing. A Thesis. Submitted to the Faculty. Drexel University. Ciira wa Maina

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

Dominant Feature Vectors Based Audio Similarity Measure

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

Detection-Based Speech Recognition with Sparse Point Process Models

Transcription:

Improved noise power spectral density tracking by a MAP-based postprocessor Aleksej Chinaev, Alexander Krueger, Dang Hai Tran Vu, Reinhold Haeb-Umbach University of Paderborn, Germany March 8th, 01 Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

Table of Contents 1 Introduction and problem formulation MAP-based noise PSD estimation 3 Experimental framework 4 Performance evaluation 5 Summary and outlook A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13

Introduction Motivation Noise PSD estimation is a key component to speech enhancement y t to robust automatic speech recognition Basic assumptions Many sophisticated algorithms rely on two assumptions: Noise more stationary than speech Noise-only time-frequency bins at regular intervals Here MAP-based () postprocessor Estimate of noise power even if speech is dominant in time-frequency bin Initial estimate of the current speech power required A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach / 13

as a postprocessor x t - clean speech signal ˆx (I) t ˆx (II) t n t - noise signal y t STDFT ISTDFT ISTDFT Y k,l frequency bin index frame index ˆX (I) k,l ˆX (II) k,l Noise σ N,k,l a priori ˆσ X,k,l estimation SNR first speech enhancement stage Gain Gain ˆσ N,k,l a priori SNR second speech enhancement stage A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 3 / 13

Problem formulation Observed Y l = N l + X l We consider N l as target process corrupted by clean speech X l of known variance σ X,l Our goal Estimation of noise variance σn,l+1 at frame l+1, from the noisy observation y l+1 given the current speech power σx,l+1 a priori PDF p σ (σ ) of variance σ N N,l Approach Maximum A Posteriori (MAP) estimation A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 4 / 13

MAP-based noise PSD estimation Bayesian variance estimation: Y l = N l If σx,l+1 = 0 (uncorrupted observ.) and σ N,l+1 = σ N (stationary) then the textbook problem: Scaled inverse chi-square distribution p σ (σ ) (σ ) ν l + e ν l λ l σ N with the degrees of freedom ν l and the scale factor λ l is conjugate prior to normal observation PDF p Yl+1 σ N (y l+1 σ ) = 1 y l+1 πσ e σ Parameter update ν l+1 = ν l + and λ l+1 = ν l + y l+1 + ν l ν l + λ l MAP-estimate of variance [ ] ˆσ N,l+1 = argmax p σ σ N Y l+1 (σ y l+1 ) = ν l+1 ν l+1 + λ l+1 A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 5 / 13

Extension to non-stationary noise Still uncorrupted noise: Y l = N l If σn,l+1 is time-variant then: The parameter ν l is kept at a constant value ν l+1 = ν l = ν 0 This results in recursive smoothing of variance estimate ˆσ N,l+1 = (1 α) ˆσ N,l +α y l+1, where α = ν 0 +4 Choice of ν 0 Trade-off between tracking ability and estimation error in stationary noise A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 6 / 13

General case Observation corrupted by speech: Y l = N l + X l If σx,l+1 0 and is known then the posterior PDF ( p σ N Y l+1 (σ y l+1 ) (σx,l+1 +σ ) 1 (σ ) ν l + e is no longer a conjugate prior for the observation PDF. y l+1 σ X,l+1 +σ + ν l λ l σ ), In order to maintain an efficient MAP estimation procedure we approximate the posterior PDF by a scaled inverse chi-squared distribution, and match its maximum (ˆσ N,l+1 ) with the maximum of the posterior PDF, which we calculate efficiently using a bisection and Newton approach. A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 7 / 13

Experimental framework as postprocessor First noise PSD estimator: Improved Minima Controlled Recursive Averaging () algorithm [Cohen, 003] Gain function: Optimally-Modified Log-Spectral Amplitude (OM-LSA) estimator [Cohen, Berdugo, 001] Y k,l+1 ˆX (I) k,l+1 ˆX (II) k,l+1 σ N,k,l+1 ˆσ a priori X,k,l+1 SNR first speech enhancement stage OM-LSA OM-LSA ˆσ N,k,l+1 a priori SNR second speech enhancement stage A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 8 / 13

Performance evaluation Setup Clean speech: TIMIT database, sentences concatenated to 3 minutes length Artificially added noise from Noisex9 database: Noise types: Stationary WGN, Triangular WGN, Babble and Factory-1 SNR values: 5,0,5,10,15dB estimator: we set ν 0 = 40 corresponding to a time constant of 0.164s Reference noise PSD Recursive temporal smoothing σ N,k,l = 0.95 σ N,k,l 1 +0.05 N k,l, with known noise periodogram N k,l A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 9 / 13

Sample trajectories of noise variance estimates Babbble noise at frequency bin k = 97 (3kHz) Triangular WGN (averaged over all frequency bins) 6 68 [db] 56 50 [db] 66 64 reference 4 5 6 7 8 Time [s] : continious update of noise variance estimate 4 5 6 7 8 Time [s] : faster response to rising noise power A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 10 / 13

Quantitative evaluation Performance measures adopted from [Taghia et al., 011] Fig.(a): minimum averaged log distance LE m between the reference and estimated PSD obtains lower error LE m for all noise types and SNRs less than or equal to 5dB or 10dB Fig.(b): variance of the logarithmic difference LE v yields lower variance LE v for all noise types and SNRs than the LEm LEv 3 1 0 4 3 1 0-5dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN (a) (b) A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 11 / 13

Speech enhancement Gain in perceptual speech quality (PESQ) scores PESQ Gain = PESQ PESQ PESQGain 0.1-5dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN 0.0 Female Male Female Male Female Male Female Male has a favourable effect on speech quality for non-stationary noise types A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 1 / 13

Summary and outlook Summary Proposed estimator is able to track the noise statistics even if the speech is dominant Low computational complexity Single parameter ν 0 Experimental evaluation: obtains lower estimation error under low SNR conditions lower fluctuation of the estimated values under all tested environments slightly improved speech quality for non-stationary noise types Outlook Investigations about dependence of algorithm performance: on the first noise PSD estimator on the degrees of freedom ν 0 A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

Thank you for your attention! Questions? Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

Periodograms of cleen, noisy and enhanced signals File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_or57.wav Page: 1 of 1 Printed: Mo Mär 19 18:44:19 khz 7 6 5 4 3 Periodogramms of the spoken sentence Biblical scholars argue history for factory-1 noise at an SNR of 5 db (a) cleen speech signal (b) noisy speech signal (c) enhanced speech signal based on estimates (d) enhanced speech signal based on estimates 1 time (a) 1.0 1.1 1. (b) 1.0 1.1 1. 1.0 1.1 1. 1.0 1.1 1. 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_noisy57.wav Page: 1 of Printed: Mo Mär 19 18:46:06 khz 7 6 5 4 3 1 time 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_imcra57.wav Page: 1 of Printed: Mo Mär 19 18:47:00 khz 7 6 5 4 3 has a positive influence on periodogramm of enhanced signal particularly clearly seen in the highlighted window 1 time (c) 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 File: /home/chinaev/desktop/icassp01/matlabcode/codecohen/datasforpesq/fem/factory1and5db/s_mapb57.wav Page: 1 of 1 Printed: Mo Mär 19 18:47:36 khz 7 6 5 4 3 1 time 0.1 0. 0.3 0.4 0.5 (d) 0.6 0.7 0.8 0.9 A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

Minimum Statistics (MS) instead of Performance measures LE m and LE v for female speaker signals -5dB 0dB 5dB 10dB 15dB LEm 9 6 3 0 Stationary Triangular Babble Factory-1 WGN WGN (a) Fig.(a): obtains lower estimation error LE m than the MS for all noise types and SNRs LEv 4 Fig.(b): yields lower variance LE v than the MS for all tested setups 0 MS MS (b) MS MS A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13

Minimum Statistics (MS) instead of PESQ Gain = PESQ PESQ NoiseEst with NoiseEst [ ; MS ] for female speaker signals PESQGain 0.1-5dB 0dB 5dB 10dB 15dB Stationary Triangular Babble Factory-1 WGN WGN 0.0 MS MS MS MS A favourable effect of estimator on speech quality for non-stationary noise types is for MS smaller than for A. Chinaev, A. Krueger, D.H. Tran Vu, R. Haeb-Umbach 13 / 13