Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

Similar documents
Consistent Wiener Filtering for Audio Source Separation

Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

Scalable audio separation with light Kernel Additive Modelling

Acoustic MIMO Signal Processing

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Gaussian Processes for Audio Feature Extraction

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION

Harmonic/Percussive Separation Using Kernel Additive Modelling

Generalized Constraints for NMF with Application to Informed Source Separation

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Sparse filter models for solving permutation indeterminacy in convolutive blind source separation

A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues

SINGLE-CHANNEL SPEECH PRESENCE PROBABILITY ESTIMATION USING INTER-FRAME AND INTER-BAND CORRELATIONS

Audio Source Separation With a Single Sensor

EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Spectral masking and filtering

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

PROJET - Spatial Audio Separation Using Projections

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

A Second-Order-Statistics-based Solution for Online Multichannel Noise Tracking and Reduction

IMPROVED MULTI-MICROPHONE NOISE REDUCTION PRESERVING BINAURAL CUES

Performance Measurement in Blind Audio Source Separation

Independent Component Analysis and Unsupervised Learning

System Identification and Adaptive Filtering in the Short-Time Fourier Transform Domain

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization

1 Introduction 1 INTRODUCTION 1

AUDIO compression has been a very active field of research. Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach

Robust Adaptive Beamforming Based on Low-Complexity Shrinkage-Based Mismatch Estimation

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

New Statistical Model for the Enhancement of Noisy Speech

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Blind Machine Separation Te-Won Lee

NONNEGATIVE MATRIX FACTORIZATION WITH TRANSFORM LEARNING. Dylan Fagot, Herwig Wendt and Cédric Févotte

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Robotic Sound Source Separation using Independent Vector Analysis Martin Rothbucher, Christian Denk, Martin Reverchon, Hao Shen and Klaus Diepold

Informed Audio Source Separation: A Comparative Study

Deep NMF for Speech Separation

A Convex Model and l 1 Minimization for Musical Noise Reduction in Blind Source Separation

SUPERVISED INDEPENDENT VECTOR ANALYSIS THROUGH PILOT DEPENDENT COMPONENTS

Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

Environmental Sound Classification in Realistic Situations

Variational inference

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION

Matrix Factorization for Speech Enhancement

Blind Instantaneous Noisy Mixture Separation with Best Interference-plus-noise Rejection

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

NOISE reduction is an important fundamental signal

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Morphological Diversity and Source Separation

2D Spectrogram Filter for Single Channel Speech Enhancement

Sampling versus optimization in very high dimensional parameter spaces

Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition

Under-determined reverberant audio source separation using a full-rank spatial covariance model

Lecture Notes 5: Multiresolution Analysis

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.

Coprime Coarray Interpolation for DOA Estimation via Nuclear Norm Minimization

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

Wavelet Footprints: Theory, Algorithms, and Applications

Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

GMM-based classification from noisy features

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

EUSIPCO

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

SEPARATION OF ACOUSTIC SIGNALS USING SELF-ORGANIZING NEURAL NETWORKS. Temujin Gautama & Marc M. Van Hulle

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

A UNIFIED PRESENTATION OF BLIND SEPARATION METHODS FOR CONVOLUTIVE MIXTURES USING BLOCK-DIAGONALIZATION

Sound Recognition in Mixtures

A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY

HST.582J/6.555J/16.456J

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

Analytical solution of the blind source separation problem using derivatives

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Phase-dependent anisotropic Gaussian model for audio source separation

Estimation of the Optimum Rotational Parameter for the Fractional Fourier Transform Using Domain Decomposition

SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD

Binaural Beamforming Using Pre-Determined Relative Acoustic Transfer Functions

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

USING STATISTICAL ROOM ACOUSTICS FOR ANALYSING THE OUTPUT SNR OF THE MWF IN ACOUSTIC SENSOR NETWORKS. Toby Christian Lawin-Ore, Simon Doclo

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

System Identification in the Short-Time Fourier Transform Domain

TinySR. Peter Schmidt-Nielsen. August 27, 2014

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Transcription:

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation Emmanuel Vincent METISS Team Inria Rennes - Bretagne Atlantique E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 1 / 28

1 Artifacts 2 Covariance smoothing 3 Consistent Wiener filtering 4 Conclusion E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 2 / 28

Artifacts Under-determined source separation We aim to separate J sources from I < J mixture channels. In the time-frequency domain, x nf = J j=1 s jnf n: time frame index f : frequency bin index x nf : I 1 mixture STFT coeffs. s jnf : I 1 spatial image of source j Separation is typically performed in two steps: 1 estimate the parameter values of some parametric source model, 2 derive the sources in each time-frequency bin by binary/soft masking, local mixing inversion restricted to a subset of sources, or Wiener filtering. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 3 / 28

Artifacts Artifacts Separate masking/filtering of each time-frequency bin results in phase and amplitude discontinuities between neighboring bins perceived as artifacts. Mixture Binary masking These artifacts are very annoying for, e.g., hearing-aid speech processing, high-fidelity music processing. Fewer artifacts are often preferred at the expense of increased interference. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 4 / 28

Artifacts General approaches to artifact reduction Two complementary approaches to artifact reduction: smooth the parameter values of the source models, given the parameter values of the source models, smooth the masks/filters. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 5 / 28

Artifacts Artifact reduction in the Gaussian modeling framework We focus on the Gaussian modeling framework (includes NMF, GMM, beamforming, some ICA, etc). Each source is parameterized by a time- and frequency-dependent multichannel covariance matrix R sjnf : s jnf N(s jnf ;0,R sjnf ). Given estimates R sjnf of R sjnf, we replace the conventional Wiener filter Ŵ jnf = R J R 1 sjnf x nf with Rxnf = R sjnf by a smoothed Wiener filter W jnf. We then recover the sources by s jnf = W jnf x nf. j=1 E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 6 / 28

Covariance smoothing Covariance smoothing E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 7 / 28

Covariance smoothing Smoothing Two families of smoothing techniques have been developed in the field of speech enhancement: multichannel spatial smoothing with spatially diffuse background, single-channel temporal smoothing with stationary background. In the following, we extend temporal smoothing techniques to multi-channel mixtures, evaluate both families of techniques on under-determined mixtures involving directional nonstationary interference. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 8 / 28

Covariance smoothing Spatial filter smoothing Idea: smooth the spatial response by interpolating between the conventional Wiener filter and the identity filter. W SFS jnf = (1 µ)ŵjnf + µi. This is equivalent to adding part of the mixture signal back to the estimated source signals. Chen, J., Benesty, J., Huang, Y., Doclo, S. New insights into the noise reduction Wiener filter IEEE TASLP, 2006 Rickard, S.T. The DUET blind source separation algorithm Blind Speech Separation, 2007 E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 9 / 28

Covariance smoothing Spatial covariance smoothing Same idea, but nonlinear interpolation. W SCS jnf = R sjnf [(1 µ) R xnf + µ R sjnf ] 1. Doclo, S., Moonen, M. On the output SNR of the speech-distortion weighted multichannel Wiener filter IEEE SPL, 2005 E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 10 / 28

Covariance smoothing Temporal filter smoothing Idea: smooth the power response by interpolating between the conventional Wiener filters for neighboring time frames. W TFS jnf = 1 L + 1 L/2 l= L/2 Ŵ j,n+l,f. Adami, A., Burget, L., Dupont, S., Garudadri, H., Grezl, F., et al. Qualcomm-ICSI-OGI features for ASR Proc. ICSLP, 2002 Multichannel generalization by replacing single-channel Wiener filters by multichannel Wiener filters. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 11 / 28

Covariance smoothing Temporal covariance smoothing Same idea, but applied to the source covariance matrices. R sjnf = 1 L + 1 W TCS jnf L/2 l= L/2 R sj,n+l,f = R sjnf R 1 x nf with Rxnf = J j=1 R sjnf Yu, G., Mallat, S., Bacry, E. Audio denoising by time-frequency block thresholding IEEE TSP, 2008 Multichannel generalization by replacing scalar variances by multichannel covariances. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 12 / 28

Covariance smoothing Temporal SNR smoothing Same idea, but applied to the Signal-to-Noise Ratio (SNR). Ĝ jnf = R sjnf [ R xnf R sjnf ] 1 G jnf = 1 L + 1 W TRS jnf L/2 l= L/2 = I [ G jnf + I] 1 Ĝ j,n+l,f Ephraim, Y., Malah, D. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator IEEE TASSP, 1984 Multichannel generalization by replacing scalar SNRs by multichannel SNRs G jnf. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 13 / 28

Covariance smoothing Experimental evaluation (1) Data: four instantaneous stereo (I = 2) mixtures of J = 3 sources with known mixing matrix. Algorithms: Metrics: initial estimation by binary masking or l 1 -norm minimization, reestimation by spatial or temporal smoothing. Signal-to-Artifacts Ratio (SAR), Signal-to-Interference Ratio (SIR). E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 14 / 28

Covariance smoothing Experimental evaluation (2) Binary masking l 1 norm minimization 20 20 SIR (db) 15 SIR (db) 15 10 10 5 10 15 20 25 SAR (db) 5 10 15 20 25 SAR (db) spatial filter smoothing spatial covariance smoothing temporal filter smoothing temporal covariance smoothing temporal SNR smoothing E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 15 / 28

Consistent Wiener filtering Consistent Wiener filtering E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 16 / 28

Consistent Wiener filtering STFT consistency So far, we have treated each time-frequency bin independently. But the STFT is a redundant representation! The sources ŝ j estimated by conventional Wiener filtering belong to S = {s C I N F s.t. s n,f f = s nf n, f }. Can we smooth the filter so as to obtain a consistent estimate s j, i.e., s j STFT(R I T ) S is the STFT of a time-domain signal? s is consistent F(s) = 0 where F : s s STFT(iSTFT(s)) is a projector. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 17 / 28

Consistent Wiener filtering General formulation (1) To do so, let us come back to the definition of the Wiener filter. Let S nf = s 1,nf. s J 1,nf. The Wiener filter computes the mode of P(S nf x nf ) assuming that s jnf are independent zero-mean Gaussian with estimated covariance R sjnf. This distribution is Gaussian with mean ˆµ nf and precision Λ nf given by ˆµ nf = Σ Sx Σ 1 xx x nf, Λ nf = (Σ SS Σ Sx Σ 1 xx Σ xs ) 1 with Σ xs = Σ H Sx = [ Rs1,nf,..., R sj 1,nf ] Σ xx = J R sjnf j=1 Σ SS = diag( R s1,nf,..., R sj 1,nf ) E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 18 / 28

Consistent Wiener filtering General formulation (2) Wiener filtering is thus equivalent to minimizing log P(S x), which is equal up to a constant to the quadratic loss ψ(s) = n,f (S nf ˆµ nf ) H Λ nf (S nf ˆµ nf ). Without the consistency constraint, the solution is obviously arg minψ(s) = ˆµ. S S We aim to solve one of the following constrained problems instead: arg min S S ψ(s) s.t. F(S) = 0 (hard constraint) F(S) small (soft penalty) E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 19 / 28

Consistent Wiener filtering Consistency as a hard constraint We formulate the hard constrained problem in the time domain as arg min ψ(stft(š)) Š R J 1 I T The solution Š satisfies STFT Λ STFT( Š) = STFT Λ(ˆµ) where STFT is the adjoint of the STFT and Λ(S) nf = Λ nf S nf. When the STFT analysis and synthesis windows are identical, STFT is equal to istft up to a constant, hence istft Λ STFT( Š) = istft Λ(ˆµ) We invert istft Λ STFT using the conjugate gradient (CG) algorithm with preconditioning M 1 : Š istft Λ 1 STFT(Š). E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 20 / 28

Consistent Wiener filtering Consistency as a soft penalty Idea: avoid hard constraints when the source parameters are unknown. Consider the following time-frequency domain penalized problem instead: arg minψ(s) + γ F(S) 2 S S The solution S satisfies (Λ + γf F)( S) = Λ(ˆµ). When the STFT analysis and synthesis windows are identical, F is Hermitian, hence F F = F and (Λ + γf)( S) = Λ(ˆµ) We invert Λ + γf using CG with preconditioning ( M 1 : S Λ + γ FN T ) 1 FN Id (S). E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 21 / 28

Consistent Wiener filtering Experimental evaluation (1) Data: mono (I = 1) mixtures of speech and noise at 3 different SNRs. STFT: half-overlapping sine windows of length 1024 (64 ms). Source covariance estimates: oracle or blind (spectral subtraction). Algorithm: 7 different values of γ, hard constrained version formally denoted as γ = +. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 22 / 28

Consistent Wiener filtering Experimental results (2) Condition numbers with (+P) and without ( P) preconditioning E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 23 / 28

Consistent Wiener filtering Experimental results (3) Signal-to-Distortion Ratio (SDR) E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 24 / 28

Consistent Wiener filtering Experimental results (4) Input SNR -10 db 0 db +10 db SDR SIR SAR Time SDR SIR SAR Time SDR SIR SAR Time Wiener 11.5 21.2 12.2 0.1 17.7 25.0 18.8 0.1 24.7 30.1 26.4 0.1 MISI 12.4 24.0 12.9 1.7 18.4 27.2 19.2 1.1 25.3 31.7 26.6 0.9 Oracle PPR 12.1 24.2 12.5 2.4 18.2 26.9 19.0 1.0 25.0 31.2 26.4 0.6 ISSIR 11.7 22.6 12.3 2.5 17.9 25.7 18.8 1.3 24.8 30.4 26.4 0.7 Proposed 12.7 25.4 13.6 2.7 19.1 27.7 20.1 2.0 25.7 31.4 27.2 1.1 Wiener -4.8-4.8 5.9 0.1 5.0 6.1 11.6 0.1 14.7 16.0 20.8 0.1 MISI -4.8-4.6 4.9 2.1 4.9 6.3 10.7 1.9 14.7 16.2 19.9 1.3 Blind PPR -4.9-3.8 2.8 11.2 4.9 6.9 9.5 6.6 14.7 16.5 19.4 2.0 ISSIR -2.3-0.8 1.0 17.8 6.6 10.0 9.3 7.8 15.8 18.7 19.1 2.5 Proposed 0.5 3.3 1.2 10.6 8.3 13.9 9.7 6.4 17.3 21.7 19.4 2.7 Gunawan, D., Sen, D. Iterative phase estimation for the synthesis of separated sources from single-channel mixtures IEEE SPL, 2010 Sturmel, N., Daudet, L. Iterative phase reconstruction of Wiener filtered signals in Proc. ICASSP, 2012. Sturmel, N., Daudet, L. Informed source separation using iterative reconstruction arxiv:1202.2075v1 E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 25 / 28

Conclusion Conclusion E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 26 / 28

Conclusion Conclusion Among the tested smoothing techniques, temporal covariance smoothing provides the best tradeoff between artifacts and interference independently of the initial separation algorithm. Consistent Wiener filtering can also decrease artifacts and interference. Both techniques rely on one parameter (kernel width L or tradeoff γ) whose setting heavily depends on the accuracy of the initial source covariance estimates R sjnf. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 27 / 28

Conclusion References E. Vincent, An experimental evaluation of Wiener filter smoothing techniques applied to under-determined audio source separation, in Proc. LVA/ICA, pp. 157-164, 2010. J. Le Roux and E. Vincent, Consistent Wiener filtering for audio source separation, IEEE Signal Processing Letters, to appear. E. Vincent (Inria) Artifact reduction in audio source separation ESI - 07/12/2012 28 / 28