Scalable audio separation with light Kernel Additive Modelling

Similar documents
Harmonic/Percussive Separation Using Kernel Additive Modelling

Scalable audio separation with light kernel additive modelling

PROJET - Spatial Audio Separation Using Projections

arxiv: v2 [cs.sd] 8 Feb 2017

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

A diagonal plus low-rank covariance model for computationally efficient source separation

Music separation guided by cover tracks: designing the joint NMF model

Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Cauchy Nonnegative Matrix Factorization

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

(3) where the mixing vector is the Fourier transform of are the STFT coefficients of the sources I. INTRODUCTION

Non-Negative Tensor Factorisation for Sound Source Separation

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

Audio Source Separation Based on Convolutive Transfer Function and Frequency-Domain Lasso Optimization

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Fast algorithms for informed source separation

A Comparative Study of Example-guided Audio Source Separation Approaches Based On Nonnegative Matrix Factorization

NON-NEGATIVE MATRIX FACTORIZATION FOR SINGLE-CHANNEL EEG ARTIFACT REJECTION

AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

Generalized Constraints for NMF with Application to Informed Source Separation

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE

Drum extraction in single channel audio signals using multi-layer non negative matrix factor deconvolution

Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017

Convention Paper Presented at the 128th Convention 2010 May London, UK

Dept. Electronics and Electrical Engineering, Keio University, Japan. NTT Communication Science Laboratories, NTT Corporation, Japan.

Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011

Phase-dependent anisotropic Gaussian model for audio source separation

Multichannel Audio Modeling with Elliptically Stable Tensor Decomposition

A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues

Audio Source Separation With a Single Sensor

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS

EUSIPCO

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Informed Audio Source Separation: A Comparative Study

NONNEGATIVE matrix factorization was originally introduced

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Gaussian Processes for Audio Feature Extraction

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

Introduction Basic Audio Feature Extraction

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Soft-LOST: EM on a Mixture of Oriented Lines

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Spatial coding-based informed source separation

Oracle Analysis of Sparse Automatic Music Transcription

ESTIMATION OF RELATIVE TRANSFER FUNCTION IN THE PRESENCE OF STATIONARY NOISE BASED ON SEGMENTAL POWER SPECTRAL DENSITY MATRIX SUBTRACTION

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Consistent Wiener Filtering for Audio Source Separation

Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Proc. of NCC 2010, Chennai, India

Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

Temporal models with low-rank spectrograms

Some Interesting Problems in Pattern Recognition and Image Processing

Independent Component Analysis and Unsupervised Learning

GMM-based classification from noisy features

Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector

Acoustic Vector Sensor based Speech Source Separation with Mixed Gaussian-Laplacian Distributions

2D Spectrogram Filter for Single Channel Speech Enhancement

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

Extraction of Individual Tracks from Polyphonic Music

Music transcription with ISA and HMM

Environmental Sound Classification in Realistic Situations

An Inverse-Gamma Source Variance Prior with Factorized Parameterization for Audio Source Separation

Frequency Domain Speech Analysis

Fast adaptive ESPRIT algorithm

Author manuscript, published in "7th International Symposium on Computer Music Modeling and Retrieval (CMMR 2010), Málaga : Spain (2010)"

Blind Machine Separation Te-Won Lee

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization

10ème Congrès Français d Acoustique

arxiv: v2 [stat.ml] 26 May 2017

Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Course content (will be adapted to the background knowledge of the class):

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

CS229 Project: Musical Alignment Discovery

SHIFT-INVARIANT DICTIONARY LEARNING FOR SPARSE REPRESENTATIONS: EXTENDING K-SVD

AUDIO compression has been a very active field of research. Coding-based Informed Source Separation: Nonnegative Tensor Factorization Approach

Transcription:

Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute of Technology, Ireland 3 Gracenote, Media Technology Lab, Emeryville, CA, USA ICASSP, Brisbane, April 21 st 2015 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 1/20

Separating audio sources MIXING SEPARATION In this presentation: mono mixtures General multichannel case in the paper Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 2/20

Notations MIXTURE + STFT = + + Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 3/20

Time frequency masking = Each source STFT s j (ω, t) is obtained by filtering the mixture ŝ j (ω, t) = x (ω, t) w j (ω, t) Underdetermined separation w j varies with both ω and t Waveforms obtained by inverse STFT Many different ways to get a Time-Frequency (TF) mask w j (ω, t) Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 4/20

Time frequency masking = s j (ω, t) is assumed equal either to x (ω, t)or to 0 A classification task over the mixture STFT x based on features pitch detection+harmonics selection (CASA) panning position (DUET) Y. Han and C. Raphael. Informed source separation of orchestra and soloist. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pages 315 320, 2010 O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. on Signal Processing, 52(7):1830 1847, 2004 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 5/20

Getting the mask Binary masking yields musical noise Soft masking w j (ω, t) [0 1] is better! Example: Wiener filtering for Gaussian processes Sources energies p j (ω, t) 0 add up to get mix energy p j (ω, t) w j (ω, t) taken as proportion of source j in mix j w j (ω, t) = p j (ω, t) j p [0 1] j (ω, t) L. Benaroya, F. Bimbot, and R. Gribonval. Audio source separation with a single sensor. IEEE Trans. on Audio, Speech and Language Processing, 14(1):191 199, January 2006 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 6/20

Audio separation Spectrogram models KAM Light Results Time-Frequency masking challenges Liutkus?, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 7/20

Audio separation Spectrogram models KAM Light Results Iterative approaches main ideas Liutkus?, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 8/20

The need for spectrograms models For each time frequency bin (ω, t) we have J unknowns p j (ω, t) 0 we have 1 observation x (ω, t) C The problem is ill-posed We need to: exploit redundancies (e.g. multichannel data) reduce the number of parameters We should use prior knowledge on p j exploit expected structure of spectrograms N.Q.K. Duong, E. Vincent, and R. Gribonval. Under-determined reverberant audio source separation using a full-rank spatial covariance model. Audio, Speech, and Language Processing, IEEE Transactions on, 18(7):1830 1840, sept. 2010 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 9/20

Global spectrogram models nonnegative matrix factorization activations over time spectral patterns approximation A. Ozerov, E. Vincent, and F. Bimbot. A general flexible framework for the handling of prior information in audio source separation. Audio, Speech, and Language Processing, IEEE Transactions on, PP(99):1, 2011 Y. Salaün, E. Vincent, N. Bertin, N. Souviraà-Labastie, X. Jaureguiberry, D. Tran, and F. Bimbot. The Flexible Audio Source Separation Toolbox (FASST) version 2.0. In ICASSP, 2014 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 10/20

Kernel spectrogram models principles NMF is a global single model for all of p j Sometimes, our knowledge is only local We assume p j (ω, t) is equal to some neighbours I j (ω, t) Example: harmonic/percussive local models Percussive sounds are locally constant through frequency Harmonic sounds are locally constant through time percussive harmonic D. Fitzgerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 2010 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 11/20

Kernel spectrogram models examples ( ω, t ) I j (ω, t), p j ( ω, t ) p j (ω, t) D. Fitzgerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, September 2010 Z. Rafii and B. Pardo. A simple music/voice separation method based on the extraction of the repeating musical structure. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 221 224, may 2011 D. FitzGerald. Vocal separation using nearest neighbours and median filtering. In Proceedings of the 23nd IET Irish Signals and Systems Conference, pages 583 588, Maynooth, 2012 Z. Rafii and B. Pardo. Music/voice separation using the similarity matrix. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), pages 583 588, 2012 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 12/20

Kernel spectrogram models objective Combining all those local models together! Example: voice/music separation Musical background 5 sources repeating at different scales (beat, downbeat,...) +1 source which is stable along time (strings, synths) Voice with a locally constant spectrogram (cross-like kernel) Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 13/20

Audio separation Spectrogram models KAM Light Results Kernel backfitting algorithm Liutkus?, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 14/20

Kernel backfitting algorithm monochannel standard version Input Mixture STFT x (ω, t) Neighbourhoods I j (ω, t), also called proximity kernels Initialization: j, ˆp j (ω, t) x (ω, t) 2 : simply take mix spectrogram Iterate Separation with Wiener filtering compute estimates ŝ j (ω, t) = [ˆp j (ω, t) / ] j ˆp j (ω, t) x (ω, t) Spectrograms fitting ˆp j (ω, t) median filter ŝ j 2 with kernel I j (ω, t) Output: source estimates ŝ j Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 15/20

Audio separation Spectrogram models KAM Light Results Scalability issues Kernel models: no compact parameterization all spectrograms must be stored in full resolution for 10 sources and a full length track: approximately 32Gb of RAM needed! Liutkus?, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 16/20

Low-rank models for compression p j (ω, t) = Different possible approaches K W (ω, k) H (k, t) k=1 Nonnegative matrix factorization W and H have nonnegative entries meaningful decompositions, but slow Truncated singular values decompositons (SVD) W and H are real not physically meaningful, but fast After filtering, compress spectrograms with a low-rank model A. Ozerov, E. Vincent, and F. Bimbot. A general flexible framework for the handling of prior information in audio source separation. Audio, Speech, and Language Processing, IEEE Transactions on, PP(99):1, 2011 N. Halko, P. Martinsson, and J. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217 288, 2011 Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 17/20

Audio separation Spectrogram models KAM Light Results Kernel Additive Modelling Light (KAML) Liutkus?, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 18/20

Demo external demo Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 19/20

Kernel Additive Modelling: conclusion A general framework for combining different kernel models Handles multichannel, full-length mixtures Easy to implement and fast algorithms full demo at www.loria.fr/~aliutkus/kaml/ To go further Formalization optimization framework with robust cost-functions equivalence with EM algorithm in some cases Combination with other techniques Learning source kernels automatically? maximizing size of kernel (robustness) maximizing invariance to median filtering Liutkus, Fitzgerald, Rafii KAML for scalable audio separation 04/22/2015 20/20