On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Similar documents
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

A Coupled Helmholtz Machine for PCA

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

NONNEGATIVE matrix factorization (NMF) is a

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Nonnegative Matrix Factorization

Sparseness Constraints on Nonnegative Tensor Decomposition

IN real-world audio signals, several sound sources are usually

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

Non-Negative Tensor Factorisation for Sound Source Separation

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

Constrained Projection Approximation Algorithms for Principal Component Analysis

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

t Tao Group Limited, Reading, RG6 IAZ, U.K. wenwu.wang(ieee.org t Samsung Electronics Research Institute, Staines, TW1 8 4QE, U.K.

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Single Channel Signal Separation Using MAP-based Subspace Decomposition

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

NON-NEGATIVE SPARSE CODING

Probabilistic Latent Variable Models as Non-Negative Factorizations

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

Matrix Factorization for Speech Enhancement

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011

Probabilistic Latent Semantic Analysis

arxiv: v3 [cs.lg] 18 Mar 2013

Environmental Sound Classification in Realistic Situations

MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Sound Recognition in Mixtures

Single-channel source separation using non-negative matrix factorization

c Springer, Reprinted with permission.

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard

Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA

NON-NEGATIVE MATRIX FACTORIZATION WITH SELECTIVE SPARSITY CONSTRAINTS FOR TRANSCRIPTION OF BELL CHIMING RECORDINGS

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Deep NMF for Speech Separation

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Independent Component Analysis (ICA)

Learning Dictionaries of Stable Autoregressive Models for Audio Scene Analysis

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

Machine Learning for Signal Processing Non-negative Matrix Factorization

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

AN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;

A METHOD OF ICA IN TIME-FREQUENCY DOMAIN

Nonnegative matrix factorization with α-divergence

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

arxiv: v2 [cs.sd] 8 Feb 2017

Nonnegative Tensor Factorization with Smoothness Constraints

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence

Maximum Within-Cluster Association

Non-negative matrix factorization and term structure of interest rates

Topographic Independent Component Analysis of Gene Expression Time Series Data

Machine Learning for Signal Processing Non-negative Matrix Factorization

Non-Negative Factorization for Clustering of Microarray Data

Preserving Privacy in Data Mining using Data Distortion Approach

x 1 (t) Spectrogram t s

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COLLABORATIVE SPEECH DEREVERBERATION: REGULARIZED TENSOR FACTORIZATION FOR CROWDSOURCED MULTI-CHANNEL RECORDINGS. Sanna Wager, Minje Kim

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Novelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

FREQUENCY-DOMAIN IMPLEMENTATION OF BLOCK ADAPTIVE FILTERS FOR ICA-BASED MULTICHANNEL BLIND DECONVOLUTION

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors

A NEURAL NETWORK ALTERNATIVE TO NON-NEGATIVE AUDIO MODELS. University of Illinois at Urbana-Champaign Adobe Research

Machine Learning (BSMC-GA 4439) Wenke Liu

Generalized Constraints for NMF with Application to Informed Source Separation

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Non-negative matrix factorization with fixed row and column sums

L 2,1 Norm and its Applications

Bayesian harmonic models for musical signal analysis. Simon Godsill and Manuel Davy

Scalable audio separation with light Kernel Additive Modelling

POLYPHONIC MUSIC TRANSCRIPTION BY NON-NEGATIVE SPARSE CODING OF POWER SPECTRA

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION

Jacobi Algorithm For Nonnegative Matrix Factorization With Transform Learning

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Proc. of NCC 2010, Chennai, India

EECS490: Digital Image Processing. Lecture #26

CS229 Project: Musical Alignment Discovery

Research Article Theorems on Positive Data: On the Uniqueness of NMF

Incremental Multi-Source Recognition with Non-Negative Matrix Factorization

Transcription:

On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu Pohang 790-784, Korea {minjekim,seungjin}@postech.ac.kr Abstract. In this paper we present a method of separating musical instrument sound sources from their monaural mixture, where we take the harmonic structure of music into account and use the sparseness and the overlapping NMF to select representative spectral basis vectors which are used to reconstruct unmixed sound. A method of spectral basis selection is illustrated and experimental results with monaural instantaneous mixtures of voice/cello and saxophone/viola, are shown to confirm the validity of our proposed method. 1 Introduction The nonnegative matrix factorization (NMF) [1] or its extended version, nonnegative matrix deconvolution (NMD) [2], was shown to be useful in polyphonic music description [3], in the extraction of multiple music sound sources [2], and in general sound classification [4]. On the other hand, a method based on multiple cause models and sparse coding was successfully applied to automatic music transcription [5]. Some of these methods regard each note as a source, which might be appropriate for music transcription and work for source separation in a very limited case. In this paper we present a method for single channel polyphonic music separation, the main idea of which is to select a few representative spectral basis vectors using the sparseness and the overlapping NMF [6], which are used to reconstruct unmixed sound signals. We assume that the structure of harmonics of a musical instrument approximately remains the same, even if it is played at different pitches. This view allows us to reconstruct original sound using only a few representative spectral basis, through the overlapping NMF. We illustrate a method of spectral basis selection from the spectrogram of mixed sound and show how these basis vectors are used to restore unmixed sound. Promising results with monaural instantaneous mixtures of voice/cello and saxophone/viola, are shown to confirm the validity of our proposed method.

2 Overlapping NMF Nonnegative matrix factorization (NMF) is a simple but efficient factorization method for decomposing multivariate data into a linear combination of basis vectors with nonnegativity constraints for both basis and encoding matrix [1]. Given a nonnegative data matrix V R m N (where V ij 0), NMF seeks a factorization V WH, (1) where W R m n (n m) contains nonnegative basis vectors in its columns and H R n N represents the nonnegative encoding variable matrix. Appropriate objective functions and associated multiplicative updating algorithms for NMF can be found in [7]. The overlapping NMF is an interesting extension of the original NMF, where transform-invariant representation and a sparseness constraint are incorporated with NMF [6]. Some of basis vectors computed by NMF could correspond to the transformed versions of a single representative basis vector. The basic idea of the overlapping NMF is to find transformation-invariant basis vectors such that fewer number of basis vectors{ could reconstruct observed data. Given a set of transformation matrices, T = T (1),T (2),...,T (K)}, the overlapping NMF finds { a nonnegative basis matrix W and a set of nonnegative encoding matrix H (k)} (for k = 1,...,K) which minimizes J (W,H) = 1 K 2 V 2 T (k) WH (k), (2) where F represents Frobenious norm. As in [7], the multiplicative updating rules for the overlapping NMF were derived in [6], which are summarized below. k=1 F Algorithm Outline: Overlapping NMF [6] Step 1 Calculate the reconstruction: R = K k=1 T (k) WH (k). Step 2 Update the encoding matrix by [ W T T (k)] T V H (k) H (k) [ W T T (k)], k = 1,...,K, (3) T R where denotes the Hadamard product and the division is carried out in an element-wise fashion. Step 3 Calculate the reconstruction R again using the encoding matrix H (k) updated in Step 2, as in Step 1.

Step 4 Update the basis matrix by W W K k=1 [T (k)] T V [H (k)] T K k=1 [T (k)] T R [ H (k)] T. (4) 3 Spectral Basis Selection The goal of spectral basis selection is to choose a few representative vectors from V = [v 1 v N ] where V is the data matrix associated with the spectrogram of mixed sound. In other words, each column vector of V corresponds to the power spectrum of the mixed sound at time t = 1,...,N. Selected representative vectors are fixed as basis vectors that are used to learn an associated encoding matrix through the overlapping NMF with the sparseness constraint, in order to reconstruct unmixed sound. Our spectral basis selection method consists of two parts, which is summarized in Table 1. The first part is to select several candidate vectors from V using a sparseness measure and a clustering technique. We use the sparseness measure proposed by Hoyer [8], described by m ( vi )/ v 2 sparseness(v) = m i, (5) 1 where v i is the ith element of the m-dimensional vector v. The first part of our spectral basis selection method starts with choosing a candidate vector that has the largest sparseness values (see Fig. 1 (d)). Then we estimate the location of fundamental frequency bin for each input vector, which corresponds to the lowest frequency bin above the mean value. Each input vector is aligned to the candidate vector such that the fundamental frequency bin appears at the same location as the candidate vector. Euclidean distances between these aligned input vectors and the candidate vectors are calculated and a hierarchical clustering method (or any other clustering methods) is applied to eliminate whatever vectors belong to a group which the candidate vector belongs to. This procedure is repeated until we choose a pre-specified number of candidate vectors. Increasing this pre-specified number provides more feasible candidate vectors, however, the computational complexity in the second part increases. The repetition procedure produces several candidates, some of which are expected to represent a original musical instrument sound in such a way that a set of vertically-shifted basis restores the original sound. The second part of our method is devoted for the final selection of representative spectral basis vectors from candidates obtained in the first part. Candidate vectors are regarded as input vectors for the overlapping NMF. For every possible combination of candidates (for the case of 2 sources, 2 out of the number of candidates), we learn an associated encoding matrix with selected candidates fixed as basis vectors, and calculate the reconstruction error. Final representative spectral basis vectors are the ones which give the lowest reconstruction error.

Table 1. Spectral basis selection procedure. Calculate the sparseness value of every input vector, v t, using (5); Normalize every input vector; repeat until the number candidates < threshold or all input vectors are eliminated Select a candidate with the highest sparseness value; Estimate the fundamental frequency bin for each input vector; Align each input vector such that its frequency bin location is the same as the candidate; Calculate Euclidean distances between the candidate and every input vector; Cluster input vectors using Euclidean distances; Eliminate input vectors in the cluster which the candidate belongs to; end (repeat) repeat for every possible combination of candidates Set all candidate vectors as input vectors; Select a combination of candidates; Learn a encoding matrix, through the overlapping NMF, with fixing these selected candidates as basis vectors; Compute the reconstruction error of the overlapping NMF; end (repeat) Select the combination of candidates with the lowest reconstruction error; 4 Numerical Experiments We present two simulation results for monaural instantaneous mixtures of: (1) voice and cello; (2) saxophone and viola. We apply our spectral basis selection method with the overlapping NMF to these two data sets. Experimental results are shown in Fig. 1 and 2 where figure captions describe detailed results. The set of transformation matrices, T, that we used, is { T = T } k m (k) T (k) = I, 1 k 2m 1, (6) j where I R m m is the identity matrix and I leads to the shift-up or shiftdown of row vectors of I by j, if j is positive or negative, respectively. After shift-up or -down, empty elements are zero-padded. For the case where m = 3 and k = 2, T (2) and T (5) are defined as 2 3 T (2) = 0 0 0 5 3 I = 1 0 0, T (5) 0 0 1 = I = 0 0 0. (7) 0 1 0 0 0 0 Multiplying a vector by these transformation matrices, leads to a set of verticallyshifted vectors.

100 200 300 400 600 700 800 900 100 200 300 400 600 700 800 900 100 200 300 400 600 700 800 900 (a) (b) (c) 0.65 0.6 0.55 0.5 sparsity 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0 20 40 60 80 100 120 time 100 200 300 400 600 700 800 900 100 200 300 400 600 700 800 900 (d) (e) (f) Fig. 1. Spectrograms of original sound of voice and a single string of a cello are shown in (a) and (b), respectively. Horizontal bars reflect the structure of harmonics. One can see that every note is the vertically-shifted version of each other if their musical instrument sources are the same. Monaural mixture of voice and cello is shown in (c) where 5 candidate vectors selected by our algorithm in Table 1 are denoted by dotted or solid vertical lines. Two solid lines represent final representative spectral basis vectors which give the smallest reconstruction error in the overlapping NMF. Each of these two basis vectors is a representative one for voice and a string of cello. Associated sparseness values are shown in (d) where black dots on a graph are associated with the candidate vectors. Unmixed sound is shown in (e) and (f) for voice and cello, respectively. 5 Discussions We have presented a method of spectral basis selection for single channel polyphonic music separation, where the harmonics, sparseness, clustering, and the overlapping NMF were used. Rather than learning spectral basis vectors from the data, our approach is to select a few representative spectral vectors among given data and fix them as basis vectors to learn associated encoding variables through the overlapping NMF, in order to restore unmixed sound. The success of our approach lies in the assumption that the distinguished timbre of a given musical instrument can be expressed by a transform-invariant time-frequency representation, even though their pitches are varying. A string instrument has multiple distinguished harmonic structures. In such a case, it is reasonable to assign a basis vector for each string. Acknowledgments: This work was supported by ITEP Brain Neuroinformatics Program and ETRI.

1 2 3000 1 2 3000 (a) (b) (c) 1 2 3000 0.8 0.7 sparsity 0.6 0.5 0.4 0.3 0.2 0 100 200 300 400 600 700 time 1 2 3000 (d) (e) (f) 1 2 3000 Fig. 2. Spectrograms of original sound of saxophone and viola are shown in (a) and (b), respectively. Every note is artificially generated by changing the frequency of a real sample sound, so that the spectral character of each instrument is constant in all the variations of notes. Monaural mixture is shown in (c) where 5 selected candidate vectors are denoted by vertical lines. Each of two solid lines among them represents final representative spectral basis vector of each instrument. Associated sparseness values are shown in (d) where black dots associated with the candidate vectors are marked. Unmixed sound is shown in (e) and (f) for saxophone and viola, respectively. References 1. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999) 788 791 2. Smaragdis, P.: Non-negative matrix factor deconvolution: Extraction of multiple sound sources from monophonic inputs. In: Proc. Int l Conf. Independent Component Analysis and Blind Signal Separation, Granada, Spain (2004) 494 499 3. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2003) 177 180 4. Cho, Y.C., Choi, S.: Nonnegative features of spectro-temporal sounds for classfication. Pattern Recognition Letters 26 (2005) 1327 1336 5. Plumbley, M.D., Abdallah, S.A., Bello, J.P., Davies, M.E., Monti, G., Sandler, M.B.: Automatic transcription and audio source separation. Cybernetics and Systems (2002) 603 627 6. Eggert, J., Wersing, H., Körner, E.: Transformation-invariant representation and NMF. In: Proc. Int l Joint Conf. Neural Networks. (2004) 7. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems. Volume 13., MIT Press (2001) 8. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5 (2004) 1457 1469