Fast algorithms for informed source separation

Size: px

Start display at page:

Download "Fast algorithms for informed source separation"

Shannon Jordan
5 years ago
Views:

1 Fast algorithms for informed source separation Augustin Lefèvre September, 10th 2013

2 Source separation in 5 minutes Recover source estimates from a mixed signal We consider the single-channel setting : x t = s (1) t +s (2) t. Ill-posed problem, need prior information.

3 Read mix waveform x time (seconds)

4 Short time Fourier transform Short time Fourier transform C fn = F t=1 ( x t+(n 1)H w t exp i ) 2(f 1)π(t 1) F

5 Remove phase information Hz sec

6 Output of source separation program Hz 4000 Hz sec sec

7 Time-frequency masking Estimates of each source s complex STFT are obtained by : S g,fn = X g,fn l X C fn l,fn

8 Estimate waveforms from STFT x time (seconds) x time (seconds)

9 Annotation informed source separation [Lefèvre et al., 2012, Bryan and Mysore, 2013]: interaction between user and source separation software. [Lefèvre et al., 2012]: detector trained on development database (random forest, SVM, nearest-neighbour, etc.). Figure: Detections in the spectrogram

10 AISS nmf: non-convex Annotation informed source separation. Information is used as additional constraints : M g X g = M g T g. [Lefèvre et al., 2012] : nonnegative matrix factorization (nmf) with constraints : min D,A s.t. Y g D ga g 2 F D R F K +,A R K N + M g (D g A g ) = M g T g Y R F N + is the input spectrogram. Need only D 1 A 1 0, but impose stronger constraint : D 1 0, A 1 0 (NMF). nmf is a hard problem.

11 AISS lownuc : convex Informed souce separation : X 1,...,X G R F N. 1 min X 2 Y G g=1 X g 2 F +λ G g=1 X g s.t. M g X g = M g T g X g 0 The rank of a matrix is revealed in its SVD : X = PΣQ. σ 1 σ 2 σ F 0 singular values. X g = F f=1 σ f. Projecting on X g 0 is straightforward. Instead of one nmf, we will make repeated calls to svd to compute X g and additional information.

12 Algorithms for informed source separation Convex but nonsmooth problem. Related approaches if no noise and no inequality constraints (Recht et al., 2010) : min X s.t. A(X) = b where A : R F NG R p, b R p (p m n) is linear. Link with SDP optimization : min s.t. t ( A(X) = ) b ti X 0 X ti Use interior-point solver, which has superlinear convergence rate. BUT Hessian has size O(F 2 N 2 ), i.e for a ten seconds audio track. This is too large!

13 Subgradient descent Objective function f is convex so it admits derivatives in all directions : f f(x +td) f(x) (X;D) = lim t 0 t Subgradients generalize the gradient : Z,D = g Tr Z g D g Z f(x) f (X;D) Z,D Projected subgradient descent : X (t+1) = Π(X (t) µ t Z (t) ). Warning : f(x (t+1) ) f(x (t) ). Guarantee :µ t = µ 0 (1+t) 1 2 X (t) X ց 0.

14 Controlled experiments lownuc nmf (a) Overall lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of CPU time (in seconds), for (green) our method and (red) NMF started from several initial points. SDR is a measure of how well we have separated sources (the higher the better).

15 Shrinkage of singular values 1e+4 1e+3 1e+2 1e+1 1 true λ 1.0e 04 λ 1.0e 02 λ 1.0e+00 λ 1.0e+02 λ 1.0e+04 1e 1 1e 2 1e 3 1e Figure: Magnitude of singular values in decreasing order, for various values of λ. Dotted line is the true singular value profile.

16 Smoothing technique [Nesterov, 2003] 1 min X 2 Y G g=1 X g 2 F +λ G g=1 X g,µ s.t. M g X g = M g T g X g 0,µ is C (1,1 with Lipschitz constant 1 µ and : X,µ X X,µ +µc X R F N X = max{tr U X, σ 1 (U) 1} X,µ = max{tr U X U 2 F, σ 1 (U) 1} Apply accelerated gradient descent to the smooth minimization problem. µ = 0 : slow convergence but accurate solutions. Large µ : fast but inaccurate solutions.

17 Comparison with subgradient obj. value Obj function subg smg 1e 1 smag 1e CPU time SDR subg smg 1e 1 smag 1e CPU time Figure: Decrease of the objective function as a function of the allowed CPU time, for various algorithms

18 Effect of µ obj. value Obj function CPU time mu 1e+00 mu 1e 01 mu 1e 02 mu 1e 03 SDR mu 1e+00 mu 1e 01 mu 1e 02 mu 1e CPU time Figure: Decrease of the objective function as a function of the allowed CPU time, for various values of µ. We display the original objective function : 1 G 2 Y G X g 2 F +λ X g. g=1 g=1

19 Conclusion Our formulation contributes to the field of informed source separation methods, where knowledge is directly relevant to the query audio track, and involves interaction with the user. These methods are the state of the art in single-channel source separation benchmarks. Our convex formulation compares well with its NMF counterpart, even with a subgradient algorithm. The smoothing technique allows to retrieve more accurate solutions for a given CPU budget. More complex constraints? E.g., source estimates should classify correctly : W,X g +b 0.

20 Proximal operator : prox( X) = arg min 1 X 2 X X 2 F +λ X, s.t. M g X g = M g T g, Necessary and sufficient conditions : 0 X X +λ(pq +W)+M E W X = 0 WX = 0 M X = M T W op 1 where E R F N are Lagrangian multiplicators associated with the constraint M X = 0. Note that here, X = PΣQ is an economy-size SVD of X and not X, so P and Q depend on X.

21 N.J. Bryan and G.J. Mysore. Interactive Refinement of Supervised and Semi-supervised Sound Source Separation Estimates. In ICASSP, A. Lefèvre, F. Bach, and C. Févotte. Semi-supervised NMF with time-frequency annotations for single-channel source separation. In International Conference on Music Information Retrieval (ISMIR), Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer, 2003.

Optimization methods

Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to