Singing voice enhancement for monaural music recordings with a cascade two-stage algorithm

Similar documents
An Example file... log.txt

This document has been prepared by Sunder Kidambi with the blessings of

Surface Modification of Nano-Hydroxyapatite with Silane Agent

Thermal Conductivity of Electric Molding Composites Filled with β-si 3 N 4

Pose Determination from a Single Image of a Single Parallelogram

A Robust Adaptive Digital Audio Watermarking Scheme Against MP3 Compression


â, Đ (Very Long Baseline Interferometry, VLBI)

Scalable audio separation with light Kernel Additive Modelling

Price discount model for coordination of dual-channel supply chain under e-commerce

Application of ICA and PCA to extracting structure from stock return

ÇÙÐ Ò ½º ÅÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ Ò Ú Ö Ð Ú Ö Ð ¾º Ä Ò Ö Ö Ù Ð Ý Ó ËÝÑ ÒÞ ÔÓÐÝÒÓÑ Ð º Ì ÛÓ¹ÐÓÓÔ ÙÒÖ Ö Ô Û Ö Ö ÖÝ Ñ ¹ ÝÓÒ ÑÙÐ ÔÐ ÔÓÐÝÐÓ Ö Ñ

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

Harmonic/Percussive Separation Using Kernel Additive Modelling

Drum extraction in single channel audio signals using multi-layer non negative matrix factor deconvolution

arxiv: v2 [cs.sd] 8 Feb 2017

EXTRACT THE PLASTIC PROPERTIES OF METALS US- ING REVERSE ANALYSIS OF NANOINDENTATION TEST

Planning for Reactive Behaviors in Hide and Seek

ADVANCES IN MATHEMATICS(CHINA)

APPARENT AND PHYSICALLY BASED CONSTITUTIVE ANALYSES FOR HOT DEFORMATION OF AUSTENITE IN 35Mn2 STEEL

Ú Bruguieres, A. Virelizier, A. [4] Á «Î µà Monoidal

L P -NORM NON-NEGATIVE MATRIX FACTORIZATION AND ITS APPLICATION TO SINGING VOICE ENHANCEMENT. Tomohiko Nakamura and Hirokazu Kameoka,

I118 Graphs and Automata

SOUND SOURCE SEPARATION BASED ON NON-NEGATIVE TENSOR FACTORIZATION INCORPORATING SPATIAL CUE AS PRIOR KNOWLEDGE

Ä D C Ã F D {f n } F,

A Double-objective Rank Level Classifier Fusion Method

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

PH Nuclear Physics Laboratory Gamma spectroscopy (NP3)

PROJET - Spatial Audio Separation Using Projections

An Introduction to Optimal Control Applied to Disease Models

General Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator

Books. Book Collection Editor. Editor. Name Name Company. Title "SA" A tree pattern. A database instance

T T V e g em D e j ) a S D } a o "m ek j g ed b m "d mq m [ d, )

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

â çüì ÂÚUèÿææ - I, SUMMATIVE ASSESSMENT I,

PART IV LIVESTOCK, POULTRY AND FISH PRODUCTION

TELEMATICS LINK LEADS

2016 xó ADVANCES IN MATHEMATICS(CHINA) xxx., 2016

An improved algorithm for scheduling two identical machines with batch delivery consideration

Scalable audio separation with light kernel additive modelling

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

Lecture 16: Modern Classification (I) - Separating Hyperplanes

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

F(jω) = a(jω p 1 )(jω p 2 ) Û Ö p i = b± b 2 4ac. ω c = Y X (jω) = 1. 6R 2 C 2 (jω) 2 +7RCjω+1. 1 (6jωRC+1)(jωRC+1) RC, 1. RC = p 1, p

Fast Fourier Transform Solvers and Preconditioners for Quadratic Spline Collocation

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

SAMPLE QUESTION PAPER Class- XI Sub- MATHEMATICS

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION

OC330C. Wiring Diagram. Recommended PKH- P35 / P50 GALH PKA- RP35 / RP50. Remarks (Drawing No.) No. Parts No. Parts Name Specifications

A Study on Dental Health Awareness of High School Students

Automatic Control III (Reglerteknik III) fall Nonlinear systems, Part 3

Vectors. Teaching Learning Point. Ç, where OP. l m n

Framework for functional tree simulation applied to 'golden delicious' apple trees

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren

Mutually orthogonal latin squares (MOLS) and Orthogonal arrays (OA)

Sample Exam 1: Chapters 1, 2, and 3

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

In Vivo Study of Porous Calcium Silicate Bioceramic in Extra-osseous Sites

45 2 Û Vol.45 No Ó ACTA METALLURGICA SINICA Feb pp

Final exam: Automatic Control II (Reglerteknik II, 1TT495)

INVERSE TRIGONOMETRIC FUNCTION. Contents. Theory Exercise Exercise Exercise Exercise

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Applications of Discrete Mathematics to the Analysis of Algorithms

ACS AKK R0125 REV B 3AKK R0125 REV B 3AKK R0125 REV C KR Effective : Asea Brown Boveri Ltd.

Affine-invariant Shape Recognition Using Grassmann Manifold

hal , version 1-27 Mar 2014

Queues, Stack Modules, and Abstract Data Types. CS2023 Winter 2004

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

Periodic monopoles and difference modules

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

Non-Negative Tensor Factorisation for Sound Source Separation

Research of Application the Virtual Reality Technology in Chemistry Education

2 Hallén s integral equation for the thin wire dipole antenna

UNIQUE FJORDS AND THE ROYAL CAPITALS UNIQUE FJORDS & THE NORTH CAPE & UNIQUE NORTHERN CAPITALS

NPTEL COURSE ON MATHEMATICS IN INDIA: FROM VEDIC PERIOD TO MODERN TIMES

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

The University of Bath School of Management is one of the oldest established management schools in Britain. It enjoys an international reputation for

Examination paper for TFY4240 Electromagnetic theory

Unit 3. Digital encoding

Optimal Control of PDEs

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

F O R SOCI AL WORK RESE ARCH

Generalized Constraints for NMF with Application to Informed Source Separation

Sound Recognition in Mixtures

$%! & (, -3 / 0 4, 5 6/ 6 +7, 6 8 9/ 5 :/ 5 A BDC EF G H I EJ KL N G H I. ] ^ _ ` _ ^ a b=c o e f p a q i h f i a j k e i l _ ^ m=c n ^

Convention Paper Presented at the 128th Convention 2010 May London, UK

Kernel expansions with unlabeled examples

QUERY-BY-EXAMPLE MUSIC RETRIEVAL APPROACH BASED ON MUSICAL GENRE SHIFT BY CHANGING INSTRUMENT VOLUME

Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics

Stochastic invariances and Lamperti transformations for Stochastic Processes

Constructive Decision Theory

arxiv: v1 [cs.sd] 4 Nov 2017

Finding small factors of integers. Speed of the number-field sieve. D. J. Bernstein University of Illinois at Chicago

u x + u y = x u . u(x, 0) = e x2 The characteristics satisfy dx dt = 1, dy dt = 1

Transcription:

2018 Ñ 9 Ð Ô 32 Ô 3 Ý Sept. 2018 Communication on Applied Mathematics and Computation Vol.32 No.3 DOI 10.3969/j.issn.1006-6330.2018.03.007 ÂßÑÀ¹ÏÇ ²Å ( 200444) É Ë³Ó²±ĐÀΠе±Ü»Ð À Đ Ñ Ö ÓÛ ¼Ú Í Ð ß ÐÁ RPCA ÄµÖ Û ¹ ÂÐ ÇÀ ÓÛ ÐÇÚÎ ĐÀË ß» ÅÆ ÓÔ ĐÉ REPET ÐÁ Î Đ É ¹Þà Ӿ±Æ MIR-1K ¹ ß» Ö ÅÉ Å ÔĐ 2010 ÉÛ 93A30»ÉÛ TN912.35 ¼½Î A ¼Ê¼ 1006-6330(2018)03-0497-12 Singing voice enhancement for monaural music recordings with a cascade two-stage algorithm YU Shiwei, ZHANG Hongjuan (College of Sciences, Shanghai University, Shanghai 200444, China) Abstract In this paper, taking into account the unique properties of the singing voice that belongs to neither harmonic nor percussive sounds, we propose a cascade method for monaural singing voice enhancement. Specifically, under this framework, the RPCA technique is first applied to decompose the music mixture spectrogram into a sparse singing voice part and a low-rank background music part. Under this strong assumption, some percussive components (i.e., bass drum) in the background music are prone to be incorrectly assigned to the vocal part, and these percussive components are more repetitive than the singing voice essentially. Therefore, the REPET technology is applied to extract them further, leaving out more purely singing voice. Evaluations on the MIR-1K public dataset show that the proposed method has the ability to improve the separation performance, when compared with three state-of-the-art methods. Key words singing voice enhancement; robust principal component analysis; repetition pattern extraction 2010 Mathematics Subject Classification 93A30 ÍÆ 2016-08-06; Æ 2017-05-15 Ö Ô (11501351) Ë ÙÈ ßÁ E-mail: zhanghongjuan@shu.edu.cn

498 Ô 32 Chinese Library Classification TN 912.35 0 ÒÅ Æ ÏÞÇ Æ«µ¼Æ Å Ä Ì Æ Ò Æ Ò ÓÊÌ«½ Æ ÅÝÝÌÁ Å Ì [1] Ä [2] [3] ÄÅ Õ ³Æ Å Ï Ô µå ÌÒ ÑÏ ÚÞغ ÈÐÊÆ«Æ «ÊƵРРÄÝÐÊÆÅ Ü Å È«ÐÉ Õ¾ ÆÐ Á½ Õ Ï«³Ë½ «µ²òæ ÙÕ² Ì Rafii Pardo Á³ÞÆ À (repetition pattern extraction technique, REPET) È Æ ««ÑÆÆÜ ÉÚ³Æ Ó Æ È««[4-5]. Ú³ ««Ð ƺ «Ó ²Æº² ½ ÝÁ Ú³ Liutkus ̳ÊΠƲ ¾ÜÌÆ [6]. ÒÐ [7] «Huang ÖÚÆÆ Ì³ ÞÆË É µ ̳ Þ ÝÁ µðæ º Ð [8] Ò ÈÁ±º ÝÉ ÒÐ [9] «Yang Ö³ ÝµÌ Þ Õ ÅÏ«º Æ Ûº Ò ÕÍÍÆ ± Ò ÕÍÍÆ ÆÍ Tachibana ³ÑÁ³ «[10], Ò²Ò¾ÕÍÆÌÁ± / Í À (harmonic/percussive sound separation, HPSS) [11], Ú ««Ï ± ÍÐÎÆ ÆÆ ÒÐ [12] «FitzGerald ¹ÌÁ˻٠̫Π[13] À HPSS À ÒÐ [14] «Zhu ̳»»Ù ÒÏ«½««Á ÐÁ ¾ÕÚÖ ¾ÕÚÖ Þ (non-negative matrix factorization, NMF). Ö RPCA Ï««ÅÒ «Ù Í ÄÝÚ Ò¾ º ÚÚÏÊ «Í¾ RPCA «ÑÒ ÚڲŠÄÅ ÒÏ«³ËÍ Æº ÚÕ Ð µ ¹Ì REPET À Í Ý Ҿ Ó ÆÆ

Ô 3 Ý ½µ À ĐÑ Ö 499 Ó ÆÐ ÁÆ È Ï Ý ÐÒ½ Å MIR-1K Þ ««µº 1 ³Æ Ð «Æ µ²æ± ͱ ÒÕÚÖ²Ò ± ÚÚÅ Úº ¾¾Ü É ÒÇØ ÆÒÒ ÅÒ¾ ÍÕ º ¾ ¾Ü Ð ÒÕÍÆÒÄ ²Æ RPCA À ÆÆ Ì³ Ü ÚÚ¹ÎÆ Û Ä ¾ÜÌ ÒÕÚÖÅÒ [15]. Ò ÒÕÚÖ ÐÇعα ÚÁ À«Ò³¾Õ Ä º Ú± ¾Ã ³Ë Ú ß º µ «ÝÆ H ± º غ ¾º Æ ( ). V ¹ÎÆ Û ( ). P È Ø Æ (Ù). Ú Æ ÆËÒ± Í ± X = H + P + V, (1) «X Æ ÀÚ H, P V ± Í 1.1 ÏÊÕÝ Ú ÌÈ RPCA. ÆÐ RPCA Ò Å Ý³Å Ø «Ï««ÓÕÚÖ² Þ E Þ A ÙÕ µ Æ { min A + λ E 1, s.t. X = A + E, (2) «λ ³Ì ÆÞ A Þ E ºĐ ºÅ Þ A ÆÅ (i.e., à ), ÌÞ A [16]. 1 L 1 ÆÅ Ì ÞÍ

500 Ô 32 Ú³ÎÌĐµÌ «(alternating direction method of multipliers, ADMM) [17] вÌÆ Õ²ÅÅ L(A, E, Y, µ) = A + λ E 1 + Y, X A E + µ 2 X A E 2 F, (3) «Y ² µ > 0 ³ ºÅ Ï«ÇÆ Ã 1 ADMM à X, λ. A, E. Ì Y 0 = (i) Ï«² : X J(X), E0 = 0, µ 0 > 0, ρ > 1, t = 0. (ii) UΣV = svd(x E t + 1 µ t Y t ); A t+1 = US 1 µ t V T ; (iii) E t+1 = S λ µ t ( X A t+1 + 1 µ t Y t ); (iv) Y t+1 = Y t + µ t (X A t+1 E t+1 ); (v) µ t+1 = ρµ t ; (vi) t := t + 1. à ϫ1 «ρ µ ³¾µ t ÝØ ³ÇÅ U V à (SVD) ĐÞ Σ ĐÞ Ö 1(c) Ò Ö³Æ Æ «ÙÍ ÚÚ Ú ² ÄÝÒ¾ÆÜ Í¾¾ Ü Ð º [8]. Åµß «Ú ; RPCA ²» 1 (a) MIR-1K [18] Ani-1-01 ÎÅÆ 0 É Å (b)(c) É RPCA ÉÜ Î

Ô 3 Ý ½µ À ĐÑ Ö 501 ÑÒ Ý Ú² ÒÏ«ÈÆ À REPET RPCA µ Í µ ÑÒ 1.2 Ë (REPET) Rafii REPET «[4-5] ÆÜ ÆÓ Æ ³ (i) ÆÜ ÜµÐÑÏÞ ÆÀ¾ É Ü˺ x оΠ(short time Fourier transform, STFT), Ò¾ÕÌ X, X «Í ÉÕÚÖ V. ÉÑÏ»ÍÚ V 2 (V «³Í Ø ) Ü Þ B, B(i, j) = m j+1 1 V 2 (i, k)v 2 (i, k + j 1). (4) m j + 1 k=1 B ÒÚ b, Ì b(j) = 1 n n B(i, j), i=1 b(i, j) = b(j) b(1), (5) «i = 1, 2,, n (ÕÍ), n = N/2 + 1, j = 1, 2,, m (¾Ü ). (ii) ÆÓ È ÒÚ b ÑÁÆÜ µðõúö V «Ü p ÑÒËÝ r É Ú± r ÕÚÖÓ «Î Æ S, S(i, j) = median{v (i, l + (k 1)p)}, (6) «i = 1, 2,, n (ÕÍ), l = 1, 2,, p (¾Ü), k = 1, 2, r, p Ü Ö³Æ ÒÕÚÖ ¾Ü² Ì ÚÆÆ (Ä¾Ü Üʲ). Ú ÐÌ«Î µ Î ºÆÆ (iii) Æ Æ Ú É ÌÈà «Æ Æ Ì W ÆÕÚÖ Ô ÕÚÖ V W, V W Ý W V. ÄÅ ÆÕÚÖ W µð S V ܲ W(i, l + (k 1)p) = min{s(i, l), V (i, l + (k 1)p)}. (7) ÆÀÚ W ÌÁ¾ÕÞ M, M(i, j) = W(i, j), M(i, j) [0, 1], (8) V (i, j)

502 Ô 32 «i = 1, 2,, n (ÕÍ), l = 1, 2,, p (¾Ü), W «ÆÆ Ý 1, ÔÝ 0. ²ÉоÕÞ M µæõúö X m ÕÚÖ X v, «i = 1, 2,, N, j = 1, 2,, m. 1.3 ĐÒÁºÈ Ç ÒÚ X m (i, j) = M(i, j)x(i, j), (9) X v (i, j) = (1 M(i, j))x(i, j), (10) ÂÐ Ð Õ Ï«ÇÖ Ö 2 ÒÚ³»ÙÆ Ö³ Ì RPCA à ÕÚÖ X Á Æ X H X 1 V. RPCA Ï«²Å Äݳ ÛÞ «Õ Š««µ³ Ò ±ÛÌ ½Æ ÚÚÏ ¹²Ò Æ «ÙÍÚÚÊ Ò ÄÝÚ Ò¾µ² [8]. Åµß «Ú ; RPCA ²ÑÒ ³Ë RPCA ÎÕ ÒÏ«Í Æº ÒÓ ¹Ì REPET À ÆÍ X P Ý X V. Ó ÆÆ X P Ó Æ X H Ð ÁÆ X M, X M = X P +X H. ¼ Þ «ÒÒ³¾ÕÍÆ Ý²Ò³ Í» 2 ÞÐÐÈÄ É X vocal X music µ µðæ Wiener Î X V X vocal = X, X V + X M (11) X M X music = X, X V + X M (12) «X à ÕÚÖ «À³

Ô 3 Ý ½µ À ĐÑ Ö 503 µ µ ¹³Å µåà Ð¹Ì Wiener Î ¾ÕÅ X vocal Ú M V M B Å ÐÆ ³µ ÕÚÖ M V =, X vocal + X music (13) X music M B =, X vocal + X music (14) X vocal = M V X, (15) X music = M B X, (16) ²ÉÌÎ (ISTFT) ÕÚÖ̾ 2 º 2.1 Ø Ð̽ MIR-1K Å Ï«ºÙ Å Hsu Jang [18], 1 000 ³Æ«Ó ¹ÍÝ 16 khz, Ý 4 13 s. ³ «ÓÝÒ OK Êƾ ȳ ÒÊ Ú 1 000 ³«Ó ¼ òÒÅÀ Ù É ³¼Ó à ²Ò 5 ( À ), 0 ( ÀÒ) +5 ( À ). Ú Á³ 1 000 ³ à ¼Å ²ÒÅ Ü «²Ò 2.2 ; ºÙ Ì BSS EVAL º v2.1. Ü (source to distortion ratio, SDR) (source to interference ratio, SIR) º (source to artifacts ratio, SAR) º Ï«Ù Å Ù ÍÚ ÂÁ³Ì SDR (normalized SDR, NSDR), NSDR( v; v; x) = SDR( v; v) SDR(x; v), (17) «v Ƴà v x à NSDR SDR Ҳà x v ³ ÌÆÀ³Ã º

504 Ô 32 ²ÉÌ Global NSDR (GNSDR) Ùسź GNSDR = N w i NSDR( v i, v i, x i ) i=1, (18) N w i i=1 «w i Ó i ³Ó N ³Å««ÓÅ SDR, SAR, SIR, NSDR GNSDR 2.3 Ç» Ð «Ï«REPET Rafii Pardo ÛÞ «[4]. RPCA Huang ÛÞÞ Ï«[7]. MLRR Yang ÛÞ «[9]. CA Ы«Ò««ÕÚÖÝоΠSTFT ÑÏ Á Ý 64 ms, FFT Ý 1 024 ³¹Õ Á ÆÙÝ 25%. RPCA ºÅ³ 1 Ý λ =, CA ºÅ³Ý λ = 1. max(m,n) max(m,n) 2.4»Ô Ö 3 Ð GNSDR ÁÊ Ï«Ò 5, 0 +5 Æ º È«µÆ Ï ÚÊ Ï«ÑÝ Á Ó Ò 5 0 Æ «²º (GNSDR ²), ÕÕ +5 ¾ MLRR Á REPET RPCA Ú ÛÞ «Í Ú REPET Ý RPCA ³É ÒÆ À À ¾ º ÕÕ Ö 4 ÐÌ SDR (Ó ) SIR (Ó ) SAR (Ó ) Ê Ï«É º ȳ 5 0 +5 ÒëÖȳ RPCA (P) REPET (R) MLRR (M) Ð «(CA). ³Ö«ÜÒ (ÇÇØ) «ÈÖ 4 «µ² Ò 5 0 Æ Ï«CA Á ² SDR SIR, Ó SAR Ò +5 Æ CA SIR ² SAR ² SDR Ó

Ô 3 Ý ½µ À ĐÑ Ö 505 --» 3 ÀÉ RPCA, REPET, MLRR ¾ ±Ä (CA) Æ 5, 0, +5 É ÂÁÉܵΠGNSDR, ÌÅÌ ÉÜÅ» 4 RPCA (P), REPET (R), MLRR(M) ¾ ÞÐÐÈÄ (CA) SDR(), SIR() SAR() Æ 5( ), 0( ), +5( À ) É Î ¹Â MIR-1 K ÎŵÉÜÎ ¹ Ò 5 0 Æ CA «² º ÛÞ «MLRR. RPCA REPET Ï«È «(SIR ²). ÚÈ º Ý Ø ( SAR). Ã Æ ÔÑ

506 Ô 32 Ö 5 Ö 6 Ê «Ò Ý 0 ¾ Æ ¾ ÈÆ ÖÇ (Æ ) RPCA REPET MLRR Ð «CA È ² CA Æ À»Þ «Ð «CA, «RPCA MLRR ÈÁ Ò Ò ÈÆ ² RPCA REPET «CA Æ» ² ÆÆ «Í É Ð «CA ²ÏÒ Í Ý 3 Ó Ã Ð Æ ÛºÚ Á³ «ÒÚ³» ÙÆ À RPCA REPET ÀÌ Æ ÆÍ Ú ««²Ò Ð «ºÆ 1 000 ³Ð«Ó MIR-1K Å ÐÐ «Ý Ï«Á ¾ «¼ ÅÀ Ò À» 5 MIR-1K Ani-1-01 ÎÄ»

Ô 3 Ý ½µ À ĐÑ Ö 507» 6 MIR-1K Ani-1-01 ÚÄ» ÀÒ¾ Ú «ÛÞ«Æ ÄÅÅ Ü ÒÅɺ «Ð µè ÅÏ«º Ù½ [1] Han J, Chen C W. Improving melody extraction using probabilistic latent component analysis [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, 2011: 33-36. [2] Fujihara H, Goto M, Ogata J, Komatani K, Ogata T, Okuno H G. Automatic synchronization between lyrics and music cd recordings based on viterbi alignment of segregated vocal signals [C]//Proceedings of International Symposium on Multimedia, 2006: 257-264. [3] Berenzweig A, Ellis D P W, Lawrence S. Using voice segments to improve artist classification of music [C]//Proceedings of AES 22nd International Conference: Virtual, Synthetic, and Entertainment Audio, 2002: 1-8. [4] Rafii Z, Pardo B. Repeating pattern extraction technique (REPET): a simple method for music/voice separation [J]. IEEE Transactions on Audio, Speech and Language Processing, 2013, 21(1): 73-84.

508 Ô 32 [5] Rafii Z, Pardo B. A simple music/voice separation method based on the extraction of the repeating musical structure [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, 2011: 221-224. [6] Liutkus A, Rafii Z, Badeau R, Pardo B, Richard G. Adaptive filtering for music/voice separation exploiting the repeating musical structure [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, 2012: 53-56. [7] Huang P S, Chen S D, Smaragdis P, Johnson M H. Singing voice separation from monaural recordings using robust principal component analysis [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, 2012: 57-60. [8] Yang Y H. On sparse and low-rank matrix decomposition for singing voice separation [C]// Proceedings of ACM International Conference on Multimedia, 2012: 757-760. [9] Yang Y H. Low-rank representation of both singing voice and music accompaniment via learned dictionaries [C]//Proceedings of International Society for Music Information Retrieval Conference, 2013: 427-432. [10] Tachibana H, Ono N, Sagayama S. Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms [J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22(1): 228-237. [11] Ono N, Miyamoto K, Roux J L, Kameoka H, Sagayama S. Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram [C]// Proceedings of European Signal Processing Conference, 2008: 1-4. [12] FitzGerald D, Gainza M. Single channel vocal separation using median filtering and factorisation techniques [J]. ISAST Transactions on Electronic and Signal Processing, 2010, 4(1): 62-73. [13] FitzGerald D. Harmonic/percussive separation using median filtering [C]//Proceedings of International Conference on Digital Audio Effects (DAFx-10), 2010. [14] Zhu B, Li W, Li R, Xue X. Multi-stage non-negativematrix factorization for monaural singing voice separation [J]. IEEE Transactions on Audio, Speech and Language Processing, 2013, 21(10): 2096-2107. [15] Ikemiya Y, Yoshii K, Itoyama K. Singing voice analysis and editing based on mutually dependent f0 estimation and source separation [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, 2015: 574-578. [16] Candés E J, Li X, Ma Y, Wright J. Robust principal component analysis? [J]. Journal of the ACM, 2009, 58(3):1-73. [17] Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers [J]. Foundations and Trends in Machine Learning, 2011, 3(1): 1-122. [18] Hsu C L, Jang J S R. On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset [J]. IEEE Transactions on Audio, Speech and Language Processing, 2010, 18(2): 310-319.