MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

Size: px
Start display at page:

Download "MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara"

Transcription

1 MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa Japan {degawa, sato, ABSTRACT This paper investigates the pitch estimation and the instrument recognition of music signals. A note exemplar is a spectrum segment of notes of the specific pitch and instrument, which is stored as a form of dictionary preliminarily. We describe the method of reconstructing a frame of musical signals as the linear combination of exemplars from the large exemplar dictionary with sparse (l 1 minimized) coefficient vector. Reconstruction constraints are imposed to KL divergence of spectra, which is found to produce better results than Euclidean distance. The proposed algorithm shows the ability to transcript music pieces with relatively many notes per a frame and to divide the instrument explicitly through some experiments. Index Terms pitch estimation, instrument recognition, l 1 regularized minimization, note exemplar. 1. INTRODUCTION The research of content-based music information retrieval (MIR) has drawn more attention because of the explosive growth of digital musics, recently. Estimating multiple fundamental frequencies (pitches) and recognizing played instruments are important tasks for many applications including automatic music transcription [9]. Many multiple pitch detection systems have been proposed, such as the method based on spectral peaks and maximum likelihood estimation[5], and Non-negative matrix factorization (NMF) [6], and so on. On the other hand, few instrument recognition systems for estimated pitches is proposed. One of these is [4], which deals with both pitch and instrument features for realtime estimation with low calculation costs. Exemplar based sparse representation aims at reconstructing an input signal vector y into a weighted sum of atoms in the dictionary matrix A, where y = Ax. Assuming the weight vector x is sparse and has corresponding values to atoms of A, it enables to extract the information to classify or decompose input signals. It can be applied in principle to various systems such as face recognition [2], speech recognition [3], and so on. The pitch estimation with this technique has been tried in [1], which is specialized only in piano musics and largely relies on preprocessing such as selecting pitch candidates from spectral peaks. In our method, we apply exemplar-based sparse representation to multi-pitch estimation, and then achieve the instrument recognition in the given musical excerpt. Since preprocessing and retraining process is not necessary, the proposed method is quite simple and powerful to handle musically complicated signals because of the l 1 minimization. 2. EXEMPLAR-BASED SPARSE REPRESENTATION We perform pitch and instrument estimation for each frame individually. Fig.1 shows the overview of exemplar-based sparse representation. Given a observation vector y t at frame t, nonnegative coefficient vector x t with sparsity constraints is determined in the following l 1 minimization problem: ˆx t = arg min x t 1 s.t. y t = Ax t, x 0 (1) In most cases y t = Ax t is underdetermined (in other words dictionary A is overcomplete). We reformulate (1) as ˆx t = arg min y t Ax t λ x t 1 s.t. x 0 (2) where λ is a positive regularization parameter. There are a lot of method to solve (2). One of them is truncated Newton interior-point (TNIP) method [7], in which nonnegativity constraint on the coefficients ˆx t can be easily added. Note that the minimization is conducted individually on each frame. One of the advantage of exemplar-based sparse representation is that it would not require any learning processes and retraining the dictionary in case of adding new note exemplars. Furthermore, it can exploit the pitch range of each instrument spontaneously, because impossible pitch candidates for a specific instrument is originally eliminated from the dictionary. The prior information such as which instrument are played in a music piece is also valuable in the proposed system, because it is easy to reassemble dictionary A with note exemplars of instruments under consideration /13/$ IEEE 560 Asilomar 2013

2 y t A x t (a) (b) (c) Fig. 1: Illustration of exemplar-based sparse representation of a music frame. (a) input spectrum y t. (b)note-exemplar dictionary A. (c) sparse coefficients vector x t. 3. KL DIVERGENCE MINIMIZATION In order to obtain better results, we use the generalized Kullback-Leibler (KL) divergence d(, ) instead of Euclidean distance as follows. d(y, ŷ) = K k=1 y k log y k ŷ k y k + ŷ k (3) KL divergence has been found to produce better accuracy than the Euclidean distance in many sound processing methods such as [3], [12]. Then the minimization is reformulated as follows: ˆx t = arg min d(y, Ax) + λ x t 1 (4) The cost function of (4) is minimized by first initialising the entries of the vector x to unity, and then iteratively applying the update rule: x x. (A T (y./(ax)))./(a T 1 + λ). (5) where. and./ denote element-wise multiplication and division, respectively. The vector 1 is an all-one vector. The deriveration of (5) is noted in [3], [10]. 4. NOTE NUMBER ESTIMATION As the ˆx t becomes available, the activation score S(p, t, i) for the frame t, pitch p and instrument i is calculated by summing the values of elements in ˆx t corresponding to the note under consideration. However, we cannot simply use the activation score because deciding the number of notes (pitches) in a frame and instrument is quite complex and challenging task. A musical Fig. 2: Illustration of note number decision algorithm. note contains some harmonic sounds at integral multiple frequency of the basis notes, so there is a possibility to extract harmonic as another note. Deciding the number of notes by thresholding uniquely may cause octave error and so on. To address this problem, we develop an dynamic thresholding algorithm which decides the number of note in a manner similar to the one in proposed in [5] (see Fig. 2). First, silent frames are detected by summing the activation score with respect to p and thresholding by 0.1 times of maximum value of sum of activation score. The note numbers of these frames are set to 0. Second, we select M pitch in a specific instrument and frame order by the activation score S(p, t, i). Then we set the threshold to T S, where 0 < T < 1 is experimentally learned constant and S = S(1) S(M). The note number is decided in as the number of activation score exceeding the threshold. In Fig.2 note number is decided as 4. For all experiments in this paper, the maximum polyphony M is set to 10. T is empirically determined to be

3 5. TEMPORAL SMOOTHING Polyphony estimation in a single frame is not robust, because there are often deletion, insertion, and substitution errors. It also can be said that, if a note if found active in a certain frame, it is very likely that the note is also active in the subsequent frames. To solve this problem, we adopted the smoothing technique based on HMM (Hidden Marcov Model) along with [1]. For the pth note (pitch), the problem can be formulated as Ŝ p = arg max S p T P (o t s p t )P (s p t s p t 1 ) (6) t=1 where Ŝ is a state sequence, sp t is the state of the note at time t, o t is the music frame beginning at time t, P (o t s i t) is the probability of observing o t given s p t, and P (s p t s p t 1 ) is the transition probability between states. Using Bayes thereom p(s p t ) P (o t s p t )P (s p t ), we have Ŝ p = arg max S p T t=1 P (s p t o t ) P (s p t ) P (sp t s p t 1 ) (7) instead of (6). P (s p t o t ) is obtained by dividing the activation score of the note by the maximum activation score at time t. Both the priorp (s p t ) and the state transition probability P (s p t s p t 1 ) can be learned from the training data in MIDI format. Then, Viterbi algorithm is applied to find the state sequence that maximizes (7). Implementation details are described in [11]. Though it is also possible to take account of smoothness between notes (internote smoothness) adopting coupled HMM, we just use (7) because coupled HMM is computationally expensive. After the above smoothing process, note estimates shorter than 100ms (in this paper 10 frames) are removed. Since pitches of music signals are locally stable, relatively short notes are considered to be noise. 6. EVALUATION In this section, we show the estimation results. We conducted two experiments: Single-instrument music transcription and Multiple-instrument music transcription, which use different datasets (training sample and test sample). The sampling frequency of music pieces are 44.1kHz, and their frequency component under 8kHz is used to produce spectrogram. STFT frame length is 100ms, hop size is 10% (10ms), and pitch range is MIDI1-MIDI120. We use the first 30s of each dataset for reducing the amount of calculation. Note samples are prepared from Logic Pro and RWC music database [8], and is sampled 5 frames at intervals of 2 frames from the head as note exemplar. Table 1: Results of monphonic music transcription(average of 60 music pieces). (a) Frame-base result None Gauss HMM Lee s[1] P 65.4% 62.2% 64.1% 74.4% R 65.7% 71.6% 73.3% 66.5% F 64.6% 65.5% 67.3% 70.2% (b) Note-base result None Gauss HMM P 67.3% 62.1% 68.0% R 79.4% 67.1% 79.3% F 71.4% 63.2% 71.8% Three standard metrics, namely precision (P ), recall (R), and F-measure (F ) are used for the evaluation. N tp P = (8) N tp + N fp N tp R = (9) N tp + N fn F = 2P R P + R (10) where N tp is the number of correct pitch estimate (true positives), N fp is the number of inactive pitches estimated as active (false positives), and N fn is the number of active pitches estimated as inactive (false negative) for all frames. In addition to the above frame-based evaluation, we also perfom note-based performance evaluation of the proposed system along with [1]. If a note s state is consecutively active for more than ten frames, the onset of the note is considered to be active at the time tick 10ms before the end of the first active frame because the hop size is 10ms. Then we calculate above three metrics same as frame base evaluation Single-instrument music transcription In the Single-instrument music transcription, we estimates only pitches from test pieces of MAPS database [13], where 60 music pieces were generated by a Steinway D piano, and underlying pitches can be obtained from the corresponding (aligned) MIDI file. The note exemplar dictionary consists of 4 instruments, Alto sax, Piano, Vibraphone, and Bass, and they each have 3 individuals. Table. 1 shows the performance of single-instrument music transcription. We processed and compared between three types of post-processing: None (not processed any postprocessing), Gauss (convolving Gaussian function along with time), and HMM (described in section 5). F measure 67.3% in HMM of (a) is the best results of three postprocessing since this technique harnesses musical property of signal. The tables also indicates that, although proposed 562

4 Fig. 3: The estimation result of liz et4 SptkBGCl(best note-base result) with HMM post-processing. P =97.3%, R=82.2%, F =89.1%. Fig. 4: The estimation result of Vibraphone of Lounge Away with HMM post-processing. P =37.4%, R=74.7%, F =49.8% method does not consist of any preprocessing steps such as tuning adjustment or note candidate selection and does not include various instrument spectra in the dictionary, the accuracy of our result is comparable to that of Lee s method[1] on the same test samples. In note-base evaluation, F measure of None and HMM achieves over 70%, but Gauss smoothing have a bad effect on the result. This is thought that some notes combine with adjacent notes and then onsets are valished by simple smoothing technique. The pianoroll result is depicted as Fig.3 for an example. The true positives are represented as black, the false positives as red, and the false negatives as yellow. It can be seen that each note is clearly distinguished with adjacent notes, which makes the accuracy rates significantly high Multiple-instrument music transcription In the Multiple-instrument music transcription, we estimate pitches for multiple instruments. The component of note exemplar dictionary is same as single-instrument music transcription. Since there are few music datasets with corresponding score information like aligned MIDI file, we mixed three music sample by Logic Pro from MIDI files of RWC database for evaluation. In estimation, the the number of instruments is supposed to be known, then we use two activation scores of the instrument i with larger summations of S and binarize as noted in section 4 to generate piano rolls. In Fig.4 and 5, we show a result of the pitch and instrument estimation. It indicates that the proposed approach has very good performance to discriminate multiple instruments whose pitch ranges are overlapped in some degree and to estimate pitches independently. F measures in Table 2 and 3 is not less than 60%. To the best of our knowledge, this is the Fig. 5: The estimation result of Piano of Lounge Away with HMM post-processing. P =37.4%, R=74.7%, F =49.8% greatest accuracy of pitch estimation of musics of multiple instruments. 7. CONCLUSION In this paper we presented a system for detection of multiplepitches produced by multiple-instruments. The core of the proposed system relies on l 1 -norm minimization (4) and the use of KL divergence. By nature, the system is free of leaning and retraining processes of the exemplar dictionary. The results show that the proposed system has the ability to estimate pitches and recognize instrument sorts of the music samples with relatively large number of notes. We are planing to apply the method to non-harmonic musical notes (mainly 563

5 Table 2: Frame-base Results of multiple-instruments music transcription of 3 music piecies. Table 3: Note-base Results of multiple-instruments music transcription of 3 music piecies. P 78.8% 74.2% 71.6% 55.9% 76.5% 49.7% R 61.3% 55.8% 56.1% 92.4% 57.7% 87.9% F 69.0% 63.7% 62.9% 69.7% 65.7 % 63.5% P 75.1% 53.9% 66.7% 53.8% 71.2% 46.8% R 67.5% 73.8% 66.0% 94.9% 68.1% 91.8% F 71.1% 62.3% 66.3% 68.7% 69.6% 62.0% P 78.5% 70.4% 71.4% 54.4% 75.9% 48.3% R 68.2% 70.0% 66.5% 96.2% 66.3% 94.6% F 73.0% 70.2% 68.8% 69.5% 70.8% 63.9% P 88.8% 54.8% 72.6% 93.2% 79.4% 91.4% R 83.3% 83.3% 82.0% 92.1% 87.5% 81.3% F 85.9% 66.1% 77.0% 92.7% 83.2% 86.0% P 77.7% 15.7% 57.0% 94.0% 58.8% 81.1% R 69.9% 45.8% 73.7% 87.6% 77.8% 65.9% F 73.6% 23.4% 64.3% 90.7% 67.0% 72.7% P 88.8% 58.8% 72.9% 93.2% 79.8% 91.3% R 83.3% 83.3% 82.0% 92.1% 87.5% 80.2% F 85.9% 69.0% 77.2% 92.7% 83.5% 85.4% percussions) and evaluate the effectivenes of dictionary based sparse coding on it as a future work. 8. REFERENCES [1] C. Lee, Y. Yang, H. H. Chen, Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation, IEEE Trans. Multimedia, vol. 14, no.3, June 2012 [2] J. Wright, A. Y. Yang, A. Ganesh, and S. S. Sastry, Yi Ma, Robust Face Recognition via Sparse Representation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp , Feb [3] J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition, IEEE Trans. on Audio, Speech, Lang. Process., vol. 19, no. 7, pp , Sept [4] A. Cont and S. Dubnov, Realtime Multiple-pitch and Multiple-Instrument Recognition for Music Signals Using Sparse Non-negative Constraints, Proc. of the 10th Int. Conference on Digital Audio Effects(DAFx-07), Bordeaux, France Sept , 2007 [5] Z. Duan, B. Pardo, and C. Zhang, Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions, IEEE Trans. Audio, Speech, Lang. Process., vol.18, no.8, pp , Nov [6] N. Boulanger-Lewandowski, Y. Bengio and P. Vincent, Discriminative non-negative matrix factorization for multiple pitch estimation, International Society for Music Information Retrieval (ISMIR), 2012 [7] S. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky, An Interior-point Method for Large-Scale L1-Regularized Least Squares, IEEE J. Sel. Topics Signal Processing, vol. 1, no. 4, Dec [8] M. Goto, H. Hashiguchi, T.Nishimura, and R. Oka, RWC Music Database: Popular, classical and jazz music databases., in International Symposium on Music Information Retrieval (ISMIR), [9] M. A. Casey, R.Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, Content-based music information retrieval: current directions and future challenges, Proc. IEEE, vol. 96, no. 4, pp , Apr [10] D. D. Lee and H. S. Seung, Algorithms for nonnegative matrix factorization, in Proc. Neural Information Processing Systems, 2002, pp [11] Lou, H. L., Implementing the Viterbi algorithm, IEEE Signal Processing Magazine, 1995, 12(5), pp [12] T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Trans. on Audio, Speech, Lang. Process, vol. 15, no. 3, pp , [13] MAPS Database - A piano database for multipitch estimation and automatic transcription of music 564

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription Constrained Nonnegative Matrix Factorization with Applications to Music Transcription by Daniel Recoskie A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

CS229 Project: Musical Alignment Discovery

CS229 Project: Musical Alignment Discovery S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,

More information

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering

More information

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES Sebastian Ewert Mark D. Plumbley Mark Sandler Queen Mary University of London, London, United Kingdom ABSTRACT

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

Oracle Analysis of Sparse Automatic Music Transcription

Oracle Analysis of Sparse Automatic Music Transcription Oracle Analysis of Sparse Automatic Music Transcription Ken O Hanlon, Hidehisa Nagano, and Mark D. Plumbley Queen Mary University of London NTT Communication Science Laboratories, NTT Corporation {keno,nagano,mark.plumbley}@eecs.qmul.ac.uk

More information

Scalable audio separation with light Kernel Additive Modelling

Scalable audio separation with light Kernel Additive Modelling Scalable audio separation with light Kernel Additive Modelling Antoine Liutkus 1, Derry Fitzgerald 2, Zafar Rafii 3 1 Inria, Université de Lorraine, LORIA, UMR 7503, France 2 NIMBUS Centre, Cork Institute

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION 13 th International Conference on Digital Audio Effects Romain Hennequin, Roland Badeau and Bertrand David Telecom

More information

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION

EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION Nicolas Boulanger-Lewandowski Gautham J. Mysore Matthew Hoffman Université de Montréal

More information

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? Tiago Fernandes Tavares, George Tzanetakis, Peter Driessen University of Victoria Department of

More information

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara

More information

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence

Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence Arnaud Dessein, Arshia Cont, Guillaume Lemaitre To cite this version: Arnaud Dessein, Arshia Cont, Guillaume

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

Learning Dictionaries of Stable Autoregressive Models for Audio Scene Analysis

Learning Dictionaries of Stable Autoregressive Models for Audio Scene Analysis Learning Dictionaries of Stable Autoregressive Models for Audio Scene Analysis Youngmin Cho yoc@cs.ucsd.edu Lawrence K. Saul saul@cs.ucsd.edu CSE Department, University of California, San Diego, 95 Gilman

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION Tal Ben Yakar Tel Aviv University talby0@gmail.com Roee Litman Tel Aviv University roeelitman@gmail.com Alex Bronstein Tel Aviv University bron@eng.tau.ac.il

More information

AUTOMATIC Music Transcription (AMT) is a fundamental. An End-to-End Neural Network for Polyphonic Piano Music Transcription

AUTOMATIC Music Transcription (AMT) is a fundamental. An End-to-End Neural Network for Polyphonic Piano Music Transcription 1 An End-to-End Neural Network for Polyphonic Piano Music Transcription Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon Abstract We present a supervised neural network model for polyphonic piano music

More information

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES

REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of

More information

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY Jean-Rémy Gloaguen, Arnaud Can Ifsttar - LAE Route de Bouaye - CS4 44344, Bouguenais, FR jean-remy.gloaguen@ifsttar.fr Mathieu

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST). IT, AIST, 1-1-1 Umezono, Tsukuba,

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System

Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System Abstract and Applied Analysis Volume 2012, Article ID 302958, 13 pages doi:101155/2012/302958 Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System Yi Guo

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

QUERY-BY-EXAMPLE MUSIC RETRIEVAL APPROACH BASED ON MUSICAL GENRE SHIFT BY CHANGING INSTRUMENT VOLUME

QUERY-BY-EXAMPLE MUSIC RETRIEVAL APPROACH BASED ON MUSICAL GENRE SHIFT BY CHANGING INSTRUMENT VOLUME Proc of the 12 th Int Conference on Digital Audio Effects (DAFx-09 Como Italy September 1-4 2009 QUERY-BY-EXAMPLE MUSIC RETRIEVAL APPROACH BASED ON MUSICAL GENRE SHIFT BY CHANGING INSTRUMENT VOLUME Katsutoshi

More information

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence Fabio Louvatti do Carmo; Evandro Ottoni Teatini Salles Abstract This paper proposes a modification of the

More information

BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS

BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS Daichi Saaue Tauma Otsua Katsutoshi Itoyama Hiroshi G. Ouno Graduate School of Informatics, Kyoto University

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS

STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 STRUCTURE-AWARE DICTIONARY LEARNING WITH HARMONIC ATOMS Ken O Hanlon and Mark D.Plumbley Queen

More information

MULTI-LABEL VS. COMBINED SINGLE-LABEL SOUND EVENT DETECTION WITH DEEP NEURAL NETWORKS. Emre Cakir, Toni Heittola, Heikki Huttunen and Tuomas Virtanen

MULTI-LABEL VS. COMBINED SINGLE-LABEL SOUND EVENT DETECTION WITH DEEP NEURAL NETWORKS. Emre Cakir, Toni Heittola, Heikki Huttunen and Tuomas Virtanen MULTI-LABEL VS. COMBINED SINGLE-LABEL SOUND EVENT DETECTION WITH DEEP NEURAL NETWORKS Emre Cakir, Toni Heittola, Heikki Huttunen and Tuomas Virtanen Department of Signal Processing, Tampere University

More information

THE task of identifying the environment in which a sound

THE task of identifying the environment in which a sound 1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES

OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES OPTIMAL WEIGHT LEARNING FOR COUPLED TENSOR FACTORIZATION WITH MIXED DIVERGENCES Umut Şimşekli, Beyza Ermiş, A. Taylan Cemgil Boğaziçi University Dept. of Computer Engineering 34342, Bebek, İstanbul, Turkey

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry

More information

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT

SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT Pablo Sprechmann, 1 Alex M. Bronstein, 2 and Guillermo Sapiro 1 1 Duke University, USA; 2 Tel Aviv University,

More information

Non-Negative Tensor Factorisation for Sound Source Separation

Non-Negative Tensor Factorisation for Sound Source Separation ISSC 2005, Dublin, Sept. -2 Non-Negative Tensor Factorisation for Sound Source Separation Derry FitzGerald, Matt Cranitch φ and Eugene Coyle* φ Dept. of Electronic Engineering, Cor Institute of Technology

More information

COMPARISON OF FEATURES FOR DP-MATCHING BASED QUERY-BY-HUMMING SYSTEM

COMPARISON OF FEATURES FOR DP-MATCHING BASED QUERY-BY-HUMMING SYSTEM COMPARISON OF FEATURES FOR DP-MATCHING BASED QUERY-BY-HUMMING SYSTEM Akinori Ito Sung-Phil Heo Motoyuki Suzuki Shozo Makino Graduate School of Engineering Tohoku University Aoba 05, Aramaki, Sendai, 980-8579

More information

A Modular NMF Matching Algorithm for Radiation Spectra

A Modular NMF Matching Algorithm for Radiation Spectra A Modular NMF Matching Algorithm for Radiation Spectra Melissa L. Koudelka Sensor Exploitation Applications Sandia National Laboratories mlkoude@sandia.gov Daniel J. Dorsey Systems Technologies Sandia

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION

BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya

More information

Music transcription with ISA and HMM

Music transcription with ISA and HMM Music transcription with ISA and HMM Emmanuel Vincent, Xavier Rodet To cite this version: Emmanuel Vincent, Xavier Rodet. Music transcription with ISA and HMM. 5th Int. Conf. on Independent Component Analysis

More information

Sparseness Constraints on Nonnegative Tensor Decomposition

Sparseness Constraints on Nonnegative Tensor Decomposition Sparseness Constraints on Nonnegative Tensor Decomposition Na Li nali@clarksonedu Carmeliza Navasca cnavasca@clarksonedu Department of Mathematics Clarkson University Potsdam, New York 3699, USA Department

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

Introduction Basic Audio Feature Extraction

Introduction Basic Audio Feature Extraction Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017 Today g Main modules A. Sound and music for

More information

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION. Romain Hennequin Deezer R&D 10 rue d Athènes, 75009 Paris, France rhennequin@deezer.com

More information

Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription

Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription Benoît Fuentes, Roland Badeau, Gaël Richard To cite this version: Benoît Fuentes, Roland Badeau, Gaël Richard.

More information

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard To cite this version: Victor Bisot, Romain Serizel, Slim Essid,

More information

IN real-world audio signals, several sound sources are usually

IN real-world audio signals, several sound sources are usually 1066 IEEE RANSACIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 Monaural Sound Source Separation by Nonnegative Matrix Factorization With emporal Continuity and Sparseness Criteria

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation

IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation Yichi Zhang and Zhiyao Duan Audio Information Research (AIR) Lab Department of Electrical and Computer Engineering University of Rochester

More information

Ken O Hanlon and Mark B. Sandler. Centre for Digital Music Queen Mary University of London

Ken O Hanlon and Mark B. Sandler. Centre for Digital Music Queen Mary University of London AN ITERATIVE HARD THRESHOLDING APPROACH TO l 0 SPARSE HELLINGER NMF Ken O Hanlon and Mark B. Sandler Centre for Digital Music Queen Mary University of London ABSTRACT Performance of Non-negative Matrix

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Deep NMF for Speech Separation

Deep NMF for Speech Separation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization

More information

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard

ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard LTCI, CNRS, Télćom ParisTech, Université Paris-Saclay, 75013,

More information

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh

MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES. Mingzi Li,Israel Cohen and Saman Mousazadeh MULTISENSORY SPEECH ENHANCEMENT IN NOISY ENVIRONMENTS USING BONE-CONDUCTED AND AIR-CONDUCTED MICROPHONES Mingzi Li,Israel Cohen and Saman Mousazadeh Department of Electrical Engineering, Technion - Israel

More information

Short-Time Fourier Transform and Chroma Features

Short-Time Fourier Transform and Chroma Features Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Short-Time Fourier Transform and Chroma Features International Audio Laboratories Erlangen Prof. Dr. Meinard Müller Friedrich-Alexander Universität

More information

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization Fei Sha and Lawrence K. Saul Dept. of Computer and Information Science University of Pennsylvania, Philadelphia,

More information

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Sparse and Shift-Invariant Feature Extraction From Non-Negative Data Paris Smaragdis, Bhiksha Raj, Madhusudana Shashanka TR8-3 August 8 Abstract

More information

On the Projection Matrices Influence in the Classification of Compressed Sensed ECG Signals

On the Projection Matrices Influence in the Classification of Compressed Sensed ECG Signals On the Projection Matrices Influence in the Classification of Compressed Sensed ECG Signals Monica Fira, Liviu Goras Institute of Computer Science Romanian Academy Iasi, Romania Liviu Goras, Nicolae Cleju,

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1. Rohit Naini, Pierre Moulin

FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1. Rohit Naini, Pierre Moulin 014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) FINGERPRINT INFORMATION MAXIMIZATION FOR CONTENT IDENTIFICATION 1 Rohit Naini, Pierre Moulin University of Illinois

More information

SUPPLEMENTARY MATERIAL FOR THE PAPER "A PARAMETRIC MODEL AND ESTIMATION TECHNIQUES FOR THE INHARMONICITY AND TUNING OF THE PIANO"

SUPPLEMENTARY MATERIAL FOR THE PAPER A PARAMETRIC MODEL AND ESTIMATION TECHNIQUES FOR THE INHARMONICITY AND TUNING OF THE PIANO SUPPLEMENTARY MATERIAL FOR THE PAPER "A PARAMETRIC MODEL AND ESTIMATION TECHNIQUES FOR THE INHARMONICITY AND TUNING OF THE PIANO" François Rigaud and Bertrand David Institut Telecom; Telecom ParisTech;

More information

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011

744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 744 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 4, MAY 2011 NMF With Time Frequency Activations to Model Nonstationary Audio Events Romain Hennequin, Student Member, IEEE,

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre

Fast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions

More information

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka, SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION Tomohiko Nakamura and Hirokazu Kameoka, Graduate School of Information Science and Technology,

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS

JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS JOINT ACOUSTIC AND SPECTRAL MODELING FOR SPEECH DEREVERBERATION USING NON-NEGATIVE REPRESENTATIONS Nasser Mohammadiha Paris Smaragdis Simon Doclo Dept. of Medical Physics and Acoustics and Cluster of Excellence

More information

Experimental Evaluation of Superresolution-Based Nonnegative Matrix Factorization for Binaural Recording

Experimental Evaluation of Superresolution-Based Nonnegative Matrix Factorization for Binaural Recording Vol.1-MUS-13 No.3 1/5/ Experimental Evaluation of Superresolution-Based Nonnegative Matrix Factorization for Binaural Recording Daichi Kitamura 1,a) Hiroshi Saruwatari 1 Satoshi Naamura 1 Yu Taahashi Kazunobu

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

10ème Congrès Français d Acoustique

10ème Congrès Français d Acoustique 1ème Congrès Français d Acoustique Lyon, 1-16 Avril 1 Spectral similarity measure invariant to pitch shifting and amplitude scaling Romain Hennequin 1, Roland Badeau 1, Bertrand David 1 1 Institut TELECOM,

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

A State-Space Approach to Dynamic Nonnegative Matrix Factorization 1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization

More information

Convention Paper Presented at the 128th Convention 2010 May London, UK

Convention Paper Presented at the 128th Convention 2010 May London, UK Audio Engineering Society Convention Paper Presented at the 128th Convention 2010 May 22 25 London, UK 8130 The papers at this Convention have been selected on the basis of a submitted abstract and extended

More information

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS Hans-Jürgen Winkler ABSTRACT In this paper an efficient on-line recognition system for handwritten mathematical formulas is proposed. After formula

More information

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION Romain Hennequin, Roland Badeau and Bertrand David, Institut Telecom; Telecom ParisTech; CNRS LTCI Paris France romainhennequin@telecom-paristechfr

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

MULTIPLE PITCH TRANSCRIPTION USING DBN-BASED MUSICOLOGICAL MODELS

MULTIPLE PITCH TRANSCRIPTION USING DBN-BASED MUSICOLOGICAL MODELS 11th International ociety for Music Information Retrieval onference (IMIR 2010) MULIPLE PIH RARIPIO UIG DB-BAED MUIOLOGIAL MODEL tanisław A. Raczyński * raczynski@ Emmanuel Vincent+ emmanuel.vincent@ Frédéric

More information