Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors
|
|
- Emma Butler
- 5 years ago
- Views:
Transcription
1 Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Chikara Ishikawa, Koya Sahashi, Seiichi Nakagawa Department of Computer Science and Engineering Toyohashi University of Technology Toyohashi, Aichi, Japan {c145302, Abstract Acoustic Event Detection plays an important role for computational acoustic scene analysis. Although we would face with a sound overlapping problem in a real situation, conventional methods do not consider the problem enough. In this paper, we propose a new overlapped acoustic event detection technique combined a source separation technique of Non-negative Matrix Factorization with shared basis vectors and a deep neural network based acoustic model to improve the detection performance. Our approach showed 20.0% absolute higher performance than the best result achieved in the D-CASE 2012 challenge on the frame based F-measure. I. INTRODUCTION Acoustic Event Detection (AED) plays an important role for Computational Acoustic Scene Analysis (CASA) [1]. Applications such as lifelog tagging, a security system with image processing, and noise pollution detection, and so on have been considered with this technique [2], [3], [4], [5]. To detect an acoustic event, two general approaches mainly have been used. One of them is the use of acoustic models and features like as in an automatic speech recognition (ASR) system [6]. ASR systems usually use Gaussian Mixture Models (GMMs) or Deep Neural Network (DNN) based likelihood calculator with Hidden Markov Models (HMMs) as the acoustic model and Mel-frequency cepstrum coefficients (MFCC) or log Mel-filterbank outputs (FBANK) as the acoustic features. The other one is a source separation technique such as Non-negative matrix factorization (NMF) [7]. The IEEE D-CASE 2012 workshop [8] has been held for the CASA task in This task had two tasks: Acoustic Scene Classification and Acoustic Event Detection. As the subtasks of AED, there were Office Live (OL) task and Office Synthetic (OS) task. For both subtasks, an office environment and 16 acoustic events was supposed as the acoustic condition. In OL task, development tracks and test tracks were recorded in a real room. These tracks had no overlapped segments of acoustic events. In OS task, D- CASE provided development tracks and test tracks which included overlapped segments of acoustic events by artificially synthesizing the tracks. For these AED tasks, in similarity to ASR methods, Vuegen et al. [9] proposed a MFCC-GMM based system to detect acoustic events for OL and OS. The AED performance of this system was 43.4% frame-based F-measure for OL and 13.5% for OS. Gemmeke et al. [10] took an NMF-HMM based approach. Their method gave the best performance on OS in this challenge. This method showed 31.4% frame-based F-measure for OL and 21.3% for OS. The performance of these detection methods was still low and the detection accuracy (F-measure) was decreased in event overlapping situation like as in ASR system. It is an important factor to be able to detect multiple acoustic events in event overlapping segments which often appear in a real application. To improve the detection accuracy in event overlapping segments, there are some points to be considered. One of them is complex acoustic features of the event due to overlapping other acoustic events or background noises. It leads mismatching between acoustic features and acoustic models and difficulty of detection of the event. Another one is the variation of acoustic events. This task includes many sound source classes, not only human speech or voice. However, some sound classes are very similar partly and they have similar basis vectors in their basis matrix of NMF. Therefore, it is hard to discriminate particular characteristics for each class by NMF. In this paper, we propose a shared basis vector method for NMF to detect overlapped acoustic events. Conventional NMF works with a basis (dictionary) matrix and an activation matrix. We hope that the basis vectors are different each other in each class. However, the basis matrix has many similar component bases over event classes actually as described above, and the NMF process tends to give low activation weights equally to the bases or to give high activation weights to only one basis, which leads misdetections especially for overlapped acoustic events. In such a case, a basis vector in a common basis matrix over classes which is shared among suitable plural classes could be better basis representation. We believe that this method helps the NMF process provide appropriate activation weights to the bases in overlapping segments. This paper is organized in four sections: The next section describes conventional and our proposed basis methods and our detection system. Section III shows the experimental results. Finally, Section IV offers some conclusions. II. ACOUSTIC EVENT DETECTION FRAMEWORK Figure 1 shows the block diagram of our AED system. As preprocessing, the input signal (mixed sound source signal) is separated into each event class sound signal by NMF. After the separation, each sound spectrum is converted into acoustic features (MFCC or FBANK in this paper). To calculate a likelihood score for each event, the converted acoustic feature is fed to a deep neural acoustic model. To detect events, the scores are compared with pre-defined thresholds and the acoustic event is detected when the score is over the threshold. A. Conventional NMF Non-negative matrix factorization (NMF) is an effective tool for separating acoustic events in an overlapping segment [11]. By using NMF, a temporal-frequency spectrogram matrix S can be represented by multiplication of a class basis matrix W and an activation matrix H, i.e. S Ŝ = W H. As the number of frequency bins is noted as L, the number of frames is T, and the total number of basis vectors is C, the size of matrix S and Ŝ becomes L T, the size of matrix W is L C, and the size of matrix H is C T. When W n and /17/$ IEEE 420
2 Fig. 1. Block diagram of our acoustic event detection system H n is a basis matrix and an activation matrix for the event class n (N represents the number of classes), W and H is also represented as W = [W 1, W 2,..., W N ] and H = [H1, t H2, t..., HN t ] t (t means matrix transpose). Also we can represent the observed spectrogram by W and H in a summation formula, S N WnHn. In this n=1 paper, to separate a mixed signal into each class signal, we used a Wiener like separation filter: Fig. 2. Distance between basis vectors within class and over classes S n S ˆ W nh n n = S N. (1) m=1 WmHm In this paper, LBG algorithm (i.e. vector quantization) was used to make a basis matrix that represents a set of spectral bases of the event class [12]. B. NMF with shared basis vectors The conventional NMF method assumes source identity. It is very important factor in order to separate mixed signal into target class signals. However, it does not work well when there are many target classes because of the basis similarity over classes. Actually, when we worked on this task with DNN acoustic model and a conventional NMF method, we obtained worse results than DNN without any separation method (see Section III). In order to explore the conventional basis vectors, we visualize the distance between basis vectors for each class and over classes in Figure 2. Each cell represents the Euclidean distance between basis vectors. The basis matrix with 4 basis vectors for each class is used for this figure. In the figure, the color represents a distance measure: red means long distance, but blue means near. It is no wonder that the diagonal component is blue and the surroundings in 4 blocks are also close to blue because they are the same or very near vectors. However, many of the distances between the bases are small valued, even between classes. This similarity of basis vectors causes generation of inappropriate activation matrix for each basis. Therefore, it is hard to get sufficient separation performance with the conventional NMF for such data. To solve this problem, we propose NMF with shared basis vectors. This method uses a common basis matrix which is made by LBG algorithm from all classes data. A basis vector in the common basis matrix is linked to basis vectors in the class-wise basis matrices, and the activation of a common basis vector is shared by all the linked classes to the common basis vector. To link a basis vector in the common basis matrix with plural classes, we use the Euclidean distance between a basis vector in the common basis matrix and a basis from the class-wise basis matrix. We compare two linking criteria between a common basis vector and a class-wise basis vector: (a) Threshold selection (Figure 3) This criterion is to choose a common basis vector which the Fig. 3. Shared bases (a) - The number of bases for each class is variable depending on the threshold. Fig. 4. Shared bases (b) - The number of bases for each class is constant. Euclidean distance between the basis and a class-wise basis vector is lower than a pre-defined threshold. The basis in a common basis matrix can be shared in several classes. (b) Constant selection (Figure 4) This criteria is to choose a k-nearest basis vectors in a common basis matrix for each class based on the Euclidean distance. The number of assigned basis vectors k is constant across all classes. In [13], [14], Komatsu et al. reported how to make a improved basis matrix for NMF having the same motivation. However, we believe that our method can make more efficient basis vectors explicitly /17/$ IEEE 421
3 Fig. 5. Score Plot (black line: silence, blue line: event (keys), red line: event (pendrop), top broken line: silence threshold, bottom broken line: event threshold) C. Event detection We utilize DNN output scores to detect acoustic events. The DNN score o j from the output unit (class) j can be calculated as follows: o j = I w ijh i + b j, (2) i=1 where h i means the value of unit i in the last hidden layer, w ij the weight between the output unit j and the hidden unit i, b j the bias of output unit j, and I the number of units in the last hidden layer. By using o j, a posterior p j of the unit j can be calculated as follows: p j = exp(o j) J exp(o, (3) j =1 j ) where J is the number of output units (i.e. the number of classes). In this paper, we use p j for silence detection and o j for acoustic event detection. As an example, Figure 5 shows a score chart. The first vertical axis is a posterior, p j, for silence, the second vertical axis is the DNN score, o j, for event classes and the horizontal axis is the frame index. To detect acoustic events, we define two thresholds for silence and event classes. We use a posterior only for the silence class detection since a posterior varies depending on the number of simultaneous events. Additionally, we take the moving average for each class score values for score smoothing to avoid quick changes of the score values. From the preliminary experiment with OL (Office Live) development tracks, the moving average window size is decided on ±9 frames. For comparison, we also used only NMF or DNN, respectively. In the case of NMF, we used the summation of estimated activation weights for all basis vectors in each class with pre-defined thresholds to detect events. In the case of DNN, we used the output values of DNN, Equation 2, for each class with pre-defined thresholds without sound source separation processing. A. Experimental setup III. EXPERIMENTS We evaluate our method on IEEE D-CASE 2012 TASK2 OS (Office Synthetic) [8]. The test tracks of OS include acoustic event sequences and overlapped events. In this task, 16 event classes are defined: alert, clear-throat, cough, door-slam, drawer, keyboard, keys, knock, laughter, mouse, page-turn, pen-drop, phone, printer, speech, and switch. In the D-CASE 2012 challenge, the training, development and test sets are provided. The sound samples were recorded in single channel at rate of 44.1kHz with 24bit quantization. There are 20 training samples per class and the total duration is about 15 minutes. As the test tracks, the D-CASE 2012 challenge provides 12 tracks. Each track has a 2 minute duration. The number of reference frames is 99,981, and there are 15,180 overlapped event frames in them. 1) NMF: The basis matrix for conventional NMF was made of normalized amplitude spectra. We first carried out Fourier transform to obtain linear amplitude spectra. The analysis Hamming window was 20ms length with 50% overlapping. After obtaining the spectra, we got a basis matrix by using LBG algorithm for each class. We used Euclidean distance as the distance measure between vectors for the LBG algorithm. We adopted KL-divergence as a cost function for NMF [15]. In this experiment, we used four basis vectors for each class. For our proposed basis matrix, we made a common basis matrix from all class data. The number of basis vectors in the matrix was fixed as 64, and the number of basis vectors for each class was set as 4 or 8. By using the methods (a) and (b) described in Section II-B, we made links between a basis vector in the all class basis matrix and a basis vector in each class basis matrix using Euclidean distance. To find the optimum links, we used the following setups: (a) Threshold: 0.35, 0.40, 0.45, 0.50 (in the case of 4 basis vectors in each class) or 0.40, 0.45, 0.50, 0.55 (in the case of 8 basis vectors in each class). (b) Constant selection: Select 4 or 5 basis vectors for each class. 2) Acoustic feature: Before the feature extraction process, Spectral Subtraction (SS) was applied to suppress the background noise [16]. The subtraction coefficient was set to 2.0 and the flooring coefficient To estimate the noise spectrum, we used noise regions denoted in the label in the training process, and the first 100 frames in the test process. As acoustic features, we used MFCC and FBANK. For extracting acoustic features, the analysis Hamming window was 20ms length with 50% overlapping, which is the same as for NMF. The number of filters in the Mel-filterbank was set to 33 for MFCC. We used 12 dimensional MFCC with log power, and their temporal s and s. The final feature vector of MFCC had 39 dimensions per frame. For FBANK, the number of channels was set to 45. As the same as for MFCC, we produced temporal s and s. The feature vector of FBANK became finally 135 dimensions. 3) Acoustic model: As an acoustic model, we used DNN. Since the original training data is not enough amount for training a DNN acoustic model, we produced multi-condition training data by adding noise data which we originally recorded in several office rooms to original training data. We corrupted the original training data by the room noises at 20, 15, and 10dB SNRs. We also added the OL development tracks to the multi-condition training set. We used a ±3 frame context as input to DNN, so that the input layer had 273 units for MFCC or 945 units for FBANK. Our DNN had 5 hidden layers. The first hidden layer had 512 units, the second, third and fourth one had 256, 128, and 64 units respectively, and the last hidden layer had 32 units. Finally the output layer had 17 units corresponding to 16 acoustic event classes and silence. We used the Rectified Linear function for DNN unit activation. For training DNN, we took a supervised learning method without pre-training. We also evaluate a score combination method of two DNN acoustic models of MFCC and FBANK. The method (we called it FUSION here) uses the maximum score from the two acoustic models simply /17/$ IEEE 422
4 TABLE I D-CASE 2012 CHALLENGE RESULTS FOR OS Measure Baseline [8] DHV [6] GVV [10] VVK [9] F [%] TABLE II ONLY NMF OR DNN RESULTS DNN only NMF only Measure MFCC FBANK FUSION R overlap [%] F [%] We hope to improve the performance of event detection by reflecting different acoustic model characteristics to the scores. 4) Evaluation metric: We followed the evaluation metrics of the D-CASE 2012 challenge that are defined as the frame-based Recall, P recision, and F -measure. The frame based evaluation is judged every 10 ms. As the number of correct event frames is C, the number of total event detection frames is E, and GT is the number of event frames in reference labels, each metric is defined as the following formulas: P recision[%] = C 100 E (4) Recall[%] = C 100 GT (5) F [%] = 2 P recision Recall 100 P recision + Recall (6) We also used the R overlap measure which is Recall only for the event overlapping frames. We compared the proposed method with the previous results on the D-CASE 2012 challenge. B. Experimental Results Tables I and II show the D-CASE 2012 results and our simple NMF or DNN results for OS. Using DNN gave F -measure improvement, especially with FBANK and FUSION, from the D-CASE 2012 challenge results. However, using NMF only, we obtained only 14.4% on F -measure. Additionally, when we used NMF with DNN, the result was worse than DNN only (shown in Table IV). These results show that the source separation based on a conventional NMF does not work well. Table III and IV show the results of our proposed method in this paper. Our proposed method shows 37.6% in F -measure with MFCC, 37.4% with FBANK, and 41.3% by FUSION at each best and R overlap was improved as well. The sharing basis method improved the performance remarkably by 27.2% for MFCC and 16.8% for FBANK absolutely from NMF-DNN as shown in Table IV. The best result was from FUSION with threshold selection (Threshold = 0.45, the number of basis = 8 for each class). The threshold selection method is better than the constant selection as shown in Table III. In the shared basis vectors, the laughter class had most links with basis that was 17, and alert and switch got only 5 basis sharing. The average links between class and the common basis matrix was 11 basis vectors and they are less links than the constant selection case. We guess that the thresholding avoids to make links to unrequired basis vectors. IV. CONCLUSION In this paper, we proposed the sharing basis vector to improve the NMF separation and AED performance for event overlapping segments. We evaluated the proposed method on the D-CASE 2012 challenge. Comparing with the previous results, we obtained 41.3% TABLE III RESULTS OF SHARING BASIS METHOD (NMF-DNN) F -MEASURE [%] Threshold selection Threshold #basis MFCC FBANK Constant selection Selection #basis MFCC FBANK TABLE IV PERFORMANCE COMPARISON BETWEEN CONVENTIONAL AND THE PROPOSED METHODS Method Feature R overlap[%] F [%] DNN FUSION NMF-DNN w/o sharing basis Sharing basis Threshold = 0.45 #basis = 8 Sharing basis Threshold = 0.55 #basis = 8 MFCC FBANK FUSION MFCC FBANK FUSION MFCC FBANK FUSION frame-based F -measure at the best and absolute 20% improvement from the previous challenge best result. As future works, we are interested in other spectral reconstruction methods that uses an original class-wise basis matrix to make a reconstructed spectrum instead of a common basis matrix after performing the sharing basis NMF. We also plan to evaluate our method in new AED challenge, D-CASE REFERENCES [1] D. Wang and G. Brown, Eds., Computational auditory scene analysis: principles, algorithms and applications. Hoboken ( N.J.): J. Wiley & sons, [Online]. Available: [2] J. Salmon and J. P. Bello, Unsupervised feature learning for urban sound classification, in Proc. IEEE ICASSP 2015, 2015, pp [3] R. Radhakrishnan, A. Divakaran, and P. Smaragdis, Audio analysys for surveillance applications, in Proc IEEE WASPAA. IEEE, 2005, pp [4] M. Espi, M.Fujimoto, K. Kinoshita, and T. Nakatani, Acoustic event detection in speech overlapping scenarios based on high-resolution spectral input and deep learning, in IEICE Transactions on Information and Systems, vol. E98-D, 2015, pp [5] K. Yamamoto and K. Itou, Browsing audio life-log data using acoustic and location information, in Proc. IEEE UBICOMM 09, 2014, pp [6] A. Diment, T. Heittola, and T. Virtanen, Sound event detection for office live and office synthetic aasp challenge, in IEEE AASP Challenge: Detection of Acoustic Scenes and Events, 2013, pp [7] A. Mesaros, T. Heittola, O. Dikmen, and T. Virtanen, Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations, in Proc. IEEE ICASSP 2015, 2015, pp [8] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D.Plumbley, Detection and Classification of Acousitc Scens and Events, in IEEE Transactions on Multimedia, vol. 17. IEEE, 2014, pp /17/$ IEEE 423
5 [9] L. Vuegen, B. V. D. Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. V. hamme, An MFCC - GMM approach apploach for event detection and classification, in IEEE AASP Challenge: Detection of Acoustic Scenes and Events, 2013, pp [10] J. F. Gemmeke, L. Vuegen, P. Karsmakers, and H. Hamme, An exemplar-based NMF approach to audio event detection, in IEEE AASP Challenge: Detection of Acoustic Scenes and Events, 2013, pp [11] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in NIPS, 2000, pp [12] S. Nakano, K. Yamamoto, and S. Nakagawa, Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix facorization, in Proc. INTERSPEECH 2011, 2011, pp [13] T. Komatsu, Y. Senda, and R. Kondo, Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation, in Proc. IEEE ICASSP 2016, 2016, pp [14] T. Komatsu, T. Toizumi, R. Kondo, and Y. Senda, Acoustic event detection method using semi-supervised non-negative matrix factorization with a mixture of local dictionaries, DCASE2016 Challenge, Tech. Rep., September [15] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in NIPS 2000, 2000, pp [16] S. F. Boll, Suppresson of acoustic noise in speech using spectral subtraction, in IEEE Transactions on Audio and Acoustic Signal Processing, 1979, pp /17/$ IEEE 424
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya
More informationORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley
ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationBIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION
BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR POLYPHONIC SOUND EVENT DETECTION Tomoki Hayashi 1, Shinji Watanabe 2, Tomoki Toda 1, Takaaki Hori 2, Jonathan Le Roux 2, Kazuya Takeda 1 1 Nagoya University, Furo-cho,
More informationRARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS
RARE SOUND EVENT DETECTION USING 1D CONVOLUTIONAL RECURRENT NEURAL NETWORKS Hyungui Lim 1, Jeongsoo Park 1,2, Kyogu Lee 2, Yoonchang Han 1 1 Cochlear.ai, Seoul, Korea 2 Music and Audio Research Group,
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationDETECTION OF OVERLAPPING ACOUSTIC EVENTS USING A TEMPORALLY-CONSTRAINED PROBABILISTIC MODEL
DETECTION OF OVERLAPPING ACOUSTIC EVENTS USING A TEMPORALLY-CONSTRAINED PROBABILISTIC MODEL Emmanouil Benetos 1, GrégoireLafay 2,Mathieu Lagrange 2,and MarkD.Plumbley 3 1 Centre for Digital Music,QueenMaryUniversity
More informationarxiv: v2 [cs.sd] 15 Aug 2016
CaR-FOREST: JOINT CLASSIFICATION-REGRESSION DECISION FORESTS FOR OVERLAPPING AUDIO EVENT DETECTION Huy Phan, Lars Hertel, Marco Maass, Philipp Koch, and Alfred Mertins Institute for Signal Processing,
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationPolyphonic Sound Event Tracking using Linear Dynamical Systems
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Polyphonic Sound Event Tracking using Linear Dynamical Systems Emmanouil Benetos, Member, IEEE, Grégoire Lafay, Mathieu Lagrange, and Mark
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationAnalysis of polyphonic audio using source-filter model and non-negative matrix factorization
Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu
More informationSingle Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,
More informationOn the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection
proceedings Proceedings On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection Panagiotis Giannoulis 1,2, Gerasimos Potamianos 2,3, and Petros Maragos 1,2 1 School of ECE,
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationNMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing
NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationMULTI-LABEL VS. COMBINED SINGLE-LABEL SOUND EVENT DETECTION WITH DEEP NEURAL NETWORKS. Emre Cakir, Toni Heittola, Heikki Huttunen and Tuomas Virtanen
MULTI-LABEL VS. COMBINED SINGLE-LABEL SOUND EVENT DETECTION WITH DEEP NEURAL NETWORKS Emre Cakir, Toni Heittola, Heikki Huttunen and Tuomas Virtanen Department of Signal Processing, Tampere University
More informationESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY
ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY Jean-Rémy Gloaguen, Arnaud Can Ifsttar - LAE Route de Bouaye - CS4 44344, Bouguenais, FR jean-remy.gloaguen@ifsttar.fr Mathieu
More informationarxiv: v2 [cs.sd] 7 Feb 2018
AUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE Qiuqiang ong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley Center for Vision, Speech and Signal Processing, University of Surrey, U
More informationMULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka
MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel
More informationDuration-Controlled LSTM for Polyphonic Sound Event Detection
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Duration-Controlled LSTM for Polyphonic Sound Event Detection Hayashi, T.; Watanabe, S.; Toda, T.; Hori, T.; Le Roux, J.; Takeda, K. TR2017-150
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationMULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara
MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION Ikuo Degawa, Kei Sato, Masaaki Ikehara EEE Dept. Keio University Yokohama, Kanagawa 223-8522 Japan E-mail:{degawa,
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationTHE task of identifying the environment in which a sound
1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various
More informationSpatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann Lehrstuhl für Multimediakommunikation
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationAUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE. Qiuqiang Kong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley
AUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE Qiuqiang ong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley Center for Vision, Speech and Signal Processing, University of Surrey, U
More informationSPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION
SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,
More informationACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING. Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard
ACOUSTIC SCENE CLASSIFICATION WITH MATRIX FACTORIZATION FOR UNSUPERVISED FEATURE LEARNING Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard LTCI, CNRS, Télćom ParisTech, Université Paris-Saclay, 75013,
More informationFeature Learning with Matrix Factorization Applied to Acoustic Scene Classification
Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard To cite this version: Victor Bisot, Romain Serizel, Slim Essid,
More informationAdapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017
Adapting Wavenet for Speech Enhancement DARIO RETHAGE JULY 12, 2017 I am v Master Student v 6 months @ Music Technology Group, Universitat Pompeu Fabra v Deep learning for acoustic source separation v
More informationNonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms
Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms Masahiro Nakano 1, Jonathan Le Roux 2, Hirokazu Kameoka 2,YuKitano 1, Nobutaka Ono 1,
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationDeep NMF for Speech Separation
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization
More informationCorrespondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure
Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationMulti-level Attention Model for Weakly Supervised Audio Classification
Multi-level Attention Model for Weakly Supervised Audio Classification Changsong Yu, Karim Said Barsim, Qiuqiang Kong and Bin Yang Institute of Signal Processing and System Theory, University of Stuttgart,
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More information2D Spectrogram Filter for Single Channel Speech Enhancement
Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 007 89 D Spectrogram Filter for Single Channel Speech Enhancement HUIJUN DING,
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationFEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION
FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,
More informationThis is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.
Powered by TCPDF (www.tcpdf.org) This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Author(s): Title: Heikki Kallasjoki,
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationarxiv: v1 [cs.sd] 29 Apr 2016
LEARNING COMPACT STRUCTURAL REPRESENTATIONS FOR AUDIO EVENTS USING REGRESSOR BANKS Huy Phan, Marco Maass, Lars Hertel, Radoslaw Mazur, Ian McLoughlin, and Alfred Mertins Institute for Signal Processing,
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationA State-Space Approach to Dynamic Nonnegative Matrix Factorization
1 A State-Space Approach to Dynamic Nonnegative Matrix Factorization Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo arxiv:179.5v1 [cs.lg] 31 Aug 17 Abstract Nonnegative matrix factorization
More informationFACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?
FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION? Tiago Fernandes Tavares, George Tzanetakis, Peter Driessen University of Victoria Department of
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationA METHOD OF ICA IN TIME-FREQUENCY DOMAIN
A METHOD OF ICA IN TIME-FREQUENCY DOMAIN Shiro Ikeda PRESTO, JST Hirosawa 2-, Wako, 35-98, Japan Shiro.Ikeda@brain.riken.go.jp Noboru Murata RIKEN BSI Hirosawa 2-, Wako, 35-98, Japan Noboru.Murata@brain.riken.go.jp
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationBayesian Hierarchical Modeling for Music and Audio Processing at LabROSA
Bayesian Hierarchical Modeling for Music and Audio Processing at LabROSA Dawen Liang (LabROSA) Joint work with: Dan Ellis (LabROSA), Matt Hoffman (Adobe Research), Gautham Mysore (Adobe Research) 1. Bayesian
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationConstrained Nonnegative Matrix Factorization with Applications to Music Transcription
Constrained Nonnegative Matrix Factorization with Applications to Music Transcription by Daniel Recoskie A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationNONNEGATIVE FEATURE LEARNING METHODS FOR ACOUSTIC SCENE CLASSIFICATION
NONNEGATIVE FEATURE LEARNING METHODS FOR ACOUSTIC SCENE CLASSIFICATION Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard LTCI, Télécom ParisTech, Université Paris Saclay, F-75013, Paris, France Université
More informationREVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES
REVIEW OF SINGLE CHANNEL SOURCE SEPARATION TECHNIQUES Kedar Patki University of Rochester Dept. of Electrical and Computer Engineering kedar.patki@rochester.edu ABSTRACT The paper reviews the problem of
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationGlobal SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks
Interspeech 2018 2-6 September 2018, Hyderabad Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma,
More informationDiffuse noise suppression with asynchronous microphone array based on amplitude additivity model
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, and Shoji Makino University
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationCS229 Project: Musical Alignment Discovery
S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationCONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT
CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering
More informationNon-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September
More informationModifying Voice Activity Detection in Low SNR by correction factors
Modifying Voice Activity Detection in Low SNR by correction factors H. Farsi, M. A. Mozaffarian, H.Rahmani Department of Electrical Engineering University of Birjand P.O. Box: +98-9775-376 IRAN hfarsi@birjand.ac.ir
More informationIMISOUND: An Unsupervised System for Sound Query by Vocal Imitation
IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation Yichi Zhang and Zhiyao Duan Audio Information Research (AIR) Lab Department of Electrical and Computer Engineering University of Rochester
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationCovariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data
Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní
More informationDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationOVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION
OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry
More informationSUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT
SUPERVISED NON-EUCLIDEAN SPARSE NMF VIA BILEVEL OPTIMIZATION WITH APPLICATIONS TO SPEECH ENHANCEMENT Pablo Sprechmann, 1 Alex M. Bronstein, 2 and Guillermo Sapiro 1 1 Duke University, USA; 2 Tel Aviv University,
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationEXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION
EXPLOITING LONG-TERM TEMPORAL DEPENDENCIES IN NMF USING RECURRENT NEURAL NETWORKS WITH APPLICATION TO SOURCE SEPARATION Nicolas Boulanger-Lewandowski Gautham J. Mysore Matthew Hoffman Université de Montréal
More informationSupervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization Nasser Mohammadiha*, Student Member, IEEE, Paris Smaragdis,
More information