Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization

Size: px
Start display at page:

Download "Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization"

Transcription

1 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization Qiang Wu, Liqing Zhang, and Guangchuan Shi Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai , China Abstract. Nonnegative tensor factorization is an extension of nonnegative matrix factorization(nmf) to a multilinear case, where nonnegative constraints are imposed on the PARAFAC/Tucker model. In this paper, to identify speaker from a noisy environment, we propose a new method based on PARAFAC model called constrained Nonnegative Tensor Factorization (cntf). Speech signal is encoded as a general higher order tensor in order to learn the basis functions from multiple interrelated feature subspaces. We simulate a cochlear-like peripheral auditory stage which is motivated by the auditory perception mechanism of human being. A sparse speech feature representation is extracted by cntf which is used for robust speaker modeling. Orthogonal and nonsmooth sparse control constraints are further imposed on the PARAFAC model in order to preserve the useful information of each feature subspace in the higher order tensor. Alternating projection algorithm is applied to obtain a stable solution. Experiments results demonstrate that our method can improve the recognition accuracy specifically in noise environment. 1 Introduction Speaker recognition is the task of determining the identification of a person from one s voice which has great potential applications in industry, business and security, etc. For a speaker recognition system, feature extraction is one of important tasks, which aims at finding succinct, robust, and discriminative features from acoustic data. Acoustic features such as linear predictive cepstral coefficients (LPCC)[1], mel-frequency cepstral coefficients (MFCC)[1], perceptual linear predictive coefficients (PLP) [2] are commonly used. The conventional speaker modeling methods such as Gaussian mixture models(gmm)[3] achieve very high performance for speaker identification and verification tasks on high-quality data when training and testing conditions are well controlled. However, in the real application such systems usually do not perform well for a large variety of speech signals corrupted by adverse conditions such as environmental noise and channel distortions. Feature compensation techniques [2,4] such as CMS, RASTA have been developed for robust speech recognition. Spectral subtraction [5] and subspacebasedfiltering[6]techniquesassumingaprioriknowledgeofthenoisespectrumhavebeen widely used because of their simplicity. Recently the computational auditory nerve models and sparse coding attract much attention from both neuroscience and speech signal processing communities. Smith et al.[7] proposed an algorithm for learningefficient auditory codes using a theoretical model for coding sound in terms of spikes.much research F. Sun et al. (Eds.): ISNN 2008, Part I, LNCS 5263, pp , c Springer-Verlag Berlin Heidelberg 2008

2 12 Q. Wu, L. Zhang, and G. Shi about sparse coding and representation for sound and speech[8,9,10] is also proved to be useful for auditory modeling and speech separation which will be a potential way for robust speech feature extraction. As a powerful data modeling tool for pattern recognition, multilinear algebra of the higher order tensors has been proposed as a potent mathematical framework to manipulate the multiple factors underlying the observations. Currently common tensor decomposition methods include: (1) the CANDECOMP/PARAFAC model [11,12,13]; (2) the Tucker Model[14,15]; (3) Nonnegative Tensor Factorization (NTF) which imposes the nonnegative constraint on the CANDECOMP/PARAFAC model [16,17]. In computer vision applications, Multilinear ICA [18]and tensor discriminant analysis [19] are applied to image representation and recognition, which improve recognition performance. In this paper, we proposed a new feature extraction method for robust speaker recognition based on auditory periphery model and tensor factorization. A novel tensor factorization method called cntf is derived by imposing orthogonal and nonnegative constraints on the tensor structure. The advantages of our feature extraction method include following: (1) simulation of the auditory perception mechanism of human being provides a higher frequency resolution at low frequencies which helps to obtain robust spectro-temporal feature; (2) a supervised feature extraction procedure via cntf learns the basis functions of multi-related feature subspaces which preserve the individual, spectro-temporal information in the tensor structure; furthermore the orthogonal constraint ensures redundancy minimization between different basis functions; (3) sparse constraint on cntf enhances energy concentration of speech signal which will preserve the useful feature during the noise reduction. The sparse tensor feature extracted by cntf can be further processed into a representation called auditory-based nonnegative tensor feature(antf) via discrete cosine transform, which can be used as feature for speaker recognition. 2 Method 2.1 Multilinear Algebra and PARAFAC Model Multilinear algebra is the algebra of higher order tensors. A tensor is a higher order generalization of a matrix. Let X R N1 N2... NM denotes a tensor. The order of X is M. An element of X is denoted by x n1,n 2,...,n M,where1 n d N d and 1 d M. The mode-d matricization or matrix unfolding of an Mth-order tensor X R N1 N2... NM rearranges the elements of X to form the matrix X (d) R N d N d+1 N d+2 N M N 1 N d 1, which is the ensemble of vectors in R N d obtained by keeping index n d fixed and varying the other indices. Matricizing a tensor is similar to vectoring a matrix. The PARAFAC model was suggested independently by Carroll and Chang[11] under the name CANDECOMP(canonical decomposition) and by Harshman[12] under the name PARAFAC(parallel factor analysis) which has gained increasing attention in the data mining field. This model has structural resemblance with many physical models of common real-world data and its uniqueness property implies that the data following the PARAFAC model can be uniquely decomposed into individual contributions.

3 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 13 An M-way tensor X R N1 N2... NM can be decomposed into a sum of M rank-1 terms, i.e. represented by the outer product of M vectors: X = a (1) a (2) a (M), (1) where is the outer product operator, a (d) R N d,ford =1, 2,...,M. The rank of tensor X, denoted R = rank(x ), is the minimal number of rank-1 tensors that is required to yield X : X = R r=1 A (1) :,r A(2) :,r A(M) :,r, (2) where A (d) :,r represents the rth column vector of the mode matrix A (d) R N d R. The PARAFAC model aims to find a rank-r approximation of the tensor X, X R r=1 A (1) :,r A(2) :,r A(M) :,r, (3) The PARAFAC model can also be written in matrix notation by use of the Khatri-Rao product, which gives the equivalent expressions: X (d) A (d) [ A (d 1)... A (1) A (M)... A (d+1)] T, (4) where is the Khatri-Rao product operator. 2.2 Constrained Nonnegative Tensor Factorization Given a nonnegative M-way tensor X R N1 N2... NM, nonnegative tensor factorization(ntf) seeks a factorization of X in the form: X ˆX = R r=1 A (1) :,r A(2) :,r A(M) :,r, (5) where the mode matrices A (d) R Nd R for d = 1,...,M are restricted to have only nonnegative elements in the factorization. In order to find an approximate tensor factorization ˆX, we can construct Least Square cost function J LS and KL-divergence cost function J KL based on the approximate factorization model (4). The cost functions with mode matrices A (d) are given by J LS1 (A (d) )= 1 2 = 1 2 M X (d) A (d) Z (d) 2 F d=1 M N d N d ( ) 2 [X (d) ] pq [A (d) Z (d) ] pq (6) d=1 p=1 q=1

4 14 Q. Wu, L. Zhang, and G. Shi J KL1 (A (d) )= = M D(X (d) A (d) Z (d) ) d=1 M N d N ( ) d [X (d) ] pq [X (d) ] pq log [X [A (d) Z (d) (d) ] pq +[A (d) Z (d) ] pq ] pq d=1 p=1 q=1 where Z (d) = [ A (d 1)... A (1) A (M)... A (d+1)] T and N d = M j d N j. These cost functions are quite similar to NMF[20], which performs matrix factorization in each mode and minimizes the error for all modes. By above model, we can add additional constraint which makes the basis functions be as orthogonal as possible, i.e. ensures redundancy minimization between different basis functions. This orthogonal constraint can be imposed by minimizing the formula p q [A(d)T A (d) ] pq. For the traditional NMF methods, many approaches have been proposed to control the sparsenses by additional constraints or penalization terms. These constraints or penalizations can be applied to the basis vectors or both basis and encoding vectors. The nsnmf model[22] proposed a factorization model V = WSH, providing a smoothing matrix S R q q given by S =(1 θ)i + θ q 11T (8) where I is the identify matrix, 1 is a vector of ones, and the parameter θ satisfies 0 θ 1. Forθ =0, the model(8) is equivalent to the original NMF. As θ 1, stronger smoothness is imposed on S, leading to a strong sparseness on both W and H. By this nonsmooth approach, we can control the sparseness of basis vectors and encoding vectors and maintain the faithfulness of the model to the data. The same idea can be applied to the NTF. Then the corresponding cost functions with orthogonal and sparse control constraints can be given by J LS2 (A (d) )= J KL2 (A (d) )= M 1 N d N d ) 2 ([X (d) ] pq [A (d) SZ (d) ] pq + α [A (d)t A (d) ] pq 2 d=1 p=1 q=1 (9) M N d N d ( ) [X (d) ] pq [X (d) ] pq log [X [A (d) SZ (d) (d) ] pq +[A (d) SZ (d) ] pq ] pq d=1 p=1 q=1 +α p q[a (d)t A (d) ] pq (10) p q (7) where α>0 is a balancing parameter between reconstruction and orthogonality. We can derive multiplicative learning algorithms for mode matrices A (d) using the exponential gradient, which are similar to those in NMF. Updating algorithms in an element-wise manner for minimizing the cost function (9) and (2.2) are directly derived as done in [16,17]:

5 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 15 LS: KL: A (d) ij A (d) ij A (d) ij [X (d) Z (d)t S T ] ij [A (d) SZ (d) Z (d)t S T ] ij + α (11) p j [A(d)T ] pi A (d) ij k [SZ(d) [X ] (d) ] ik jk [A (d) SZ (d) ] ik k [SZ(d) ] jk + α p j [A(d)T ] pi (12) 3 Feature Extraction Based on Auditory Model and Tensor Representation As we know, human auditory system is of powerful capability in speech recognition and speaker recognition. Much of research on auditory model has already shown that the features based on simulation of auditory system are more robust than traditional features under noisy background. In our feature extraction framework, we calculate the frequency selectivity information by imitating the process performed in the auditory periphery and pathway. And the robust speech features are obtained by the projections of the extracted auditory information mapped into multiple interrelated feature subspace via cntf. A diagram of feature extraction and speaker recognition framework is shown in Figure 1. Pre-Emphasis DCT GMM Recognition Result Cochlear Filters Nonlinearity X A Fig. 1. Feature extraction and recognition framework 3.1 Feature Extraction Based on Auditory Model We extract the features by imitating the process occurred in the auditory periphery and pathway, such as outer ear, middle ear, basilar membrane, inner hair-cell, auditory nerves, and cochlear nucleus. We implement traditional pre-emphasis to model the combined outer and middle ear functions, which is x pre (t) =x(t) 0.97x(t 1),wherex(t) is the discrete time speech signal, t =1, 2,...,andx pre (t) is the filtered output signal. The frequency selectivity of peripheral auditory system such as basilar membrane is simulated by a bank of cochlear filters, which have an impulse response in the following form: g i (t) =a i t n 1 e 2πbiERB(fi)t cos(2πf i t + φ i ), (1 i N), (13)

6 16 Q. Wu, L. Zhang, and G. Shi where n is the order of the filters, N is the number of filterbanks. For the ith filter bank, f i is the center frequency, ERB(f i ) is the equivalent rectangular bandwidth (ERB) of the auditory filter, φ i is the phase, and a i,b i R are constants where b i determines the rate of decay of the impulse response, which is related to bandwidth. In order to model nonlinearity of the inner hair-cells, we compute the power of each band in every frame k with a logarithmic nonlinearity: P (i, k) =log(1 + γ {x i g(t)} 2 ), (14) t frame k where P (i, k) is the output power, γ is a scaling constant, and x i g(t)= τ x pre(τ)g i (t τ) is the outputs of each gammatone filterbanks. This model can be considered as average firing rates in the inner hair-cells, which simulate the higher auditory pathway. The resulting power feature vector P (i, k) at frame k with component index of frequency f i, comprises the spectro-temporal power representation of the auditory response. Similar to Mel-scale processing in MFCC extraction, this power spectrum provides a much higher frequency resolution at low frequencies than at high frequencies. 3.2 Sparse Tensor Representation In order to extract robust features based on tensor structure, we model the cochlear power feature of different speakers as 3-order tensor X R N f N t N s. Each feature tensor is an array with three modals frequency time speaker identity which comprises the cochlear power feature matrix X R N f N t of different speakers. Then we transform the auditory feature tensor into multiple interrelated subspaces by cntf to learn the basis functions A (d), (d =1, 2, 3). Figure 2 shows the tensor model for the calculation of basis functions. Compared with traditional subspace learning methods, the extracted tensor features may characterize the differences of speakers and preserve the discriminative information for classification. As described in Section 3.1, the cntf Basis Functions Fig. 2. Tensor model for calculation of basis functions via cntf cochlear power feature can be considered as neurons response in the inner hair-cells. The hair-cells have receptive fields which refer to a coding of sound frequency. Here we employ the sparse localized basis function A R N f R in time-frequency subspace to transform the auditory feature into the sparse feature subspace, where R is the dimension of sparse feature subspace. The representation of auditory sparse feature X s is obtained via the following transformation: X s = ÂX (15)

7 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization (a) Basis functions (b) Examples of encoding vector Fig. 3. Results of cntf applied to the clean speech data. (a) basis functions (100 80) in spectrotemproal domain. (b) Examples for encoding feature vector. where  consists of the nonnegative elements of A 1,i.e. =[A 1 ] +. Figure 3(a) shows an example of basis functions in spectro-temporal domain. From this result we can see that most elements of basis function are near to zero, which accords with the sparse constraint of cntf. Figure 3(b) gives several examples for the encoding feature vector after transformation which also prove the sparse characteristic of feature. Our feature extraction model is based on the fact that in sparse coding the energy of the signal is concentrated on a few components only, while the energy of additive noise remains uniformly spreading on all the components. As a soft-threshold operation, the absolute values of pattern from the sparse coding components are compressed towards to zero. The noise is reduced while the signal is not strongly affected. We also impose orthogonal constraint to cntf which helps to extract the helpful feature by minimizing the redundancy of different basis functions. 4 Experiments Results In this section we provide the evaluation results of a speaker identification system using ANTF. Aurora2 speech corpus is used to test the recognition performance, which is designed to evaluate speech recognition algorithms in noisy conditions. Different noise classes were considered to evaluate the performance of ANTF against MFCC, Mel- NMF, Mel-PCA feature and identification accuracy was assessed. In our experiments the sampling rate of speech signals was 8kHz. For the given speech signals, we employed time window of length samples (5s). For computational simplicity, we selected 36 cochlear filter banks and time duration 10 samples(1.25ms). Then the dimension of the speaker data is = 360. We calculated the basis functions using cntf after the calculation of cochlear power feature. For learning the basis functions in different subspaces, 550 sentences (5 sentences each person) were selected randomly as the training data and 200 dimension sparse tensor representation is extracted. In order to estimate the speaker model and test the efficiency of our method, we use 5500 sentences (50 sentences each person) as training data and 1320 sentences (12 sentences each person) mixed with different kinds of noise were used as testing data. The

8 18 Q. Wu, L. Zhang, and G. Shi Table 1. Identification accuracy in four noisy conditions(subway, car noise, babble, exhibition hall) for Aurora2 noise testing dataset Noise Subway Babble Car noise Exhibition hall SNR(dB) ANTF(%) Mel-NMF(%) Mel-PCA(%) MFCC(%) testing data were mixed with subway, babble, car noise, exhibition hall in SNR intensities of 20dB, 15dB, 10dB and 5dB. For the final feature set, 16 cepstral coefficients were extracted and used for speaker modeling. GMM was used to build the recognizer with 64 gaussian mixtures. For comparison, the performance of MFCC, Mel-NMF and Mel-PCA with 16-order cepstral coefficients are also tested. We use PCA and NMF to learn the part-based representation in the spectro-temporal domain after mel filtering, which is similar to [9]. The feature after PCA or NMF projection was further processed into the cesptral domain viadiscretecosinetransform. Table 1 presents the identification accuracy obtained by ANTF and baseline system in all testing conditions. We can observe from Table 1 that the performance degradation of ANTF is slower with increasing noise intensity that compared with other features. It performs better than other three features in the high noise conditions such as 5dB condition noise. Figure 4 describes the identification rate in four noisy conditions averaged over SNRs between 5-20 db, and the overall average accuracy across all the conditions. The results suggest that this auditory-based tensor representation feature is robust against the additive noise, which indicates the potential of the new feature for dealing with a wider variety of noisy conditions. Identification rate 100% 80% 60% 40% 20% ANTF Mel NMF Mel PCA MFCC 0 Subway Babble Car noise Exhibition hall Average Fig. 4. Identification accuracy in four noisy conditions averaged over SNRs between 5-20dB, and the overall average accuracy across all the conditions, for ANTF and other three features using Aurora2 noise testing dataset

9 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 19 5 Conclusion In this paper, we presented a novel speech feature extraction framework which is robust to noise with different SNR intensities, for evaluation with identification systems operating under a wide variety of conditions. This approach is primarily data-driven and effectively extracts robust feature of speech called ANTF that is invariant to noise types and interference with different intensities. We derived new feature extraction methods called cntf for robust speaker identification. The research is mainly focused on the encoding of speech based on general higher order tensor structure to extract the robust auditory-based feature from interrelated feature subspace. The frequency selectivity features at basilar membrane and inner hair cells were used to represent the speech signals in the spectro-temporal domain, and then cntf algorithm was employed to extract the sparse tensor representation for robust speaker modeling. The discriminative and robust information of different speakers may be preserved after the multi-related subspace projection. Experiment on Aurora2 has shown the improvement of the noise robustness by the new method, in comparison with baseline systems trained on the same amount of information. Acknowledgment The work was supported by the National High-Tech Research Program of China (Grant No.2006AA01Z125) and the National Natural Science Foundation of China (Grant No ). References 1. Rabiner, L.R., Juang, B.: Fundamentals on Speech Recognition. Prentice Hall, New Jersey (1996) 2. Hermansky, H., Morgan, N.: RASTA Processing of Speech. IEEE Trans. Speech Audio Process 2, (1994) 3. Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, (2000) 4. Reynolds, D.A.: Experimental Evaluation of Features for Robust Speaker Identification. IEEE Trans. Speech Audio Process 2, (1994) 5. Berouti, M., Schwartz, R., Makhoul, J., Beranek, B., Newman, I., Cambridge, M.A.: Enhancement of Speech Corrupted by Acoustic Noise. Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP 1979, vol. 4, pp (1979) 6. Hermus, K., Wambacq, P., Van hamme, H.: A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition. EURASIP Journal on Applied Signal Processing 1, (2007) 7. Smith, E., Lewicki, M.S.: Efficient Auditory Coding. Nature 439, (2006) 8. Kim, T., Lee, S.Y.: Learning Self-organized Topology-preserving Complex Speech Features at Primary Auditory Cortex. Neurocomputing 65, (2005) 9. Cho, Y.C., Choi, S.: Nonnegative Features of Spectro-temporal Sounds for Classification. Pattern Recognition Letters 26, (2005) 10. Asari, H., Pearlmutter, B.A., Zador, A.M.: Sparse Representations for the Cocktail Party Problem. Journal of Neuroscience 26, (2006)

10 20 Q. Wu, L. Zhang, and G. Shi 11. Carroll, J.D., Chang, J.J.: Analysis of Individual Differences in Multidimensional Scaling via An n-way Generalization of Eckart-Young Decomposition. Psychometrika 35, (1970) 12. Harshman, R.A.: Foundations of the PARAFAC Procedure: Models and Conditions for An Explanatory Multi-modal Factor Analysis. UCLA Working Papers in Phonetics 16, 1 84 (1970) 13. Bro, R.: PARAFAC: Tutorial and Applications. Chemometrics and Intelligent Laboratory Systems 38, (1997) 14. De Lathauwer, L., De Moor, B., Van de walle, J.: A Multilinear Singular Value Decomposition. SIAM Journal on Matrix Analysis and Applications 21, (2000) 15. Kim, Y.D., Choi, S.: Nonnegative Tucker Decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1 8 (2007) 16. Welling, M., Weber, M.: Positive Tensor Factorization. Pattern Recognition Letters 22, (2001) 17. Shashua, A., Hazan, T.: Non-negative Tensor Factorization with Applications to Statistics and Computer Vision. In: Proceedings of the International Conference on Machine Learning (ICML), pp (2005) 18. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear independent components analysis, In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pp (2005) 19. Tao, D.C., Li, X.L., Wu, X.D., Maybank, S.J.: General Tensor Discriminant Analysis and Gabor Feature for Gait Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, (2007) 20. Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13, (2001) 21. Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning Spatially Localized, Parts-based Representation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1 6 (2001) 22. Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth Nonnegative Matrix Factorization. IEEE Transactions on. Pattern Analysis and Machine Intelligence. 28, (2006)

Sparseness Constraints on Nonnegative Tensor Decomposition

Sparseness Constraints on Nonnegative Tensor Decomposition Sparseness Constraints on Nonnegative Tensor Decomposition Na Li nali@clarksonedu Carmeliza Navasca cnavasca@clarksonedu Department of Mathematics Clarkson University Potsdam, New York 3699, USA Department

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Slice Oriented Tensor Decomposition of EEG Data for Feature Extraction in Space, Frequency and Time Domains

Slice Oriented Tensor Decomposition of EEG Data for Feature Extraction in Space, Frequency and Time Domains Slice Oriented Tensor Decomposition of EEG Data for Feature Extraction in Space, and Domains Qibin Zhao, Cesar F. Caiafa, Andrzej Cichocki, and Liqing Zhang 2 Laboratory for Advanced Brain Signal Processing,

More information

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION International Conference on Computer Science and Intelligent Communication (CSIC ) CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION Xuefeng LIU, Yuping FENG,

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Non-Negative Tensor Factorisation for Sound Source Separation

Non-Negative Tensor Factorisation for Sound Source Separation ISSC 2005, Dublin, Sept. -2 Non-Negative Tensor Factorisation for Sound Source Separation Derry FitzGerald, Matt Cranitch φ and Eugene Coyle* φ Dept. of Electronic Engineering, Cor Institute of Technology

More information

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli

More information

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

CVPR A New Tensor Algebra - Tutorial. July 26, 2017 CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic

More information

Nonnegative Tensor Factorization with Smoothness Constraints

Nonnegative Tensor Factorization with Smoothness Constraints Nonnegative Tensor Factorization with Smoothness Constraints Rafal ZDUNEK 1 and Tomasz M. RUTKOWSKI 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Environmental Sound Classification in Realistic Situations

Environmental Sound Classification in Realistic Situations Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

Non-negative Matrix Factorization: Algorithms, Extensions and Applications Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25

More information

arxiv: v3 [cs.lg] 18 Mar 2013

arxiv: v3 [cs.lg] 18 Mar 2013 Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young

More information

Dominant Feature Vectors Based Audio Similarity Measure

Dominant Feature Vectors Based Audio Similarity Measure Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft

More information

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation

More information

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION

SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,

More information

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

On Spectral Basis Selection for Single Channel Polyphonic Music Separation On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu

More information

NONNEGATIVE matrix factorization (NMF) is a

NONNEGATIVE matrix factorization (NMF) is a Algorithms for Orthogonal Nonnegative Matrix Factorization Seungjin Choi Abstract Nonnegative matrix factorization (NMF) is a widely-used method for multivariate analysis of nonnegative data, the goal

More information

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September

More information

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

The multiple-vector tensor-vector product

The multiple-vector tensor-vector product I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition

More information

WHEN an object is represented using a linear combination

WHEN an object is represented using a linear combination IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 20, NO 2, FEBRUARY 2009 217 Discriminant Nonnegative Tensor Factorization Algorithms Stefanos Zafeiriou Abstract Nonnegative matrix factorization (NMF) has proven

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley

ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Non-Negative Matrix Factorization with Quasi-Newton Optimization Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction "Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

A Generative Model Based Kernel for SVM Classification in Multimedia Applications

A Generative Model Based Kernel for SVM Classification in Multimedia Applications Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard

More information

Robust Sound Event Detection in Continuous Audio Environments

Robust Sound Event Detection in Continuous Audio Environments Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University

More information

arxiv: v1 [cs.sd] 25 Oct 2014

arxiv: v1 [cs.sd] 25 Oct 2014 Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

A perception- and PDE-based nonlinear transformation for processing spoken words

A perception- and PDE-based nonlinear transformation for processing spoken words Physica D 149 (21) 143 16 A perception- and PDE-based nonlinear transformation for processing spoken words Yingyong Qi a, Jack Xin b, a Department of Electrical and Computer Engineering, University of

More information

Cochlear modeling and its role in human speech recognition

Cochlear modeling and its role in human speech recognition Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL Allen/IPAM

More information

Detection-Based Speech Recognition with Sparse Point Process Models

Detection-Based Speech Recognition with Sparse Point Process Models Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,

More information

Support Vector Machines using GMM Supervectors for Speaker Verification

Support Vector Machines using GMM Supervectors for Speaker Verification 1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:

More information

A new truncation strategy for the higher-order singular value decomposition

A new truncation strategy for the higher-order singular value decomposition A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21,

More information

Single-channel source separation using non-negative matrix factorization

Single-channel source separation using non-negative matrix factorization Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics

More information

SPARSE NONNEGATIVE MATRIX FACTORIZATION USINGl 0 -CONSTRAINTS. Robert Peharz, Michael Stark, Franz Pernkopf

SPARSE NONNEGATIVE MATRIX FACTORIZATION USINGl 0 -CONSTRAINTS. Robert Peharz, Michael Stark, Franz Pernkopf SPARSE NONNEGATIVE MATRIX FACTORIZATION USINGl 0 -CONSTRAINTS Robert Peharz, Michael Stark, Franz Pernkopf Signal Processing and Speech Communication Lab University of Technology, Graz ABSTRACT Although

More information

Speaker Verification Using Accumulative Vectors with Support Vector Machines

Speaker Verification Using Accumulative Vectors with Support Vector Machines Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,

More information

Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex

Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex D. J. Klein S. A. Shamma J. Z. Simon D. A. Depireux,2,2 2 Department of Electrical Engineering Supported in part

More information

Theoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices

Theoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices Theoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices Himanshu Nayar Dept. of EECS University of Michigan Ann Arbor Michigan 484 email:

More information

MATRIX COMPLETION AND TENSOR RANK

MATRIX COMPLETION AND TENSOR RANK MATRIX COMPLETION AND TENSOR RANK HARM DERKSEN Abstract. In this paper, we show that the low rank matrix completion problem can be reduced to the problem of finding the rank of a certain tensor. arxiv:1302.2639v2

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

Harmonic Structure Transform for Speaker Recognition

Harmonic Structure Transform for Speaker Recognition Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Research Article Relationship Matrix Nonnegative Decomposition for Clustering

Research Article Relationship Matrix Nonnegative Decomposition for Clustering Mathematical Problems in Engineering Volume 2011, Article ID 864540, 15 pages doi:10.1155/2011/864540 Research Article Relationship Matrix Nonnegative Decomposition for Clustering Ji-Yuan Pan and Jiang-She

More information

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints

Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors

Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara

More information

To be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative Tensor Title: Factorization Authors: Ivica Kopriva and A

To be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative Tensor Title: Factorization Authors: Ivica Kopriva and A o be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative ensor itle: Factorization Authors: Ivica Kopriva and Andrzej Cichocki Accepted: 21 June 2009 Posted: 25 June

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus

More information

Nonlinear reverse-correlation with synthesized naturalistic noise

Nonlinear reverse-correlation with synthesized naturalistic noise Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California

More information

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

THE task of identifying the environment in which a sound

THE task of identifying the environment in which a sound 1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various

More information

Identification and separation of noises with spectro-temporal patterns

Identification and separation of noises with spectro-temporal patterns PROCEEDINGS of the 22 nd International Congress on Acoustics Soundscape, Psychoacoustics and Urban Environment: Paper ICA2016-532 Identification and separation of noises with spectro-temporal patterns

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery Florian Römer and Giovanni Del Galdo 2nd CoSeRa, Bonn, 17-19 Sept. 2013 Ilmenau University of Technology Institute for Information

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Sparse Sensing in Colocated MIMO Radar: A Matrix Completion Approach

Sparse Sensing in Colocated MIMO Radar: A Matrix Completion Approach Sparse Sensing in Colocated MIMO Radar: A Matrix Completion Approach Athina P. Petropulu Department of Electrical and Computer Engineering Rutgers, the State University of New Jersey Acknowledgments Shunqiao

More information

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure

Correspondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard

More information

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

/16/$ IEEE 1728

/16/$ IEEE 1728 Extension of the Semi-Algebraic Framework for Approximate CP Decompositions via Simultaneous Matrix Diagonalization to the Efficient Calculation of Coupled CP Decompositions Kristina Naskovska and Martin

More information

Non-negative matrix factorization with fixed row and column sums

Non-negative matrix factorization with fixed row and column sums Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van

More information

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,

More information

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017 2 Classification of hazardous

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report

More information

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES THEORY AND PRACTICE Bogustaw Cyganek AGH University of Science and Technology, Poland WILEY A John Wiley &. Sons, Ltd., Publication Contents Preface Acknowledgements

More information

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal

More information

Deep NMF for Speech Separation

Deep NMF for Speech Separation MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization

More information

Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations

Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations Anh Huy Phan 1, Andrzej Cichocki 1,, Rafal Zdunek 1,2,andThanhVuDinh 3 1 Lab for Advanced Brain Signal Processing,

More information

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION

FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,

More information

Nonnegative Tensor Factorization with Smoothness Constraints

Nonnegative Tensor Factorization with Smoothness Constraints Nonnegative Tensor Factorization with Smoothness Constraints Rafal Zdunek 1 and Tomasz M. Rutkowski 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,

More information

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille

More information

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel

More information