Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization
|
|
- Antonia Mason
- 5 years ago
- Views:
Transcription
1 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization Qiang Wu, Liqing Zhang, and Guangchuan Shi Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai , China Abstract. Nonnegative tensor factorization is an extension of nonnegative matrix factorization(nmf) to a multilinear case, where nonnegative constraints are imposed on the PARAFAC/Tucker model. In this paper, to identify speaker from a noisy environment, we propose a new method based on PARAFAC model called constrained Nonnegative Tensor Factorization (cntf). Speech signal is encoded as a general higher order tensor in order to learn the basis functions from multiple interrelated feature subspaces. We simulate a cochlear-like peripheral auditory stage which is motivated by the auditory perception mechanism of human being. A sparse speech feature representation is extracted by cntf which is used for robust speaker modeling. Orthogonal and nonsmooth sparse control constraints are further imposed on the PARAFAC model in order to preserve the useful information of each feature subspace in the higher order tensor. Alternating projection algorithm is applied to obtain a stable solution. Experiments results demonstrate that our method can improve the recognition accuracy specifically in noise environment. 1 Introduction Speaker recognition is the task of determining the identification of a person from one s voice which has great potential applications in industry, business and security, etc. For a speaker recognition system, feature extraction is one of important tasks, which aims at finding succinct, robust, and discriminative features from acoustic data. Acoustic features such as linear predictive cepstral coefficients (LPCC)[1], mel-frequency cepstral coefficients (MFCC)[1], perceptual linear predictive coefficients (PLP) [2] are commonly used. The conventional speaker modeling methods such as Gaussian mixture models(gmm)[3] achieve very high performance for speaker identification and verification tasks on high-quality data when training and testing conditions are well controlled. However, in the real application such systems usually do not perform well for a large variety of speech signals corrupted by adverse conditions such as environmental noise and channel distortions. Feature compensation techniques [2,4] such as CMS, RASTA have been developed for robust speech recognition. Spectral subtraction [5] and subspacebasedfiltering[6]techniquesassumingaprioriknowledgeofthenoisespectrumhavebeen widely used because of their simplicity. Recently the computational auditory nerve models and sparse coding attract much attention from both neuroscience and speech signal processing communities. Smith et al.[7] proposed an algorithm for learningefficient auditory codes using a theoretical model for coding sound in terms of spikes.much research F. Sun et al. (Eds.): ISNN 2008, Part I, LNCS 5263, pp , c Springer-Verlag Berlin Heidelberg 2008
2 12 Q. Wu, L. Zhang, and G. Shi about sparse coding and representation for sound and speech[8,9,10] is also proved to be useful for auditory modeling and speech separation which will be a potential way for robust speech feature extraction. As a powerful data modeling tool for pattern recognition, multilinear algebra of the higher order tensors has been proposed as a potent mathematical framework to manipulate the multiple factors underlying the observations. Currently common tensor decomposition methods include: (1) the CANDECOMP/PARAFAC model [11,12,13]; (2) the Tucker Model[14,15]; (3) Nonnegative Tensor Factorization (NTF) which imposes the nonnegative constraint on the CANDECOMP/PARAFAC model [16,17]. In computer vision applications, Multilinear ICA [18]and tensor discriminant analysis [19] are applied to image representation and recognition, which improve recognition performance. In this paper, we proposed a new feature extraction method for robust speaker recognition based on auditory periphery model and tensor factorization. A novel tensor factorization method called cntf is derived by imposing orthogonal and nonnegative constraints on the tensor structure. The advantages of our feature extraction method include following: (1) simulation of the auditory perception mechanism of human being provides a higher frequency resolution at low frequencies which helps to obtain robust spectro-temporal feature; (2) a supervised feature extraction procedure via cntf learns the basis functions of multi-related feature subspaces which preserve the individual, spectro-temporal information in the tensor structure; furthermore the orthogonal constraint ensures redundancy minimization between different basis functions; (3) sparse constraint on cntf enhances energy concentration of speech signal which will preserve the useful feature during the noise reduction. The sparse tensor feature extracted by cntf can be further processed into a representation called auditory-based nonnegative tensor feature(antf) via discrete cosine transform, which can be used as feature for speaker recognition. 2 Method 2.1 Multilinear Algebra and PARAFAC Model Multilinear algebra is the algebra of higher order tensors. A tensor is a higher order generalization of a matrix. Let X R N1 N2... NM denotes a tensor. The order of X is M. An element of X is denoted by x n1,n 2,...,n M,where1 n d N d and 1 d M. The mode-d matricization or matrix unfolding of an Mth-order tensor X R N1 N2... NM rearranges the elements of X to form the matrix X (d) R N d N d+1 N d+2 N M N 1 N d 1, which is the ensemble of vectors in R N d obtained by keeping index n d fixed and varying the other indices. Matricizing a tensor is similar to vectoring a matrix. The PARAFAC model was suggested independently by Carroll and Chang[11] under the name CANDECOMP(canonical decomposition) and by Harshman[12] under the name PARAFAC(parallel factor analysis) which has gained increasing attention in the data mining field. This model has structural resemblance with many physical models of common real-world data and its uniqueness property implies that the data following the PARAFAC model can be uniquely decomposed into individual contributions.
3 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 13 An M-way tensor X R N1 N2... NM can be decomposed into a sum of M rank-1 terms, i.e. represented by the outer product of M vectors: X = a (1) a (2) a (M), (1) where is the outer product operator, a (d) R N d,ford =1, 2,...,M. The rank of tensor X, denoted R = rank(x ), is the minimal number of rank-1 tensors that is required to yield X : X = R r=1 A (1) :,r A(2) :,r A(M) :,r, (2) where A (d) :,r represents the rth column vector of the mode matrix A (d) R N d R. The PARAFAC model aims to find a rank-r approximation of the tensor X, X R r=1 A (1) :,r A(2) :,r A(M) :,r, (3) The PARAFAC model can also be written in matrix notation by use of the Khatri-Rao product, which gives the equivalent expressions: X (d) A (d) [ A (d 1)... A (1) A (M)... A (d+1)] T, (4) where is the Khatri-Rao product operator. 2.2 Constrained Nonnegative Tensor Factorization Given a nonnegative M-way tensor X R N1 N2... NM, nonnegative tensor factorization(ntf) seeks a factorization of X in the form: X ˆX = R r=1 A (1) :,r A(2) :,r A(M) :,r, (5) where the mode matrices A (d) R Nd R for d = 1,...,M are restricted to have only nonnegative elements in the factorization. In order to find an approximate tensor factorization ˆX, we can construct Least Square cost function J LS and KL-divergence cost function J KL based on the approximate factorization model (4). The cost functions with mode matrices A (d) are given by J LS1 (A (d) )= 1 2 = 1 2 M X (d) A (d) Z (d) 2 F d=1 M N d N d ( ) 2 [X (d) ] pq [A (d) Z (d) ] pq (6) d=1 p=1 q=1
4 14 Q. Wu, L. Zhang, and G. Shi J KL1 (A (d) )= = M D(X (d) A (d) Z (d) ) d=1 M N d N ( ) d [X (d) ] pq [X (d) ] pq log [X [A (d) Z (d) (d) ] pq +[A (d) Z (d) ] pq ] pq d=1 p=1 q=1 where Z (d) = [ A (d 1)... A (1) A (M)... A (d+1)] T and N d = M j d N j. These cost functions are quite similar to NMF[20], which performs matrix factorization in each mode and minimizes the error for all modes. By above model, we can add additional constraint which makes the basis functions be as orthogonal as possible, i.e. ensures redundancy minimization between different basis functions. This orthogonal constraint can be imposed by minimizing the formula p q [A(d)T A (d) ] pq. For the traditional NMF methods, many approaches have been proposed to control the sparsenses by additional constraints or penalization terms. These constraints or penalizations can be applied to the basis vectors or both basis and encoding vectors. The nsnmf model[22] proposed a factorization model V = WSH, providing a smoothing matrix S R q q given by S =(1 θ)i + θ q 11T (8) where I is the identify matrix, 1 is a vector of ones, and the parameter θ satisfies 0 θ 1. Forθ =0, the model(8) is equivalent to the original NMF. As θ 1, stronger smoothness is imposed on S, leading to a strong sparseness on both W and H. By this nonsmooth approach, we can control the sparseness of basis vectors and encoding vectors and maintain the faithfulness of the model to the data. The same idea can be applied to the NTF. Then the corresponding cost functions with orthogonal and sparse control constraints can be given by J LS2 (A (d) )= J KL2 (A (d) )= M 1 N d N d ) 2 ([X (d) ] pq [A (d) SZ (d) ] pq + α [A (d)t A (d) ] pq 2 d=1 p=1 q=1 (9) M N d N d ( ) [X (d) ] pq [X (d) ] pq log [X [A (d) SZ (d) (d) ] pq +[A (d) SZ (d) ] pq ] pq d=1 p=1 q=1 +α p q[a (d)t A (d) ] pq (10) p q (7) where α>0 is a balancing parameter between reconstruction and orthogonality. We can derive multiplicative learning algorithms for mode matrices A (d) using the exponential gradient, which are similar to those in NMF. Updating algorithms in an element-wise manner for minimizing the cost function (9) and (2.2) are directly derived as done in [16,17]:
5 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 15 LS: KL: A (d) ij A (d) ij A (d) ij [X (d) Z (d)t S T ] ij [A (d) SZ (d) Z (d)t S T ] ij + α (11) p j [A(d)T ] pi A (d) ij k [SZ(d) [X ] (d) ] ik jk [A (d) SZ (d) ] ik k [SZ(d) ] jk + α p j [A(d)T ] pi (12) 3 Feature Extraction Based on Auditory Model and Tensor Representation As we know, human auditory system is of powerful capability in speech recognition and speaker recognition. Much of research on auditory model has already shown that the features based on simulation of auditory system are more robust than traditional features under noisy background. In our feature extraction framework, we calculate the frequency selectivity information by imitating the process performed in the auditory periphery and pathway. And the robust speech features are obtained by the projections of the extracted auditory information mapped into multiple interrelated feature subspace via cntf. A diagram of feature extraction and speaker recognition framework is shown in Figure 1. Pre-Emphasis DCT GMM Recognition Result Cochlear Filters Nonlinearity X A Fig. 1. Feature extraction and recognition framework 3.1 Feature Extraction Based on Auditory Model We extract the features by imitating the process occurred in the auditory periphery and pathway, such as outer ear, middle ear, basilar membrane, inner hair-cell, auditory nerves, and cochlear nucleus. We implement traditional pre-emphasis to model the combined outer and middle ear functions, which is x pre (t) =x(t) 0.97x(t 1),wherex(t) is the discrete time speech signal, t =1, 2,...,andx pre (t) is the filtered output signal. The frequency selectivity of peripheral auditory system such as basilar membrane is simulated by a bank of cochlear filters, which have an impulse response in the following form: g i (t) =a i t n 1 e 2πbiERB(fi)t cos(2πf i t + φ i ), (1 i N), (13)
6 16 Q. Wu, L. Zhang, and G. Shi where n is the order of the filters, N is the number of filterbanks. For the ith filter bank, f i is the center frequency, ERB(f i ) is the equivalent rectangular bandwidth (ERB) of the auditory filter, φ i is the phase, and a i,b i R are constants where b i determines the rate of decay of the impulse response, which is related to bandwidth. In order to model nonlinearity of the inner hair-cells, we compute the power of each band in every frame k with a logarithmic nonlinearity: P (i, k) =log(1 + γ {x i g(t)} 2 ), (14) t frame k where P (i, k) is the output power, γ is a scaling constant, and x i g(t)= τ x pre(τ)g i (t τ) is the outputs of each gammatone filterbanks. This model can be considered as average firing rates in the inner hair-cells, which simulate the higher auditory pathway. The resulting power feature vector P (i, k) at frame k with component index of frequency f i, comprises the spectro-temporal power representation of the auditory response. Similar to Mel-scale processing in MFCC extraction, this power spectrum provides a much higher frequency resolution at low frequencies than at high frequencies. 3.2 Sparse Tensor Representation In order to extract robust features based on tensor structure, we model the cochlear power feature of different speakers as 3-order tensor X R N f N t N s. Each feature tensor is an array with three modals frequency time speaker identity which comprises the cochlear power feature matrix X R N f N t of different speakers. Then we transform the auditory feature tensor into multiple interrelated subspaces by cntf to learn the basis functions A (d), (d =1, 2, 3). Figure 2 shows the tensor model for the calculation of basis functions. Compared with traditional subspace learning methods, the extracted tensor features may characterize the differences of speakers and preserve the discriminative information for classification. As described in Section 3.1, the cntf Basis Functions Fig. 2. Tensor model for calculation of basis functions via cntf cochlear power feature can be considered as neurons response in the inner hair-cells. The hair-cells have receptive fields which refer to a coding of sound frequency. Here we employ the sparse localized basis function A R N f R in time-frequency subspace to transform the auditory feature into the sparse feature subspace, where R is the dimension of sparse feature subspace. The representation of auditory sparse feature X s is obtained via the following transformation: X s = ÂX (15)
7 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization (a) Basis functions (b) Examples of encoding vector Fig. 3. Results of cntf applied to the clean speech data. (a) basis functions (100 80) in spectrotemproal domain. (b) Examples for encoding feature vector. where  consists of the nonnegative elements of A 1,i.e. =[A 1 ] +. Figure 3(a) shows an example of basis functions in spectro-temporal domain. From this result we can see that most elements of basis function are near to zero, which accords with the sparse constraint of cntf. Figure 3(b) gives several examples for the encoding feature vector after transformation which also prove the sparse characteristic of feature. Our feature extraction model is based on the fact that in sparse coding the energy of the signal is concentrated on a few components only, while the energy of additive noise remains uniformly spreading on all the components. As a soft-threshold operation, the absolute values of pattern from the sparse coding components are compressed towards to zero. The noise is reduced while the signal is not strongly affected. We also impose orthogonal constraint to cntf which helps to extract the helpful feature by minimizing the redundancy of different basis functions. 4 Experiments Results In this section we provide the evaluation results of a speaker identification system using ANTF. Aurora2 speech corpus is used to test the recognition performance, which is designed to evaluate speech recognition algorithms in noisy conditions. Different noise classes were considered to evaluate the performance of ANTF against MFCC, Mel- NMF, Mel-PCA feature and identification accuracy was assessed. In our experiments the sampling rate of speech signals was 8kHz. For the given speech signals, we employed time window of length samples (5s). For computational simplicity, we selected 36 cochlear filter banks and time duration 10 samples(1.25ms). Then the dimension of the speaker data is = 360. We calculated the basis functions using cntf after the calculation of cochlear power feature. For learning the basis functions in different subspaces, 550 sentences (5 sentences each person) were selected randomly as the training data and 200 dimension sparse tensor representation is extracted. In order to estimate the speaker model and test the efficiency of our method, we use 5500 sentences (50 sentences each person) as training data and 1320 sentences (12 sentences each person) mixed with different kinds of noise were used as testing data. The
8 18 Q. Wu, L. Zhang, and G. Shi Table 1. Identification accuracy in four noisy conditions(subway, car noise, babble, exhibition hall) for Aurora2 noise testing dataset Noise Subway Babble Car noise Exhibition hall SNR(dB) ANTF(%) Mel-NMF(%) Mel-PCA(%) MFCC(%) testing data were mixed with subway, babble, car noise, exhibition hall in SNR intensities of 20dB, 15dB, 10dB and 5dB. For the final feature set, 16 cepstral coefficients were extracted and used for speaker modeling. GMM was used to build the recognizer with 64 gaussian mixtures. For comparison, the performance of MFCC, Mel-NMF and Mel-PCA with 16-order cepstral coefficients are also tested. We use PCA and NMF to learn the part-based representation in the spectro-temporal domain after mel filtering, which is similar to [9]. The feature after PCA or NMF projection was further processed into the cesptral domain viadiscretecosinetransform. Table 1 presents the identification accuracy obtained by ANTF and baseline system in all testing conditions. We can observe from Table 1 that the performance degradation of ANTF is slower with increasing noise intensity that compared with other features. It performs better than other three features in the high noise conditions such as 5dB condition noise. Figure 4 describes the identification rate in four noisy conditions averaged over SNRs between 5-20 db, and the overall average accuracy across all the conditions. The results suggest that this auditory-based tensor representation feature is robust against the additive noise, which indicates the potential of the new feature for dealing with a wider variety of noisy conditions. Identification rate 100% 80% 60% 40% 20% ANTF Mel NMF Mel PCA MFCC 0 Subway Babble Car noise Exhibition hall Average Fig. 4. Identification accuracy in four noisy conditions averaged over SNRs between 5-20dB, and the overall average accuracy across all the conditions, for ANTF and other three features using Aurora2 noise testing dataset
9 Robust Speaker Modeling Based on Constrained Nonnegative Tensor Factorization 19 5 Conclusion In this paper, we presented a novel speech feature extraction framework which is robust to noise with different SNR intensities, for evaluation with identification systems operating under a wide variety of conditions. This approach is primarily data-driven and effectively extracts robust feature of speech called ANTF that is invariant to noise types and interference with different intensities. We derived new feature extraction methods called cntf for robust speaker identification. The research is mainly focused on the encoding of speech based on general higher order tensor structure to extract the robust auditory-based feature from interrelated feature subspace. The frequency selectivity features at basilar membrane and inner hair cells were used to represent the speech signals in the spectro-temporal domain, and then cntf algorithm was employed to extract the sparse tensor representation for robust speaker modeling. The discriminative and robust information of different speakers may be preserved after the multi-related subspace projection. Experiment on Aurora2 has shown the improvement of the noise robustness by the new method, in comparison with baseline systems trained on the same amount of information. Acknowledgment The work was supported by the National High-Tech Research Program of China (Grant No.2006AA01Z125) and the National Natural Science Foundation of China (Grant No ). References 1. Rabiner, L.R., Juang, B.: Fundamentals on Speech Recognition. Prentice Hall, New Jersey (1996) 2. Hermansky, H., Morgan, N.: RASTA Processing of Speech. IEEE Trans. Speech Audio Process 2, (1994) 3. Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, (2000) 4. Reynolds, D.A.: Experimental Evaluation of Features for Robust Speaker Identification. IEEE Trans. Speech Audio Process 2, (1994) 5. Berouti, M., Schwartz, R., Makhoul, J., Beranek, B., Newman, I., Cambridge, M.A.: Enhancement of Speech Corrupted by Acoustic Noise. Acoustics, Speech, and Signal Processing. In: IEEE International Conference on ICASSP 1979, vol. 4, pp (1979) 6. Hermus, K., Wambacq, P., Van hamme, H.: A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition. EURASIP Journal on Applied Signal Processing 1, (2007) 7. Smith, E., Lewicki, M.S.: Efficient Auditory Coding. Nature 439, (2006) 8. Kim, T., Lee, S.Y.: Learning Self-organized Topology-preserving Complex Speech Features at Primary Auditory Cortex. Neurocomputing 65, (2005) 9. Cho, Y.C., Choi, S.: Nonnegative Features of Spectro-temporal Sounds for Classification. Pattern Recognition Letters 26, (2005) 10. Asari, H., Pearlmutter, B.A., Zador, A.M.: Sparse Representations for the Cocktail Party Problem. Journal of Neuroscience 26, (2006)
10 20 Q. Wu, L. Zhang, and G. Shi 11. Carroll, J.D., Chang, J.J.: Analysis of Individual Differences in Multidimensional Scaling via An n-way Generalization of Eckart-Young Decomposition. Psychometrika 35, (1970) 12. Harshman, R.A.: Foundations of the PARAFAC Procedure: Models and Conditions for An Explanatory Multi-modal Factor Analysis. UCLA Working Papers in Phonetics 16, 1 84 (1970) 13. Bro, R.: PARAFAC: Tutorial and Applications. Chemometrics and Intelligent Laboratory Systems 38, (1997) 14. De Lathauwer, L., De Moor, B., Van de walle, J.: A Multilinear Singular Value Decomposition. SIAM Journal on Matrix Analysis and Applications 21, (2000) 15. Kim, Y.D., Choi, S.: Nonnegative Tucker Decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1 8 (2007) 16. Welling, M., Weber, M.: Positive Tensor Factorization. Pattern Recognition Letters 22, (2001) 17. Shashua, A., Hazan, T.: Non-negative Tensor Factorization with Applications to Statistics and Computer Vision. In: Proceedings of the International Conference on Machine Learning (ICML), pp (2005) 18. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear independent components analysis, In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pp (2005) 19. Tao, D.C., Li, X.L., Wu, X.D., Maybank, S.J.: General Tensor Discriminant Analysis and Gabor Feature for Gait Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, (2007) 20. Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13, (2001) 21. Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning Spatially Localized, Parts-based Representation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1 6 (2001) 22. Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Nonsmooth Nonnegative Matrix Factorization. IEEE Transactions on. Pattern Analysis and Machine Intelligence. 28, (2006)
Sparseness Constraints on Nonnegative Tensor Decomposition
Sparseness Constraints on Nonnegative Tensor Decomposition Na Li nali@clarksonedu Carmeliza Navasca cnavasca@clarksonedu Department of Mathematics Clarkson University Potsdam, New York 3699, USA Department
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationSlice Oriented Tensor Decomposition of EEG Data for Feature Extraction in Space, Frequency and Time Domains
Slice Oriented Tensor Decomposition of EEG Data for Feature Extraction in Space, and Domains Qibin Zhao, Cesar F. Caiafa, Andrzej Cichocki, and Liqing Zhang 2 Laboratory for Advanced Brain Signal Processing,
More informationCP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION
International Conference on Computer Science and Intelligent Communication (CSIC ) CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION Xuefeng LIU, Yuping FENG,
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationNon-Negative Tensor Factorisation for Sound Source Separation
ISSC 2005, Dublin, Sept. -2 Non-Negative Tensor Factorisation for Sound Source Separation Derry FitzGerald, Matt Cranitch φ and Eugene Coyle* φ Dept. of Electronic Engineering, Cor Institute of Technology
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationCVPR A New Tensor Algebra - Tutorial. July 26, 2017
CVPR 2017 A New Tensor Algebra - Tutorial Lior Horesh lhoresh@us.ibm.com Misha Kilmer misha.kilmer@tufts.edu July 26, 2017 Outline Motivation Background and notation New t-product and associated algebraic
More informationNonnegative Tensor Factorization with Smoothness Constraints
Nonnegative Tensor Factorization with Smoothness Constraints Rafal ZDUNEK 1 and Tomasz M. RUTKOWSKI 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationarxiv: v3 [cs.lg] 18 Mar 2013
Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young
More informationDominant Feature Vectors Based Audio Similarity Measure
Dominant Feature Vectors Based Audio Similarity Measure Jing Gu 1, Lie Lu 2, Rui Cai 3, Hong-Jiang Zhang 2, and Jian Yang 1 1 Dept. of Electronic Engineering, Tsinghua Univ., Beijing, 100084, China 2 Microsoft
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationSPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION
SPEECH ENHANCEMENT USING PCA AND VARIANCE OF THE RECONSTRUCTION ERROR IN DISTRIBUTED SPEECH RECOGNITION Amin Haji Abolhassani 1, Sid-Ahmed Selouani 2, Douglas O Shaughnessy 1 1 INRS-Energie-Matériaux-Télécommunications,
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationNONNEGATIVE matrix factorization (NMF) is a
Algorithms for Orthogonal Nonnegative Matrix Factorization Seungjin Choi Abstract Nonnegative matrix factorization (NMF) is a widely-used method for multivariate analysis of nonnegative data, the goal
More informationNon-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs Paris Smaragdis TR2004-104 September
More informationOrthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationThe multiple-vector tensor-vector product
I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition
More informationWHEN an object is represented using a linear combination
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 20, NO 2, FEBRUARY 2009 217 Discriminant Nonnegative Tensor Factorization Algorithms Stefanos Zafeiriou Abstract Nonnegative matrix factorization (NMF) has proven
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA. Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley
ORTHOGONALITY-REGULARIZED MASKED NMF FOR LEARNING ON WEAKLY LABELED AUDIO DATA Iwona Sobieraj, Lucas Rencker, Mark D. Plumbley University of Surrey Centre for Vision Speech and Signal Processing Guildford,
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationSingle Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification
Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification Hafiz Mustafa and Wenwu Wang Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey,
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationNon-Negative Matrix Factorization with Quasi-Newton Optimization
Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More information"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"
"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction" Francesco Nesta, Marco Matassoni {nesta, matassoni}@fbk.eu Fondazione Bruno Kessler-Irst, Trento (ITALY) For contacts:
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationAnalysis of polyphonic audio using source-filter model and non-negative matrix factorization
Analysis of polyphonic audio using source-filter model and non-negative matrix factorization Tuomas Virtanen and Anssi Klapuri Tampere University of Technology, Institute of Signal Processing Korkeakoulunkatu
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationA Generative Model Based Kernel for SVM Classification in Multimedia Applications
Appears in Neural Information Processing Systems, Vancouver, Canada, 2003. A Generative Model Based Kernel for SVM Classification in Multimedia Applications Pedro J. Moreno Purdy P. Ho Hewlett-Packard
More informationRobust Sound Event Detection in Continuous Audio Environments
Robust Sound Event Detection in Continuous Audio Environments Haomin Zhang 1, Ian McLoughlin 2,1, Yan Song 1 1 National Engineering Laboratory of Speech and Language Information Processing The University
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationA perception- and PDE-based nonlinear transformation for processing spoken words
Physica D 149 (21) 143 16 A perception- and PDE-based nonlinear transformation for processing spoken words Yingyong Qi a, Jack Xin b, a Department of Electrical and Computer Engineering, University of
More informationCochlear modeling and its role in human speech recognition
Allen/IPAM February 1, 2005 p. 1/3 Cochlear modeling and its role in human speech recognition Miller Nicely confusions and the articulation index Jont Allen Univ. of IL, Beckman Inst., Urbana IL Allen/IPAM
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationA new truncation strategy for the higher-order singular value decomposition
A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21,
More informationSingle-channel source separation using non-negative matrix factorization
Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics
More informationSPARSE NONNEGATIVE MATRIX FACTORIZATION USINGl 0 -CONSTRAINTS. Robert Peharz, Michael Stark, Franz Pernkopf
SPARSE NONNEGATIVE MATRIX FACTORIZATION USINGl 0 -CONSTRAINTS Robert Peharz, Michael Stark, Franz Pernkopf Signal Processing and Speech Communication Lab University of Technology, Graz ABSTRACT Although
More informationSpeaker Verification Using Accumulative Vectors with Support Vector Machines
Speaker Verification Using Accumulative Vectors with Support Vector Machines Manuel Aguado Martínez, Gabriel Hernández-Sierra, and José Ramón Calvo de Lara Advanced Technologies Application Center, Havana,
More informationLinear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex
Linear and Non-Linear Responses to Dynamic Broad-Band Spectra in Primary Auditory Cortex D. J. Klein S. A. Shamma J. Z. Simon D. A. Depireux,2,2 2 Department of Electrical Engineering Supported in part
More informationTheoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices
Theoretical Performance Analysis of Tucker Higher Order SVD in Extracting Structure from Multiple Signal-plus-Noise Matrices Himanshu Nayar Dept. of EECS University of Michigan Ann Arbor Michigan 484 email:
More informationMATRIX COMPLETION AND TENSOR RANK
MATRIX COMPLETION AND TENSOR RANK HARM DERKSEN Abstract. In this paper, we show that the low rank matrix completion problem can be reduced to the problem of finding the rank of a certain tensor. arxiv:1302.2639v2
More informationAN INVERTIBLE DISCRETE AUDITORY TRANSFORM
COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to
More informationHarmonic Structure Transform for Speaker Recognition
Harmonic Structure Transform for Speaker Recognition Kornel Laskowski & Qin Jin Carnegie Mellon University, Pittsburgh PA, USA KTH Speech Music & Hearing, Stockholm, Sweden 29 August, 2011 Laskowski &
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationResearch Article Relationship Matrix Nonnegative Decomposition for Clustering
Mathematical Problems in Engineering Volume 2011, Article ID 864540, 15 pages doi:10.1155/2011/864540 Research Article Relationship Matrix Nonnegative Decomposition for Clustering Ji-Yuan Pan and Jiang-She
More informationDiscovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints
Discovering Convolutive Speech Phones using Sparseness and Non-Negativity Constraints Paul D. O Grady and Barak A. Pearlmutter Hamilton Institute, National University of Ireland Maynooth, Co. Kildare,
More informationA Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang
More informationDetection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors
Detection of Overlapping Acoustic Events Based on NMF with Shared Basis Vectors Kazumasa Yamamoto Department of Computer Science Chubu University Kasugai, Aichi, Japan Email: yamamoto@cs.chubu.ac.jp Chikara
More informationTo be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative Tensor Title: Factorization Authors: Ivica Kopriva and A
o be published in Optics Letters: Blind Multi-spectral Image Decomposition by 3D Nonnegative ensor itle: Factorization Authors: Ivica Kopriva and Andrzej Cichocki Accepted: 21 June 2009 Posted: 25 June
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationNonlinear reverse-correlation with synthesized naturalistic noise
Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California
More informationFuzzy Support Vector Machines for Automatic Infant Cry Recognition
Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla,
More informationA Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise
334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C
More informationTHE task of identifying the environment in which a sound
1 Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification Victor Bisot, Romain Serizel, Slim Essid, and Gaël Richard Abstract In this paper, we study the usefulness of various
More informationIdentification and separation of noises with spectro-temporal patterns
PROCEEDINGS of the 22 nd International Congress on Acoustics Soundscape, Psychoacoustics and Urban Environment: Paper ICA2016-532 Identification and separation of noises with spectro-temporal patterns
More informationNon-negative Matrix Factorization on Kernels
Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing
More informationConvolutional Associative Memory: FIR Filter Model of Synapse
Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationTensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo
Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery Florian Römer and Giovanni Del Galdo 2nd CoSeRa, Bonn, 17-19 Sept. 2013 Ilmenau University of Technology Institute for Information
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationAutomatic Rank Determination in Projective Nonnegative Matrix Factorization
Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and
More informationSparse Sensing in Colocated MIMO Radar: A Matrix Completion Approach
Sparse Sensing in Colocated MIMO Radar: A Matrix Completion Approach Athina P. Petropulu Department of Electrical and Computer Engineering Rutgers, the State University of New Jersey Acknowledgments Shunqiao
More informationCorrespondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure
Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars
More informationENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition
ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationNMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing
NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056
More informationLecture 7: Feature Extraction
Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of
More information/16/$ IEEE 1728
Extension of the Semi-Algebraic Framework for Approximate CP Decompositions via Simultaneous Matrix Diagonalization to the Efficient Calculation of Coupled CP Decompositions Kristina Naskovska and Martin
More informationNon-negative matrix factorization with fixed row and column sums
Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van
More informationWindow-based Tensor Analysis on High-dimensional and Multi-aspect Streams
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams Jimeng Sun Spiros Papadimitriou Philip S. Yu Carnegie Mellon University Pittsburgh, PA, USA IBM T.J. Watson Research Center Hawthorne,
More informationA randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors
A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017 2 Classification of hazardous
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationMVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes
MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report
More informationOBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES
OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES THEORY AND PRACTICE Bogustaw Cyganek AGH University of Science and Technology, Poland WILEY A John Wiley &. Sons, Ltd., Publication Contents Preface Acknowledgements
More informationExploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Interspeech 2018 2-6 September 2018, Hyderabad Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan Signal
More informationDeep NMF for Speech Separation
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Deep NMF for Speech Separation Le Roux, J.; Hershey, J.R.; Weninger, F.J. TR2015-029 April 2015 Abstract Non-negative matrix factorization
More informationNovel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations
Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations Anh Huy Phan 1, Andrzej Cichocki 1,, Rafal Zdunek 1,2,andThanhVuDinh 3 1 Lab for Advanced Brain Signal Processing,
More informationFEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION
FEATURE SELECTION USING FISHER S RATIO TECHNIQUE FOR AUTOMATIC SPEECH RECOGNITION Sarika Hegde 1, K. K. Achary 2 and Surendra Shetty 3 1 Department of Computer Applications, NMAM.I.T., Nitte, Karkala Taluk,
More informationNonnegative Tensor Factorization with Smoothness Constraints
Nonnegative Tensor Factorization with Smoothness Constraints Rafal Zdunek 1 and Tomasz M. Rutkowski 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,
More informationNonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy
Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille
More informationMULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka
MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION Hiroazu Kameoa The University of Toyo / Nippon Telegraph and Telephone Corporation ABSTRACT This paper proposes a novel
More information