Separation of Different Voices in Speech using Fast Ica Algorithm

Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 364-368 Separation of Different Voices in Speech using Fast Ica Algorithm Dr. T.V.P Sundararajan 1, Dr. P.Sampath 2, T.Kiruthika 3, E.Dharani 4 1,2,3,4 Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, INDIA ABSTRACT Speech signal processing and feature extraction is the initial stage of any speech recognition system. My goal is separating different voices in a single audio file. Imagine you're at a cocktail party. For you it is no problem to follow the discussion of your neighbours, even if there are lots of other sound sources in the room: other discussions in English and in other languages, different kinds of music, etc. You might even hear a siren from the passing-by police car. It is not known exactly how humans are able to separate the different sound sources. Independent component analysis is able to do it, if there are at least as many microphones or 'ears' in the room as there are different simultaneous sound sources. ICA will separate them without knowing anything about the different sound sources. Keywords- Cocktail party, ICA component analysis (PCA) or factor analysis cannot separate the signals: they give components that are uncorrelated, but little more. Independent component analysis (ICA) is a statistical and for revealing computational technique hidden factors that underlie sets of random variables, measurements, or signals.ica defines a generative model for the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed non-gaussian and mutually independent, and they are called the independent components of the observed data. These independent components, also called sources or factors, can be found by ICA. I. INTRODUCTION Imagine that you are at a large party where several conversations are being held at the same time. Despite the strong background, you are able to focus your attention on a specific conversation (of your choice), and ignore all others. At the same time, if someone were to call our name from the other name of the room, we would immediately be able to respond to it. We can separate different flows of information that occur at the same time and share the very same frequency bands. One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it the cocktail party problem. No machine has been constructed to do just this, to filter out one conversation from a number jumbled together. The first thing to note is that independence is a much stronger property than uncorrelatedness. Considering the blind source separation problem, we could actually find many different uncorrelated representations of the signals that would not be independent and would not separate the sources. Uncorrelatedness in itself is not enough to separate the components. This is also the reason why principal Fig1 SeperatingMixing Sources II. METHODOLOGY The objective of ICA is to determine a data transformation that assures that the output signals are as independent as possible. For this purpose, ICA methods often employ higher-order statistics (HOS) to extract a higher-order structure of signals in feature extraction. III. LITERATURE SURVEY Blind Source Separation and Visual Voice Activity Detection for Target Speech Extraction: Many blind source separation (BSS) algorithms have been proposed to recover the unknown source signals from their mixtures, using techniques such as independent component analysis (ICA).Because in blind 364 Copyright 2016. Vandana Publications. All Rights Reserved.

source separation very difficult to separate the information from signal.[1] Advances in Nonlinear Blind Source Separation: Two main results can be stated. First, solving the nonlinear BSS problem appropriately using only the independence assumption is possible only if mixtures as well as separation structure are structurally constrained: for example post nonlinear mixtures, or mappings satisfying addition theorem. Second, prior information on sources, for example bounded or temporally correlated sources, can simplify the algorithms or reduce the indeterminacies in the solutions.[2] Blind Separation of Nonlinear Mixing Signals Using Kernel with Slow Feature Analysis: A hybrid blind source separation approach (HBSSA) for nonlinear mixing model (NLBSS). The proposed hybrid scheme combines simply the kernel feature spaces separation technique (KTDSEP) and the principle of the slow feature analysis (SFA). The nonlinear mixed data is mapped to high dimensional feature space usingkernel-based method. Then, the linear blind source separation (BSS) based on the slow feature analysis (SFA) is used to extract the most slowness vectors among the independent data vectors.[3] Adaptive Approach For Blind Source Separation Of Nonlinear Mixing Signals Nonlinear blind source separation problem (NBSS) method is based on the concept of reducing the high frequency component of the nonlinear mixed signal by dividing the mixed signal into blocks in the time domain, with any arbitrary size. To remove the distortion of the nonlinear function, the discreet cosine transform (DCT) is applied on each block. By adaptively adjusting the size of the DCT block of data, the highly correlated subblocks, can be estimated, then the correlation between the highly correlated sub-blocks can be reduced. To complete the separation process, the linear blind source separation (BSS) algorithm based on the wavelet transform is used to reduced the correlation between the highly correlated DCT subblock. [4] Deterministic Independent Component Analysis: Independent Component Analysis with noisy observations, polynomial-time algorithms to recover non-gaussian source signals and the mixing matrix with a reconstruction error that vanishes at a 1/ T rate using T observations and scales only polynomially with the natural parameters of the problem. Our algorithms and analysis also extend to deterministic source signals whose empirical distributions are approximately independent.[5] Source Separation in Post-Nonlinear Mixtures: The problem of separation of mutually independent sources in nonlinear mixtures. propose theoretical results and prove that in the general case, it is not possible to separate the sources without nonlinear distortion. Therefore, focus on specific nonlinear mixtures known as post-nonlinear mixtures. These mixtures constituted by a linear instantaneous mixture (linear memory less channel) followed by an unknown and invertible memory less nonlinear distortion, are realistic models in many situations and emphasize interesting properties i.e., in such nonlinear mixtures, sources can be estimated with the same indeterminacies as in instantaneous linear mixtures. The separation structure of nonlinear mixtures is a two-stage system, namely, a nonlinear stage followed by a linear stage, the parameters of which are updated to minimize an output independence criterion expressed as a mutual information criterion. The minimization of this criterion requires knowledge or estimation of source densities or of their log-derivatives. [6] Nonlinear Independent Component Analysis: Independent Component Analysis (ICA) and blind source separation (BSS) for nonlinear data models. A fundamental difficulty especially in the nonlinear ICA problem is that it is highly nonunique without a suitable regularization. Special emphasis is given to a promising new approach which applies Bayesian ensemble learning to a flexible multilayerperceptron model for finding the sources and nonlinear mixing mapping that have most probably given rise to the observed mixed data. The efficiency of this method is demonstrated using both artificial and realworld data.[7]. IV. BASIC ICA ESTIMATION ICA was originally developed to deal with BSS Problems which are closely related to the classical cocktail-party problem. Assume that there are three microphones used to record time signals in different locations in one room. The amplitudes of the two signals are respectively denoted asx 1,x 2.. x n where tis the time index. Further assume that each signal is a weighted sum of two different source sound signals which are respectively denoted as The relationship between the two source sound signals and the two microphones sound signals may be described as x 1 (t)=a 11 s 1 (t)+a 12 s 2 (t) x 2 (t)=a 21 s 1 (t)+a 22 s 2 (t) V. PRINCIPLE OF ICA ESTIMATION One way of stating how independence is stronger than un-correlatedness is to say that independence implies nonlinear un-correlatedness: If s1and s2 are independent, then any nonlinear transformations g(s1) and h(s2) are uncorrelated (in the sense that their covariance is zero). In contrast, for two random variables that are merely uncorrelated, such nonlinear transformations do not have zero covariance in general. Thus, we could attempt to perform ICA by a stronger form of de-correlation, by finding a representation where the yiare uncorrelated even after some nonlinear transformations. This gives a simple principle of estimating the matrix W: ICA estimation principle 1: Nonlinear decorrelation. Find the matrix so that for any i j, the components yi and yj are uncorrelated, and the 365 Copyright 2016. Vandana Publications. All Rights Reserved.

transformed components g(yi) and h(yj) are uncorrelated, where g and h are some suitable nonlinear functions. This is a valid approach to estimating ICA: If the nonlinearities are properly chosen, the method does find the independent components. In fact, computing nonlinear correlations between the two mixtures in Fig one would immediately see that the mixtures are not independent. Estimation theory provides the most classic method of estimating any statistical model: the maximum likelihood method. Information theory provides exact measures of independence, such as mutual information. Using either one of these theories, we can determine the nonlinear functions g and h in a satisfactory way. VI. MAXIMUM NON GAUSSANITY Another very intuitive and important principle of ICA estimation is maximum nongaussianity The idea is that according to the central limit theorem, sums of nongaussian random variables are closer to gaussian that the original ones. Therefore, if we take a linear n i=0 combination bixiof the observed mixture variables (which, because of the linear mixing model, is a linear combination of the independent components as well), this will be maximally non-gaussian if it equals one of the independent components. This is because if it were a real mixture of two or more components, it would be closer to a gaussian distribution, due to the central limit theorem. Thus, the principle can be stated as follows, ICA estimation principle 2: Maximum non-gaussianity. Find the local maxima of non-gaussianity of a linear n combination i=0 bixi under the constraint that the variance of yi is constant. Each local maximum gives one independent component. To measure nongaussianity in practice, we could use, for example, the kurtosis. Kurtosis is a higher-order cumulant, which are some kind of generalizations of variance using higher-order polynomials. Cumulants have interesting algebraic and statistical properties which is why they have an important part in the theory of ICA. For example, comparing the nongaussianities of the components given by the axes in fig, An interesting point is that this principle of maximum non-gaussianity shows the very close connection between ICA and an independently developed technique called projection pursuit. In projection pursuit, we are actually looking for maximally non-gaussian linear combinations, which are used for visualization and other purposes. Thus, the independent components can be interpreted as projection pursuit directions. When ICA is used to extract features, this principle of maximum non-gaussianity also shows an important connection to sparse coding that has been used in neuro- scientific theories of feature extraction. The idea in sparse coding is to represent data with components so that only a small number of them are active at the same time. It turns out that this is equivalent, in some situations, to finding components that are maximally non-gaussian. The projection pursuit and sparse coding connections are related to a deep result that says that ICA gives a linear representation that is as structured as possible. This statement can be given a rigorous meaning by information-theoretic concepts (Chapter 10), and shows that the independent components are in many ways easier to process than the original random variables. VII. COVARIANCES There are many other methods for estimating the ICA model as well. Many of them will be treated in this book. What they all have in common is that they consider some statistics that are not contained in the covariance matrix (the matrix that contains the covariance s between all pairs of the xi ). Using the covariance matrix, we can decorrelate the components in the ordinary linear sense, but not any stronger. Thus, all the ICA methods use some form of higher-order statistics, which specifically means information not contained in the covariance matrix. Earlier, we encountered two kinds of higher-order information: the nonlinear correlations and kurtosis. Many other kinds can be used as well. VIII. NUMERICAL METHODS In addition to the estimation principle, one has to find an algorithm for implementing the computations needed. Because the estimation principles use non quadratic functions, the computations needed usually cannot be expressed using simple linear algebra, and therefore they can be quite demanding. Numerical algorithms are thus an integral part of ICA estimation methods. The numerical methods are typically based on optimization of some objective functions. The basic optimization method is the gradient method. Of particular interest is a fixed-point algorithm called Fast ICA that has been tailored to exploit the particular structure of the ICA problem. For example, we could use both of these methods to find the maxima of the nongaussianity as measured by the absolute value of kurtosis. 366 Copyright 2016. Vandana Publications. All Rights Reserved. IX. MEASUREMENT OF NON GAUSINITTY For simplicity, let us assume that all the sources have identical distributions. Our goal is to find the vector w such that y=w t x is equal to one of the sources We make the change of variables z=a t w. y=w t x z=a t w=z T s Thus, y is a linear combination of the sources s According to the CLT, the signal y is more Gaussian than the sources S since it is a linear combination of them, and becomes the least Gaussian when it is equal to

one of the sources Therefore, the optimal w is the vector that maximizes then on Gaussianity of w t x, since this will make y equal to one of the sources. X. PREPROCESSING OF ICA CENTERING The most basic and necessary preprocessing is to center x, i.e. subtract its mean vector m=e{x} so as to make x a zero-mean variable. This implies that s is zeromean as well, as can be seen by taking expectations on both sides. This preprocessing is made solely to simplify the ICA algorithms: It does not mean that the mean could not be estimated. After estimating the mixing matrix A with centered data, we can complete the estimation by adding the mean vector of s back to the centered estimates of s. The mean vector of is given by A -1 m, where m is the mean that was subtracted in the preprocessing. WHITENING Another useful preprocessing strategy in ICA is to first whiten the observed variables. This means that before the application of the ICA algorithm (and after centering), we transform the observed vector x linearly so that we obtain a new vector x bar which is white, i.e. its components are uncorrelated and their variances equal unity. In other words, the covariance matrix of x bar equals the identity matrix: E{XX T =I} The whitening transformation is always possible. One popular method for whitening is to use the eigen-value decomposition (EVD) of the covariance matrix E{XX T }=EDE T, where E is the orthogonal matrix of eigenvectors of E{XX T } and D is the diagonal matrix of its eigenvalues, D=dig{d1,d2, dn}. Note that E{XX T } can be estimated in a standard way from the available sample x 1.x T. Whitening can now be done by X =ED -1/2 E T X where the matrix D -1 is computed by a simple component-wise operation as taking inversion of diagonal operetion. It is easy to check that now E{XX T }=1. Whitening transforms the mixing matrix into a new one, A. The utility of whitening resides in the fact that the new mixing matrix is orthogonal. This can be seen from E{X X T }=A E{SS T }A T =A A T =I Here we see that whitening reduces the number of parameters to be estimated. Instead of having to estimate the n 2 parameters that are the elements of the original matrix A, we only need to estimate the new, orthogonal mixing matrix A. An orthogonal matrix contains n(n-1)/2degrees of freedom. For example, in two dimensions, an orthogonal transformation is determined by a single angle parameter. In larger dimensions, an orthogonal matrix contains only about half of the number of parameters of an arbitrary matrix. Thus one can say that whitening solves half of the problem of ICA. Because whitening is a very simple and standard procedure, much simpler than any ICA algorithms, it is a good idea to reduce the complexity of the problem this way. It may also be quite useful to reduce the dimension of the data at the same time as we do the whitening. Then we look at the eigenvalues d j ofe{xx T } and discard those that are too small, as is often done in the statistical technique of principal component analysis. This has often the effect of reducing noise. Moreover, dimension reduction prevents over learning, which can sometimes be observed in ICA.. Fig 2. The Joint Distribution Of the Whitened Mixtures A graphical illustration of the effect of whitening can be seen, in which the data has been whitened. The square defining the distribution is now clearly a rotated version of the original square in Figure 5. All that is left is the estimation of a single angle that gives the rotation. In the rest of this tutorial, we assume that the data has been preprocessed by centering and whitening. For simplicity of notation, we denote the preprocessed data just by x, and the transformed mixing matrix by A, omitting the tildes. XI. FURTHER PREPROCESSING The success of ICA for a given data set may depende crucially on performing some applicationdependent preprocessing steps. For example, if the data consists of time-signals, some band-pass filtering may be very useful. Note that if we filter linearly the observed signals x i (t) to obtain new signals, say x i * (t), the ICA model still holds for xi*t, with the same mixing matrix. This can be seen as follows. Denote by X the matrix that contains the observations x 1 x T as its columns, and similarly for S. Then the ICA model can be expressed as: X=AS Now, time filtering of X corresponds to multiplying X from the right by a matrix, let us call it M. This gives X*=XM=ASM=AS* which shows that the ICA model remains still valid. THE FASTICA ALGORITHM In the preceding sections, we introduced different measures of nongaussianity, i.e. objective functions for ICA estimation. In practice, one also needs an algorithm for maximizing the contrast function. In 367 Copyright 2016. Vandana Publications. All Rights Reserved.

this section, we introduce a very efficient method of maximization suited for this task. It is here assumed that the data is preprocessed by centering and whitening as discussed in the preceding section. PROPERTIES OF THE FASTICA ALGORITHM The FastICA algorithm and the underlying contrast functions have a number of desirable properties when compared with existing methods for ICA. 1. The convergence is cubic (or at least quadratic), under the assumption of the ICA data model This is in contrast to ordinary ICA algorithms based on (stochastic) gradient descent methods, where the convergence is only linear. This means a very fast convergence, as has been confirmed by simulations and experiments on real data. 2. Contrary to gradient-based algorithms, there are no step size parameters to choose. This means that the algorithm is easy to use. 3. The algorithm finds directly independent components of (practically) any non-gaussian distribution using any nonlinearity g. This is in contrast to many algorithms, where some estimate of the probability distribution function has to be first available, and the nonlinearity must be chosen accordingly. 4. The performance of the method can be optimized by choosing a suitable nonlinearity g. In particular, one can obtain algorithms that are robust and/or of minimum variance. In fact, the two nonlinearities in have some optimal properties. 5. The independent components can be estimated one by one, which is roughly equivalent to doing projection pursuit. This es useful in exploratory data analysis, and decreases the computational load of the method in cases where only some of the independent components need to be estimated. 6. The FastICA has most of the advantages of neural algorithms: It is parallel, distributed, computationally simple, and requires little memory space. Stochastic gradient methods seem to be preferable only if fast adaptivity in a changing environmentis required. XII. CONCLUSION ICA is a very general-purpose statistical technique in which observed random data are linearly transformed into components that are maximally independent from each other, and simultaneously have interesting distributions.ica can be formulated as the estimation of a latent variable model. It provides solution for cocktail problems. The intuitive notion of maximum nongaussianity can be used to derive different objective functions whose optimization enables the estimation of the ICA model. Alternatively, one may use more classical notions like maximum likelihood estimation or minimization of mutual information to estimate ICA; somewhat surprisingly these approaches are (approximatively) equivalent. A computationally very efficient method performing the actual estimation is given by the Fast ICA algorithm. Applications of ICA can be found in many different areas such as audio processing, biomedical signal processing, image processing, telecommunications, and econometrics. REFERENCES [1] Naveen Dubey, Rajesh Mehra, Blind audio source separation (BASS): An unsupervised approach, 2015. [2] Jonathan Le Roux, Emmanuel Vincent, Consistent Wiener Filtering for Audio Source Separation,2013 [3] A.S.Bregman, CompurarionalAudirorSceneAnal - ysis, 1994 [4] D. L. Wang and G. J. Brown, Separation of speech from interfering sounds based on oscillatory correlation,, 2009 [5] Shao, S. Srinivasan, Z. Jin, and D. Wang, A computational auditory scene analysis system for speech segregation and robust speechrecognition,, 2010 [6] Virtanen, Monaural sound sourceseparation by nonnegative matrix factorization with temporal continuity and sparseness criteria, 2007. [7] R. Naik and D. K. Kumar, An Overview of Independent Component Analysis and Its Applications,, 2011 [8] J. V. Stone, Independent Component Analysis: A Tutorial Introduction, 2004 [9] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, 2001. [10] G. R. Naik (Ed.), ICA: Audio Applications, 2012, Fig 3 ICA Estimation of Mixing Signals 368 Copyright 2016. Vandana Publications. All Rights Reserved.