Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition of ICA model, assumption and principle of ICA, and FASTICA algorithms. Examples are given to demonstrate the method. We also compare principal component analysis (PCA) with ICA on the decomposition of independent signals mixture problem.. Introduction Blind signal separation problems exist in many fields such as image separation, speech recognition, face recognition and artificial signals removing from electroencephalography (EEG) records. In these problems, the observed data sets are mixed by several unknown independent signal sources. Our purpose is to separate multivariate data sets and recover the original data sets. Independent component analysis (ICA) is one of the powerful methods to successfully disengages independent unknown signals from multivariate data. The multivariate data are considered as linear transformation of the original data set. However, both the mixing matrix and original sources are uncertain. In this article, we firstly describe the cocktail party problem, which is the motivation of ICA method. Then we introduce the definition, assumption and principle of ICA in section 3. FASTICA algorithm, one of the most popular algorithm of ICA, is presented in section 4. An example of mixed sound separation will be illustrated at last section. 2. Problem The simple problem of blind signal separation is the cocktail party problem. Let s imagine that a party is hosted in a big room with a music band S, a singer S 2, and talking people in different corner of the room. We install three audio recorders in the room and record the voice signals (x, x 2, x 3 ) simultaneously. The distance between original sound signals and voice recorders are denoted as a, a 2, a 3..., a 33. Therefore x has linear relationship: The distances a ij and the original signals s i (t) are unknown; we would like to obtain s j (t) from x j (t).

Final Project 2//25 3. Principle of Independent Component Analysis 3. The definition of ICA Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multi-dimensional) statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and nongaussian. (Aapo, Juha Karhunen and Erkki, 2) The linear mixtures x, x 2,..., x n of n independent components can be written as The equation () also can be written as: The matrix notation can expend as: Here A is the mixing matrix, s is the source signal and x is the observed data. Our task is to estimate A and s using only the observeable random vector x. 3.2 Assumption of ICA In order to apply ICA, the components s j should follow the statistically independency and have nongaussian distribution. If s j are Gaussian with zero mean and unity variance and the mixing matrix A is orthogonal, their joint density will be: The joint distribution of two independent Gaussian variables is shown in figure. From the figure we can tell that the density is completely symmetric. Therefore it does not contain any information on the direction of the columns of the mixing matrix A. In this condition, ICA method cannot successfully find source signals from observed data.

Final Project 2//25 Figure. The distribution of multivariable Gaussian distribution. 3.3 Principle of ICA We assume that s j has zero mean and unit variance,. We assume that there is a linear transformation of y=w T x, and y is as close as possible to s. From equation (2) we can get y= w T x= =w T As=x T s, y is a linear combination of s i, w is a row vector of W. From central limit theorem(the distribution of a sum of independent random variables tends toward a Gaussian distribution, under certain condition.), we know that y is more Gaussian than s. Since we want y to be as same as s, we should maximize the non-gaussianity of y. Finally, our problem becomes to find a matrix W, which could maximize the non-gaussianity of w T x. 3.4 Measure of non-gaussianity Before trying to maximize the non-gaussianity of y= w T x, we could find a way to measure non-gaussianity. There are many methods we can use to measure the nongaussianty such as Kurtosis and negentropy. FASTICA algorithm adopts negentropy. Entropy (H) of a random variable is the degree of information that an observation gives, which could be measured by following equation. Negentropy is the difference between the entropy of a Gaussian variable y and the entropy of the variable Y. The formula is given below: y gauss is a Gaussian random variable with unity variance. Negentropy is a great measure of non-gussianity due to the easy statistical justification. However, negentropy is hard to compute because it would require an estimate of the probability density function of variable Y. So a simpler approximation of negentropy would be more useful.

Final Project 2//25 According to the maximum-entropy principle, the approximation of negentropy could be denote as: where G is some nonquadratic function and v is a Gaussian variable of zero mean and unit variance. Different ICA algorithm may choose diverse G and v. 4. The algorithm of FASTICA 4. Data processing Before we apply FastICA, the observed data need to be processed. There are two steps: centering and whitening. To center the data, the mean of x is subtracted from the data set. We denote the centered data set as : The purpose of whitening is to make the component doing spectral decomposition to the covariance matrix uncorrelated and their variance unit. By as following, Then using the transform shown in formula, we can get updated data x* with uncorrelated component and unit variance. 4.2 FastICA FastICA is a very effective way to maximize the non-gaussianity. The principle of FastICAS is to find the maximum w T x with iterations. The input is the mixture signal x, and the output is the recovered independent signals. Step: give an initial value to the weighted vector w; Step2: let the new w calculate by Step3: then w is normalized by the equation w= w + / w + ; Step4: If w is not converged, we go back to step2. Usually, when the squared euclidean distance between new w and old w is less than., stop the iteration.

Final Project 2//25 5. Example of FastICA In our project, we apply FastICA to a toy data set and a mixed sound signals. The package used is FastICA_25 in matlab. 5. Toy data set example We generate two data sets A and B listed below. Each of them contains points. Let the matrix A=, then we can get mixed data x=as. After we apply FastICA to the mixture data set x, we recover the original data sources successfully. As shown in figure 2, the recovered signals are almost same as the original signals except the magnitude and order of the signal. Figure 2. The plot for toy data set 5.2 Separate mixed audio record We get mixed audio signals from website published by Jaakko Särelä, Patrik Hoyer and Ella Bingham. In the website, there are several kinds of source signals. We choose police car alarm, finance news, singer with piano music and a baby crying as source signals; then after clicking "mixture" button that web page automatically generates four mixture signals. Each mixture data set contains 5, data. By entering these data set into matlab, we obtain recovered signals as shown in figure 3. By converting these signals into wav format, each of the four audios contain only one sound of the police car alarm, finance news, singer with piano music and a baby crying, which verify the power of ICA in solving this kind of problem. Principal component analysis (PCA) is another popular method to convert the correlated data into uncorrelated data. Unlike ICA, PCA decomposes data on orthogonal subspace which keep

Final Project 2//25 most information of correlated data set. For this reason we expect that PCA is not a wise choice for cocktail party problem. The results of PCA are listed in figure 4. By converting decomposed signals of PCA into wav format, we can hear that each signal contains mixture sound from four source signals. This result consists with our expectation that PCA is not appropriate to deal with cocktail party problem..5 recoverd data -.5.5.5 2 2.5 3 3.5 4 4.5 5 Figure 3. decomposed data plots by FastICA -.5.5 2 2.5 3 3.5 4 4.5 5 -.5.5 2 2.5 3 3.5 4 4.5 5.5 -.5.5.5 2 2.5 3 3.5 4 4.5 5 -.5.5 2 2.5 3 3.5 4 4.5 5 -.5.5 2 2.5 3 3.5 4 4.5 5.5 -.5.5.5 2 2.5 3 3.5 4 4.5 5 -.5.5 2 2.5 3 3.5 4 4.5 5 Figure 4. decomposed data plots by PCA 6. Conclusion ICA is a very popular way to decompose observed random data into independent components. It adopts iterative algorithm to maximize non-gaussianity of the linear combination of mixed signals. In the other hand, PCA is not a wise choice for independent signal separation since it seeks the largest variance of the data. References Hyvärinen, A., Karhunen, J., Oja, E.: 2, Independent Component Analysis: Algorithms and Applications, Wiley, New York. Särelä. "COCKTAIL PARTY PROBLEM." COCKTAIL PARTY PROBLEM. N.p., 2 Apr. 25. Web. Dec.-Jan. 25.