Independent Component Analysis Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) ICA Fall 2017 1 / 18
Introduction Independent Component Analysis (ICA) falls under the broader topic of Blind Source Separation (BSS). BSS is the separation of a set of source signals (the signals we are looking for) from a set of mixed signals (the signals we can measure) where very little is known about the source signals and the mixing process. With ICA, we are assuming that the mixing is linear, that is the measured signals can be expressed as a linear combination of the source signals. We are also assuming that the source signals are statistically independent. Philippe B. Laval (KSU) ICA Fall 2017 2 / 18
Introduction A classical example of ICA is the cocktail party problem. Imagine a room in which p people are speaking. We are trying to extract each conversation. For this, we use p microphones spread throughout the room to record the various people speaking. We have to extract each conversation from the mixed signals captured by the microphones. Philippe B. Laval (KSU) ICA Fall 2017 3 / 18
Introduction Example Consider two signals s 1 (t) = sin (2t) + 2 cos (3t) and s 2 (t) = sin t cos t shown in the next 2 slides. Consider the linear mixings of these signals x 1 (t) = 2s 1 (t) + 3s 2 (t) and x 2 (t) = 1.5s 1 (t) 2.37s 2 (t) shown in the following 2 slides. s 1 (t) and s 2 (t) are the source signals while x 1 (t) and x 2 (t) are the measured signals and they are a linear combination of s 1 (t) and s 2 (t). Imagine that x 1 (t) and x 2 (t) are known and we have to recover s 1 (t) and s 2 (t) without actually knowing the linear combination which was given here. This seems to be an impossible problem as there are too many unknowns. However, we will see that with some additional assumptions, we can come extremely close to recovering s 1 (t) and s 2 (t). Philippe B. Laval (KSU) ICA Fall 2017 4 / 18
Introduction y 3 2 1 0 1 1 2 3 4 5 6 7 8 9 10 x 2 3 s 1 (t) = sin (2t) + 2 cos (3t) Philippe B. Laval (KSU) ICA Fall 2017 5 / 18
Introduction y 1.0 0.5 0.0 0.5 1 2 3 4 5 6 7 8 9 10 x 1.0 s 2 (t) = sin t cos t Philippe B. Laval (KSU) ICA Fall 2017 6 / 18
Introduction y 5 0 1 2 3 4 5 6 7 8 9 10 x 5 10 x 1 (t) = 2s 1 (t) + 3s 2 (t) Philippe B. Laval (KSU) ICA Fall 2017 7 / 18
Introduction y 4 2 0 2 1 2 3 4 5 6 7 8 9 10 x 4 6 x 2 (t) = 1.5s 1 (t) 2.37s 2 (t) Philippe B. Laval (KSU) ICA Fall 2017 8 / 18
Setup of the Problem s 1 s 2 Let s =. s p represent the source signals and x = represent the signals measured by the microphones. For the ICA model, we assume that each x k and s k is a random variable instead of a variable depending on time. The observed values x k (t) are just a sample of the random variable x k, k = 1, 2,..., p. We call B x the matrix of observation for the x variables and use a similar notation for the other observation matrices. Since both x and s are p 1, if we discretize the time interval in N points (that is if each variable has N measures) then B x and B s will be p N. Without loss of generality, we may assume that both the x i s and the s i s have zero mean. If this is not the case, we first subtract the sample mean to the x i s. Philippe B. Laval (KSU) ICA Fall 2017 9 / 18 x 1 x 2. x p
Setup of the Problem Since the x i s are a linear combination of the s i s, we can write x j = a j1 s 1 + a j2 s 2 +... + a jp s p p = a ji s i i=1 If we let A = (a ij ) be the p p matrix containing the coeffi cients of the above linear combination, then we can write x = As Philippe B. Laval (KSU) ICA Fall 2017 10 / 18
Setup of the Problem The goal of ICA is to find the matrix A (or its inverse) so we can recover the signal s from x. In other words, s = A 1 x. This seems to be an impossible problems because we are trying to find the p 2 entries of A plus the p entries of s from the p entries of x. The approach is to approximate A 1 by a matrix W such that if ^s = W x then ^s s. We will outline some of the steps which make this problem possible but will not go in detail through all the steps. Philippe B. Laval (KSU) ICA Fall 2017 11 / 18
Strategy for Solving ICA We find the matrix A in several steps. Knowing that A has an SVD of the form A = UΣV T, instead of finding A, we find U, Σ, and V. W (A 1 ) will then be W = V Σ 1 U T It is also important to remember that both U and V are orthogonal matrices hence their inverse is the same as their transpose. We will proceed in two stages: 1 Use the covariance of the data x to find Σ and U. 2 Use the assumption of independence of s to find V. We will describe stage 1 in detail but not stage 2. Philippe B. Laval (KSU) ICA Fall 2017 12 / 18
Strategy for Solving ICA To recover Σ and U, we make one additional assumption. We will discuss its meaning below. We assume that the covariance of the source data satisfies B s Bs T = I where I is the identity matrix. Recall from the homework of the section on PCA that if x = As then B x = AB s. Now, we compute the covariance of the measured data: B x B T x = AB s (AB s ) T = UΣ 2 U T Note that this equation shows that under our assumptions, the covariance of the measured data only depends on Σ and U, not V and s. Philippe B. Laval (KSU) ICA Fall 2017 13 / 18
Strategy for Solving ICA Recall that B x Bx T we can write is symmetric hence diagonalizable. This means that B x B T x = PDP T where D is the matrix containing the eigenvalues of B x Bx T and P is the matrix containing the corresponding eigenvectors, written as column vectors. This tell us that U = P and Σ 2 = D will work. Therefore, we have identified U and Σ. Σ is a diagonal matrix containing the square root of the eigenvalues of B x Bx T and U is the matrix containing the corresponding eigenvectors, written as column vectors. Therefore, W = VD 1 2 P T V is the only orthogonal matrix left to find. Philippe B. Laval (KSU) ICA Fall 2017 14 / 18
Whitening of the Data Before we say a few words about V,let us discuss our assumption B s Bs T = I also known as whitening of the data. First, we note that ( P T ) ( B x P T ) T B x = D. ( ) Next, we define x w = D 1 2 P T x. This operation is called whitening of the data and note that B xw B T x w = I. Recall that our goal was to find W such that ^s = W x which amounts to solving ^s = V x w Hence it amounts to finding V from the whitened data x w. Recall V is an orthogonal (rotation) matrix. You will also not that this implies our assumption BŝB Ṱ s = I. Philippe B. Laval (KSU) ICA Fall 2017 15 / 18
Finding V Solving ICA amounts to finding the rotation matrix V so that ^s is statistically independent. As I mentioned, We will not discuss finding V here; it is a much more challenging and advanced problem. It involves using information theory and a quantity called the entropy of a distribution. Algorithms exist to perform this step; we will use them. Philippe B. Laval (KSU) ICA Fall 2017 16 / 18
PCA and MATLAB ICA is not included in the version of MATLAB one buys from MathWorks. However, some implementations can be downloaded from the internet for MATLAB and other platforms. Please, visit the fastica webpage. Philippe B. Laval (KSU) ICA Fall 2017 17 / 18
Exercises See the problems at the end of the notes on Independent Component Analysis. Philippe B. Laval (KSU) ICA Fall 2017 18 / 18