A Review of Independent Component Analysis Techniques

Size: px

Start display at page:

Download "A Review of Independent Component Analysis Techniques"

Cynthia Booth
5 years ago
Views:

1 A Revie of Independent Component Analysis Techniques Bogdan Matei Electrical and Computer Engineering Department Rutgers University, Piscataay, NJ, , USA 1 Introduction The Blind Signal Separation (BSS) consists in the recovery of unknon independent signals from observed linear combinations of them. To set up the notations, let be the unknon signals and the "! matrix # signal $ % & ' ' ')( to * be the unknon mixing matrix relating the observed $ #,+ (1) here the explicit time-dependence on time as dropped. Without loss of generality it is assumed that the source signals are zero mean -.0/, 21. The model (1) can be modified in order to accommodate the presence of additive Gaussian noise $ #4365 (2) An illustration of a BSS application is the cocktail party problem in hich the speech of several people is received from several microphones present in the room. The task is to recover the speech of the individual speakers from the overlapped talk 14]. The column 7 / of the matrix # is called directional vector associated to the 8 -th source /. When 7 / are knon the spatial filters 9 / can be estimated using the minimum variance distortionless response (MVDR) 9 /: <;>=? 7 /0+ ;? - $ $"@ A+ (3) or the linear constrained minimum variance estimator (LCMV) in the presence of zero-mean hite noise ith noise variance B and independent sources 4] 9 /: C A;?ED B GF IH 7 /J (4)

2 the pseudo-inverse. Blind deconvolution (BD) is a particular case of BSS in hich the mixing matrix # is assumed to have a Toeplitz triangular structure. When # is not triangular the filter is non-causal and hen Throughout the paper K denotes the hermitian operator and L is not Toeplitz the filter is non-stationary 9]. In many applications, hoever, either the mixing matrix # is not available, or the errors in # due to multipath propagation may render the estimates 9 / unusable. In these situations it is better not to make any assumptions about # other than the Toeplitz/triangular structure for BD. Independent component analysis (ICA) is a class of techniques of performing BSS 14], though many references use ICA and BSS interchangeably 22, 6, 9]. We ill also consider that ICA and BSS are synonymous. There are to distinct approaches toards computing the ICA. One employs high order cumulants and is found mainly in the statistical signal processing literature 9, 6, 22] and the other uses the gradient-descent of non-linear activation functions in neuron-like devices and is mainly developed in the neural netorks community 1, 18, 19]. Each of the above approaches has advantages and shortcomings: the computation of high order cumulants is very sensitive to outliers and lack of sufficient support in the data especially for signals having a long-tailed probability density function (p.d.f.), hile the neural-netorks algorithms may become unstable, converge sloly and most often require some extra knoledge about the p.d.f. of the source signals in order to choose the non-linearities in the neurons. There are both batch and adaptive versions of ICA algorithms depending on the particular application. The ICA requires that the number of received signals! be at least equal to the number of independent sources. A close examination of the model (1) reveals the folloing ambiguities 1. The variances - MN/*M can not be determined since $ # diag POQ+R+ OST U V diag AO = +W+ O = G0X #ZY Y J (5) To fix this scale indeterminacy it is assumed that - M / M found only up to a sign., \, hoever 7 / and / can be 2. The order of the components can not be specified. If ] is a permutation matrix, i.e., it has a single 1 on every ro and column, then similarly to (5) $ ^ #_] ` ] = 2 ) " #ZY Y J (6)

3 B o The blind decomposition into # as it ill be shon in Section 2. and can be found only if at most one of the sources / is Gaussian, The ICA techniques ere applied in antenna array processing for the estimation of radiating sources 11], in the separation of the evoked potentials from EEG 19], in the recovery of fetal EKG 8], in the separation of reflections from images 12], etc. In Section 2 the differences beteen the Principal Component Analyis (PCA) and ICA are investigated and it shon that ICA is a generalization of PCA. The cumulant based approach for doing ICA is introduced in Section 4. In Section 5 several gradient based techniques for ICA are presented. Section 6 contains simulations using real data extracted from images. 2 Principal Component Analysis vs. Independent Component Analysis Principal Component Analysis is a standard statistical tool used to find the orthogonal directions corresponding to the highest variance. It is equivalent to a decorrelation of the data using secondorder information. If a? - is the covariance of $, (since E($ ) = b ), let the eigenvalue decomposition of ;? be and assume that the elements of the diagonal matrix d ;? (7) are sorted in decreasing order. In the absence of noise and assuming!hgi, the covariance matrix ;? has rank equal to, therefore the last! D eigenvalues jlk C1)+GmW 3in++!. When independent and identically distributed (i.i.d.) noise ith zero-mean and noise variance B jlk, m" 3in++!. Define the hitening transform o diag U j D B and let the hitened data vector be affect the measurements, the smallest eigenvalues satisfy U=qprs+t j D B o It can be easily seen that the covariance matrix of U=uprs++ j D B G=qp rv+ 1)++ 1s (8) $ (9) is ;Zx " - o diag U0++n+ 1)++ (10) 3

4 F B o ( F In practice ;{z must be estimated from the available data. If available An estimate of the noise variance and When! o diag G j D B samples of $, E }0+l+ are ;? + ~ $ U + $ A s +l+ $ AJ (11) B is given by! D G=qp rv+w j D B k " * j k +!h ˆ (12) G=qp rv++ j D B U=uprs+`1)+l+`1y $ (13), the noise variance can not be estimated and ICA proceeds similarly to the noiseless situation. A degradation in the performance of ICA algorithms is therefore expected depending on the signal to noise (SNR) ratio. Since the last! D components of are identically zero, e ill consider in the folloing that e ork only in the signal subspace and that is -dimensional. The hitening transform o can be determined only up to an orthogonal transform. Indeed, since the covariance of q ;ŒR $"$"@` - (14) The ICA attempts to go further one step then PCA, by finding the orthogonal matrix hich transforms the hitened data into Š having `+ ++ statistically independent. The difference beteen PCA and ICA is illustrated in Figure 1. The independent signals Ž, % 0++1n101 having uniform, respectively Gaussian distribution are transformed into $ using the mixing matrix The PCA hitens the data into " # \ 1)J 0 0 n 1)J 0 n 0 1)J 0 ) 1)J 0 0 (15) $ as shon in Figure 1(e),(f). The hitened uniform data is distributed inside a square, hoever the components are still not independent. The ICA techniques allo the determination of the orthogonal matrix hich aligns the sides of the square ith the coordinate axes. The independent components n/ are recovered up to a scale, sign and order. On the other hand the hitened normal data has circular symmetry, therefore the matrix can not be found. The ICA is thus more general than PCA in trying not only to decorrelate the data, but also to find a decomposition transforming the input into independent components. 4

5 0.4 Uniform 2.5 Gaussian (a) (b) (c) (d) (e) (f) (g) (h) Figure 1: Whitening uniform (left column) and Gaussian data (right column). (a), (b) original data; (c), (d) rotated data using c ; (e), (f) scaled data; (g) The independent components are recovered by finding the corresponding matrix in the case of uniformly distributed data hich aligns the axis of the square ith the ' and axes; (h) the orthogonal matrix can not be found for Gaussian data since the hitened data has circular symmetry. 5

6 o The transform e seek is therefore If an orthogonal matrix Š " še$ š # * % œ Ž (16) can be found hich transforms the mixed signals $ into Š ith independent components, then assuming that at most one independent source / is normally distributed, then Š 2œ_ ith œ being a non-mixing matrix (i.e. it has exactly one non-zero component on each ro and column) 9]. The ICA algorithms attempt to find the matrix š Section 3 several criteria performing ICA are presented. 3 Contrast Functions hich ensures that Š are independent. In Contrast functions (or contrasts) ž serve as objective criteria for ICA. They are real functions of a probability distribution and are designed such that 6, 9] ž Š Ÿ ž š $ Ÿ ž œ_n gž n (17) Most often contrast functions are derived using concepts from information theory like the entropy and the Kullback distance beteen to probability densities 10]. The entropy of a random variable having a p.d.f. is defined as S De 0 0 The Kullback distance beteen the probability densities and is M 0 0 j (19) The Kullback distance doesn t satisfy the distance axioms, since M «ª M, hoever, M g 1 ith equality if and only if. Assuming that the p.d.f. of the independent signals / are knon e have R v " / j (18) R A/ (20) The Maximum Likelihood criterion seeks to find the matrix š such that the distribution of Š še$ is closest to the distribution of. A suitable contrast is then yielded by the Kullback distance beteen these to distributions ž Š Š M.* U T± Š M²n (21) 6

7 F Bell and Sejnoski proposed in 1] the infomax contrast function žÿ³a in the context of a neural netork approach. Cardoso proved that žs³a is in fact identical to ž :µ 7]. When no assumption about the distribution of the sources can be made, the minimization of Š M²s should be done not only ith respect to š but also ith respect to the distribution of. Let Š Y be a random vector hose components / Y are independent and have a p.d.f. equal to the marginal densities of /. Then the Kullback distance beteen Š š and is 6] Š M 0 Š M Š Y *3 Š Y M²n (22) Since Š Y M²n is minimum (zero) hen the distribution of is equal to the distribution of Š Y, the minimization of Š M²n is equivalent to the minimization of Š M Š Y, hich is nothing else then the mutual information beteen Š and Š Y. The mutual information contrast function ž ³ as proposed by Comon 9] and measures the deviation from independence of a random vector ž ³ Š Ÿ Š M Š Y SJ (23) From the definition of the Kullback distance (19), it follos that ž¹ ³ Š g 1 ith equality iff the Š Š Š / J entries of are independent. Since is hite /v3 const (24) and ž ³ Š can be brought to a simpler form, the minimum marginal entropy 2] ž º ³ Š / /Ÿ+ (25) here the» superscript denotes the fact that the minimization of the contrast function ž º ³ Š is done under the hiteness constraint - S. The minimum marginal entropy (25) is related to the negentropy maximization criterion 16] ¼ Š¹½ D Š (26) hich measures the difference in entropy beteen a normally distributed vector Š¹½, having the same covariance matrix as Š, and the vector Š. The normal distribution has the largest entropy among all the distributions ith the same first to moments 10], therefore the negentropy is a strictly positive quantity. From (24) and using the constancy of Š*½ the maximization of the negentropy (26) is equivalent to (25). The contrast function ž¹ ³ is the most general in the sense that it makes no assumption about the distribution of the independent sources. On the other hand hen some knoledge about the p.d.f. of the sources is available ž :µ should be used 2, 6]. 7

8 k É Ã k k Ã / 4 Cumulant Based Approaches The contrast functions (21, 23, 25) can be expressed using high order statistics. The most convenient ay of expressing higher orders statistics is through cumulants 21] because: 1. The computations are simpler ith cumulants than ith moments 2. The cumulants of a sum are the sum of the cumulants 3. The Edgeorth approximations of a distribution can be conveniently expressed using cumulants 4. The higher order cumulants of a normal vector are zero, contrary to the moments The moments and cumulants can be obtained using moment, respectively cumulant generating functions. The moment generating function of a random vector, ²Q+ {+"t is ¾\ ÀS " - Á' * P (27) and the cumulant generating function is defined as Â ÀS " f n S ¾\ (28) The cumulant generating function has a Taylor expansion Â ÀS \ )Æ klå k 3 keãnä kèçéaç / Ã væ É Å kèç É 3i (29) kèç ÉTÃ ÃnÄ / Å kèçéaç /W3 Ã Ä *Æ kêçéaç / É ËÅ kèçéaç / Ç Ë 3i Ç ËGÃ Ã Ä Ã Ä The cumulants Å are obtained from (30) by identifying the corresponding terms. For example the order to cumulants are Å kèç É, the order three cumulants are Å kêç ÉAÇ /, etc. Another ay of defining the cumulants is presented belo 2] in terms of the centered random vectors ÌÍ D - 0. The first order cumulant is hile the second order cumulants are Å k - ² k Ÿ+ (30) Å kèçé - Ì k Ì É Ä SJ (31) 8

9 V Å V Å V Å X k 3 3 V Å 3 Å Å V Å V V Å Å X Thus the second order cumulants are given by the components of the covariance matrix. Similarly the third and fourth order cumulants also called skeness, respectively kurtosis are Å kèçéaç /4 For a normally distributed vector function is - Ì k Ì É Ì / Ä Å kèçépç / Ç Ë - Ì k Ì É / Ì Ì Ä Ë (32) Ä having mean Î x and covariance œzx, the cumulant generating Â xs À¹ " Î x 3 therefore the cumulants ith order greater than to vanish. 4.1 Contrast Function Approximation Using Cumulants œzxqà (33) The approximations of the contrast functions introduced in Section 3 in terms of up to order four cumulants, assuming that distributions having odd moments equal to zero are obtained in terms of Edgeorth expansions 2, 16]. Assuming that Ï and Ð are to random vectors hose p.d.f. function Ï, respectively Ð are close to a normal distribution and have odd moments zero, the Kullback distance beteen the to distributions are given in terms of second and fourth order cumulants by Ï M Ð SÑ kèçé Å)ÒkÈÇ É D Å)ÓkÈÇ É s kèçépç / Ç Ë ÅŽÒkÊÇÉAÇ / Ç Ë D Å)ÓkÈÇÉAÇ / Ç Ë Using the approximation (34) and the independence of the components / corresponding to the sources it follos that ž¹ µ Š Ñ kêçé kêç É D kèçé`ô k É X s kèçépç / Ç Ë kêçéaç / Ç Ë D kèçéaç / Ç ËÕÔ k É / Ë X here the Kronecker product Ô is one only hen all the indexes are identical. Taking into account that after hitening e have Å ž º :µ kèçé Š Ö Ô k É after some algebra (35) can be further simplified kèçéaç / Ç Ë D kèçéaç / Ç Ë Ô k É / Ë X D l kêçéaç / Ç Ë kêçkèçkèçk - M k Ì M Ø )3 const (34) (35) (36) The mutual information contrast ž ³ is approximated by 2] ž¹ ³ Š Ñ kèç ÉÙ ykèç k kèçé X s kèçépç / Ç ËÊÙ ykèçkèçkèçk kèçépç / Ç Ë X (37) 9

10 Ä F é V Å Ä ä V Å é é Ä k D Ä V Å and the minimum marginal entropy contrast ž º Ú³ ž º ³ Š SÑ s kèçéaç / kèçépç / Ç Ë X Ç ËÈÙ skèçkèçkèçk D kèçkèçkêçk X (38) A contrast function similar to (38) hich can be minimized using the joint diagonalization of matrices 3] as proposed by Cardoso 4] žsûqü)ý Þ Š S kèçépç / Ç ËÊÙ ykèçkèç / Ç Ë kèçéaç / Ç Ë X The JADE contrast function žsûqü)ý Þ and the orthogonal maximum likelihood function ž º µ ill be used in the folloing in performing ICA. 4.2 Computation of Fourth Order Cumulants In this section several computational algorithms for determining the fourth order cumulants Å of hitened data Š Ul `++ Š. Define the ß matrix à } Š Gl `+ Š A s ++ Š A. A convenient ay of storing Å kèçépç / Ç Ë is in a square matrix á 4, 22]. The Kronecker product 13] beteen the \Œ! matrix d and _ matrix â is defined as the Í _Œ!ã matrix. â ( ë ì â ì dåä â ßæç çè é... ê â ( â The vectorization of the matrix à Define the and matrices Note that for hitened data ;ò expressed as (39) kèçépç / Ç Ë í (40) is obtained by stacking its columns one belo the other vec à " î Š Gl ï Š q ð à ß î Š Ul ä Š U + Š A s Š A s +ñ Š ä Š ð à ß î Š Ul ä Š Gl `+ Š A y ä Š P y `+ñ Š ä Š ð á ;ò ;{ô" à à (41) (42). Similarly to 22] the fourth order cumulant matrix á can be à D vec a vec D a ä a 10 ;Œô Js; + (43)

11 ö = = ( Å þ F = here J has the same significance ith MATLAB. A different approach for the computation of the fourth moments using cumulant matrices is presented in 4, 2]. Given a ß_ matrix õ and the hitened random vectors Š, the cumulant õ has the mè -th component matrix ö ö õ A k É A complete set of cumulant matrices is given by úsö / Ç Ë Å kêçéaç / Ç Ë ø / Ë (44) - õv D tr õ D õ D ð õ\k ûmm ü ý here õÿþ + n++ (45) and kèçéaçþ Ç î ö ð The Ø fourth order cumulants are therefore determined by computing the using the definition (44). Since ö Another choice is given by the base õþ ö þ@ + if = 3 þ@ + if < = + if { < it follos that ö õ2þ k É (46) cumulant matrices + C0++ (47), 21 for Ö, only 3l cumulant matrices need to be computed 2]. A maximal base containing only cumulant matrices is obtained by performing an eigenvalue decomposition of the matrix á computed in (43) á ^c diag j + j ++ j r + j <j ij r (48) ith here the Œ õk jlkêk m (49) matrix k is obtained by reshaping the Wm -th column of c. 11

12 k Ä o $ 4.3 JADE Algorithm In this section e sho that the contrast function žÿûüvý Þ, hich is an approximation to the minimum marginal entropy contrast (38) can be minimized using the joint diagonalization of the cumulant matrices ö õ k. The orthogonal matrix introduced in Section 2 is found as the minimizer of 4] žsûüvý Þ õòkp (50) here õ\k is a maximal set of cumulant matrices and off # denotes the off diagonal elements of the matrix #. The matrix can be found be a Jacobi technique using Givens rotations 3] as here each of the Givens rotations ; k k k É É k ÉÉ ;Œ ;_ `;{/ (51) is an identity matrix ith the exception of D + M M 3^MN*M & (52) At each step and are found from the minimization of (51). See 3] for the proof and their expression. The JADE algorithm is summarized next. 1. Compute the hitening transform o using (13) 2. Compute the maximal set of cumulant matrices ö õ using (44) or (49) 3. Find the orthogonal matrix hich minimizes (50) 4. Estimate the mixing matrix as #! o H 5. Estimate the independent sources 4.4 MaxKurt/MinKurt Algorithm When the kurtoses of the sources are knon the contrast function ž º µ (37) should be minimized. Under the assumption of equal kurtosis ith knon signs (37) should be maximized for positive kurtosis (a random variable ith positive kurtosis is also knon as super-gaussian or super-kurtic). On the other hand for variables having negative kurtoses (for example the uniform distribution) (37) should be maximized. 12

13 š F F F F F The MaxKurt/ MinKurt algorithm minimizes / maximizes (37) using also the joint diagonalization of matrices. This approach has the advantage of operating directly on the data à, ithout the need of computing explicitly the cumulant matrices ö and yields closed-form solutions for the Givens angles and (52). The disadvantage is that ith bad assumptions about the sign of the kurtosis, the ICA may fail as illustrated in Figure 2 A similar approach as used by Farid and Adelson 12] to separate the reflections and the lighting from images, hoever they have assumed only sources ith positive kurtosis. 5 Gradient Techniques for ICA The approaches presented in Section 4 are batch oriented and can be used only off-line, after the hole data is acquired. In this Section several gradient based techniques are described hich can be used in both on-line and off-line applications. Most of the adaptive approaches ere developed by researchers from the neural netorks community 1, 19, 20, 16]. Assuming that the input data at moment is $ and the separation matrix is š of ICA is, see also (16), the output Š š $ (53) The update rule is of the form 5] š 3<l " D#" %$ U A š (54) here $ñ U is a vector to matrix function dependent on the specific algorithm used and the learning factor " can be constant or time dependent. 5.1 The EASI Adaptive Algorithm An equivariant adaptive separation via independence (EASI) algorithm as proposed in 5]. Based on the notion of relative gradient the folloing update rule can be derived š 3<l " ÖV D#" tî Š D 3'&t Š G D Š (&% Š ð X š (55) ith Š defined in (53). A more stable rule is the normalized version of (55) 3<l W *) D#" Š D &W Š U D Š +&" Š «3 " Š W3 " M Š &W Š U M -, š (56) 13

14 Uniform Exponential ( (a) (b) (c) (d) (e) (f) Figure 2: MaxKurt/MinKurt fails hen the kurtosis signs are rong. Calling the function ith positive sign for the uniform data, or alternatively ith negative sign for the exponential (hich is super-gaussian) causes ICA to find directions rotated ith m from the directions corresponding to independence. (a), (b) The mixed sources; (c), (d) The mixed sources after hitening; (e), (f) The output of ICA using MaxKurt. 14

15 F o F 8 The nonlinearity & can be chosen as &t Š G Š Š (57) though other expressions may be employed. Several observations should be made at this point: (i) The data as assumed centered; (ii) the. hitening is adaptively performed by the term Š D 5.2 Neural Approaches Based on Information Maximization Bell and Sejnoski 1] introduced a class of neural netorks based on information maximization. As mentioned in Section 4 the information maximization criterion is in fact equivalent to the maximum likelihood. The learning rule proposed for super-kurtic signals is š q3i % V 3 " î š D 0/214365q Š U ðgx š (58) See 1] for other learning rules for sub-kurtic signal. The algorithm is very simple, hoever it is rather slo as pointed out in 16] and requires the hitening of $. 5.3 The FastICA Algorithm The FastICA algorithm as proposed in 15] as a fast, batch type algorithm for performing ICA. Assuming hite data the orthogonal matrix (16) the FastICA can be summarized as follos 15, 14]. 1. Center the data, by subtracting the mean 2. Whiten the data using " $ $ D $ Ì $ Ì 3. Choose a random orthogonal matrix. $, ith o defined in (13). 7 $ (59) 4. Compute Š (60) 15

16 8 P 8 7 Y F F 5. Update as ith 6. Orthogonalize :9 3; diag D=< 3?>Z + (61) k 7 k BAŸ k G (62) ; diag ) k DDC A k U, U= (64) 7. If not finished goto Step 4 The non-linearity A can be chosen as 14] AŸ A'S " '6E AŸ P'ã F/1G365q A'S AŸ A'S 'IHKJML DON (65) The FastICA can be applied ithout modification to super and sub kurtic sources and has very fast convergence, since is a Neton type algorithm. In the Section 6 e ill sho some results using the FastIca implementation available for donload from 6 Experiments The ability of several idely used ICA algorithms to recover the independent components from unknon mixtures is analyzed next. Simulations ith real data extracted from images ere performed using Jade, MaxKurt and FastICA algorithms. The pixels from several images play the role of the one-dimensional independent components. For instance assume that and are to images having the same dimensions. Then the independent data is P î vec F vec F ð (66) A random matrix ith positive entries plays the role of the mixing matrix # Then, the mixed signals are obtained as ~ # in the ICA model. (67) 16

17 P The data ~ is analyzed subsequently ith ICA. In Figure 3 the independent and mixed signals obtained from to independent images containing human faces are displayed. Since there is a scale uncertainty, e have scaled the ~ in order to cover all 256 gray levels 12]. The sign uncertainty in the recovered signals! as solved by using the a priori information about the positive elements of the mixing matrix #. The Jade and MaxKurt algorithms ere implemented by the author using the guidelines from the sources available from cardoso/guidesepsou.html. Cardoso s eb site contains also numerous links to papers and people involved in research in ICA. The FastICA algorithm developed by the Independent Component Analysis research group from Helsinki University of Technology as also tested. The FastICA used the nonlinearity AŸ A'S " i' E (see Section 5.3). The EASI adaptive algorithm as also tested, hoever the convergence on the real data used as extremely slo and the performance rather poor. The results of performing ICA on real data are presented in Figures We have noticed that both Jade and FastICA have very close results, ith FastICA being usually somehat faster. MaxKurt also gave good results hen the exact sign of the kurtosis of the sources is knon. A histogram in the images can reveal if the pdf has long tails or short tail and this visual information can help in taking a decision about the sign of the kurtosis. The importance of the correct sign can be observed from Figure 6. An interesting conclusion is illustrated in Figure 9. When the data doesn t cooperate all three algorithms failed to recover perfectly the independent signal. For good data all three give fairly good estimates. An experiment containing three independent sources is presented in Figure 11. The Maxkurt failed to recover the original images for both positive and negative signs for the kurtosis. It can be seen that Jade does a slightly better job of recovering the independent components. 7 Conclusion We have investigated several idely used approaches for performing ICA. We have recovered successfully the original images from mixtures of to and three images. 17

18 (a) (b) (c) (d) Figure 3: Images containing human faces. (a), (b) originals; (c), (d) mixed images. 18

19 (a) (b) (c) (d) Figure 4: The result of ICA on the images from Figure 3. MaxKurt/MinKurt assuming positive kurtosis; (e), (f) FastICA (e) (f) (a), (b) Jade Algorithm; (c), (d) 19

20 (a) (b) (c) (d) Figure 5: Natural Images. (a), (b) originals; (c), (d) mixed 20

21 (a) (b) (c) (d) (e) (f) Figure 6: The result of ICA on the images from Figure 5. (a), (b) Jade Algorithm; (c), (d) MaxKurt/MinKurt assuming positive kurtosis; (e), (f) MaxKurt/MinKurt assuming negative kurtosis. 21

22 (a) (b) (c) (d) Figure 7: Interion images (PUMA set). (a), (b) originals; (c), (d) mixed images. 22

23 (a) (b) (c) (d) (e) (f) Figure 8: The result of ICA on the images from Figure 7. (a), (b) Jade Algorithm; (c), (d) MaxKurt/MinKurt assuming positive kurtosis; (e), (f) FastICA. 23

24 (a) (b) (c) (d) Figure 9: Baboon and Human images. (a), (b) originals; (c), (d) mixed images. 24

25 (a) (b) (c) (d) (e) (f) Figure 10: The result of ICA on the images from Figure 9. (a), (b) Jade Algorithm; (c), (d) MaxKurt/MinKurt assuming positive kurtosis; (e), (f) FastICA algorithm 25

26 (a) (b) (c) Figure 11: The result of ICA using 3 overlapped images. (a) the mixed images; (b) Output of the Jade algorithm; (c) Output of FastICA algorithm. 26

27 References 1] Anthony J. Bell and Terrence J. Sejnoski, An Information-Maximization Approach to Blind Separation and Blind Deconvolution, Neural Computation, Vol. 7, No. 6, pp , ] Jean-François Cardoso, High-order contrast for independent analysis, Neural Computation, Vol. 11, No. 1, pp , ] Jean-François Cardoso and Antoine Souloumiac, Jacobi angles for simultaneous diagonalization, SIAM Journal of Matrix Analysis and Applications, Vol. 17, No. 1, ] Jean-François Cardoso and Antoine Souloumiac, Blind beamforming for non Gaussian signals, IEE Proceedings-F, Vol. 140(6), pp , ] Jean-François Cardoso and Beate Laheld, Equivariant adaptive source separation, IEEE Trans. on Signal Processing, Vol. 44, No. 12, pp , ] Jean-François Cardoso, Blind signal separation: statistical principles, Proceedings of the IEEE, Special issue on blind identification and estimation, R.-W. Liu andl. Tong editors. Vol. 90, No. 8, pp , ] Jean-François Cardoso, Infomax and maximum likelihood for source separation, IEEE Letters on Signal Processing, Vol. 4, No. 4, pp , ] Jean-François Cardoso, Multidimensional independent component analysis, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 98), Seattle, ] Pierre Comon, Independent component analysis. A ne concept?, Signal Processing, Special issue on High-Order Statistics, Vol. 36, No. 3, pp , ] T. M. Cover and J. A. Thomas, Elements of information theory, Wiley series in telecommunications, John Wiley, ] G. Desodt and D. Muller, Complex Independent Component Analysis applied to the Separation of Radar Signals, Proc. EUSIPCO, Barcelona, pp ,

28 12] Hany Farid and Edard H. Adelson, Separating Reflections from Images Using Independent Component Analysis, Journal of the Optical Society of America, Vol. 16, No. 9, pp , ] Alexander Graham, Kronecker Products and Matrix Calculus ith Applications, Ellis Horood series in mathematics and its applications, ] Aapo Hyvärinen and Erkki Oja, Independent Component Analysis: A Tutorial, Available at 15] Aapo Hyvärinen, A fast fixed-point algorithm for independent component analysis, Neural Computation, Vol. 9, No. 7, pp , ] Aapo Hyvärinen, Survey on Independent Component Analysis, Neural Computing Surveys Vol. 2, pp , ] M.C. Jones and Robin Sibson, What is Projection Pursuit, Journal of the Royal Statistical Society A, Vol. 150, Part 1, pp. 1 36, ] Juha Karhunen, Neural Approaches to Independent Component Analysis and Source Separation, In Proc. 4th European Symposium on Artificial Neural Netorks (ESANN 96), Bruges, Belgium, April 24-26, pp , ] Juha Karhunen et al., Applications of Neural Blind Separation to Signal and Image Processing, Proc. IEEE 1997 Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 97), April 21-24, Munich, Germany, pp , ] Petteri Pajunen and Juha Karhunen, Least-Squares Methods for Blind Source Separation Based on Nonlinear PCA, International Journal of Neural Systems, Vol. 8, No. 5 and 6, pp , ] Peter McCullagh, Tensor Methods in Statistics, Monographs on Statistics and Applied Probability, Chapman and Hall, ] Jacob Sheinvald, On Blind Beamforming for Multiple Non-Gaussian Signals and the Constant-Modulus Algorithm, IEEE Transactions on Signal Processing, Vol. 46, No. 7, pp ,

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle