The squared symmetric FastICA estimator

Size: px

Start display at page:

Download "The squared symmetric FastICA estimator"

Emerald Stephanie Bruce
6 years ago
Views:

1 he squared symmetric FastICA estimator Jari Miettinen, Klaus Nordhausen, Hannu Oja, Sara askinen and Joni Virta arxiv:.0v [math.s] Dec 0 Abstract In this paper we study the theoretical properties of the deflation-based FastICA method, the original symmetric FastICA method, and a modified symmetric FastICA method, here called the squared symmetric FastICA. his modification is obtained by replacing the absolute values in the FastICA objective function by their squares. In the deflation-based case this replacement has no effect on the estimate since the maximization problem stays the same. However, in the symmetric case a novel estimate with unknown properties is obtained. In the paper we review the classic deflation-based and symmetric FastICA approaches and contrast these with the new squared symmetric version of FastICA. We find the estimating equations and derive the asymptotical properties of the squared symmetric FastICA estimator with an arbitrary choice of nonlinearity. Asymptotic variances of the unmixing matrix estimates are then used to compare their efficiencies for large sample sizes showing that the squared symmetric FastICA estimator outperforms the other two estimators in a wide variety of situations. Index erms Affine equivariance, independent component analysis, limiting normality, minimum distance index I. INRODUCION We assume that a p-variate random vector x = (x,..., x p ) follows the basic independent component (IC) model, that is, the components of x are linear mixtures of p mutually independent latent variables in z = (z,..., z p ). he model can then be written as x = µ + Ωz, () where µ is a location shift and Ω is a full-rank p p mixing matrix. In independent component analysis (ICA), parameter µ is usually regarded as a nuisance parameter as the main interest is to find, using a random sample X = (x,..., x n ) from the distribution of x, an estimate for an unmixing matrix Γ such that Γx has independent components [], [], []. Note that versions of () also exist where the dimension of z is larger than that of x (the underdetermined case) or the other way around (the overdetermined case), in the latter of which we can simply apply a dimension reduction method at first stage. In this paper we, however, restrict to the case where x and z are of the same dimension. he IC model () is a semiparametric model in the sense that the marginal distributions of the components z,..., z p are unspecified. However, some assumptions on z are needed in order to fix the model: For identifiability of Ω, we need to assume that (A) at most one of the components z,..., z p is gaussian [8]. J. Miettinen and S. askinen are with the Department of Mathematics and Statistics, University of Jyvaskyla, Jyvaskyla, FIN-00, Finland ( jari.p.miettinen@jyu.fi). K. Nordhausen, H. Oja and J. Virta are with the Department of Mathematics and Statistics, University of urku, urku, FIN-00, Finland. Nevertheless, µ, Ω and z are still confounded and the mixing matrix Ω can be identified only up to the order, the signs, and heterogenous multiplication of its columns. o fix µ and the scales of the columns of Ω we further assume that (A) E(z i ) = 0 and E(zi ) = for i =,..., p. After these assumptions, the order and signs of the columns of Ω still remain unidentified. For practical data analysis, this is, however, often sufficient. he impact of the component order on asymptotics is further discussed in Section III. he solutions to the ICA problem are often formulated as algorithms with two steps. he first step is to whiten the data, and the second step is to find an orthogonal matrix that rotates the whitened data to independent components. In the following we formulate such an algorithm at the population level using the random variable x: Let S(F x ) = Cov(x) denote the covariance matrix of a random vector x, where F x denotes the cumulative distribution function x, and write x st = S / (F x )(x E(x)) for the standardized (whitened) random vector. Here the square root matrix S / is chosen to be symmetric. he aim of the second step is to find the rows of an orthogonal matrix U = (u,..., u p ), either one by one (deflation-based approach) or simultaneously (symmetric approach). he symmetric version of the famous FastICA algorithm [] finds the orthogonal matrix U, which maximizes a measure of non-gaussianity for the rotated components, E[G(u j x st )], j= where G is a twice continuously differentiable, nonlinear and nonquadratic function (see Section II-E for more details). In this paper we replace the absolute values by their squares and consider the objective function ( E[G(u j x st )] ), j= as suggested in [9], where the squared symmetric FastICA estimates based on convex combinations of the third and fourth squared cumulants were studied in detail. Notice that replacing the absolute values by their squares in the objective functions has been mentioned in [] and [, Section ], but the idea was never carried further. In Section II we formulate unmixing matrix functionals based on the two symmetric approaches and the deflation-based approach. Some statistical properties of the old estimators are recalled in Section III, and the corresponding results of squared symmetric FastICA are derived for the first time for general function G. he efficiencies of the three estimators are compared in Section IV using both asymptotic results and simulations.

2 II. FASICA FUNCIONALS In this section we give formal definitions of three different, two old and one new, FastICA unmixing matrix functionals with corresponding estimating equations and algorithms for their computation. he formal definition of the squared symmetric FastICA functional is new. he conditions for function G that ensure the consistency of the estimates is also discussed. A. IC functionals Let again F x denote the cumulative distribution function of a random vector x obeying the IC model (), and write Γ(F x ) for the value of an unmixing matrix functional at the distribution F x. Due to the ambiguity in model (), it is natural to require that the separation result Γ(F x )x = Γ(F z )z does not depend on µ and Ω and the choice of z in the model specification. his is formalized in the following. Definition. he p p matrix-valued functional Γ(F x ) is said to be independent component (IC) functional if ) Γ(F x )x has independent components for all x in the IC model (), and ) Γ(F x ) is affine-equivariant in the sense that Γ(F Ax+b ) = Γ(F x )A for all nonsingular p p full-rank matrices A, for all p-vectors b and for all x (even beyond the IC model). he condition Γ(F Ax+b ) = Γ(F x )A can be relaxed to be true only up to permutations and sign changes of their rows. he corresponding sample version ˆΓ = Γ(X) is obtained when the IC functional is applied to the empirical distribution function of X = (x,..., x n ). Naturally, the estimator is then also affine equivariant in the sense that Γ(AX +b n ) = Γ(X)A for all nonsingular p p full-rank matrices A and for all p-vectors b. he rest of this section focuses on three specific FastICA functionals. For recent overviews of FastICA and its variants see also [9] and []. B. Deflation-based approach Deflation-based FastICA functional is based on the algorithm proposed in [] and []. In deflation-based FastICA method the rows of an unmixing matrix are extracted one after another. he method can thus be used in situations where only the few most important components are needed. he statistical properties of the deflation-based method were studied in [] and [], where the influence functions and limiting variances and covariances of the rows of unmixing matrix were derived. Assume now that x is an observation from an IC model () with mean vector µ = E(x) and covariance matrix S = Cov(x). In deflation-based FastICA, the unmixing matrix Γ = (γ,..., γ p ) is estimated so that after finding γ,..., γ j, the jth row vector γ j maximizes a measure of non-gaussianity E[G(γ j (x E(x)))] under the constraints γ l Sγ j = δ lj, l =,..., j, where δ lj is the Kronecker delta δ lj = (0) as l = j (l j). he requirements for the function G and the conventional choices of it are discussed in Section II-E. he deflation-based FastICA functional Γ d satisfies the following p estimating equations [], []: Definition. he deflation-based FastICA functional Γ d = (γ d,..., γ d p) solves the estimating equations ( j ) (γ j ) = S γ l γ l (γ j ), j =,..., p, where and g = G. l= (γ) = E[g(γ (x E(x)))(x E(x))], he estimating equations imply that ΓSΓ = I p, that is, Γ = US / for some orthogonal matrix U. he estimation problem can then be reduced to the estimation of the rows of U one by one. his suggests the following fixed-point algorithm for u j : u j (u j ) ( ) j u j I p u l u l l= u j u j u j, where (u) = E[g(u x st )x st ] and x st is the whitened random variable. However, this algorithm is unstable and we recommend the use of the original algorithm [], a modified Newton-Raphson algorithm, where the first step is u j u j E[g(u j x st )x st ] E[g (u j x st )]u j. For the estimate based on the observed data set, all the expectations above are replaced by the sample averages, e.g., E(x) is replaced by x and S by the sample covariance matrix Ŝ. Notice that neither the estimating equations nor the algorithm fixes the order in which the components are found and the order to some extent depends on the initial value in the algorithm. Since a change in the estimation order changes the unmixing matrix estimate more than just by permuting its rows, deflation-based FastICA is not affine equivariant if the initial value is chosen randomly. o find an estimate which globally maximizes the objective function at each stage, we propose the following strategy to choose the initial value for the algorithm: ) Find a preliminary consistent estimator Γ 0 of Γ. ) Find a permutation matrix P such that E[G((P Γ 0 x) )] E[G((P Γ 0 x) p )]. ) he orthogonal initial value for U is P Γ 0 S /. he preliminary estimate in step can be for example k-jade estimate [0]. his algorithm, as well as all other FastICA algorithms mentioned in this paper, are implemented in R package fica []. he extraction order of the components is highly important not only for the affine equivariance of the estimate, but also for its efficiency. In the deflationary approach, accurate estimation

3 of the first components can be shown to have a direct impact on accurate estimation of the last components as well. [] discussed the extraction order and the estimation efficiency and introduced the so-called reloaded deflation-based FastICA, where the extraction order is based on the minimization of the sum of the asymptotic variances, see Section III. [] discussed the estimate that uses different G-functions for different components. Different versions of the algorithm and their performance analysis are presented, for example, in [], []. C. Symmetric approach In symmetric FastICA approach, the rows of Γ = (γ,..., γ p ) are found simultaneously by maximizing E[G(γ j (x E(x)))] j= under the constraint ΓSΓ = I p. he unmixing matrix Γ optimizes the Lagrangian function L(Γ, Θ) = E[G(γ j (x E(x)))] θ jj (γ j Sγ j ) j= p j= l=j+ θ lj γ l Sγ j, j= where symmetric matrix Θ = [θ lj ] contains p(p + )/ Lagrangian multipliers. Differentiating the above function with respect to γ j and setting the derivative to zero yields E[g(γ j (x E(x)))(x E(x))] s j = θ jj Sγ j + l<j θ lj Sγ l + l>j θ jl Sγ l, where g = G and s j = sign(e[g(γ j (x E(x)))]). hen by multiplying both sides by γ l we obtain γ l E[g(γ j (x E(x)))(x E(x))]s j = θ lj, for l < j, and γ l E[g(γ j (x E(x)))(x E(x))]s j = θ jl, for l > j. Hence the solution Γ must satisfy the following estimating equations Definition. he symmetric FastICA functional Γ s = (γ s,..., γ s p) solves the estimating equations γ l (γ j ) s j = γ j (γ l ) s l and γ l Sγ j = δ lj, where j, l =,..., p, and (γ) = E[g(γ (x E(x)))(x E(x))], g = G, s j = sign(e[g(γ j (x E(x)))] and δ lj is the Kronecker delta. Again, Γ = US / for some orthogonal matrix U. hen the estimation equations for U are u l (u j ) s j = u j (u l ) s l and u l u j = δ lj, where l, j =,..., p, (u) = E[g(u x st )x st ], and the equations suggest the following fixed-point algorithm for U: ( (u ),..., (u p )) U ( ) /. As in the deflation-based approach, a more stable algorithm is obtained when (u j ) is replaced by (u j ) = E[g(u j x st )x st ] E[g (u j x st )]u j. In symmetric FastICA, different initial values give identical unmixing matrix estimates up to order and signs of the rows. D. Squared symmetric approach In squared symmetric FastICA, the absolute values in the objective function of the regular symmetric FastICA are replaced by squares [9]. he squared symmetric FastICA functional Γ s = (γ s,..., γ s p ) maximizes (E[G(γ j (x E(x)))]) j= under the constraint ΓSΓ = I p. Similarly as in Section II-C the Lagrange multipliers method yields the following estimating equations: Definition. he squared symmetric FastICA functional Γ s = (γ s,..., γ s p ) solves the estimating equations γ l (γ j ) = γ j (γ l ) and γ l Sγ j = δ lj, where j, l =,..., p, (γ) = E[G(γ (x E(x)))]E[g(γ (x E(x)))(x E(x))], g = G and δ lj is the Kronecker delta. he estimation equations for U are u l (u j ) = u j (u l ) and u l u j = δ lj, l, j =,..., p, where (u) = E[G(u (x st ))]E[g(u (x st ))x st ]. he following algorithm, which is based on the same idea as the algorithm for symmetric FastICA, can be used to find the solution in practice: ( (u ),..., (u p )) U ( ) /, where (u) = E[G(u (x st ))]{E[g(u (x st ))x st ] E[g (u (x st ))]u}. Notice that (u) = E[G(u (x st ))] (u), and hence the squared symmetric FastICA estimator can be seen as weighted classical symmetric FastICA estimator. he more nongaussian, as measured by function G, an independent component is, the more impact it has in the orthogonalization step. E. Function G he function G is required to be twice continuously differentiable, nonlinear and nonquadratic function such that E[G(z)] = 0, when z is a standard Gaussian random variable. he derivative function g = G is the so-called nonlinearity. he use of classical kurtosis as a measure of non-gaussianity is given by the nonlinearity function g(z) = z (pow) [].

4 Other popular choices include g(z) = tanh(az) (tanh) and g(z) = z exp( az /) (gaus) with tuning parameters a as suggested in [], and g(z) = z (skew). he deflation-based, symmetric and squared symmetric FastICA estimators need extra conditions for G to ensure the consistency of the estimation procedure: One then requires that, for any bivariate Z = (z, z ) with independent and standardized components (E(z) = 0 and Cov(z) = I ) and for any orthogonal matrix U = (u, u ), def E[G(u z)] max( E[G(z )], E[G(z )] ), sym sym E[G(u z)] + E[G(u z)] E[G(z )] + E[G(z )] ( E[G(u z)] ) ( + E[G(u z)] ) (E[G(z )]) + (E[G(z )]) [] and [9] proved that for pow and skew (as well as for their convex combination), all three conditions are satisfied. On the contrary, tanh and gaus do not satisfy the conditions for all choices of the distributions of z and z. For these two nonlinearities [0] found bimodal distributions for which the fixed points of the deflation-based FastICA algorithm are not correct solutions of the IC problem. In Figure we plot the density functions of random variables z and z which serve as examples for a case where none of the three inequalities hold for gaus. hese examples should however be seen as f(x) x f(x) Fig.. Density functions of z and z, which violate the conditions def, sym and sym with nonlinearity gaus. Both distributions are mixtures of four Gaussian distributions. For more details, see Appendix. rare and artificial exceptions and FastICA with tanh and gaus satisfy the conditions for most of pairs of distributions of z we have checked. For example, in Section IV-C FastICA with tanh worked as expected under a wide variety of source distributions. Deflation-based or symmetric FastICA with tanh is perhaps the most popular unmixing matrix estimate. See Section IV-B for the optimal choice of the nonlinearity for a component with a known density function. III. ASYMPOICAL PROPERIES OF HE FASICA ESIMAORS he limiting variances and the asymptotic multinormality of the deflation-based and symmetric FastICA unmixing matrix estimators were found quite recently in [], [], [] and []. In this section, we review these findings and derive the results for the squared symmetric FastICA estimator. x Let now X = (x,..., x n ) be a random sample from the distribution of x following the IC model (). he deflationbased, symmetric and squared symmetric FastICA estimators ˆΓ d, ˆΓ s and ˆΓ s are then obtained when the three functionals are applied to the empirical distribution of X. Due to affine equivariance, we can in the following assume without loss of generality that Ω = I p. Before proceeding we need to make some additional assumptions on the distribution of z i = (z i,..., z ip ), namely, (A) he fourth moments β j = E[zij ] as well as the following expected values ν j = E[G(z ij )], µ j = E[g(z ij )], σ j = Var[g(z ij )], λ j = E[g(z ij )z ij ], δ j = E[g (z ij )], τ j = E[g (z ij )z ij ] exist. Write also s j = sign(ν j ). Write now j = n j = n (g(z ij ) µ j )z i i= G(z ij ) n i= and (g(z ij ) µ j )z i for j =,..., p. o avoid division by zero in the following theorem, assume that ν j (λ j δ j ) 0 for all j =,..., p, with equality for at most one j, see [], who stated that ν j (λ j δ j ) > 0 for most of the reasonable functions G and distributions of z ij. For (pow), ν j (λ j δ j ) > 0 for any distribution with E(zij ). he limiting behavior of the deflation-based FastICA estimate was first given in []. he corresponding results of the symmetrical FastICA estimates are given in the following. he result (iii) is proved in the Appendix and the proof of (ii) is essentially similar to that. In the following theorem, e i is a p-vector with ith element one and others zero and o P () replaces a random variable that converges in probability to zero as n goes to the infinity. i= heorem. Let X = (x,..., x n ) be a random sample from the IC model () satisfying the assumptions (A)-(A): If Ω = I p then there exist a sequence of solutions ˆΓ d, ˆΓs and ˆΓs converging to I p such that (i) (deflation-based) d jl = d lj n Ŝjl + o P (), l < j, n (ˆγ d jj ) = n ( Ŝ jj ) + o P (), l = j, d jl = e l n j λ j n Ŝ jl λ j δ j + o P (), l > j, (ii) (symmetric) n (ˆγ s jj ) = n ( Ŝ jj ) + o P (), l = j, s jl = e l n j s j e j n l s l (λ j s j δ l s l ) n Ŝjl (λ j δ j )s j + (λ l δ l )s l + o P (), l j,

5 (iii) (squared symmetric) n (ˆγ s jj ) = n ( Ŝ jj ) + o P (), l = j, s jl = e l n j e j n l + (ν l δ l ν j λ j ) n Ŝjl ν j (λ j δ j ) + ν l (λ l δ l ) + o P (), l j. For the asymptotical properties of deflation-based FastICA for several nonlinearities g, see []. As seen from heorem (i), the limiting distributions of vectors ˆγ d,..., ˆγ d p depend on the order in which they are found. It is shown in Corollary that, for j < l, the asymptotic variances of ˆγ lj d and ˆγ jl d are equal and depend only on the distribution of the jth independent component. he limiting distributions of the diagonal elements do not depend on the method or the chosen nonlinearity g. [9] discovered that the squared symmetric FastICA estimator with (pow) nonlinearity has the same asymptotics as the JADE (joint approximate diagonalization of eigenmatrices) estimator []. We then have the following straightforward but important corollaries. Corollary. Under the assumptions of heorem, if the joint limiting distribution of n jl and n jl for j l =,..., p and n (Ŝjl δ jl ) for j, l =,..., p, is a multivariate normal distribution, then also the limiting distributions of n vec(ˆγd Ip ), n vec(ˆγ s I p ) and n vec(ˆγ s I p ) are multivariate normal. Corollary. Under the assumptions of heorem, the asymptotic covariance matrix (ASV) of the jth source vectors are given by ASV (ˆγ d j ) = ASV (ˆγ s j) = ASV (ˆγ s j ) = where (i) (deflation-based) (ii) (symmetric) ASV (ˆγ d jl) = ASV (ˆγ jl)e d l e l, l= ASV (ˆγ jl)e s l e l, l= l= ASV (ˆγ s jl )e l e l, and σ l λ l (λ l δ l ) +, l < j ASV (ˆγ jj) d = β j, l = j ASV (ˆγ d jl) = σ j λ j (λ j δ j ), l > j. ASV (ˆγ jj) s = β j, l = j ASV (ˆγ jl) s = σ j + σ l λ j + δ l(δ l λ l ) ((λ j δ j )s j + (λ l δ l )s l ), l j. (iii) (squared symmetric) ASV (ˆγ jj s ) = β j, l = j ASV (ˆγ jl s ) = ν j (σ j λ j ) + ν l (σ l + δ l(δ l λ l )) (ν j (λ j δ j ) + ν l (λ l δ l )), l j. he asymptotic variances of the deflation-based and symmetric FastICA estimators were first derived in [] and [], respectively. he asymptotic covariance matrices of the FastICA estimators for given marginal densities can be computed using the R package BSSasymp []. IV. EFFICIENCY COMPARISONS he asymptotical results derived in Section III allow us to evaluate and compare the performances of the FastICA methods. In this section the asymptotic and finite sample efficiencies of deflation-based and symmetric FastICA estimators are compared to those of squared symmetric FastICA estimators using a wide range of distributions with varying skewness and kurtosis values. A. Performance index We measure the finite sample performance of the unmixing matrix estimates using the minimum distance index [8] ˆD = D(ˆΓΩ) = p inf C C C ˆΓΩ I p () where is the matrix (Frobenius) norm and C is the set of p p matrices with exactly one non-zero element in each column and each row. he minimum distance index is scaled so that 0 ˆD. If Ω = I p and n vec(ˆγ I p ) N p (0, Σ), then the limiting distribution of n(p ) ˆD is that of weighted sum of independent chi squared variables with the expected value race [ (I p D p,p )Σ(I p D p,p ) ], () where D p,p = i (e ie i ) (e ie i ), and means the Kronecker product. Notice that () equals the sum of the limiting variances of the off-diagonal elements of n vec(ˆγ I p ) and therefore p (ASV (ˆγ jl ) + ASV (ˆγ lj )) () j= l=j+ provides a global measure of the variation of the estimate ˆΓ. B. Asymptotic efficiency Let f j be the density function and g j = f j /f j be the optimal location score function for the jth independent component z j. Also let I j = V ar(g j (z j )) be the Fisher information number for the location problem. Write α j := σ j λ j (λ j δ j ) = [(I j )ρ g(z j)g j(z j) z j ], where ρ g(z j)g j(z) z j is the squared partial correlation between g(z j ) and g j (z j ) given z j. hen we have the following.

6 heorem. For our three estimates and for non-gaussian z j and z l, j l, ASV (ˆγ jl ) + ASV (ˆγ lj ) is ( ) ( ) βj βl (α j + ) + (α l + ) β j + β l β j + β l where { βj =, s j (λ j δ j ), and ν j (λ j δ j ) β l = 0, s l (λ l δ l ), and ν l (λ l δ l ) for deflation-based, symmetric and squared symmetric FastICA estimates, respectively. Notice first that the value of ASV (ˆγ jl ) + ASV (ˆγ lj ) only depends on the jth and lth marginal distributions, which means we can restrict the comparison to bivariate distributions as the other components have no impact. If the jth and lth marginal distributions are the same, then the three values of ASV (ˆγ jl )+ ASV (ˆγ lj ) are (α j + ), (α j + ) and (α j + ) and these are minimized with the choice g = g j. So, if z,..., z p are identically distributed with the density function f, then the optimal choice for g is f /f. If the lth component is Gaussian then, λ l = δ l, and for the deflation-based and squared symmetric FastICA estimates, ASV (ˆγ jl ) + ASV (ˆγ lj ) = (α j + ) and for the symmetric FastICA estimate one gets ASV (ˆγ jl ) + ASV (ˆγ lj ) = (α j + ) + (σ l λ l ) = (α j + ) + σ l β j ( ρ g(zil )zil ) where ρ g(zil )z il is the correlation between g(z il ) and z il. he symmetric FastICA is therefore always poorest in this case. For further comparison of the estimators we use two families of source distributions, the standardized exponential power distribution family and the standardized gamma distribution family. he density function of standardized exponential power distribution with shape parameter β is f(x) = β exp{ ( x /α)β }, α Γ(/β) where β > 0, α = (Γ(/β)/Γ(/β)) / and Γ is the gamma function. he distribution is symmetric for any β, and β = gives the normal (Gaussian) distribution, β = gives the heavy-tailed Laplace distribution and the density converges to the low-tailed uniform distribution as β. he density function of standardized gamma distribution with shape parameter α is f(x) = (x + α) α α α/ exp{ (x + α) α}. Γ(α) distributions are right skew, and for α = k/, the distribution is a chi square distribution with k degrees of freedom, k =,,.... When α =, we have an exponential distribution, and the distribution converges to a normal distribution as α. β j We next compare the asymptotic variances of the unmixing matrix estimates with the same nonlinearity and for Ω = I p. For the comparison, write ARE s,d = ASV (ˆγd jl ) + ASV (ˆγd lj ) ASV (ˆγ s ) + ASV (ˆγs jl lj ), for the asymptotic relative efficiency of the squared symmetric estimate with respect to the deflation based estimate, and similarly for ARE s,s. Notice that ARE s,d and ARE s,s depend on the two marginal distribution as well as on the chosen nonlinearity. We then plot the contour maps of the ARE s as functions of the shape parameters of the exponential power or gamma distributions with nonlinearities pow and tanh. he equal efficiency is given by the ARE value and can be found using the bar with contour thresholds on the right-hand side of the figures. As seen in Figure, the squared symmetric FastICA estimator is in most cases more efficient than the deflationbased estimator. In Figure we use ARE s,s similarly for the comparison between symmetric and squared symmetric FastICA. In the figures, the darker the point the higher relative efficiency. Notice that ARE s,d = if one of the components is Gaussian, and ARE s,s = if E(G(z )) = E(G(z )) (e.g. if the two distributions are the same). Figure shows that the areas where ASV (ˆγ jl s ) > ASV (ˆγs jl ) and ASV (ˆγs jl ) < ASV (ˆγ jl s ) are almost equally large, but the differences in favour of the squared symmetric estimator are larger. Also, they occur in cases where the separation of the components is difficult, and hence the efficiency is important there. In able I the values of ARE s,s and ARE s,d are displayed for different pairs of source distributions and for pow in the upper triangle and for tanh in the lower triangle. able I presents a sample of the values of Figure and Figure in a numerical form. C. Finite-sample efficiencies We compare the finite-sample efficiencies of the estimates in a simulation study using the same two-dimensional settings with Ω = I p as in the previous section. In each setting we consider the average of n(p ) ˆD which has limiting expected value ASV (ˆγ jl ) + ASV (ˆγ lj ). hus, the simulation study also illustrates how well the asymptotic results approximate the ˆΓ s i finite-sample variances. Let and ˆΓ s i, i =,..., M, be the estimates from M samples of size n. hen the finite sample asymptotic relative efficiency is estimated by M i= ÂRE s,s = {D(ˆΓ s i Ω) } M s. i= {D(ˆΓ i Ω) } In able II, we list the estimated values of ARE s,s and ARE s,d for the same set of distributions as in able I. For each setting, M = 0000 samples of size n = 000 are generated. In most of the settings, the ratios of the averages are close to the corresponding asymptotical values. When both components are nearly Gaussian, a larger sample size than 000 is required for ÂRE s,s and ÂRE s,d to converge to ARE s,s and ARE s,d, respectively. Also, if E[G(z ij )] E[G(z il )],

7 Defl/Sym, pow Defl/Sym, tanh Fig.. Contour maps of asymptotic relative efficiencies ARE s,d when Ω = I p and the source distributions are exponential power () or gamma () distributions with varying shape parameter values. he nonlinearities are pow on the top row and tanh on the bottom row. then the extraction order of the deflation-based estimate is not always the one which is assumed when computing the asymptotical variances. his may have a large impact on the efficiency of the deflation-based estimate. In Figure we plot the contour maps of the average of n(p ) ˆD over 00 simulation runs for deflation-based, symmetric and squared symmetric FastICA estimates using tanh. Each setting has two independent components with exponential power distribution and varying shape parameter value, and n = 000. Also the contour maps of the limiting expected values are given, and the corresponding maps resemble each other rather nicely. he asymptotical results thus provide good approximations already for n = 000. V. CONCLUSIONS In this paper we investigate in detail the properties of the squared symmetric FastICA procedure, obtained from the regular symmetric FastICA procedure by replacing in the objective function the analytically cumbersome absolute values by their squares. We reviewed in a unified way the estimating equations, algorithms and asymptotic theory of the classical deflation-based and symmetric FastICA estimators and provided similar tools and derived similar results for the novel squared symmetric FastICA. he asymptotic variances were used to compare the three methods in numerous different situations. he asymptotic and finite sample efficiency studies imply, that although none of the methods uniformly outperforms the others, the squared symmetric approach has the best overall performance under the considered combinations of source distributions and nonlinearities. Also, a crude ranking order of (deflation-based, symmetric, squared symmetric) from worst to best can be given and thus the use of the squared symmetric variant over the two other methods is highly recommended. Write ˆ (ˆγ) = n ˆ (ˆγ) = n APPENDIX g(ˆγ (x i x))(x i x) and i= G(ˆγ (x i x)) n i= g(ˆγ (x i x))(x i x). i= he deflation-based, symmetric and squared symmetric FastICA estimators ˆΓ d = Γ d (X), ˆΓs = Γ s (X) and ˆΓ s = Γ s (X) are defined as follows Definition. he deflation-based FastICA estimate ˆΓ d =

8 8 Sym/Sym, pow Sym/Sym, tanh Fig.. Contour maps of asymptotic relative efficiencies ARE s,s when Ω = I p and the source distributions are exponential power () or gamma () distributions with varying shape parameter values. he nonlinearities are pow on the top row and tanh on the bottom row. (ˆγ d,..., ˆγ d p) solves the estimating equations ( j ) ˆ (ˆγ j ) = Ŝ ˆγ l ˆγ l ˆ (ˆγ j ), j =,..., p, () l= Definition. he symmetric FastICA unmixing matrix estimate ˆΓ s = (ˆγ s,..., ˆγ s p) solves the estimating equations ˆγ l ˆ (ˆγ j ) ŝ j = ˆγ j ˆ (ˆγ l ) ŝ l and ˆγ l Ŝ ˆγ j = δ lj, () where j, l =,..., p and δ lj is the Kronecker delta. Definition. he squared symmetric FastICA unmixing matrix estimate ˆΓ s = (ˆγ s,..., ˆγ s p ) solves the estimating equations ˆγ l ˆ (ˆγ j ) = ˆγ j ˆ (ˆγ l ) and ˆγ l Ŝ ˆγ j = δ lj, () where j, l =,..., p and δ lj is the Kronecker delta. o prove heorem, we need the following straightforward result: Lemma. he second set of estimating equations ˆγ j Ŝ ˆγ l = δ lj, j, l =,..., p yields to n (ˆγjj ) = n ( Ŝ I p ) jj + o P () and jl + lj = n Ŝjl + o P (). (8) Proof of heorem (iii) Let us now consider the first set of estimating equations. o shorten the notations, write ˆ (ˆγ j ) = ˆ j. Now l ˆ j = n (ˆγ l e l ) ˆ j + n e l ( ˆ j ν j λ j e j ). By aylor expansion and Slutky s heorem, we have n ( ˆ j ν j λ j e j ) = n ( j ν j λ j e j ) (µ j λ j + ν j τ j )e j e j n x + (λ je j e j + ν j j ) n (ˆγ j e j ) + o P (), where j = E[g (z ij )z i z i ]. Consequently, l ˆ j = n (ˆγ l e l ) ν j λ j e j + e l ( n j (µ j λ j + ν j τ j )e j e j n x + (λ je j e j + ν j j ) n (ˆγ j e j )) + o P () = ν j λ j lj + e l n j + ν j δ j jl + o P (). According to our estimating equations, above expression should be equivalent to j ˆ l = ν l λ l jl + n e j l + ν l δ l lj + o P (),

9 9 ABLE II VALUES OF ÂRE s,s (ON HE OP) AND ÂRE s,d (ON HE BOOM) COMPUED FROM 0000 SAMPLES OF SIZE n = 000 FOR DIFFEREN DISRIBUIONS. L=LAPLACE==EXPONENIAL POWER DISRIBUION WIH β =, N=NORMAL DISRIBUION=, U=UNIFORM DISRIBUION, G=GAMMA DISRIBUION WIH α =. UPPER RIANGLE FOR pow AND LOWER RIANGLE FOR tanh. L.. N U G G G L 0.98\ \ \ N \ \ U \ G \ G \ G \0.9 L.. N U G G G L.9\ \ \ N \ \ U \ G \...0 G \.. G \. Defl tanh, n=000 Sym tanh, n=000 Sym tanh Defl tanh Sym tanh Sym tanh Fig.. Contour maps of the average of n(p ) D ˆ over 00 simulation runs with deflation-based, symmetric and squared symmetric FastICA estimates using tanh on the top and the contour maps of the limiting expected values on the bottom. wo independent components with exponential power distribution and varying shape parameter value. which means that (ν l λ l ν j δ j ) jl (ν j λ j ν l δ l ) lj = n e l j n e j l + o P (). Now using (8) in Lemma, we have that (ν l λ l ν j δ j ) jl + (ν j λ j ν l δ l ) ( n Ŝjl + jl ) = n (e l j e j l ) + o P ().

10 0 ABLE I VALUES OF ARE s,s (ON HE OP) AND ARE s,d (ON HE BOOM) FOR DIFFEREN DISRIBUIONS. L=LAPLACE==EXPONENIAL POWER DISRIBUION WIH β =, N=NORMAL DISRIBUION=, U=UNIFORM DISRIBUION, G=GAMMA DISRIBUION WIH α =. UPPER RIANGLE FOR pow AND LOWER RIANGLE FOR tanh. L.. N U G G G L N U G G G L.. N U G G G L N U G G G hen (ν j (λ j δ j ) + ν l (λ l δ l )) jl = n (e l j e j l ) + (ν l δ l ν j λ j ) n Ŝjl + o P (), which proves the heorem. he densities of z and z in Section II-E are given by f i = π ij N(µ ij, σij), i =,, j= where N(µ, σ ) denotes the Gaussian density function with mean µ and variance σ, and the (rounded) parameter values are π = 0.09, π = 0., π = 0., π = 0.0, π = 0., π = 0., π = 0., π = 0.09, µ =., µ = 0., µ = 0., µ =.9, µ =., µ = 0., µ = 0.8, µ =., σ = 0., σ = 0.0, σ = 0.8, σ = 0., σ = 0., σ = 0., σ = 0., σ = 0.. ACKNOWLEDGMEN REFERENCES [] J.C. Cardoso and A. Souloumiac, Blind beamforming for non gaussian signals, IEE Proceedings-F, vol. 0, pp. 0, 99. [] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications. Wiley, Cichester, 00. [] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Oxford, 00. [] A. Hyvärinen and E. Oja, A fast fixed-point algorithm for independent component analysis, Neural Computation, vol. 9, pp. 8 9, 99. [] A. Hyvärinen, One-Unit Contrast Functions for Independent Component Analysis: A Statistical Analysis, in Neural Networks for Signal Processing VII (Proc. IEEE NNSP Workshop 99), Amelia Island, Florida, pp [] A. Hyvärinen, Fast and Robust fixed-point algorithms for independent component analysis, IEEE rans. Neural Networks, vol. 0, pp. -, 999. [] A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis. John Wiley and Sons, New York, 00. [8] P. Ilmonen, K. Nordhausen, H. Oja and E. Ollila, A new performance index for ICA: properties computation and asymptotic analysis, in Latent Variable Analysis and Signal Processing (Proceedings of 9th International Conference on Latent Variable Analysis and Signal Separation), 9-, 00. [9] Z. Koldovský and P. ichavský, Improved Variants of the FastICA Algorithm, in Advances in Independent Component Analysis and Learning Machines, Academic Press, Springer, 0. [0] J. Miettinen, K. Nordhausen, H. Oja and S. askinen, Fast Equivariant JADE, in Proceedings of 8th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 0), Vancouver, 0, pp.. [] J. Miettinen, K. Nordhausen, H. Oja and S. askinen, fica: Classical, Reloaded and Adaptive FastICA Algorithms, R package version.0-, 0. [] J. Miettinen, K. Nordhausen, H. Oja and S. askinen, BSSasymp: Asymptotic Covariance Matrices of Some BSS Mixing and Unmixing Matrix Estimates, R package version.-, 0. [] J. Miettinen, K. Nordhausen, H. Oja and S. askinen, Deflation-based FastICA with adaptive choices of nonlinearities, IEEE ransactions on Signal Processing, vol., pp., 0. [] J. Miettinen, S. askinen, K. Nordhausen and H. Oja, Fourth moments and independent component analysis, Statistical Science, vol. 0, pp. 90, 0. [] K. Nordhausen, P. Ilmonen, A. Mandal, H. Oja and E. Ollila, Deflationbased FastICA reloaded, in Proc. 9th European Signal Processing Conference 0 (EUSIPCO 0), Barcelona, 0, pp [] E. Ollila, On the robustness of the deflation-based FastICA estimator, in Proc. IEEE Workshop on Statistical Signal Processing (SSP 09), pp. -, 009. [] E. Ollila, he deflation-based FastICA estimator: statistical analysis revisited, IEEE ransactions on Signal Processing, vol. 8, pp. -, 00. [8] F.J. heis, A new concept for separability problems in blind source separation, Neural Computation, vol., pp. 8-80, 00. [9] J. Virta, K. Nordhausen and H. Oja, Joint use of third and fourth cumulants in independent component analysis, arxiv preprint arxiv:0.0. [0]. Wei, On the spurious solutions of the Fastica algorithm, In Statistical Signal Processing (SSP), 0 IEEE Workshop on, pp. -, 0. []. Wei, A convergence and asymptotic analysis of the generalized symmetric FastICA algorithm, IEEE ransactions on Signal Processing, vol., pp. 8, 0. []. Wei, An overview of the asymptotic performance of the family of the FastICA algorithms, in E. Vincent et al. (Eds.): LVA/ICA 0, LNCS 9, Springer, pp., 0. [] V. Zarzoso and P. Comon, Comparative Speed Analysis of FastICA, in M.E. Davies et al. (Eds.): Independent Component Analysis and Signal Separation, LNCS, Springer, pp. 9 00, 00. [] V. Zarzoso, P. Comon, and M. Kallel M., How fast is FastICA? in Proceedings fo th European Signal Processing Conference, pp., 00. his work was supported by the Academy of Finland (grants 9, 9 and 80).

Deflation-based separation of uncorrelated stationary time series

Deflation-based separation of uncorrelated stationary time series Jari Miettinen a,, Klaus Nordhausen b, Hannu Oja c, Sara Taskinen a a Department of Mathematics and Statistics, 40014 University of Jyväskylä,