Optimization and Testing in Linear. Non-Gaussian Component Analysis

Size: px
Start display at page:

Download "Optimization and Testing in Linear. Non-Gaussian Component Analysis"

Transcription

1 Optimization and Testing in Linear Non-Gaussian Component Analysis arxiv: v2 [stat.me] 29 Dec 2017 Ze Jin, Benjamin B. Risk, David S. Matteson May 13, 2018 Abstract Independent component analysis (ICA) decomposes multivariate data into mutually independent components (ICs). The ICA model is subject to a constraint that at most one of these components is Gaussian, which is required for model identifiability. Linear non-gaussian component analysis (LNGCA) generalizes the ICA model to a linear latent factor model with any number of both non-gaussian components (signals) and Gaussian components (noise), where observations are linear combinations of independent components. Although the individual Gaussian components are not identifiable, the Gaussian subspace is identifiable. We introduce an estimator along with its optimization approach in which non-gaussian and Gaussian components are estimated simultaneously, maximizing the of each non-gaussian component from Gaussianity while minimizing the of each Gaussian component from Gaussianity. When the number of non-gaussian components is unknown, we develop a statistical test to determine it based on resampling and the of estimated components. Through a variety of simulation studies, we demonstrate the improvements of our estimator over competing estimators, and we illustrate the effectiveness of the test to determine the number of non-gaussian components. Further, we apply our method to real data examples and demonstrate its practical value. Key words: independent component analysis; multivariate analysis; hypothesis testing; subspace estimation; dimension reduction; projection pursuit Research support from an NSF award (DMS ), a Xerox PARC Faculty Research Award, and Cornell University Atkinson Center for a Sustainable Future (AVF-2017). 1

2 1 Introduction Independent component analysis (ICA) finds a representation of multivariate data based on mutually independent components (ICs). As an unsupervised learning method, ICA has been developed for applications including blind source separation, feature extraction, brain imaging, and many others. Hyvärinen et al. (2004) provided an overview of ICA approaches for measuring the non-gaussianity and estimating the ICs. Let Y = (Y 1,..., Y p ) T R p be a random vector of observations. Assume that Y has a nonsingular, continuous distribution F Y, with E(Y j ) = 0 and Var(Y j ) <, j = 1,..., p. Let X = (X 1,..., X p ) T R p be a random vector of latent components. Without loss of generality, X is assumed to be standardized such that E(X j ) = 0 and Var(X j ) = 1, j = 1,..., p. A static linear latent factor model to estimate the components X from the observations Y is given by Y = AX, X = A 1 Y BY where A R p p is a constant, nonsingular mixing matrix, and B R p p is a constant, nonsingular unmixing matrix. Pre-whitened random variables are uncorrelated and thus easier to work with from both practical and theoretical perspectives. Let Σ Y = Cov(Y ) be the covariance matrix of Y, and H = Σ 1/2 Y be an uncorrelating matrix. Let Z = HY = (Z 1,..., Z p ) T R p be a random vector of uncorrelated observations, such that Σ Z = Cov(Z) = I p, the p p identity matrix. The ICA model further assumes that the components X 1,..., X p are mutually independent, in which the number of Gaussian components is at most one. Then the relationship between 2

3 X and Z in the ICA model is X = A 1 Y = A 1 H 1 Z W Z = M T Z, Z = W 1 X = HAX MX = W T X (1) where W = A 1 H 1 R p p is a constant, nonsingular unmixing matrix, and M = HA R p p is a constant, nonsingular mixing matrix. Given that Z are uncorrelated observations, W is an orthogonal matrix, and M is an orthogonal matrix as well. Thus, we have W = M 1 = M T and M = W 1 = W T. Many methods have been proposed for estimating the ICA model in the literature, including the fourth-moment diagonalization of FOBI (Cardoso, 1989) and JADE (Cardoso and Souloumiac, 1993), the information criterion of Infomax (Bell and Sejnowski, 1995), maximizing negentropy in FastICA (Hyvärinen and Oja, 1997), the maximum likelihood principle of ProDenICA (Hastie and Tibshirani, 2003), and the mutual dependence measure of dcov- ICA (Matteson and Tsay, 2017) and MDMICA (Jin and Matteson, 2017). Most of them use optimization to obtain the components such that they have maximal non-gaussianity under the constraint that they are uncorrelated. The goal is to use Z to estimate both W and X, by maximizing the non-gaussianity of the components in X, according to a particular measure of non-gaussianity. To overcome the limit of the ICA model that at most one Gaussian component exists, the NGCA (non-gaussian component analysis) model was proposed Blanchard et al. (2006). Beginning with (1), the components X R p are decomposed into signals S R q and noise N R p q, and M into M S and M N, and W into W S and W N correspondingly. The components in S are assumed to be non-gaussian, while the components in N are assumed to be Gaussian. The NGCA model further assumes that the non-gaussian components S are independent of the Gaussian components N, the components in N are mutually independent and thus are multivariate normal, although the components in S may remain mutually 3

4 dependent. Then the relationship between X and Z in the NGCA model is S = X = W Z = W SZ, N W N Z ] Z = MX = [M S M N S = M S S + M N N N (2) where M S R p q has rank q, M N R p (p q) has rank p q, W S R q p has rank q, and W N R (p q) p has rank p q. The goal is to estimate the non-gaussian subspace spanned by the rows in W S corresponding to S, as the Gaussian subspace corresponding to N is uninteresting. Kawanabe et al. (2007) developed an improved algorithm based on radial kernel functions. Theis et al. (2011) proved a necessary and sufficient condition for the uniqueness of the non-gaussian subspace from projection methods. Bean (2014) developed theory for an approach based on characteristic functions. Sasaki et al. (2016) introduced a least-squares NGCA (LSNGCA) algorithm based on least-squares estimation of log-density gradients and eigenvalue decomposition, and Shiino et al. (2016) proposed a whitening-free variant of LSNGCA. Nordhausen et al. (2017) developed asymptotic and bootstrap tests for the dimension of non-gaussian subspace based on the FOBI method. To incorporate nice characteristics from both the ICA model and NGCA model, we consider the LNGCA (linear non-gaussian component analysis) model proposed in Risk et al. (2017) as a special case of the NGCA model, which is the same as the the NGICA model in Virta et al. (2016). In the form of (2), the LNGCA model further assumes that the components X 1,..., X p are mutually independent, and allows any number of both non- Gaussian components and Gaussian components among them. Similarly, we have W = M 1 = M T and M = W 1 = W T. Then the relationship between X and Z in the LNGCA 4

5 model is S = X = W Z = W SZ = M T Z = M S T Z, N W N Z MN T Z ] Z = MX = [M S M N S = M S S + M N N N where M S R p q has rank q, M N R p (p q) has rank p q, W S R q p has rank q, and W N R (p q) p has rank p q. Risk et al. (2017) presented a parametric LNGCA using the logistic density and a semi-parametric LNGCA using tilted Gaussians with cubic B-splines to estimate this model. Virta et al. (2016) used projection pursuit to extract the non-gaussian components and separate the corresponding signal and noise subspaces where the projection index is a convex combination of squared third and fourth cumulants. In this paper, we study the LNGCA model by taking advantage of its flexibility in the number of Gaussian components, and mutual independence assumption between all components. With pre-whitening, the Gaussian contribution to the model likelihood is invariant to linear transformations that preserve unit variance, as shown in Risk et al. (2017). Thus, an alternative framework is necessary in order to leverage the information in the Gaussian subspace. This motivates our novel objective function, which estimates the unmixing matrix W by maximizing the from Gaussianity for the non-gaussian components and minimizing the for the Gaussian components, thereby explicitly estimating the Gaussian subspace to improve upon constrained maximum likelihood approaches. The rest of this paper is organized as follows. In Section 2, we introduce the functions to measure the distance from Gaussianity. In Section 3, we propose a framework of LNGCA estimation given the number of non-gaussian components q. In Section 4, we introduce a sequence of statistical tests to determine the number of non-gaussian components q when it 5

6 is unknown. We present the simulation results in Section 5, followed by real data examples in Section 6. Finally, Section 7 is the summary of our work. The following notations will be used throughout this paper. Let O a b denote the set of a b matrices whose columns are orthonormal. Let P ± a a denote the set of a a signed permutation matrices. Let U F = i,j U 2 ij denote the Frobenius norm of U Ra b. 2 Discrepancy 2.1 Population Discrepancy Measures In order to find the best estimate for the LNGCA model, we need a criterion to measure the between X and its underlying assumption, i.e., S should be far from Gaussianity and N should be close to Gaussianity. Specifically, we choose a general class of functions D that measure the D between each component X j and Gaussianity. Hastie and Tibshirani (2003) proposed the expected log-likelihood tilt function to measure the from Gaussianity in the estimation of the ICA model. Suppose the density of X j is f j, j = 1,..., p, and each of the densities f j is represented by an exponentially tilted Gaussian density f j (x j ) = φ(x j )e g j(x j ) where φ is the standard univariate Gaussian density, and g j is a smooth function. The logtilt function g j represents departures from Gaussianity, and the expected log-likelihood ratio between f j and the Gaussian density is GPois(X j ) = E[g j (X j )]. 6

7 Virta et al. (2015, 2016) proposed the use of the Jarque-Bera (JB) test statistic (Jarque and Bera, 1987) JB(X j ) = Skew(X j ) Kurt(X j) to measure the from Gaussianity in the estimation of ICA and LNGCA models, where Skew(X j ) = ( E[X 3 j ] ) 2, Kurt(X j ) = ( E[X 4 j ] 3 ) 2 are squared skewness and squared excess kurtosis. In fact, Virta et al. (2015, 2016) studied a linear combination of Skew and Kurt, i.e., αskew + (1 α)kurt, and advised the choice of α = 0.8, which corresponds to JB. This takes deviation of both skewness and kurtosis into account, while Skew and Kurt are valid functions as well. Notice that JB(X j ), Skew(X j ), and Kurt(X j ) are simplified due to standardized X j. 2.2 Empirical Discrepancy Measures Let Y = {Y i from F Y, and let Y j = (Y i 1,..., Y i p ) : i = 1,..., n} R n p be an i.i.d. sample of observations = {Y i j : i = 1,..., n} R p be the corresponding i.i.d. sample of observations from F Yj, j = 1,..., p, such that Y = {Y 1,..., Y p }. Let Σ Y be the sample covariance matrix of Y, and Ĥ = Σ 1/2 Y be the estimated uncorrelating matrix. Although the covariance Σ Y is unknown in practice, the sample covariance Σ Y is a consistent estimate under the finite second-moment assumption. Let Ẑ = YĤT R n d be the estimated uncorrelated observations, such that ΣẐ = I d, and ΣẐ a.s. I d as n. To simplify notation, we assume that Z, an uncorrelated i.i.d. sample is given with mean zero and unit variance. Let X = {X i = (X i 1,..., X i p) : i = 1,..., n} = [S, N] = ZW T R n p be the sample of X, where S R n q and N R n (p q), and let X j = {X i j : i = 1,..., n} 7

8 R n be the sample of X j, i.e., the jth column in X. Similarly, we can define S j, N j R n. Notice that X j, S j, N j has sample mean 0 and sample variance 1. We obtain the empirical D by replacing expectations by sample averages. The empirical GPois is given by GPois(X j ) = 1 n n ĝ j (Xj) i i=1 where ĝ j is estimated by maximum penalized likelihood, maximizing the criterion p j=1 { 1 n n [ log φ(x i j ) + ĝ j (Xj) ] i λ j i=1 ĝ j 2 (x)dx } subject to φ(s)eĝj(x) dx = 1 where ĝ j is estimated by a smoothing spline, and λ j is selected by controlling the degrees of freedom of the smoothing spline, which is 6 by default in the R package ProDenICA (Hastie and Tibshirani, 2010). The empirical JB is given by JB(X j ) = Skew(X j ) Kurt(X j) where Skew(X j ) = Kurt(X j ) = ( 1 n ( 1 n ) 2 n (Xj k ) 3, k=1 n (Xj k ) 4 3 k=1 are the empirical Skew and empirical Kurt. We will see that JB (joint use of skewness and ) 2 8

9 kurtosis) performs much better than either Skew (use of skewness only) or Kurt (use of kurtosis only) alone in the simulations of Section 5, which was shown in Virta et al. (2016) as well. 3 Optimization Strategy Using D to measure the difference between X j and Gaussianity, we seek an optimal W such that X is most likely to fit the underlying model with independent components. For the ICA model, a classical ICA estimator to estimate W in FastICA (Hyvärinen and Oja, 1997) and ProDenICA (Hastie and Tibshirani, 2003) is defined by Ŵ = arg max W O p p p j=1 D(X j ). We can naturally extend the ICA estimator to an LNGCA estimator given q as Ŵ max S = arg max W O p q j:x j S D(X j ) = arg max W O p q q j=1 D(S j ) (3) which is named the max estimator, as we maximize the between non-gaussian components and Gaussianity. The algorithm for the max estimator is described in Alg. 1, where the fixed point algorithm is elaborated in Hastie and Tibshirani (2003). The objective function used in Spline-LCA from Risk et al. (2017) is the same as the max estimator when f is GPois, but the optimization differs, which will be explored in Section 5. are Given the estimated unmixing matrix Ŵ S max, the estimated non-gaussian components Ŝ = Z(Ŵ max S ) T. Since any rotation of a Gaussian distribution will lead to the same Gaussian distribution, the Gaussian components N are not identifiable. However, we can benefit from estimating the Gaussian subspace for the LNGCA model, since the column space of W N is identifiable. 9

10 Algorithm 1 LNGCA algorithm for the max estimator 1. Initialize W p q. 2. Alternate until convergence of W, using the Frobenius norm. (a) Given W, estimate the D(S j ) of component S j for each j. (b) Given D(S j ), j = 1,..., q, perform one step of the fixed point algorithm towards finding the optimal W. Taking N into account by optimizing S and N simultaneously in the objective function, we expect to recognize the Gaussian subspace, which helps shape the non-gaussian subspace because the non-gaussian subspace is the complement of the Gaussian subspace. Motivated by this optimization idea, we propose a new LNGCA estimator given q as Ŵ max-min = arg max W O p p j:x j S D(X j ) j:x j N D(X j ) = arg max W O p p [ q j=1 D(S j ) p q j=1 D(N j ) ] (4) which is named the max-min estimator for the LNGCA model, as we maximize the between non-gaussian components and Gaussianity, and minimize the between Gaussian components and Gaussianity simultaneously. The algorithm for the maxmin estimator is described in Alg. 2, where the fixed point algorithm is elaborated in Hastie and Tibshirani (2003). We will see that the max-min estimator (joint optimization of S and N) performs much better than the max estimator (optimization of S only) in the simulations of Section 5. Algorithm 2 LNGCA algorithm for the max-min estimator 1. Initialize W p p. 2. Alternate until convergence of W, using the Frobenius metric. (a) Given W, estimate the D(X j ) of component X j for each j. (b) Sort components by D(X j ) in decreasing order. (c) Flip the sign of D(X j ) of the last p q components. (d) Given D(X j ), j = 1,..., p, perform one step of the fixed point algorithm towards finding the optimal W. 3. Sort components by D(X j ) in decreasing order. 10

11 Given the estimated unmixing matrix Ŵ max-min, the estimated non-gaussian and Gaussian components are X = Z(Ŵ max-min ) T. However, it is not clear which component in X belongs to Ŝ or N, since Ŝ and N are obtained together instead of Ŝ only. The solution is to sort the independent components X 1,..., X p by value D(X i ) in decreasing order, and obtain the ordered independent components X (1),..., X (p). Given that there are q non- Gaussian components, it is natural to take S = (X (1),..., X (q) ) T and N = (X (q+1),..., X (p) ) T based on the function measuring non-gaussianity. As the q non-gaussian components in S have the q-largest values D among X 1,..., X p, the estimated non- Gaussian components in Ŝ are expected to have the q-largest empirical values D among X 1,..., X p. Nevertheless, we cannot sort X by the empirical to determine which component in X belongs to S or N at the beginning, and then stick to the order throughout the iterative algorithm and conclude which component in X belongs to Ŝ or N in the end, since the optimization does depend on the initialization, and the order of components may change after each iteration. Instead, we repeatedly sort X by empirical value and adaptively determine the components in S and N at the end of each iteration in Alg 2. Finally, when the algorithm converges, we sort the estimated components X 1,..., X p by empirical value, and obtain the ordered estimated components X (1),..., X (p). Then we take Ŝ = [ X (1),..., X (q) ], and N = [ X (q+1),..., X (p) ]. Accordingly, we decompose Ŵ into ŴS and ŴN, and M = Ŵ T into M S and M N. 4 Testing and Subspace Estimation In practice, the number of non-gaussian components q is unknown. Following the convention of ordered components with respect to non-gaussianity, we introduce a sequence of statistical tests to decide q. The main idea is that, for any j < j, X (j ) is more likely to be non- 11

12 Gaussian than X (j) in terms of value D. If there are k non-gaussian independent components, then X (1),..., X (k) are non-gaussian, and X (k+1),..., X (p) are Gaussian. Based on this heuristic, we propose a sequence of hypotheses for searching q as H (k) 0 : X (1),..., X (k 1) are non-gaussian and X (k),..., X (p) are Gaussian, H (k) A : X (1),..., X (k) are non-gaussian which is equivalent to testing whether there are exactly k 1 non-gaussian components or at least k non-gaussian components. Under H (k) 0, we first run the optimization from X = ZW T using the max-min estimator with q = k 1, in which we estimate Ŵ and X = [ X (1),..., X (p) ] from the sample data Z. One thing worth mentioning is that X depends on k as the optimization depends on k, although we suppress the notation here. Next we repeat the following resampling procedure for B times: during the bth time, we randomly generate independent Gaussian G (b) = [G (b) 1,..., G (b) p k+1 ] with the same number of observations as Z, and construct pseudo components X (b) = [ X (1),..., X (k 1), G (b) ]. Based on the estimated unmixing matrix Ŵ, we use the estimated mixing matrix M = Ŵ T to construct pseudo observations Z (b) = X (b) M T. Then we run the optimization from X (b) = Z (b) W T using the max-min estimator with q = k 1, and we estimate Ŵ (b) and X (b) = [ (b) (b) X (1),..., X (p) ] from the pseudo data Z(b). At last, we calculate an approximate p-value by comparing D( X (k) ) to k D( X j=1 (j) ) to k (b) j=1 D( X (j) ) as (b) D( X (k)), or p curr = p cumu = { # D( X(k) ) D( X (b) (k) ) }, { B k # D( X j=1 (j) ) k j=1 B D( X (b) (j) ) } (5) 12

13 which we name the current method and the cumulative method respectively. Our test shares the resampling technique with Nordhausen et al. (2017). However, there are two major differences. On the one hand, our test does not need to bootstrap on X, and thus saves remarkable computational cost, and we will show that it accurately estimates the number of components. On the other hand, our test is more flexible on the test statistic, as it does not need to match what is used in the objective function in the optimization. The algorithm for our sequential test is summarized in Alg. 3 below. Algorithm 3 The algorithm for the sequential test H (k) 0 1. Estimate Ŵ from X = ZW T using the max-min estimator with q = k Estimate X = ZŴ T = [ X (1),..., X (p) ]. 3. Repeat the procedure for B times: (a) Generate independent Gaussian G (b) = [G (b) 1,..., G (b) p k+1 ]. (b) Construct X (b) = [ X (1),..., X (k 1), G (b) ]. (c) Construct Z (b) = X (b) M T = X (b) Ŵ. (d) Estimate Ŵ (b) from X (b) = Z (b) W T using the max-min estimator with q = k 1. (e) Estimate X (b) = Z (b) (Ŵ b ) T (b) (b) = [ X (1),..., X (p) ]. 3. Calculate p-value using the current or cumulative method in (5). The proposed procedure involves a sequence of tests, but the number of tests can be dramatically reduced by using a binary search. This approach quickly narrows in on the selected q because we focus on the boundary that the p-value crosses a specific significance level. As we expect no more than log 2 p tests, it makes sense to apply the Bonferroni correction. Note that even for fairly large p, the number of tests remains reasonable, e.g., p = 10, 000 implies fewer than fourteen tests. Multiple testing in this setting of sequential testing may become more problematic as the dimension or search space grows, though the sequential searching works well in the simulations of Section 5. Issues with multiple testing is an important direction for future research. 13

14 5 Simulation Study 5.1 Sub- and Super-Gaussian Densities In this section, we evaluate the performance of the max-min estimator by performing simulations similar to Matteson and Tsay (2017) for the LNGCA model, and compare it to that of the max estimator using several functions including Skew, Kurt, JB, GPois, and Spline. Moreover, we elaborate on the implementation and performance measure of the LNGCA model. We generate the non-gaussian independent components S R n q from 18 distributions using rjordan in the R package ProDenICA (Hastie and Tibshirani, 2010) with sample size n and dimension q. See Figure 1 for the density functions of the 18 distributions. We also generate the Gaussian independent components N R n (p q) with sample size n and dimension p q. Then X = [S, N] are the underlying components of interest. We simulate a mixing matrix A R p p with condition number between 1 and 2 using mixmat in the R package ProDenICA (Hastie and Tibshirani, 2010) and obtain the observations Y = XA T, which are centered by their sample mean, then pre-whitened by their sample covariance to obtain uncorrelated observations Z = YĤT. Finally, we estimate ŴS and M S = Ŵ T S based on Z via the max estimator or the max-min estimator. Therefore, Z = XA T Ĥ T = X(ĤA)T, and we evaluate the estimation by comparing the estimated unmixing matrix Ŵ to the ground truth W 0 = (ĤA) 1 = A 1 Ĥ 1 = BĤ 1 with respect to S, i.e., comparing ŴS to WS 0 where W S 0 = BSĤ 1. The optimization problem associated with the max estimator in (3) and the max-min estimator in (4) is non-convex, which requires the initialization step and is sensitive to the initial point. Risk et al. (2014) demonstrated strong sensitivity to the initialization matrix in various ICA algorithms for the eighteen distributions considered in the experiments below. To mitigate the presence of local maximum, we explore two options, one with a single initial 14

15 point, and another with multiple initial points, where each initial point is generated by orthogonalizing matrices with random Gaussian elements. We suggest that the number of multiple initial points m should grow with the dimension p, e.g., m = p. Each method returns an estimate for the mixing matrix. To jointly measure the uncertainty associated with both pre-whitening observations and estimating non-gaussian components, we introduce an measure to evaluate the between ŴS and WS 0 as min Q P ± p p 1 pq W 0 S ŴSQ 2 F which is similar to the measures in Ilmonen et al. (2010), Risk et al. (2017), and Miettinen et al. (2017). The infimum above is taken such that the measure is invariant to the sign and order of components with respect to the ambiguities associated with the LNGCA model, and the optimal Q is solved by the Hungarian method (Papadimitriou and Steiglitz, 1982). We compare the max-min estimator to the max estimator with various distributions, dimensions of components, and functions in Experiment 1 and 2 below. Experiment 1 (Different distributions of components). We sample S from one of the 18 distributions with q = 2, p = 4, and n = See Figure 2 for the measures of 100 trials, with both multiple initial points (m = 4) and a single initial point (m = 1). For both multiple initial points and a single initial point, the measure of the maxmin estimator is much lower than that of the max estimator for most distributions and functions. Therefore, the max-min estimator improves the performance of estimation over the max estimator, no matter whether a single initial point or multiple initial points is used in optimization. For both the max-min estimator and max estimator, the measure with multiple initial points is much lower than that with a single initial point for most of the distributions and functions, which illustrates the advantage of using multiple initial points 15

16 over a single initial point. Moreover, the max-min estimator and multiple initial points turns out to be a powerful combination, since the measure of the max estimator with multiple initial points can be further reduced when replacing the max estimator with the max-min estimator. The measure of JB is much lower than that of Skew and Kurt for most of the distributions, which justifies the joint use of moments. In addition, GPois is equal and often better than other functions for all the distributions, especially with multiple initial points. Experiment 2 (Different dimensions of components). We sample S from q randomly selected distributions of the 18 distributions, with q {2, 4, 8, 16}, p = 2q, n = 500q. See Figure 3 for the measures of 100 trials, with both multiple initial points (m = p) and a single initial point (m = 1). As in the previous experiment, the max-min estimator improves the performance of estimation over the max estimator, where the measure with multiple initial points is much lower than that with a single initial point for most cases. In addition, GPois performs the best for q = 2, 4, 8, and JB and GPois perform similarly for q = 16 with the max-min estimator and multiple initial points. Since GPois turns out to be more robust to different distributions than Spline in the simulations, and it shares the same idea with Spline, we omit the results of Spline in the following simulation experiments and data examples. We compare the current method to the cumulative method for selecting q with various sample sizes of components, and functions using the max-min estimator in Experiment 3 below. Experiment 3 (Selecting q with varying n). We sample S from q randomly selected distributions of the 18 distributions, with q = 2, p = 4, n {2000, 4000, 8000}, B = 200. See 16

17 Table 2 and 3 for the empirical size and power of 100 trials, with significance level α = 5%, and both multiple initial points (m = 4) and a single initial point (m = 1). For both multiple initial points and a single initial point, the empirical power of the current method is much higher than that of the cumulative method, while both methods have empirical size around 5% or even lower, for all the sample sizes and functions. Hence, the current method outperforms the cumulative method in testing, no matter whether a single initial point or multiple initial points is used in optimization. For both the current method and cumulative method, the empirical size and power with multiple initial points are similar to those with a single initial point, for all the sample sizes and functions, which implies no remarkable effect in testing from using multiple initial points or a single initial point in estimation. This suggests that the estimate of the rank of the subspace is less sensitive to initialization than estimates of the individual components. The empirical power of JB is much higher than that of Skew and Kurt, for all the sample sizes, which justifies the joint use of moments. In addition, GPois outperforms the other functions, for all the sample sizes. 5.2 Image Data Fulfilling a task of unmixing vectorized images similar to Virta et al. (2016), we consider the three gray-scale images from the test images of Computer Vision Group at University of Granada, depicting a cameraman, a clock, and a leopard respectively. Each image is represented by a matrix, where each element indicates the intensity value of a pixel. Three noise images of the same size are simulated with independent standard Gaussian pixels. We standardize the six images such that the intensity values across all the pixels in each image have mean zero and unit variance. Then we vectorize each image into a vector of length 256 2, and combine the vectors from all six images as a matrix X, i.e., 17

18 p = 6, n = Thus, each row of X contains the intensity values of a single pixel across all images, and each column of X contains the intensity values of a single image. Then we simulate a mixing matrix A R p p using mixmat in the R package ProDenICA (Hastie and Tibshirani, 2010), and mix the six images to obtain the observations Y = XA T, which are centered by their sample mean, then pre-whitened by their sample covariance to get uncorrelated observations Z = YĤT. We aim to infer the number of true images, and then estimate the intensity values in them. First, we run the sequential test to infer the number of true images q with B = 200. See Table 1 for the p-values corresponding to each k with a single initial point (m = 1). Both the current method and cumulative method correctly select q = 3 with significance level α = 5%, for all the functions. Second, we estimate the intensity values Ŝ with q = 3 and multiple initial points (m = 3). See Figures 4 and 5 for the recovered images Ŝ and images Ŝ S, where the Euclidean norm of vectorized images is used to evaluate the accuracy of estimation. The max-min estimator outperforms the max estimator for Kurt, as the max-min estimator recovers the second image, while the second image recovered by the max estimator is masked by noise, and also the max-min estimator has much lower than the max-min estimator in term of the first image recovered, which illustrates the advantage of the max-min estimator over the max estimator, especially when the max estimator does not perform well. For the other functions, both the max-min estimator and max estimator nicely recover the true images. The estimation of JB is more accurate than that of Skew and Kurt, as its recovered images are mixed with less noise, indicated by both the estimated images and images. In addition, JB and GPois have similar performance, as JB achieves the lowest on the first image while GPois achieves the lowest on the second image. 18

19 6 EEG Data There are 24 subjects in the EEG data from the Human Ecology Department at Cornell University, where each subject receives 20 trials. In each trial, 128 EEG channels (3 unused) were collected with 1024 sample points for a few seconds. We study the first trial of the first subject. The data of interest is represented by a matrix, i.e., p = 125, n = Here, we estimate the number of non-gaussian signals and examine their time series. Since the max-min estimator and the current method with GPois perform the best in estimation and testing of the simulations, we only use the max-min estimator and the current method with GPois in this application. First, we conduct the sequential test to estimate the number of non-gaussian signals q with B = 200. Using the binary search for p = 125, we expect to have at most log = 7 tests. Hence, we correct the significance level α to 0.714% from the original level 5%. See Figure 6 for the test statistic values (empirical ) and critical values at significance level α {0.714%, 5%, 10%} (i.e., %, 95%, and 90% quantiles of (b) ˆD(X (k))) corresponding to k {63, 94, 110, 118, 114, 116, 115} chosen from the binary search with a single initial point (m = 1). The current method rejects the null hypothesis that there are exactly 114 components (p-value < corrected α) and fails to reject the null hypothesis that there are exactly 115 non-gaussian components (p-value > corrected α), thus selecting q = 115. We also iterate all k = 1,..., p and provide the complete testing results for reference. See Figure 7 for the test statistic values and critical values at significance level α {0.714%, 5%, 10%} corresponding to each k with a single initial point (m = 1). The dashed lines pinpoint where test statistic values meet with critical values, indicating that this component is assumed to be Gaussian because we cannot reject the null hypothesis. Second, we estimate the true signals Ŝ with q = 115 and multiple initial points (m = 100). See Figure 8 for the estimated signals Ŝ. The max-min estimator successfully extracts 19

20 meaningful first and second components, which may be artifacts related to eyeblinks in the middle and at the end of the trial. The 115th and 116th components are likely to be Gaussian, as they are on the boundary of the p-value = 0.714%. The 125th (last) component is fairly close to Gaussian, compared to the Gaussian noise we randomly generate with the same sample size as a reference distribution. 7 Conclusion In this paper, we study the LNGCA model as a generalization of the ICA model, which can have any number of non-gaussian components and Gaussian components, given that all components are mutually independent. Our contributions are the following: (1) We propose a new max-min estimator, maximizing the of each non- Gaussian component from Gaussianity and minimizing the of each Gaussian component from Gaussianity simultaneously. On the contrary, the existing max estimator only maximizes the of each non-gaussian component from Gaussianity, which has been used in the ICA model (Hastie and Tibshirani, 2003) and the LNGCA model (Risk et al., 2017). Our approach may seem unintuitive because the individual Gaussian components are not identifiable. However, the Gaussian subspace is identifiable, and joint estimation of the non-gaussian components and Gaussian components balances the non- Gaussian subspace with the Gaussian subspace. This helps shape the non-gaussian subspace, and thus improves the accuracy of estimating the non-gaussian components. (2) In practice, we need to choose the number of non-gaussian components. We introduce a sequence of statistical tests based on generating Gaussian components and ordering estimated components by empirical, which is computationally efficient with a binary search to reduce the actual number of tests. Two methods with different test statistics are proposed, where the current method considers the value of the component 20

21 under investigation, while the cumulative method considers the total value of all the components from the first one up to the one under investigation. Although our test shares some characteristics with that of Nordhausen et al. (2017), it has less computational burden with no bootstrap needed and is more flexible in choosing the test statistics. We evaluate the performance of our methods in simulations, demonstrating that the maxmin estimator outperforms the max estimator given the number of non-gaussian components for different functions, dimensions, and distributions of the components, no matter whether a single initial point or multiple initial points is used in optimization. When the number of non-gaussian components is unknown, our statistical test successfully finds the correct number with different functions, and sample sizes, where the current method is more powerful than the cumulative method. In the task of recovering true images from mixed image data, our test determines the correct number of true images, and we illustrate the advantage of the max-min estimator over the max estimator through some functions. Specifically, the max-min estimator nicely recovers the images while the max estimator fails using the same function, and the estimation of the max-min estimator is equal and sometimes lower than of the max estimator. In the task of exploring EEG data, our test finds a large number of non-gaussian signals, and it extracts two components as the first two non-gaussian components that may correspond to eye-blink artifacts. The distributions of estimated signals tend to become more Gaussian as their empirical values decrease. There are a large number of non-gaussian components in this data set. In data applications, applying a preliminary data reduction step using principal component analysis (PCA) would likely remove non-gaussian signals. This underscores the importance of a flexible estimation and testing procedure. There can be two directions for the future research. One is to look for a better way to address the multiple testing issue in searching a suitable q. Another one is to better 21

22 understand the improvements with the max-min estimator from a theoretical perspective. Our intuition is that the contributions of the non-gaussian components to the asymptotic variances would equal zero. Therefore, it would be great to gain additional insight into the statistical versus computational advantages of the max-min estimator. References F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of machine learning research, 3(Jul):1 48, D. M. Bean. Non-gaussian component analysis. PhD thesis, University of California, Berkeley, A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6): , G. Blanchard, M. Sugiyama, M. Kawanabe, V. Spokoiny, and K.-R. Müller. Non-gaussian component analysis: a semi-parametric framework for linear dimension reduction. In Advances in Neural Information Processing Systems, pages , J.-F. Cardoso. Source separation using higher order moments. In Acoustics, Speech, and Signal Processing, ICASSP-89., 1989 International Conference on, pages IEEE, J.-F. Cardoso and A. Souloumiac. Blind beamforming for non-gaussian signals. In IEE proceedings F (radar and signal processing), volume 140, pages IET, T. Hastie and R. Tibshirani. Independent components analysis through product density estimation. In Advances in neural information processing systems, pages , T. Hastie and R. Tibshirani. Prodenica: Product density estimation for ica using tilted gaussian density estimates. R package version, 1, A. Hyvärinen and E. Oja. A fast fixed-point algorithm for independent component analysis. Neural computation, 9(7): , A. Hyvärinen, J. Karhunen, and E. Oja. Independent component analysis, volume 46. John Wiley & Sons, P. Ilmonen, K. Nordhausen, H. Oja, and E. Ollila. A new performance index for ica: properties, computation and asymptotic analysis. Latent Variable Analysis and Signal Separation, pages ,

23 C. M. Jarque and A. K. Bera. A test for normality of observations and regression residuals. International Statistical Review/Revue Internationale de Statistique, pages , Z. Jin and D. S. Matteson. Independent component analysis via energy-based mutual dependence measures. Under review, M. Kawanabe, M. Sugiyama, G. Blanchard, and K.-R. Müller. A new algorithm of nongaussian component analysis with radial kernel functions. Annals of the Institute of Statistical Mathematics, 59(1):57 75, D. S. Matteson and R. S. Tsay. Independent component analysis via distance covariance. Journal of the American Statistical Association, 112(518): , J. Miettinen, K. Nordhausen, and S. Taskinen. Blind source separation based on joint diagonalization in r: The packages jade and bssasymp. Journal of Statistical Software, 76, K. Nordhausen, H. Oja, D. E. Tyler, and J. Virta. Asymptotic and bootstrap tests for the dimension of the non-gaussian subspace. IEEE Signal Processing Letters, 24(6): , C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity. Prentice-Hall, Inc., B. B. Risk, D. S. Matteson, D. Ruppert, A. Eloyan, and B. S. Caffo. An evaluation of independent component analyses with an application to resting-state fmri. Biometrics, 70 (1): , B. B. Risk, D. S. Matteson, and D. Ruppert. Linear non-gaussian component analysis via maximum likelihood. Journal of the American Statistical Association, To appear. H. Sasaki, G. Niu, and M. Sugiyama. Non-gaussian component analysis with log-density gradient estimation. In Artificial Intelligence and Statistics, pages , H. Shiino, H. Sasaki, G. Niu, and M. Sugiyama. Whitening-free least-squares non-gaussian component analysis. arxiv preprint arxiv: , F. J. Theis, M. Kawanabe, and K.-R. Muller. Uniqueness of non-gaussianity-based dimension reduction. IEEE Transactions on signal processing, 59(9): , J. Virta, K. Nordhausen, and H. Oja. Joint use of third and fourth cumulants in independent component analysis. arxiv preprint arxiv: , J. Virta, K. Nordhausen, and H. Oja. Projection pursuit for non-gaussian independent components. arxiv preprint arxiv: ,

24 a b c d e f g h i j k l m n o p q r Figure 1: Density plots of the 18 distributions from rjordan in the R package ProDenICA. Table 1: p-values of both current method and cumulative method with q = 3, p = 6, n = 256 2, B = 200, α = 5%, and a single initial point (m = 1) in testing for the image data. Discrepancy Method k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative

25 a 0.3 b 0.6 c d 0.5 e f g h i j k l m n o p q r single + max single + maxmin multi + max multi + maxmin estimation Figure 2: Error measures of both max estimator and max-min estimator with q = 2, p = 4, n = 1000, 100 trials, and both multiple initial points (m = 4) and a single initial point (m = 1) in Experiment 1. 25

26 q = 2 q = q = q = 16 estimation single + max single + maxmin multi + max multi + maxmin Figure 3: Error measures of both max estimator and max-min estimator with p = 2q, n = 500q, 100 trials, and both multiple initial points (m = p) and a single initial point (m = 1) in Experiment 2. 26

27 Table 2: Empirical size and power of both current method and cumulative method with q = 2, p = 4, B = 200, 100 trials, α = 5%, and a single initial point in Experiment 3. n Discrepancy Method power size k = 1 k = 2 k = 3 k = 4 Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative

28 Table 3: Empirical size and power of both current method and cumulative method with q = 2, p = 4, B = 200, 100 trials, α = 5%, and multiple initial points in Experiment 3. n Discrepancy Method power size k = 1 k = 2 k = 3 k = 4 Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative Skew current cumulative Kurt current cumulative JB current cumulative GPois current cumulative

29 Figure 4: Recovered images of both max estimator and max-min estimator with q = 3, p = 6, n = 2562, and multiple initial points (m = 3) in estimation for the image data. Each value on title is the Euclidean norm of the vectorized image corresponding to the recovered image. We apply a signed permutation to the images and modify the gray scales for illustration purpose. 29

30 Figure 5: Error images of both max estimator and max-min estimator with q = 3, p = 6, n = 256 2, and multiple initial points (m = 3) in estimation for the image data. Each value on title is the Euclidean norm of the vectorized image. We apply a signed permutation to the images and modify the gray scales for illustration purpose. 30

31 value test stat 10% crit 5% crit 0.714% crit k Figure 6: Test statistics and critical values of current method for testing k from binary search with p = 125, n = 1024, B = 200, and a single initial point (m = 1) in testing for the EEG data. 31

32 value test stat 10% crit 5% crit 0.714% crit k value test stat 10% crit 5% crit 0.714% crit k Figure 7: Test statistics and critical values of current method for testing all k with p = 125, n = 1024, B = 200, and a single initial point (m = 1) in testing for the EEG data. 32

33 comp 1 (GPois = ) comp 1 (GPois = ) Frequency value value time comp 2 (GPois = ) comp 2 (GPois = ) Frequency value value time comp 115 (GPois = 0.056) comp 115 (GPois = 0.056) Frequency value value time comp 116 (GPois = ) comp 116 (GPois = ) Frequency value value time comp 125 (GPois = 0.024) comp 125 (GPois = 0.024) Frequency value value time random noise (GPois = ) random noise (GPois = ) Frequency value value time Figure 8: Estimated signals of max-min estimator with q = 115, p = 125, n = 1024, and multiple initial points (m = 100) in estimation for the EEG data. 33

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Package ProDenICA. February 19, 2015

Package ProDenICA. February 19, 2015 Type Package Package ProDenICA February 19, 2015 Title Product Density Estimation for ICA using tilted Gaussian density estimates Version 1.0 Date 2010-04-19 Author Trevor Hastie, Rob Tibshirani Maintainer

More information

Independent Component Analysis

Independent Component Analysis 1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S

More information

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA) Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata ' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

A more efficient second order blind identification method for separation of uncorrelated stationary time series

A more efficient second order blind identification method for separation of uncorrelated stationary time series A more efficient second order blind identification method for separation of uncorrelated stationary time series Sara Taskinen 1a, Jari Miettinen a, Klaus Nordhausen b,c a Department of Mathematics and

More information

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Email: brao@ucsdedu References 1 Hyvarinen, A, Karhunen, J, & Oja, E (2004) Independent component analysis (Vol 46)

More information

Scatter Matrices and Independent Component Analysis

Scatter Matrices and Independent Component Analysis AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 175 189 Scatter Matrices and Independent Component Analysis Hannu Oja 1, Seija Sirkiä 2, and Jan Eriksson 3 1 University of Tampere, Finland

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis with Some Recent Advances Aapo Hyvärinen Dept of Computer Science Dept of Mathematics and Statistics University of Helsinki Problem of blind source

More information

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Estimation of linear non-gaussian acyclic models for latent factors

Estimation of linear non-gaussian acyclic models for latent factors Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka

More information

ICA and ISA Using Schweizer-Wolff Measure of Dependence

ICA and ISA Using Schweizer-Wolff Measure of Dependence Keywords: independent component analysis, independent subspace analysis, copula, non-parametric estimation of dependence Abstract We propose a new algorithm for independent component and independent subspace

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

The squared symmetric FastICA estimator

The squared symmetric FastICA estimator he squared symmetric FastICA estimator Jari Miettinen, Klaus Nordhausen, Hannu Oja, Sara askinen and Joni Virta arxiv:.0v [math.s] Dec 0 Abstract In this paper we study the theoretical properties of the

More information

Independent component analysis for functional data

Independent component analysis for functional data Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis James V. Stone November 4, 24 Sheffield University, Sheffield, UK Keywords: independent component analysis, independence, blind source separation, projection pursuit, complexity

More information

Non-Gaussian Component Analysis with Log-Density Gradient Estimation

Non-Gaussian Component Analysis with Log-Density Gradient Estimation Non-Gaussian Component Analysis with Log-Density Gradient Estimation Hiroaki Sasaki Gang Niu Masashi Sugiyama Grad. School of Info. Sci. Nara Institute of Sci. & Tech. Nara, Japan hsasaki@is.naist.jp Grad.

More information

Package BSSasymp. R topics documented: September 12, Type Package

Package BSSasymp. R topics documented: September 12, Type Package Type Package Package BSSasymp September 12, 2017 Title Asymptotic Covariance Matrices of Some BSS Mixing and Unmixing Matrix Estimates Version 1.2-1 Date 2017-09-11 Author Jari Miettinen, Klaus Nordhausen,

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis' Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis' Lester'Mackey' May'7,'2014' ' Stats'306B:'Unsupervised'Learning' Beyond'linearity'in'state'space'modeling' Credit:'Alex'Simma'

More information

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,

More information

Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis. PhD Seminar Jörgen Ungh Independent Component Analysis PhD Seminar Jörgen Ungh Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples Background & motivation The cocktail party problem Bla bla

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Blind Source Separation Using Artificial immune system

Blind Source Separation Using Artificial immune system American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-02, pp-240-247 www.ajer.org Research Paper Open Access Blind Source Separation Using Artificial immune

More information

Blind Machine Separation Te-Won Lee

Blind Machine Separation Te-Won Lee Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

On the Estimation of Entropy in the FastICA Algorithm

On the Estimation of Entropy in the FastICA Algorithm 1 On the Estimation of Entropy in the FastICA Algorithm Abstract The fastica algorithm is a popular dimension reduction technique used to reveal patterns in data. Here we show that the approximations used

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Independent component analysis: algorithms and applications

Independent component analysis: algorithms and applications PERGAMON Neural Networks 13 (2000) 411 430 Invited article Independent component analysis: algorithms and applications A. Hyvärinen, E. Oja* Neural Networks Research Centre, Helsinki University of Technology,

More information

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Non-Euclidean Independent Component Analysis and Oja's Learning

Non-Euclidean Independent Component Analysis and Oja's Learning Non-Euclidean Independent Component Analysis and Oja's Learning M. Lange 1, M. Biehl 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics Mittweida, Saxonia - Germany 2-

More information

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Blind Source Separation (BSS) and Independent Componen Analysis (ICA) Massoud BABAIE-ZADEH Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39 Outline Part I Part II Introduction

More information

Deflation-based separation of uncorrelated stationary time series

Deflation-based separation of uncorrelated stationary time series Deflation-based separation of uncorrelated stationary time series Jari Miettinen a,, Klaus Nordhausen b, Hannu Oja c, Sara Taskinen a a Department of Mathematics and Statistics, 40014 University of Jyväskylä,

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Independent Component Analysis and Blind Source Separation

Independent Component Analysis and Blind Source Separation Independent Component Analysis and Blind Source Separation Aapo Hyvärinen University of Helsinki and Helsinki Institute of Information Technology 1 Blind source separation Four source signals : 1.5 2 3

More information

Independent Component Analysis of Rock Magnetic Measurements

Independent Component Analysis of Rock Magnetic Measurements Independent Component Analysis of Rock Magnetic Measurements Norbert Marwan March 18, 23 Title Today I will not talk about recurrence plots. Marco and Mamen will talk about them later. Moreover, for the

More information

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function Australian Journal of Basic and Applied Sciences, 5(9): 2152-2156, 211 ISSN 1991-8178 Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function 1 Tahir Ahmad, 2 Hjh.Norma

More information

Blind separation of instantaneous mixtures of dependent sources

Blind separation of instantaneous mixtures of dependent sources Blind separation of instantaneous mixtures of dependent sources Marc Castella and Pierre Comon GET/INT, UMR-CNRS 7, 9 rue Charles Fourier, 9 Évry Cedex, France marc.castella@int-evry.fr, CNRS, I3S, UMR

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

An Introduction to Independent Components Analysis (ICA)

An Introduction to Independent Components Analysis (ICA) An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Comparative Analysis of ICA Based Features

Comparative Analysis of ICA Based Features International Journal of Emerging Engineering Research and Technology Volume 2, Issue 7, October 2014, PP 267-273 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Comparative Analysis of ICA Based Features

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

One-unit Learning Rules for Independent Component Analysis

One-unit Learning Rules for Independent Component Analysis One-unit Learning Rules for Independent Component Analysis Aapo Hyvarinen and Erkki Oja Helsinki University of Technology Laboratory of Computer and Information Science Rakentajanaukio 2 C, FIN-02150 Espoo,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA 16 1 (Independent Component Analysis: ICA) 198 9 ICA ICA ICA 1 ICA 198 Jutten Herault Comon[3], Amari & Cardoso[4] ICA Comon (PCA) projection persuit projection persuit ICA ICA ICA 1 [1] [] ICA ICA EEG

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

ICA. Independent Component Analysis. Zakariás Mátyás

ICA. Independent Component Analysis. Zakariás Mátyás ICA Independent Component Analysis Zakariás Mátyás Contents Definitions Introduction History Algorithms Code Uses of ICA Definitions ICA Miture Separation Signals typical signals Multivariate statistics

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Combining EMD with ICA to analyze combined EEG-fMRI Data

Combining EMD with ICA to analyze combined EEG-fMRI Data AL-BADDAI, AL-SUBARI, et al.: COMBINED BEEMD-ICA 1 Combining EMD with ICA to analyze combined EEG-fMRI Data Saad M. H. Al-Baddai 1,2 saad.albaddai@yahoo.com arema S. A. Al-Subari 1,2 s.karema@yahoo.com

More information

APPLICATION OF INDEPENDENT COMPONENT ANALYSIS TO CHEMICAL REACTIONS. S.Triadaphillou, A. J. Morris and E. B. Martin

APPLICATION OF INDEPENDENT COMPONENT ANALYSIS TO CHEMICAL REACTIONS. S.Triadaphillou, A. J. Morris and E. B. Martin APPLICAION OF INDEPENDEN COMPONEN ANALYSIS O CHEMICAL REACIONS S.riadaphillou, A. J. Morris and E. B. Martin Centre for Process Analytics and Control echnology School of Chemical Engineering and Advanced

More information

Independent Component Analysis (ICA)

Independent Component Analysis (ICA) Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Statistical Analysis of fmrl Data

Statistical Analysis of fmrl Data Statistical Analysis of fmrl Data F. Gregory Ashby The MIT Press Cambridge, Massachusetts London, England Preface xi Acronyms xv 1 Introduction 1 What Is fmri? 2 The Scanning Session 4 Experimental Design

More information

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis vs. Independent Component Analysis for Damage Detection 6th European Workshop on Structural Health Monitoring - Fr..D.4 Principal Component Analysis vs. Independent Component Analysis for Damage Detection D. A. TIBADUIZA, L. E. MUJICA, M. ANAYA, J. RODELLAR

More information

Package steadyica. November 11, 2015

Package steadyica. November 11, 2015 Type Package Package steadyica November 11, 2015 Title ICA and Tests of Independence via Multivariate Distance Covariance Version 1.0 Date 2015-11-08 Author Benjamin B. Risk and Nicholas A. James and David

More information

Different Estimation Methods for the Basic Independent Component Analysis Model

Different Estimation Methods for the Basic Independent Component Analysis Model Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Winter 12-2018 Different Estimation Methods for the Basic Independent

More information

Independent Component Analysis

Independent Component Analysis 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 1 Introduction Indepent

More information

FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE

FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE Yoshinori Sakaguchi*, Seiichi Ozawa*, and Manabu Kotani** *Graduate School of Science and Technology, Kobe

More information

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures. Undercomplete Independent Component Analysis for Signal Separation and Dimension Reduction John Porrill and James V Stone Psychology Department, Sheeld University, Sheeld, S10 2UR, England. Tel: 0114 222

More information

ANALYSING ICA COMPONENTS BY INJECTING NOISE. August-Bebel-Strasse 89, Potsdam, Germany

ANALYSING ICA COMPONENTS BY INJECTING NOISE. August-Bebel-Strasse 89, Potsdam, Germany ANALYSING ICA COMPONENTS BY INJECTING NOISE Stefan Harmeling, Frank Meinecke, and Klaus-Robert Müller, Fraunhofer FIRST.IDA, Kekuléstrasse, 9 Berlin, Germany University of Potsdam, Department of Computer

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

On Independent Component Analysis

On Independent Component Analysis On Independent Component Analysis Université libre de Bruxelles European Centre for Advanced Research in Economics and Statistics (ECARES) Solvay Brussels School of Economics and Management Symmetric Outline

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Blind separation of sources that have spatiotemporal variance dependencies

Blind separation of sources that have spatiotemporal variance dependencies Blind separation of sources that have spatiotemporal variance dependencies Aapo Hyvärinen a b Jarmo Hurri a a Neural Networks Research Centre, Helsinki University of Technology, Finland b Helsinki Institute

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information