INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal Densities 0.0 0.2 0.4 0.6 0.8 1.0 4 2 0 2 4 6 S^ 1 1 0 1 2 3 4 5 s Richar Samworth, University of Cambrige Joint work with Ming Yuan

What are ICA moels? ICA is a special case of a blin source separation problem, where from a set of mixe signals, we aim to infer both the source signals an mixing process; e.g. cocktail party problem. It was pioneere by Comon (1994), an has become enormously popular in signal processing, machine learning, meical imaging...

Mathematical efinition In the simplest, noiseless case, we observe replicates x 1,...,x n of X = A S, 1 1 where the mixing matrix A is invertible an S has inepenent components. Our main aim is to estimate the unmixing matrix W = A 1 ; estimation of marginals P 1,...,P of S = (S 1,...,S ) is a seconary goal. This semiparametric moel is therefore relate to PCA.

Different previous approaches Postulate parametric family for marginals P 1,...,P ; optimise contrast function involving (W,P 1,...,P ). Contrast usually represents mutual information or maximum entropy; or non-gaussianity (Eriksson et al., 2000, Karvanen et al., 2000). Postulate smooth (log) ensities for marginals (Bach an Joran, 2002; Hastie an Tibshirani, 2003; Samarov an Tsybakov, 2004, Chen an Bickel, 2006).

Our approach S. an Yuan (2012) To avoi assumptions of existence of ensities, an choice of tuning parameters, we propose to maximise the log-likelihoo log et W + 1 n n i=1 log f j (wj T x i ) j=1 over all non-singular matrices W = (w 1,...,w ) T, an univariate log-concave ensities f 1,...,f. To unerstan how this works, we nee to unerstan log-concave ICA projections.

Notation Let P k be the set of probability istributions P on R k with R k x P(x) < an P(H) < 1 for all hyperplanes H. Let F k be the set of upper semi-continuous log-concave ensities on R k. The conition P P is necessary an sufficient for the existence of a unique log-concave projection ψ : P F given by ψ (P) = argmax log f P. f F R (Cule, S. an Stewart, 2010; Cule an S., 2010; Dümbgen, S., Schuhmacher, 2011).

ICA notation Let W be the set of invertible matrices. The ICA moel P ICA consists of those P P with P(B) = j=1 P j (w T j B), Borel B, for some W W an P 1,...,P P 1. The log-concave ICA moel F ICA consists of f F with f(x) = et W j=1 f j (w T j x) with W W,f 1,...,f F 1. If X has ensity f F ICA, then w T j X has ensity f j.

Log-concave ICA projections Let ψ (P) = argmax f F ICA We also write L (P) = sup f F ICA log f P. R R log f P. The conition P P is necessary an sufficient for L (P) R an then ψ (P) efines a non-empty, proper subset of F ICA.

An example Suppose P is the uniform istribution on the unit Eucliean isk in R 2. Then ψ (P) consists of those f F ICA represente by an arbitrary W W an that can be f 1 (x) = f 2 (x) = 2 π (1 x2 ) 1/2 ½ {x [ 1,1]}.

Schematic picture of maps P P ICA ψ ψ ց ψ P ICA F F ICA

Log-concave ICA projection on P ICA If P P ICA, then ψ (P) efines a unique element of F ICA. The map ψ P ICA suppose that P P ICA coincies with ψ P ICA. Moreover,, so that P(B) = j=1 P j (w T j B), Borel B, for some W W an P 1,...,P P 1. Then f (x) := ψ (P)(x) = et W fj (wj T x), where f j = ψ (P j ). j=1

Ientifiability Comon (1994), Eriksson an Koivunen (2004) Suppose a probability measure P on R satisfies P(B) = j=1 P j (w T j B) = j=1 P j ( w T j B) Borel B, where W, W W an P 1,...,P, P 1,..., P are probability measures on R. Then there exists a permutation π an scaling vector ǫ (R \ {0}) such that P j (B j ) = P π(j) (ǫ j B j ) an w j = ǫ 1 j w π(j) iff none of P 1,...,P is a Dirac mass an not more than one of them is Gaussian. Consequence: If P P ICA, then ψ (P) is ientifiable iff P is ientifiable.

Convergence Suppose that P,P 1,P 2,... P satisfy (P n,p) 0, where enotes Wasserstein istance. Then sup inf f n f 0. f n ψ (P n ) f ψ (P) R If P P ICA is ientifiable an (W,P 1,...,P ) ICA P, then sup sup f n ψ (P n ) { (ǫ n j ) 1 wπ n n (j) w j + (W n,f1 n,...,fn )ICA f n inf ǫ n 1,...,ǫn R\{0} inf π n Π ǫ n j fπ n n (j) (ǫn j x) fj (x) } x 0, for each j = 1,...,, where f j = ψ (P j ). Consequently, for large n, every f n ψ (P n ) is ientifiable.

Estimation proceure Now suppose (W 0,P1 0,...,P 0) ICA P 0 P ICA, an we ii have ata x 1,...,x n P 0 with n + 1. We propose to estimate P 0 by ψ ( ˆP n ), where ˆP n is the empirical istribution of the ata. That is, we maximise l n (W,f 1,...,f ) = log et W + 1 n over W W an f 1,...,f F 1. n i=1 log f j (wj T x i ) j=1

Consistency Suppose P 0 is ientifiable. For any maximiser (Ŵ n, ˆf 1 n,..., ˆf n) of ln (W,f 1,...,f ), there exist ˆπ n Π an ˆǫ n 1,...,ˆǫn R \ {0} such that (ˆǫ n j ) 1 ŵ ṋ π n (j) a.s. w 0 j an ˆǫ n j ˆf π ṋ n (j) (ˆǫn j x) fj (x) x a.s. 0, for j = 1,...,, where f j = ψ (P 0 j ).

Pre-whitening Pre-whitening is a stanar pre-processing step in ICA algorithms to improve stability. We replace the ata with z 1 = ˆΣ 1/2 x 1,...,z n = ˆΣ 1/2 x n, an maximise the log-likelihoo over O O() an g 1,...,g F 1. If (Ôn,ĝ1 n,...,ĝn ) is a maximiser, we then set Ŵ n = ÔnˆΣ 1/2 an ˆfn j = ĝj n. Thus to estimate the 2 parameters of W 0, we first estimate the ( + 1)/2 free parameters of Σ, then maximise over the ( 1)/2 free parameters of O.

Equivalence of pre-whitene algorithm Suppose P 0 is ientifiable an R x 2 P 0 (x) <. With probability 1 for large n, a maximiser (Ŵ n, ˆfn 1,..., ˆfn ) of l n (W,f 1,...,f ) over W O()ˆΣ 1/2 an f 1,...,f F 1 exists. For any such maximiser, there exist ˆπ n Π an ˆǫ n 1,...,ˆǫ n R \ {0} such that (ˆǫ n j ) 1 ŵ ṋˆπ n (j) a.s. w 0 j an ˆǫ n j ˆfṋ ˆπ n (j) (ˆǫ n j x) fj (x) x a.s. 0, where f j = ψ (P 0 j ).

Computational algorithm With (pre-whitene) ata x 1,...,x n, consier maximising l n (W,f 1,...,f ) over W O() an f 1,...,f F 1. (1) Initialise W accoring to Haar measure on O() (2) For j = 1,...,, upate f j with the log-concave MLE of w T j x 1,...,w T j x n (Dümbgen an Rufibach, 2011) (3) Upate W using projecte graient step (4) Repeat (2) an (3) until negligible relative change in log-likelihoo.

Projecte graient step The set SO() is a ( 1)/2-imensional Riemannian submanifol of R 2. The tangent space at W SO() is T W SO() := {WY : Y = Y T }. The unique geoesic passing through W SO() with tangent vector WY (where Y = Y T ) is the map α : [0,1] SO() given by α(t) = W exp(ty ), where exp is the usual matrix exponential.

Projecte graient step 2 On [min(w T j x 1,...,w T j x n),max(w T j x 1,...,w T j x n)], we have log f j (x) = min (b jk x β jk ). k=1,...,m j For 1 < s < r <, let Y r,s enote the matrix with Y r,s (r,s) = 1/ 2, Y r,s (s,r) = 1/ 2 an zero otherwise. Then Y + = {Y r,s : 1 < s < r < } forms an o.n.b. for the skew-symmetric matrices. Let Y = { Y : Y Y + }. Choose Y max Y + Y to maximise the one-sie irectional erivative WY g(w), where g(w) = 1 n n i=1 j=1 min (b jk wj T x i β jk ). k=1,...,m j

Exp(1)-1 4 2 0 2 4 6 S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 Truth Rotate 4 2 0 2 4 6 S 1 X 1 4 2 0 2 4 6 S^ 2 2 1 0 1 2 3 4 Marginal Densities 0.0 0.2 0.4 0.6 0.8 1.0 Reconstructe 1 0 1 2 3 4 5 S^ 1 s

0.7N( 0.9, 1) + 0.3N(2.1, 1) Truth Rotate S 2 4 2 0 2 4 X 2 4 2 0 2 4 6 4 2 0 2 4 6 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 4 2 0 2 4 Marginal Densities 0.00 0.10 0.20 0.30 6 4 2 0 2 4 6 S^ 1 4 2 0 2 4 s

Performance comparison LogConICA FastICA ProDenICA Amari Metric 0.0 0.2 0.4 0.6 0.8 1.0 Amari Metric 0.0 0.2 0.4 0.6 0.8 1.0 Amari Metric 0.0 0.2 0.4 0.6 0.8 1.0 Uniform Exponential t 2 LogConICA FastICA ProDenICA LogConICA FastICA ProDenICA LogConICA FastICA ProDenICA Amari Metric 0.0 0.2 0.4 0.6 0.8 Amari Metric 0.0 0.2 0.4 0.6 0.8 1.0 Mixture of Normal Binomial LogConICA FastICA ProDenICA

References Bach, F., Joran, M. I. (2002) Kernel inepenent component analysis. Journal of Machine Learning Research, 3, 1 48. Chen, A. an Bickel, P. J. (2006) Efficient inepenent component analysis, The Annals of Statistics, 34, 2825 2855. Comon, P. (1994) Inepenent component analysis, A new concept? Signal Proc., 36, 287 314. Cule, M., Samworth, R. (2010) Theoretical properties of the log-concave maximum likelihoo estimator of a multiimensional ensity. Electron. J. Stat., 4, 254-270. Cule, M., Samworth, R. an Stewart, M. (2010), Maximum likelihoo estimation of a multi-imensional log-concave ensity, J. Roy. Statist. Soc., Ser. B. (with iscussion), 72, 545-607.

Dümbgen, L. an Rufibach, K. (2011) logconens: Computations Relate to Univariate Log-Concave Density Estimation. J. Statist. Software, 39, 1 28. Dümbgen, L., Samworth, R. an Schuhmacher, D. (2011) Approximation by log-concave istributions, with applications to regression. Ann. Statist., 39, 702 730. Eriksson, J. an Koivunen, V. (2004) Ientifiability, separability an uniqueness of linear ICA moels. IEEE Signal Processing Letters, 11, 601 604. Hastie, T. an Tibshirani, R. (2003) Inepenent component analysis through prouct ensity estimation. In Avances in Neural Information Processing Systems 15 (Becker, S. an Obermayer, K., es), MIT Press, Cambrige, MA. pp 649 656. Hastie, T. an Tibshirani, R. (2003) ProDenICA: Prouct Density Estimation for ICA using tilte Gaussian ensity estimates. R package version 1.0. http://cran.r-project.org/web/packages/prodenica/. Samarov, A. an Tsybakov, A. (2004), Nonparametric inepenent component analysis. Bernoulli,

10, 565 582. Samworth, R. J. an Yuan, M. (2012) Inepenent component analysis via nonparametric maximum likelihoo estimation. http://arxiv.org/abs/1206.0457.