INDEPENDENT COMPONENT ANALYSIS VIA

Size: px

Start display at page:

Download "INDEPENDENT COMPONENT ANALYSIS VIA"

James Hopkins
6 years ago
Views:

1 INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S X S 1 X 1 Reconstructe S^ Marginal Densities S^ s Richar Samworth, University of Cambrige Joint work with Ming Yuan

2 What are ICA moels? ICA is a special case of a blin source separation problem, where from a set of mixe signals, we aim to infer both the source signals an mixing process; e.g. cocktail party problem. It was pioneere by Comon (1994), an has become enormously popular in signal processing, machine learning, meical imaging...

3 Mathematical efinition In the simplest, noiseless case, we observe replicates x 1,...,x n of X = A S, 1 1 where the mixing matrix A is invertible an S has inepenent components. Our main aim is to estimate the unmixing matrix W = A 1 ; estimation of marginals P 1,...,P of S = (S 1,...,S ) is a seconary goal. This semiparametric moel is therefore relate to PCA.

4 Different previous approaches Postulate parametric family for marginals P 1,...,P ; optimise contrast function involving (W,P 1,...,P ). Contrast usually represents mutual information or maximum entropy; or non-gaussianity (Eriksson et al., 2000, Karvanen et al., 2000). Postulate smooth (log) ensities for marginals (Bach an Joran, 2002; Hastie an Tibshirani, 2003; Samarov an Tsybakov, 2004, Chen an Bickel, 2006).

5 Our approach S. an Yuan (2012) To avoi assumptions of existence of ensities, an choice of tuning parameters, we propose to maximise the log-likelihoo log et W + 1 n n i=1 log f j (wj T x i ) j=1 over all non-singular matrices W = (w 1,...,w ) T, an univariate log-concave ensities f 1,...,f. To unerstan how this works, we nee to unerstan log-concave ICA projections.

6 Notation Let P k be the set of probability istributions P on R k with R k x P(x) < an P(H) < 1 for all hyperplanes H. Let F k be the set of upper semi-continuous log-concave ensities on R k. The conition P P is necessary an sufficient for the existence of a unique log-concave projection ψ : P F given by ψ (P) = argmax log f P. f F R (Cule, S. an Stewart, 2010; Cule an S., 2010; Dümbgen, S., Schuhmacher, 2011).

7 ICA notation Let W be the set of invertible matrices. The ICA moel P ICA consists of those P P with P(B) = j=1 P j (w T j B), Borel B, for some W W an P 1,...,P P 1. The log-concave ICA moel F ICA consists of f F with f(x) = et W j=1 f j (w T j x) with W W,f 1,...,f F 1. If X has ensity f F ICA, then w T j X has ensity f j.

8 Log-concave ICA projections Let ψ (P) = argmax f F ICA We also write L (P) = sup f F ICA log f P. R R log f P. The conition P P is necessary an sufficient for L (P) R an then ψ (P) efines a non-empty, proper subset of F ICA.

9 An example Suppose P is the uniform istribution on the unit Eucliean isk in R 2. Then ψ (P) consists of those f F ICA represente by an arbitrary W W an that can be f 1 (x) = f 2 (x) = 2 π (1 x2 ) 1/2 ½ {x [ 1,1]}.

10 Schematic picture of maps P P ICA ψ ψ ց ψ P ICA F F ICA

11 Log-concave ICA projection on P ICA If P P ICA, then ψ (P) efines a unique element of F ICA. The map ψ P ICA suppose that P P ICA coincies with ψ P ICA. Moreover,, so that P(B) = j=1 P j (w T j B), Borel B, for some W W an P 1,...,P P 1. Then f (x) := ψ (P)(x) = et W fj (wj T x), where f j = ψ (P j ). j=1

12 Ientifiability Comon (1994), Eriksson an Koivunen (2004) Suppose a probability measure P on R satisfies P(B) = j=1 P j (w T j B) = j=1 P j ( w T j B) Borel B, where W, W W an P 1,...,P, P 1,..., P are probability measures on R. Then there exists a permutation π an scaling vector ǫ (R \ {0}) such that P j (B j ) = P π(j) (ǫ j B j ) an w j = ǫ 1 j w π(j) iff none of P 1,...,P is a Dirac mass an not more than one of them is Gaussian. Consequence: If P P ICA, then ψ (P) is ientifiable iff P is ientifiable.

13 Convergence Suppose that P,P 1,P 2,... P satisfy (P n,p) 0, where enotes Wasserstein istance. Then sup inf f n f 0. f n ψ (P n ) f ψ (P) R If P P ICA is ientifiable an (W,P 1,...,P ) ICA P, then sup sup f n ψ (P n ) { (ǫ n j ) 1 wπ n n (j) w j + (W n,f1 n,...,fn )ICA f n inf ǫ n 1,...,ǫn R\{0} inf π n Π ǫ n j fπ n n (j) (ǫn j x) fj (x) } x 0, for each j = 1,...,, where f j = ψ (P j ). Consequently, for large n, every f n ψ (P n ) is ientifiable.

14 Estimation proceure Now suppose (W 0,P1 0,...,P 0) ICA P 0 P ICA, an we ii have ata x 1,...,x n P 0 with n + 1. We propose to estimate P 0 by ψ ( ˆP n ), where ˆP n is the empirical istribution of the ata. That is, we maximise l n (W,f 1,...,f ) = log et W + 1 n over W W an f 1,...,f F 1. n i=1 log f j (wj T x i ) j=1

15 Consistency Suppose P 0 is ientifiable. For any maximiser (Ŵ n, ˆf 1 n,..., ˆf n) of ln (W,f 1,...,f ), there exist ˆπ n Π an ˆǫ n 1,...,ˆǫn R \ {0} such that (ˆǫ n j ) 1 ŵ ṋ π n (j) a.s. w 0 j an ˆǫ n j ˆf π ṋ n (j) (ˆǫn j x) fj (x) x a.s. 0, for j = 1,...,, where f j = ψ (P 0 j ).

16 Pre-whitening Pre-whitening is a stanar pre-processing step in ICA algorithms to improve stability. We replace the ata with z 1 = ˆΣ 1/2 x 1,...,z n = ˆΣ 1/2 x n, an maximise the log-likelihoo over O O() an g 1,...,g F 1. If (Ôn,ĝ1 n,...,ĝn ) is a maximiser, we then set Ŵ n = ÔnˆΣ 1/2 an ˆfn j = ĝj n. Thus to estimate the 2 parameters of W 0, we first estimate the ( + 1)/2 free parameters of Σ, then maximise over the ( 1)/2 free parameters of O.

17 Equivalence of pre-whitene algorithm Suppose P 0 is ientifiable an R x 2 P 0 (x) <. With probability 1 for large n, a maximiser (Ŵ n, ˆfn 1,..., ˆfn ) of l n (W,f 1,...,f ) over W O()ˆΣ 1/2 an f 1,...,f F 1 exists. For any such maximiser, there exist ˆπ n Π an ˆǫ n 1,...,ˆǫ n R \ {0} such that (ˆǫ n j ) 1 ŵ ṋˆπ n (j) a.s. w 0 j an ˆǫ n j ˆfṋ ˆπ n (j) (ˆǫ n j x) fj (x) x a.s. 0, where f j = ψ (P 0 j ).

18 Computational algorithm With (pre-whitene) ata x 1,...,x n, consier maximising l n (W,f 1,...,f ) over W O() an f 1,...,f F 1. (1) Initialise W accoring to Haar measure on O() (2) For j = 1,...,, upate f j with the log-concave MLE of w T j x 1,...,w T j x n (Dümbgen an Rufibach, 2011) (3) Upate W using projecte graient step (4) Repeat (2) an (3) until negligible relative change in log-likelihoo.

19 Projecte graient step The set SO() is a ( 1)/2-imensional Riemannian submanifol of R 2. The tangent space at W SO() is T W SO() := {WY : Y = Y T }. The unique geoesic passing through W SO() with tangent vector WY (where Y = Y T ) is the map α : [0,1] SO() given by α(t) = W exp(ty ), where exp is the usual matrix exponential.

20 Projecte graient step 2 On [min(w T j x 1,...,w T j x n),max(w T j x 1,...,w T j x n)], we have log f j (x) = min (b jk x β jk ). k=1,...,m j For 1 < s < r <, let Y r,s enote the matrix with Y r,s (r,s) = 1/ 2, Y r,s (s,r) = 1/ 2 an zero otherwise. Then Y + = {Y r,s : 1 < s < r < } forms an o.n.b. for the skew-symmetric matrices. Let Y = { Y : Y Y + }. Choose Y max Y + Y to maximise the one-sie irectional erivative WY g(w), where g(w) = 1 n n i=1 j=1 min (b jk wj T x i β jk ). k=1,...,m j

21 Exp(1) S X Truth Rotate S 1 X S^ Marginal Densities Reconstructe S^ 1 s

22 0.7N( 0.9, 1) + 0.3N(2.1, 1) Truth Rotate S X S 1 X 1 Reconstructe S^ Marginal Densities S^ s

23 Performance comparison LogConICA FastICA ProDenICA Amari Metric Amari Metric Amari Metric Uniform Exponential t 2 LogConICA FastICA ProDenICA LogConICA FastICA ProDenICA LogConICA FastICA ProDenICA Amari Metric Amari Metric Mixture of Normal Binomial LogConICA FastICA ProDenICA

24 References Bach, F., Joran, M. I. (2002) Kernel inepenent component analysis. Journal of Machine Learning Research, 3, Chen, A. an Bickel, P. J. (2006) Efficient inepenent component analysis, The Annals of Statistics, 34, Comon, P. (1994) Inepenent component analysis, A new concept? Signal Proc., 36, Cule, M., Samworth, R. (2010) Theoretical properties of the log-concave maximum likelihoo estimator of a multiimensional ensity. Electron. J. Stat., 4, Cule, M., Samworth, R. an Stewart, M. (2010), Maximum likelihoo estimation of a multi-imensional log-concave ensity, J. Roy. Statist. Soc., Ser. B. (with iscussion), 72,

25 Dümbgen, L. an Rufibach, K. (2011) logconens: Computations Relate to Univariate Log-Concave Density Estimation. J. Statist. Software, 39, Dümbgen, L., Samworth, R. an Schuhmacher, D. (2011) Approximation by log-concave istributions, with applications to regression. Ann. Statist., 39, Eriksson, J. an Koivunen, V. (2004) Ientifiability, separability an uniqueness of linear ICA moels. IEEE Signal Processing Letters, 11, Hastie, T. an Tibshirani, R. (2003) Inepenent component analysis through prouct ensity estimation. In Avances in Neural Information Processing Systems 15 (Becker, S. an Obermayer, K., es), MIT Press, Cambrige, MA. pp Hastie, T. an Tibshirani, R. (2003) ProDenICA: Prouct Density Estimation for ICA using tilte Gaussian ensity estimates. R package version Samarov, A. an Tsybakov, A. (2004), Nonparametric inepenent component analysis. Bernoulli,

26 10, Samworth, R. J. an Yuan, M. (2012) Inepenent component analysis via nonparametric maximum likelihoo estimation.

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed