FuncICA for time series pattern discovery

FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology

The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns that vary independently over the data Existing solutions: 1 Standard independent component analysis (ICA) 2 Functional principal component analysis 2 / 29

High level Data Basis functions Principal curves Independent curves 3 / 29

High level FuncICA - Functional independent component analysis ICA for time series data 1 Express the data using a set of basis functions. 2 Functional PCA - Find the k curves of maximum variation across the data, subject to smoothness constraint 3 Rotate functional principal components to maximize independence and yield independent components Y 4 Optimize an independence objective Q(Y ) with respect to a smoothness regularization term 4 / 29

Towards automating science ICA used in pattern discovery in natural domains like neuroscience, genetics, and astrophysics, but patterns are often smooth Many domains created by humans, including financial markets and chemical processes, involve smooth variation FuncICA offers a way to find optimally smoothed patterns in these domains. Automatic identification of event-related potentials in EEG Automatic discovery of gene signaling mechanisms from microarray gene expression data Identifying spatiotemporal activation patterns in fmri 5 / 29

Independent component analysis Observe multiple signals X (t) over time Univariate signal X (t) i is mixture of independent sources S (t) Linear instantaneous mixing model: X (t) = AS (t) Find unmixing transformation that maximizes statistical independence of recovered sources 1 Sources 0.6 Mixtures 0.5 0 0.5 1 0 5 10 15 20 25 30 35 1 0.4 0.2 0 0.2 0.4 0 5 10 15 20 25 30 35 0.6 0.4 0.5 0 0.5 0 5 10 15 20 25 30 35 0.2 0 0.2 0.4 0 5 10 15 20 25 30 35 6 / 29

Independent component analysis Find unmixing matrix W such that Y = WX = WAS Y S up to a scaling and permutation of sources Maximizing independence Minimizing difference between joint and product of marginals P[Y ] P[Yi ] Use the Kullback-Leibler Divergence: H(Y ) = D KL (P[Y ] n i=1 P[Y i]) = n i=1 H(Y i) H(Y ) 7 / 29

ICA duality Primal Dual sample 1 IC 1 varies in loading over samples of x x 1 IC 1 loading varies over samples sample 2 x 2 sample 1, IC 1 IC 1 sample 1 8 / 29

Why functional data? Higher sampling rate Higher dimensional data No principled way of dimensionality reduction Subsampling? No principled way to handle asynchronous observations Missing data from occasionally offline sensors Each observation lives in different space Alternative? Generative (parametric) models - HMM, dynamic Bayesian network OR Functional representation 9 / 29

Functional data Set of n curves X = {X 1 (t),..., X n (t)}, X i X Set of m basis functions β = {β 1,..., β m } Decompose data as X i (t) = m j=1 ψ i,jβ j (t) X L 2 (Hilbert space), with inner product f, g = f (t)g(t)dt 0.7 0.6 15 0.5 0.4 13 17 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 / 29

Functional PCA Functional PCA to get principal curves E i (t) = m ρ i,j β j (t) j=1 Principal components (curve loadings) over the data σ (E) i,j = E i, X j = σ (E) = ρbψ T m k=1 l=1 m ρ i,k β k, β l ψ j,l 11 / 29

Functional ICA Independent curves are rotation W of principal curves Independent components Y i (t) = σ (Y ) i,j = Y i, X j = σ (Y ) = W σ (E) m [W ρ] i,j β j (t) j=1 m k=1 l=1 m [W ρ] i,k β k, β l ψ j,l Now, just solve for W to find IC basis weights φ = W ρ 12 / 29

Independence objective KL-divergence objective After FPCA, we have H(Y ) = n H(Y i ) H(Y ) i=1 n H(E) = H(E i ) H(E) i=1 Y = WE and W orthogonal yield minimum marginal entropy objective H (Y ) = n H(Y i ) i=1 13 / 29

Entropy estimator to evaluate H(Y i ) Vasicek entropy estimator nonparametric entropy estimator that considers order statistics 1 Order samples in non-decreasing order Z (1) Z (2)... Z (N) 2 m-spacing is Z (i+m) Z (i) 3 Ĥ N (Z 1, Z 2,..., Z N ) = N m 1 N ( ) N log (Z (i+mn) Z (i) N m N i=1 14 / 29

Optimizer Plug-in ICA estimator: RADICAL ICA [Learned-Miller and Fisher 2003] RADICAL uses Vasicek entropy estimator For ICA in D dimensions (D eigenfunctions), do pairwise separation. 2-D rotation matrix is parameterized by one angle parameter θ: ( ) cos θ sin θ sin θ cos θ Since θ [ π 2, π 2 ], use brute-force to optimize Ĥ N (θ) 15 / 29

Smoothing FPCA chooses smoothing level α p minimizing leave-one-out cross-validation error α p not optimal for source recovery FuncICA balances reconstruction error while avoiding Gaussian components 16 / 29

L 2 Smoothing (FPCA) [Ramsay and Silverman 2002] Penalize the second derivative so that functions are not too wiggly ξ(t) 2 dt + α (D 2 ξ(t)) 2 dt = 1, for α 0 Smooths the principal curves directly Hence also smooths independent curves Select optimal reconstruction error αp via leave-one-out cross-validation 17 / 29

Inverse negentropy smoothing (FuncICA) Motivation - Penalize Gaussian components Why? ICA fails if there is more than 1 Gaussian component Oversmoothing components become noisy non-gaussian components may become Gaussian How? Inverse negentropy objective function: Q(Y ) = p i=1 1 J(Y i ) where J(Y i ) = H(N (0, 1)) H(Y i ) is the negentropy of unit-variance Y i 18 / 29

Optimal FuncICA inverse negentropy smoothing Algorithm 1 α = αp, τ = 0, Q (0) = 2 repeat 3 τ = τ + 1 4 (Y, Q (τ) ) = FuncICA(X, α (τ) ) 5 α (τ+1) = γα (τ) 6 until Q (τ) > Q (τ 1) 7 return α (τ 1) Intuition: αp optimally smooths for reconstruction error Can further smoothing be beneficial for source recovery? Yes! Can effectively dampen Gaussian noise components 19 / 29

Synthetic data results 2 1.5 1 Perfect source recovery for mixture of Laplace-distributed harmonics f(t) 0.5 0 0.5 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t Successful isolation of single high-frequency Gaussian source FuncICA performs well for α 0 FPCA blends high-frequency source into all recovered curves Dampening of two high-frequency Gaussian sources Q statistic performs well in recovering Laplacian sources 20 / 29

Synthetic data results 130 Effect of smoothing on Q 125 Source curves S 1 (t) = 1 2 sin(10πt) S 2 (t) = 1 2 cos(10πt) S 3 (t) = sin(40πt) S 4 (t) = cos(40πt) Distributions Laplace Laplace Gaussian Gaussian Q 120 115 110 105 100 95 10 10 10 9 10 8 α 1.002 Effect of smoothing on accuracy 1 0.998 accuracy 0.996 0.994 0.992 0.99 0.988 10 10 10 9 10 8 α 21 / 29

Event-related potential discovery results 0.7 0.6 FuncICA ICA FPCA Empirical P300 0.5 Accuracy 0.4 0.3 0.2 0.1 0 column row letter 22 / 29

Microarray gene expression results 6178 genes observed at 18 times in 7 minute increments Goal: identify co-regulated genes related to specific phases of the cell cycle G 1 phase regulated vs non-g 1 phase regulated IC PC Top 11 7.1% 9.2% Filtered 6.2% 8.2% 23 / 29

Conclusion Functional ICA offers a way to find smooth, independent modes of variation in time series and other continuous-natured data Alternative to FPCA when components of interest may not be Gaussian Applicable for EEG, gene expression, finance, and other domains 24 / 29

Questions? 25 / 29

Perhaps not functional PCA Let s see what FPCA does for our ERP data 2 f(t) 1.5 1 0.5 0 0.5 1 1.5 Looks like a Fourier basis Makes sense α rhythm β rhythm µ rhythm 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 26 / 29

Let s see what Functional ICA (FuncICA) extracts 3 IC closest to P300 ERP for α = 5E 8 IC 6 (t) 2 1 0 Closest IC to P300 for α = 5 10 5 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 3.5 Empirical P300 waveform 3 P300(t) 2.5 2 1.5 Empirical P300 waveform calculated from 2550 trials 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 2 IC closest to P300 ERP for α = 1E 7 1.5 1 0.5 IC 3 (t) 0 0.5 1 1.5 2 Closest IC to P300 for α = 1 10 7 2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 27 / 29

Event-related potential discovery results 1250 Effect of smoothing on Q 1200 1150 1100 1050 Q 1000 950 900 850 800 750 10 8 10 7 10 6 α 0.6 Effect of smoothing on accuracy 0.58 0.56 0.54 accuracy 0.52 0.5 0.48 0.46 0.44 10 8 10 7 10 6 α 28 / 29

Smoothing Choices to make Spline functional form? cubic b-spline - computationally efficient, common in statistics Number of knots? As many as we can use tractably Number of principal curves to retain? Use reconstruction error threshold OR Largest number that is tractable for FuncICA 29 / 29