FuncICA for time series pattern discovery

Similar documents
Multidimensional scaling (MDS)

Principal Component Analysis

Independent Component Analysis

Unsupervised learning: beyond simple clustering and PCA

Independent Component Analysis. Contents

Variational Principal Components

ICA and ISA Using Schweizer-Wolff Measure of Dependence

Information geometry for bivariate distribution control

HST.582J/6.555J/16.456J

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

CIFAR Lectures: Non-Gaussian statistics and natural images

Machine Learning (BSMC-GA 4439) Wenke Liu

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

CS281 Section 4: Factor Analysis and PCA

PATTERN RECOGNITION AND MACHINE LEARNING

Density Estimation. Seungjin Choi

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Independent Component Analysis. PhD Seminar Jörgen Ungh

Slide11 Haykin Chapter 10: Information-Theoretic Models

An Improved Cumulant Based Method for Independent Component Analysis

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

ICA Using Spacings Estimates of Entropy

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Rigid Structure from Motion from a Blind Source Separation Perspective

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

An Introduction to Independent Components Analysis (ICA)

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Threshold models with fixed and random effects for ordered categorical data

Lecture 10: Dimension Reduction Techniques

Dimension Reduction (PCA, ICA, CCA, FLD,

Fractal functional regression for classification of gene expression data by wavelets

Advanced Introduction to Machine Learning CMU-10715

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Variational Methods in Bayesian Deconvolution

Statistical Analysis of fmrl Data

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks

Latent Variable Models and EM algorithm

CSC 2541: Bayesian Methods for Machine Learning

Empirical Wavelet Transform

Independent Component Analysis (ICA)

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Spatial smoothing using Gaussian processes

From independent component analysis to score matching

13: Variational inference II

Independent Component Analysis

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Bayesian ensemble learning of generative models

Machine learning - HT Maximum Likelihood

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Factor Analysis (10/2/13)

Robust Statistics, Revisited

Recent Advances in Bayesian Inference Techniques

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

2234 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 8, AUGUST Nonparametric Hypothesis Tests for Statistical Dependency

Machine Learning Techniques for Computer Vision

STA 4273H: Statistical Machine Learning

Linear Dynamical Systems

Nonparametric Regression With Gaussian Processes

Variational Inference (11/04/13)

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

Week 3: The EM algorithm

Curve Fitting Re-visited, Bishop1.2.5

Sparse linear models

Dimension Reduction. David M. Blei. April 23, 2012

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Advanced Introduction to Machine Learning

Random Sampling of Bandlimited Signals on Graphs

2.3. Clustering or vector quantization 57

Switching-state Dynamical Modeling of Daily Behavioral Data

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Functional Preprocessing for Multilayer Perceptrons

Independent Component Analysis

CS Lecture 19. Exponential Families & Expectation Propagation

Stochastic Spectral Approaches to Bayesian Inference

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

Copula Based Independent Component Analysis

ICES REPORT Model Misspecification and Plausibility

Kernel methods, kernel SVM and ridge regression

GWAS V: Gaussian processes

Independent Component Analysis of Incomplete Data

Transcription:

FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology

The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns that vary independently over the data Existing solutions: 1 Standard independent component analysis (ICA) 2 Functional principal component analysis 2 / 29

High level Data Basis functions Principal curves Independent curves 3 / 29

High level FuncICA - Functional independent component analysis ICA for time series data 1 Express the data using a set of basis functions. 2 Functional PCA - Find the k curves of maximum variation across the data, subject to smoothness constraint 3 Rotate functional principal components to maximize independence and yield independent components Y 4 Optimize an independence objective Q(Y ) with respect to a smoothness regularization term 4 / 29

Towards automating science ICA used in pattern discovery in natural domains like neuroscience, genetics, and astrophysics, but patterns are often smooth Many domains created by humans, including financial markets and chemical processes, involve smooth variation FuncICA offers a way to find optimally smoothed patterns in these domains. Automatic identification of event-related potentials in EEG Automatic discovery of gene signaling mechanisms from microarray gene expression data Identifying spatiotemporal activation patterns in fmri 5 / 29

Independent component analysis Observe multiple signals X (t) over time Univariate signal X (t) i is mixture of independent sources S (t) Linear instantaneous mixing model: X (t) = AS (t) Find unmixing transformation that maximizes statistical independence of recovered sources 1 Sources 0.6 Mixtures 0.5 0 0.5 1 0 5 10 15 20 25 30 35 1 0.4 0.2 0 0.2 0.4 0 5 10 15 20 25 30 35 0.6 0.4 0.5 0 0.5 0 5 10 15 20 25 30 35 0.2 0 0.2 0.4 0 5 10 15 20 25 30 35 6 / 29

Independent component analysis Find unmixing matrix W such that Y = WX = WAS Y S up to a scaling and permutation of sources Maximizing independence Minimizing difference between joint and product of marginals P[Y ] P[Yi ] Use the Kullback-Leibler Divergence: H(Y ) = D KL (P[Y ] n i=1 P[Y i]) = n i=1 H(Y i) H(Y ) 7 / 29

ICA duality Primal Dual sample 1 IC 1 varies in loading over samples of x x 1 IC 1 loading varies over samples sample 2 x 2 sample 1, IC 1 IC 1 sample 1 8 / 29

Why functional data? Higher sampling rate Higher dimensional data No principled way of dimensionality reduction Subsampling? No principled way to handle asynchronous observations Missing data from occasionally offline sensors Each observation lives in different space Alternative? Generative (parametric) models - HMM, dynamic Bayesian network OR Functional representation 9 / 29

Functional data Set of n curves X = {X 1 (t),..., X n (t)}, X i X Set of m basis functions β = {β 1,..., β m } Decompose data as X i (t) = m j=1 ψ i,jβ j (t) X L 2 (Hilbert space), with inner product f, g = f (t)g(t)dt 0.7 0.6 15 0.5 0.4 13 17 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 / 29

Functional PCA Functional PCA to get principal curves E i (t) = m ρ i,j β j (t) j=1 Principal components (curve loadings) over the data σ (E) i,j = E i, X j = σ (E) = ρbψ T m k=1 l=1 m ρ i,k β k, β l ψ j,l 11 / 29

Functional ICA Independent curves are rotation W of principal curves Independent components Y i (t) = σ (Y ) i,j = Y i, X j = σ (Y ) = W σ (E) m [W ρ] i,j β j (t) j=1 m k=1 l=1 m [W ρ] i,k β k, β l ψ j,l Now, just solve for W to find IC basis weights φ = W ρ 12 / 29

Independence objective KL-divergence objective After FPCA, we have H(Y ) = n H(Y i ) H(Y ) i=1 n H(E) = H(E i ) H(E) i=1 Y = WE and W orthogonal yield minimum marginal entropy objective H (Y ) = n H(Y i ) i=1 13 / 29

Entropy estimator to evaluate H(Y i ) Vasicek entropy estimator nonparametric entropy estimator that considers order statistics 1 Order samples in non-decreasing order Z (1) Z (2)... Z (N) 2 m-spacing is Z (i+m) Z (i) 3 Ĥ N (Z 1, Z 2,..., Z N ) = N m 1 N ( ) N log (Z (i+mn) Z (i) N m N i=1 14 / 29

Optimizer Plug-in ICA estimator: RADICAL ICA [Learned-Miller and Fisher 2003] RADICAL uses Vasicek entropy estimator For ICA in D dimensions (D eigenfunctions), do pairwise separation. 2-D rotation matrix is parameterized by one angle parameter θ: ( ) cos θ sin θ sin θ cos θ Since θ [ π 2, π 2 ], use brute-force to optimize Ĥ N (θ) 15 / 29

Smoothing FPCA chooses smoothing level α p minimizing leave-one-out cross-validation error α p not optimal for source recovery FuncICA balances reconstruction error while avoiding Gaussian components 16 / 29

L 2 Smoothing (FPCA) [Ramsay and Silverman 2002] Penalize the second derivative so that functions are not too wiggly ξ(t) 2 dt + α (D 2 ξ(t)) 2 dt = 1, for α 0 Smooths the principal curves directly Hence also smooths independent curves Select optimal reconstruction error αp via leave-one-out cross-validation 17 / 29

Inverse negentropy smoothing (FuncICA) Motivation - Penalize Gaussian components Why? ICA fails if there is more than 1 Gaussian component Oversmoothing components become noisy non-gaussian components may become Gaussian How? Inverse negentropy objective function: Q(Y ) = p i=1 1 J(Y i ) where J(Y i ) = H(N (0, 1)) H(Y i ) is the negentropy of unit-variance Y i 18 / 29

Optimal FuncICA inverse negentropy smoothing Algorithm 1 α = αp, τ = 0, Q (0) = 2 repeat 3 τ = τ + 1 4 (Y, Q (τ) ) = FuncICA(X, α (τ) ) 5 α (τ+1) = γα (τ) 6 until Q (τ) > Q (τ 1) 7 return α (τ 1) Intuition: αp optimally smooths for reconstruction error Can further smoothing be beneficial for source recovery? Yes! Can effectively dampen Gaussian noise components 19 / 29

Synthetic data results 2 1.5 1 Perfect source recovery for mixture of Laplace-distributed harmonics f(t) 0.5 0 0.5 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t Successful isolation of single high-frequency Gaussian source FuncICA performs well for α 0 FPCA blends high-frequency source into all recovered curves Dampening of two high-frequency Gaussian sources Q statistic performs well in recovering Laplacian sources 20 / 29

Synthetic data results 130 Effect of smoothing on Q 125 Source curves S 1 (t) = 1 2 sin(10πt) S 2 (t) = 1 2 cos(10πt) S 3 (t) = sin(40πt) S 4 (t) = cos(40πt) Distributions Laplace Laplace Gaussian Gaussian Q 120 115 110 105 100 95 10 10 10 9 10 8 α 1.002 Effect of smoothing on accuracy 1 0.998 accuracy 0.996 0.994 0.992 0.99 0.988 10 10 10 9 10 8 α 21 / 29

Event-related potential discovery results 0.7 0.6 FuncICA ICA FPCA Empirical P300 0.5 Accuracy 0.4 0.3 0.2 0.1 0 column row letter 22 / 29

Microarray gene expression results 6178 genes observed at 18 times in 7 minute increments Goal: identify co-regulated genes related to specific phases of the cell cycle G 1 phase regulated vs non-g 1 phase regulated IC PC Top 11 7.1% 9.2% Filtered 6.2% 8.2% 23 / 29

Conclusion Functional ICA offers a way to find smooth, independent modes of variation in time series and other continuous-natured data Alternative to FPCA when components of interest may not be Gaussian Applicable for EEG, gene expression, finance, and other domains 24 / 29

Questions? 25 / 29

Perhaps not functional PCA Let s see what FPCA does for our ERP data 2 f(t) 1.5 1 0.5 0 0.5 1 1.5 Looks like a Fourier basis Makes sense α rhythm β rhythm µ rhythm 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 26 / 29

Let s see what Functional ICA (FuncICA) extracts 3 IC closest to P300 ERP for α = 5E 8 IC 6 (t) 2 1 0 Closest IC to P300 for α = 5 10 5 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 3.5 Empirical P300 waveform 3 P300(t) 2.5 2 1.5 Empirical P300 waveform calculated from 2550 trials 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 2 IC closest to P300 ERP for α = 1E 7 1.5 1 0.5 IC 3 (t) 0 0.5 1 1.5 2 Closest IC to P300 for α = 1 10 7 2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t 27 / 29

Event-related potential discovery results 1250 Effect of smoothing on Q 1200 1150 1100 1050 Q 1000 950 900 850 800 750 10 8 10 7 10 6 α 0.6 Effect of smoothing on accuracy 0.58 0.56 0.54 accuracy 0.52 0.5 0.48 0.46 0.44 10 8 10 7 10 6 α 28 / 29

Smoothing Choices to make Spline functional form? cubic b-spline - computationally efficient, common in statistics Number of knots? As many as we can use tractably Number of principal curves to retain? Use reconstruction error threshold OR Largest number that is tractable for FuncICA 29 / 29