Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor

Size: px

Start display at page:

Download "Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor"

Shanna Golden
5 years ago
Views:

1 Research Overview Kristjan Greenewald University of Michigan - Ann Arbor February 2, 2016

2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with complex correlation structures.

2 2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with complex correlation structures. Training samples are often scarce relative to the number of variables. Dataset limitations Data distribution changing slowly over time in non-stationary way. Use mean-covariance model. Will consider the combination of several high-dimensional covariance regularization methods including KronPCA.

3 3/17 Kronecker Covariance Estimation Data vector dimensionality p t p s : p 2 s p 2 t parameters. No assumed structure: Sample covariance (SCM) is known to be a poor estimator in sample starved situations. Natural array arrangement (e.g. spatiotemporal) of variables should imply exploitable structure Kronecker product covariance [Werner et al 2008, Tsiligkaridis et al 2013, etc.] Σ = T S t 11 S t 1T S T S =..... t T 1 S t TT S ps 2 + pt 2 parameters - much lower estimation variance. Issue: Restrictive model gives significant bias in most space-time applications

4 4/17 Robust KronPCA Sum of Kronecker products allows lower bias, but often has conditioning issues. Solution: Include a sparse correction ( r ) Σ T i S i + Γ = Θ + Γ i=1 where Γ is a sparse matrix. Robust KronPCA with sparse noise, analogous to PCA with sparse noise of [Yang et al 2013] etc. Motivation: Avoid degradation of Kronecker basis estimate by a few high magnitude outlier variables and/or correlations. Sensor failure robustness. Noise processes often have sparse correlations which would be unlikely to have the same Kronecker basis as the signal. Identifiability is given via an incoherence assumption on Θ and Γ e.g. [Chandrasekaran et al 2011].

5 5/17 Nuclear Norm-based Objective Function Nuclear and L1 norm penalization based approach. Encourage sparsity of singular values of rearranged R(Θ) (implies low r) and elements of Γ. Objective function min Σ SCM Θ Γ 2 F + β R(Θ) + λ Γ 1,

6 6/17 Theoretical Rates Able to derive a random matrix concentration bound and leverage general Robust PCA theorems. The rate derived for Robust KronPCA: ( { ˆΘ Θ F = O max r p2 t + ps 2 + log M, n }) r p2 t + ps 2 + log M s log pt p s, n n The rate for unstructured (SCM) covariance estimation: ( ) p 2 ˆΣ SCM Σ F = O s pt 2 n Note the significant gains when r, s are both small.

7 7/17 Simulation Results: Estimation MSE Estimation MSE for corrupted Toeplitz (left) and non Toeplitz (right) covariances. Results shown as a function of n for values of λ Θ, λ Γ selected via cross validation.

8 8/17 Dynamic Metric Tracking Metric learning: Learning a metric ( inverse covariance ) best separating data classes. Big, complex data. Unsupervised learning is not good enough - because there are many possible learning tasks. Fully supervised learning requires too much analyst time. Traditional approach: Hand design custom feature set/learning approach. Our goal: Use a relatively small amount of targeted analyst feedback to allow unsupervised techniques to home in on the problem of interest.

9 9/17 Metric Learning Unsupervised techniques require notion of closeness: learn the analyst s problem-specific internal metric. Cluster similar points together Analyst metric: Non-euclidean, potentially complex riemannian metric. E.g. imagery: small changes in appearance cause large L2 changes, and vice versa. Feature relevance: metrics that project the data onto a submanifold prevent irrelevant features from confusing the learner. Example: Grouping objects by shape vs. grouping by color. Different ways to cluster

10 10/17 Metric Drift The real world is often dynamic Social media, news: changing discussion, changing events, behavior etc. Security: Changing attacks, changing technology, changing human behavior. And more... What causes metric drift? Changes in: problem of interest/feature relevance analyst internal metric underlying distribution/classes (changes optimal metric) Potentially rapid changes Need to exploit previous information without being enslaved to it. For simplicity, approximate the metric with the Mahalanobis distance, analogous to the inverse covariance. d M (x, z) = (x z) T M(x z)

11/17 Learning Applications Goals Track the relevant metric in the presence of

Find an embedding in which the data clusters are most separated, enabling better

Applications Improving k-nn classification performance Partially supervised

11 11/17 Learning Applications Goals Track the relevant metric in the presence of label noise. Find an embedding in which the data clusters are most separated, enabling better interpretation and/or feedback. Applications Improving k-nn classification performance Partially supervised clustering: Estimates a notion of similarity, which is fundamental to clustering. Anomaly detection: Distance to nonanomalous distribution. top 2 images wikipedia.org k-nn Clustering Anomaly Detection

12 12/17 Objective Function: Constraints Analyst provides a sequence of triplets (x t, z t, y t ) (x t, z t ) Pair of instances in R n y t Label: similar = +1, dissimilar = 1 Drift occurs as the sequence is provided by the analyst. Q({M t }) = T l t (M t, µ) + ρr(m t ) t=1 l t (M, µ) = l(m t ), m t = y t (µ (x t z t ) T M(x t z t ))

13 13/17 Online DML Efficient implementation of Composite Objective Mirror Descent (COMID) for online static DML [Kunapuli 2012]. Online updates (B ψ Bregman divergence, η t learning rate) M t+1 = arg min M 0 B ψ(m, M t ) + η t M l t (M t, µ t ), M M t + η t ρr(m) µ t+1 = arg min µ 1 B ψ(µ, µ t ) + η t µ l t (M t, µ t ) T (µ µ t ) Provably sublinear (O( T )) regret for learning rate η t = η 0 / T. Problem: In a true online learning scenario, the drift rate may change without warning. Cannot optimize η 0 a priori.

14 14/17 Strongly Adaptive Learning Combine COMID learners with low static regret on different scale intervals. Pick best performers. Each learner is a mirror descent learner with fixed η 1/ T i. Each (besides Scale 1) is initialized to the current estimate and weight of the next shortest scale.

15 15/17 Results - Shift to Confuser Classes 0.2 No drift 1 Mean K-NN Error Rate Nonadaptive Adaptive Batch Windowed Batch Online ITML K-Means NMI Probability Time (constraints) Drift Time (constraints) Mean K-NN Error Rate Nonadaptive Adaptive Batch Windowed Batch K-Means NMI Probability Time (constraints) Nonadaptive Adaptive Time (constraints) Batch Windowed Batch

16 16/17 Conclusions Kronecker methods significantly reduce the number of training samples required to perform high-dimensional covariance estimation for matrix-valued data. Other work: Block Toeplitz KronPCA Application to detection of moving targets in synthetic aperture radar Time varying Kronecker sum model incorporating sparsity in the inverse Introduced dynamic metric tracking: a strongly adaptive online method to find useful low-dimensional embeddings of changing and/or ambiguous datasets.

17 17/17 Publications [1] K. Greenewald, T. Tsiligkaridis, and A. Hero, Kronecker sum decompositions of space-time data, in Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 IEEE 5th International Workshop on, Dec 2013, pp [2] K. Greenewald and A. Hero, Robust kronecker product pca for spatio-temporal covariance estimation, Signal Processing, IEEE Transactions on, vol. 63, no. 23, pp , Dec [3] K. Greenewald and A. O. Hero III, kronecker pca based robust sar stap, arxiv preprint arxiv: , [4] K. Greenewald and A. Hero, Kronecker pca based spatio-temporal modeling of video for dismount classification, in Proceedings of SPIE, [5], Regularized block toeplitz covariance matrix estimation via kronecker product expansions, in Proceedings of IEEE SSP, Accepted papers Greenewald, E. Zelnio, and A. Hero, SPIE Defense+ Security, Under revision: Greenewald, E. Zelnio, and A. Hero, Kronecker PCA Based Robust SAR STAP, IEEE Transactions on Aerospace and Electronic Systems. In preparation: Greenewald, S. Kelley, A. Hero, Dynamic Metric Learning. Greenewald, S. Park, S. Zhou, A. Giessing, Time Varying Matrix Variate Graphical Models. Greenewald, S. Zhou, A. Hero, Multigraphical Lasso.

High Dimensional Covariance Estimation for Spatio-Temporal Processes

High Dimensional Covariance Estimation for Spatio-Temporal Processes by Kristjan Greenewald A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical