Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE

Similar documents
Detection tests for worst-case scenarios with optimized dictionaries. Applications to hyperspectral data

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Novel spectrum sensing schemes for Cognitive Radio Networks

Detection of Anomalies in Texture Images using Multi-Resolution Features

EUSIPCO

PATTERN RECOGNITION AND MACHINE LEARNING

Performance Limits on the Classification of Kronecker-structured Models

Detection theory 101 ELEC-E5410 Signal Processing for Communications

CFAR TARGET DETECTION IN TREE SCATTERING INTERFERENCE

MULTIPLE-CHANNEL DETECTION IN ACTIVE SENSING. Kaitlyn Beaudet and Douglas Cochran

Detection theory. H 0 : x[n] = w[n]

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Notes on Latent Semantic Analysis

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ECE521 week 3: 23/26 January 2017

Sparse Linear Contextual Bandits via Relevance Vector Machines

Application to Hyperspectral Imaging

Composite Hypotheses and Generalized Likelihood Ratio Tests

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Clustering VS Classification

Parameter Estimation in a Moving Horizon Perspective

Machine Learning for Data Science (CS4786) Lecture 12

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Signal Denoising with Wavelets

Matrix Factorization and Analysis

Info-Greedy Sequential Adaptive Compressed Sensing

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

10. Composite Hypothesis Testing. ECE 830, Spring 2014

DETECTION theory deals primarily with techniques for

HST.582J/6.555J/16.456J

Passive Sonar Detection Performance Prediction of a Moving Source in an Uncertain Environment

IMPROVED GLRT BASED ON THE EXPLOITATION OF SPATIAL CORRELATION BETWEEN NEIGHBORING SENSORS

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Finding High-Order Correlations in High-Dimensional Biological Data

Hyung So0 Kim and Alfred 0. Hero

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Gaussian discriminant analysis Naive Bayes

Graph Functional Methods for Climate Partitioning

Object Recognition Using Local Characterisation and Zernike Moments

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Factor Analysis (10/2/13)

Clustering and Gaussian Mixture Models

Lessons learned from the theory and practice of. change detection. Introduction. Content. Simulated data - One change (Signal and spectral densities)

A Root-MUSIC-Like Direction Finding Method for Cyclostationary Signals

Computational Methods Short Course on Image Quality

Robust Sparse Recovery via Non-Convex Optimization

5.6. PSEUDOINVERSES 101. A H w.

Lecture 11: Unsupervised Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Estimation of large dimensional sparse covariance matrices

Robust Capon Beamforming

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

SPARSE signal representations have gained popularity in recent

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

approximation algorithms I

Statistical Machine Learning

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

PCA and admixture models

Model Selection and Geometry

Convex Optimization in Classification Problems

Advanced Introduction to Machine Learning

Motivating the Covariance Matrix

Lecture 8: MIMO Architectures (II) Theoretical Foundations of Wireless Communications 1. Overview. Ragnar Thobaben CommTh/EES/KTH

Sketching for Large-Scale Learning of Mixture Models

Sparse Covariance Selection using Semidefinite Programming

Dimensionality Reduction with Principal Component Analysis

Fantope Regularization in Metric Learning

Multivariate Random Variable

Arbitrary and reconfigurable optics - new opportunities for integrated photonics

EECS 275 Matrix Computation

Linear Classifiers as Pattern Detectors

Bachelor Research Project Resolved Stars in a large IFU Data Cube with GIPSY

Using Multiple Kernel-based Regularization for Linear System Identification

Machine Learning for Signal Processing Sparse and Overcomplete Representations

10708 Graphical Models: Homework 2

Introduction to Statistical Inference

Classification 2: Linear discriminant analysis (continued); logistic regression

Iterative Laplacian Score for Feature Selection

Statistical Pattern Recognition

STA414/2104 Statistical Methods for Machine Learning II

Lecture Notes 5: Multiresolution Analysis

Sparse Graph Learning via Markov Random Fields

L26: Advanced dimensionality reduction

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

Graphlet Screening (GS)

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Transport Continuity Property

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

Polynomial Chaos and Karhunen-Loeve Expansion

DETECTION IN MULTIPLE CHANNELS HAVING UNEQUAL NOISE POWER. PO Box 1500, Edinburgh 5111, Australia. Arizona State University, Tempe AZ USA

Lecture 7 MIMO Communica2ons

Transcription:

Détection de sources astrophysiques dans les données du spectrographe intégral de champ MUSE David Mary, André Ferrari and Raja Fazliza R.S. Université de Nice Sophia Antipolis, France Porquerolles, 15 mai 2014

Introduction Hyperspectral Imaging with MUSE: detection of galaxies and emission lines The spectrograph MUSE delivers data cube of 300 300 pixels at 3600 wavelength channels. Lyman-α emitters are very distant and faint; contain essentially one emission line, whose profile varies with the considered object. wavelength (nm) Image credit: ESO MUSE (Multi Unit Spectroscopic Explorer) data cube Flux (arbitrary units) 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 S 100 0 10 20 30 40 50 60 70 80 90 100 100 possible alternatives under H 1 Detect one target signature s i R N 1 which belongs to a large known library S = [s 1,..., s L ] (S R N L, with L N). The amplitude and the alternative s l activated under H 1 are unknown. 2 / 18

Introduction Strategy for dimension reduction: Average vs Minimax performance Standard approach when having a very large library is to test in subspaces of reduced dimensions (e.g.: sample mean, SVD, K-SVD, etc). + less computationally complex with similar average performances. might induce drop of performance for some targets. Proposed approach Learn a reduced size library with the objective of minimizing the maximum power loss. 3 / 18

Exact model, associated test and loss of performances Exact model and associated GLRT { H0 : x = n, n N (0, I) H 1 : x = Sα + n, α 0 = 1 n R N 1 : gaussian noise (covariance matrix is known and equal under H 0 and H 1 ), S R N L = [s 1, s 2,..., s L ] with s i 2 2 = 1 i, L N, and α: unknown vector. GLR test using S under constraint α 0 = 1: max α: α 0 =1 p(x Sα) p(x 0) H 1 H 0 γ Corresponding GLRT: Max test over S T Max : max i=1,...,l s i x H1 H 0 γ. 4 / 18

Exact model, associated test and loss of performances Issues when testing with a reduced size library Assuming in all cases that s l is the active alternative under H 1 : Oracle Neyman-Pearson detector (O. NPD): x s l H 1 H 0 γ. A reduced one-dimension GLR test: x v H 1 H 0 γ, (e.g. v: SVD of S). 1) s l similar shape to v (s l v =0.93) 0.7 0.6 0.5 O. NPD case 1 Max test S 10 5 R.D. test S 10 5 2) s l dissimilar shape to v (s l v =0.50) 0.7 0.6 0.5 O. NPD case 2 Max test S 10 5 R.D. test S 10 5 P Det 0.4 P Det 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.02 0.04 0.06 0.08 0.1 P FA 0 0.02 0.04 0.06 0.08 0.1 P FA R.D. test: advantageous w.r.t. Max test iff s l is well correlated to the learned subspace. possibility to devise robust (minimax) R.D. tests. 5 / 18

The reduced dimension model and associated test Model of reduced dimension and corresponding GLRT Proposed R.D. model { H0 : x = n, n N (0, I) H 1 : x = Dβ + n β 0 = 1 D R N K with d j 2 2 = 1 is a low dimension (K L) dictionary to be optimized. Corresponding GLRT: Max test over D T D (x) = max j=1,...,k d j x H1 ξ. H 0 Aim: optimize D to maximize the worst-case detection performance. 6 / 18

Reduced dimension dictionnary estimation Minimax optimization criterion Optimal minimax dictionary: exact criterion ( ) D = arg max min P max D i=1,...,l j=1,...,k d j x > ξ H 1, s i ( ) P max j=1,...,k d j x > ξ H 0 P FA0, subject to d j 2 = 1, j = {1,..., K }. Solving D (K > 1) involves distributions of the max. of correlated variables d j is an extremely intricate task. x, j = {1,..., K }, which Exact criterion for K = 1 d = arg max min d: d 2 =1 i=1,...,l d s i 7 / 18

Reduced dimension dictionnary estimation Illustration of the minimax atom d (K = 1) Proposition: Assuming that S R N +, then d R N + is the solution of a QP problem. C + s i v d * worst-case alternatives w.r.t.d * On this example, d is held by three marginal alternatives (at the border of the smallest enclosed circle C). Those isolated s i induce the worst P Det. v (SVD) represents well the most populated area of the alternatives. Representing S by a single atom may be insufficient w.r.t. the intrinsic diversity of S. 8 / 18

Reduced dimension dictionnary estimation Approximation criterion to optimize D for K > 1: deriving bounds ( P Det (s l, D) = P max j=1,...,k (d j ) x) 2 > ξ 2 H 1, s l, D. CDF of the max of continuous random variables (X 1,..., X N ): F max(x1,...,x N )(t) min (F 1 (t),..., F N (t)). P Det (s l, D) max j=1,...,k Q 1 2 ( d j s l, ξ) Approximation of the optimal dictionary D arg max D: d j 2 =1 ρ(k ) (D) where ρ (K ) (D) = min i=1,...,l function. max d j s i is the minimax correlation j=1,...,k 9 / 18

Reduced dimension dictionnary estimation Approximation criterion to optimize D for K > 1 P FA (D) = P ( max j=1,...,k (d j ) x) 2 > ξ 2 H 0, D. CDF of multivariate normals (C. Khatri, 1968): P( u i c i, i = 1,..., m) m P( u i c i ) when u = (u 1,..., u m) is distributed as multivariate normal u N (0, Σ). P FA (D) 1 i=1 K P( u j ξ) j=1 P FA (D) 1 Φ K χ 2 1 (ξ 2 ). If D is orthogonal, the upper bound is the exact P FA (D). 10 / 18

Minimax learning algorithms Ilustrations of the proposed learning methods to optimize D Greedy minimax: (K = 3) Heuristic optimization based on the approximation criterion. Dictionary update stage by 1-dimensional minimax optimization. Samples the distribution in a greedy manner to open new classes. 11 / 18

Minimax learning algorithms Ilustrations of the proposed learning methods to optimize D K-minimax: (K = 3) The dictionary update stage of the K-SVD algorithm [Aharon et al., 2006] is replaced by the exact solution of the 1-dimensional minimax problem. ii) 12 / 18

Numerical results Subspace learning of faces Library S: 40 subjects in front position (possible alternatives under H 1 ), taken from the ORL Database of Faces by AT&T Laboratories Cambridge. K-SVD (K = 3) Greedy minimax (K = 3) Minimax approach captures marginal features, (K)-SVD tends to represent average face(s). 13 / 18

Numerical results Subspace learning of 100 spectral profiles Flux (arbitrary units) 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 S 100 0 10 20 30 40 50 60 70 80 90 100 i. Spectral profiles, S R 100 100 ρ (K) 1 0.95 0.9 5 ρ (K) Greedy minimax 0.75 1 3 5 7 9 11 13 15 17 19 21 K ii. Minimax correlation function, ρ (K ) Flux (arbitrary units) 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 D 21 * 0 10 20 30 40 50 60 70 80 90 100 iv. D 21 (minimax) Flux (arbitrary units) 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 K-SVD D 21 0 10 20 30 40 50 60 70 80 90 100 v. D K-SVD 21 14 / 18

Numerical results AUC (Area under curve) of the ROCs 0.9 5 AUC 0.75 0.7 Worst cases! oracle NPD Max test S 100 K-SVD D 21 * D 21 K D * 21 0.65 0 10 20 30 40 50 60 70 80 90 100 Alternatives under H 1 15 / 18

Numerical results Performances ranking: Minimax vs Average Dictionary Ranking Criterion min AUC Average AUC O. NPD Ref : 86 Ref : 87 S 100 1 st : 13 3 rd : 47 d 2 nd : 0.794 6 th : 36 v (SVD) 4 th : 0.699 1 st : 63 D K-SVD 21 3 rd : 0.764 2 nd : 49 D 21 1 st : 12 4 th : 45 D K 21 1 st : 13 5 th : 43 16 / 18

Numerical results MUSE simulation. 3D atoms, SNR = -11 db, P FA = 0.01 Pdet using K SVD Pdet K-SVD 1 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Pdet using minimax Pdet minimax 0 1 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 17 / 18

Conclusions Conclusions Propose a detection test of reduced dimension where dictionary D is learned with the objective of maximizing the worst P Det. Case K = 1, exact solution to the minimax problem, but might be insufficient to represent well all the intrinsic diversity of S. Case K > 1: We derived proxies that are used to elaborate two algorithms for minimax detection. better sampling of S, thus improves the minimax performance over the case K = 1. Perspectives Minimax modification of learning algorithm involving sparsity can be easily achieved by the use of use of d in the dictionary update stage. Find the optimal value of K, which depends on the intrinsic diversity of S. 18 / 18