A Correntropy based Periodogram for Light Curves & Semi-supervised classification of VVV periodic variables

Similar documents
Bagging and Other Ensemble Methods

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Computational Intelligence Challenges and. Applications on Large-Scale Astronomical Time Series Databases

Introduction to Machine Learning Midterm Exam

Massive Event Detection. Abstract

False Alarm Probability based on bootstrap and extreme-value methods for periodogram peaks

Holdout and Cross-Validation Methods Overfitting Avoidance

Linear and Non-Linear Dimensionality Reduction

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Efficient and Principled Online Classification Algorithms for Lifelon

Statistical Learning Theory and the C-Loss cost function

Using conditional entropy to identify periodicity

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Jeff Howbert Introduction to Machine Learning Winter

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Online Manifold Regularization: A New Learning Setting and Empirical Study

An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves

Learning with multiple models. Boosting.

6.036 midterm review. Wednesday, March 18, 15

Kernel Methods. Barnabás Póczos

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter

What is semi-supervised learning?

Support Vector Machine. Industrial AI Lab.

Chapter 14 Combining Models

Sampler of Interdisciplinary Measurement Error and Complex Data Problems

Nonlinear Dimensionality Reduction. Jose A. Costa

CS 6375 Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Exploiting Sparse Non-Linear Structure in Astronomical Data

When Dictionary Learning Meets Classification

Learning SVM Classifiers with Indefinite Kernels

Introduction to Machine Learning Midterm Exam Solutions

Evaluation. Andrea Passerini Machine Learning. Evaluation

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Kernel Logistic Regression and the Import Vector Machine

Statistical Machine Learning

FINAL: CS 6375 (Machine Learning) Fall 2014

Additional Keplerian Signals in the HARPS data for Gliese 667C from a Bayesian re-analysis

Cosmic Ray Electrons and GC Observations with H.E.S.S.

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Evaluation requires to define performance measures to be optimized

Automated Variable Source Classification: Methods and Challenges

Graph-Based Semi-Supervised Learning

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Advanced Introduction to Machine Learning CMU-10715

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

L11: Pattern recognition principles

Period. End of Story?

Multi-label Active Learning with Auxiliary Learner

Models, Data, Learning Problems

Hypothesis Evaluation

Learning algorithms at the service of WISE survey

Machine Learning Lecture 7

Final Exam, Machine Learning, Spring 2009

Iterative Laplacian Score for Feature Selection

A Least Squares Formulation for Canonical Correlation Analysis

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Machine Learning Practice Page 2 of 2 10/28/13

Statistical validation of PLATO 2.0 planet candidates

Ensembles. Léon Bottou COS 424 4/8/2010

Graphs in Machine Learning

Does Unlabeled Data Help?

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

ECE521 week 3: 23/26 January 2017

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Active learning in sequence labeling

TIME SERIES ANALYSIS

CS534 Machine Learning - Spring Final Exam

Introduction to Support Vector Machines

CS145: INTRODUCTION TO DATA MINING

Better Algorithms for Selective Sampling

arxiv: v1 [astro-ph.sr] 2 Dec 2015

arxiv: v1 [astro-ph.im] 16 Jun 2011

Advanced Introduction to Machine Learning CMU-10715

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

CS7267 MACHINE LEARNING

Lecture #11: Classification & Logistic Regression

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Principal Component Analysis

arxiv: v2 [astro-ph.im] 28 Mar 2014

The Perceptron Algorithm

Support Vector Machine (SVM) and Kernel Methods

Final Exam, Fall 2002

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Characterization of variable stars using the ASAS and SuperWASP databases

Learning theory. Ensemble methods. Boosting. Boosting: history

Comments on New Approaches in Period Analysis of Astronomical Time Series by Pavlos Protopapas (Or: A Pavlosian Response ) Don Percival

SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA. Outlier Detection And Description Workshop 2013

Decision Tree And Random Forest

How to learn from very few examples?

Linear Dimensionality Reduction

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Mixture of Gaussians Models

Robotics 2. AdaBoost for People and Place Detection. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard

Transcription:

A Correntropy based Periodogram for Light Curves & Semi-supervised classification of VVV periodic variables Los Pablos (P4J): Pablo Huijse Pavlos Protopapas Pablo Estévez Pablo Zegers Jose Principe Harvard-Chile Data Science School

Motivation Light curve analysis challenges: Uneven sampling. Different noise sources and heteroscedastic errors. May have few points. Databases can be huge. Picture sources: http://www.atnf.csiro.au/ and http://www.hao.ucar.edu/

Least Squares Spectral Analysis (LSSA) For a given frequency, the Lomb-Scargle (LS) power is equivalent to the L2 norm of the coefficients of the sinusoidal model that best fits the data in a least squares sense The Generalized Lomb Scargle (GLS) periodogram M. Zechmeister, The Generalized Lomb Scargle Periodogram, A&A, 2009 J. VanderPlas & Z. Ivezic, Periodograms for Multiband Astronomical Time Series, ApJ, 2015

Maximum correntropy criterion For two arbitrary R.V. with N realizations - Generalizes correlation to higher order moments - Samples are compared through a kernel - Parameter: kernel bandwidth J. Principe, Information Theoretic Learning, Springer, 2010

Maximum correntropy criterion - Fit a model to the data by maximizing correntropy - MCC is equivalent to maximizing the pdf of the error at e=0 (Principe 2010) - M-estimator - Robust to non-gaussian noise and outliers - Assumes homoscedastic noise J. Principe, Information Theoretic Learning, Springer, 2010

Weighted Maximum Correntropy Criterion Simple sample weighting through the kernel bandwidth Fixed-point updates: 1. Assume sigma fixed and update 2. Assume beta fixed and update 3. WMCC convergence: Stop or go back to 1

Statistical test for period significance - LS has an analytical expression for the false alarm pbb (assumes Gaussian noise) - Generalized Extreme Value (GEV) statistics - The maxima* from several realizations of an experiment follow - Do bootstrap, find maxima on a subset of frequencies, fit GEV, compute false alarm probability [1] [1] M. Suveges, Extreme-value modelling for the significance assessment of periodogram peaks, MNRAS, 2012

Synthetic test - - Simple irregular sampling: Generate linearly spaced vector, add jitter proportional to the 1/Fs, discard 80% of the points Model import P4J t = P4J.irregular_sampling(T=100.0, N=100) y_clean = P4J.trigonometric_model(t, f0=2.0, A=[1.0, 0.5,.25]) y, y_noisy, dy = P4J.contaminate_time_series(t, y_clean, SNR=0.0, red_noise_ratio=0.25, outlier_ratio=0.0)

Example my_per = P4J.periodogram(M=3, method='wmcc') my_per.fit(t, y_noisy, dy) freq, per = my_per.grid_search(0.0, 5.0, 1.0, 0.1, n_local_max=10) my_per.fit_extreme_cdf(n_bootstrap=100, n_frequencies=100) per_levels = my_per.get_fap(np.asarray([0.05, 0.01, 0.001])) SNR=10.0, red_noise_var=0.25

Example my_per = P4J.periodogram(M=3, method='wmcc') my_per.fit(t, y_noisy, dy) freq, per = my_per.grid_search(0.0, 5.0, 1.0, 0.1, n_local_max=10) my_per.fit_extreme_cdf(n_bootstrap=100, n_frequencies=100) per_levels = my_per.get_fap(np.asarray([0.05, 0.01, 0.001])) SNR=0.0, red_noise_var=0.25

Results - Performance: The % of cases were relative error is below tol Confidence: The average significance at f = f0 No outliers, red_noise_ratio = 0.0, i.e. Noise is perfectly explained by uncertainties 10 random time vectors, 100 noise realizations

Results - Performance: The % of cases were relative error is below tol Confidence: The average significance at f = f0 No outliers, red_noise_ratio = 1/8 10 random time vectors, 100 noise realizations

Results - Performance: The % of cases were relative error is below tol Confidence: The average significance at f = f0 No outliers, red_noise_ratio = 1/4 10 random time vectors, 100 noise realizations

VISTA Variables of the Via Lactea (VVV) ESO survey. Most measurements in the K band (near infrared) using 7 apertures. Public Survey. Study the structure of the Galactic bulge and the origin of our galaxy.

VISTA Variables of the Via Lactea (VVV) ESO survey. Most measurements in the K band (near infrared) using 7 apertures. Public Survey. Study the structure of the Galactic bulge and the origin of our galaxy. F. Gran et al 2016: - 1,019 RRab Light curves - Fields b201 - b228 (~47 sq. deg) - Detected with AoV, corrected manually

VISTA Variables of the Via Lactea (VVV) Method 1. Grab light curves (LC) from fields b201 b228 (~47 sq. deg) 2. Discard LC with chi2 < 2.0 and N<30 3. Discard LC with per. confidence < th 4. Create features 5. Semi-supervised PU classification

Analysis of VVV periodic variables First N light curves sorted in periodicity confidence - Amount of reported RRL out of this set (lost RRL) - Relative error of the detected periods vs reported period 10,000 20,000 50,000 Lomb Scargle 154 / 0.019 66 / 0.025 32 / 0.043 Generalized LS 554 / 0.011 94 / 0.030 11 / 0.038 WMCC periodogram 75 / 0.007 54/ 0.011 6 / 0.018

Analysis of VVV periodic variables

Analysis of VVV periodic variables

Analysis of VVV periodic variables

Analysis of VVV periodic variables

Semi-supervised and PU Learning - Semi-supervised classification - Manifold assumption - Clustering assumption - Low-density separation - Self-Learning/Graph-based/Avoiding changes in dense regions [1] - >10,000 unlabeled periodic light curves ~1,000 labeled RRab (positive class) No other survey to crossmatch Positive/Unlabeled (PU) scenario Images taken from wikipedia [1] X. Zhu, Semi-supervised Learning Literature Survey, 2005 (online, public)

Efficient SS/PU Learning Bagging PU [1] (transductive version) 1. 2. 3. 4. Positive dataset (size NP), Unlabeled dataset (size NU) Do T bootstrap sets from unlabeled data (size K) Train T weak learners (NP+K), predict in OOB set (NU-unique[K]) Average OOB predictions No graph computation/few parameters/highly parallel/simple github.com/phuijse/bagging_pu [1] F. Mordelet and J.P Vert, A bagging SVM to learn from PU ensembles Pattern Recog. Letter, v. 37, 2014 [2] M. Claesen et al. A robust ensemble approach to learn PU Data using SVM, Neurocomputing, 2014 [3] M. Claesen et al. Assesing binary classifiers using only PU data, Neurocomputing, 2015

Analysis of VVV periodic variables L.J.P. van der Maaten and G.E. Hinton, Visualizing High-Dimensional Data Using t-sne, JMLR, 2008

Analysis of VVV periodic variables L.J.P. van der Maaten and G.E. Hinton, Visualizing High-Dimensional Data Using t-sne, JMLR, 2008

Analysis of VVV periodic variables [1] F. Mordelet and J.P Vert, A bagging SVM to learn from PU ensembles Pattern Recog. Letter, v. 37, 2014 [2] M. Claesen et al. A robust ensemble approach to learn PU Data using SVM, Neurocomputing, 2014 [3] M. Claesen et al. Assesing binary classifiers using only PU data, Neurocomputing, 2015

Pbb in [0.95, 1.00] 132 lc

Pbb in [0.85, 0.95] 101 lc

Pbb in [0.65, 0.85] 102 lc

Pbb in [0.50, 0.65] 82 lc

Conclusions and Future work - Periodicity detection based on information theoretic functionals are more precise and less sensitive to FP New set of VVV RR Lyrae candidates to confirm, and more fields to run Compare with more periodicity detection methods (C. Entropy, AoV, PDM), test different features (FATS) and PU/SS methods Test other surveys (Pan-STARRS, CTRS, synthetic LSST lc) Improve computational implementations LINKS: pypi.python.org/pypi/p4j github.com/phuijse/p4j github.com/phuijse/bagging_pu