SNLS supernovae photometric classification with machine learning

SNLS supernovae photometric classification with machine learning Courtesy of D. Campbell-Wilson Anais Möller CAASTRO, Australian National University Statistical Challenges in Astronomy, Carnegie Mellon University June 2016

SNe Ia cosmology Type Ia SNe very luminous homogeneous spectral and photometric properties characteristic light curves -> we assume reproducible luminosities SNIa cosmology advantages: Betoule et al. 2014 orthogonal constraints to CMB and BAO sensitive to dark energy properties 2

SNIa cosmology challenges we want : - more SNe Ia - precise measurements e.g. improving systematics. - host galaxy - modelling / templates - e.g. JLA Betoule et al. 2014 more: not enough resources for spectroscopy (DES,LSST) -> photometric classification

SN photometric classification challenges how to classify?: - features: some type of fitter - redshift: photometric/ host-galaxy spectroscopic - classification algorithm we need to trust our sample: - high purity sample - without sacrificing stats (efficiency) from simulations to data from a classified sample to cosmology the unexpected

SUPERNOVA PHOTOMETRIC CLASSIFICATION CHALLENGE Richard Kessler, 1,2 Alex Conley, 3 Saurabh Jha, 4 Stephen Kuhlmann 5 Challenge Released on Jan 29, 2010. Last update: September 24, 2013 ABSTRACT We have publicly released a blinded mix of simulated SNe, with types (Ia, Ib, Ic, II) selected in proportion to their expected rate. The simulation is realized in the griz filters of the Dark Energy Survey (DES) with realistic observing conditions (sky noise, point spread function and atmospheric Participants Abbreviation a Classified +Z b /noz c SN z ph d CPU e Description (strategy class) f P. Belov and S. Glazov.... Belov & Glazov Yes/no No 90 Light-curve χ 2 test against Nugent templates (2) S. Gonzalez... Gonzalez Yes/yes No 120 Cuts on SiFTO fit χ 2 and fit parameters (1) InCA group g... InCA No/yes No 1 Spline fit and nonlinear dimensionality reduction (4) JEDI group h... JEDI-KDE Yes/yes No 10 Kernel density evaluation with 21 parameters (4) JEDI group... JEDI boost Yes/yes No 10 Boosted decision trees (4) JEDI group... JEDI-Hubble Yes/no No 10 Hubble-diagram KDE (3) JEDI group... JEDI combo Yes/no No 10 Boosted decision trees and Hubble KDE (3 & 4) MGU+DU group I... MGU+DU-1 No/yes No <1 Light-curve slopes and neural network (2) MGU+DU group... MGU+DU-2 No/yes No <1 Light-curve slopes and random forests (2) Portsmouth group j... Portsmouth χ 2 Yes/no No 1 SALT-II χ 2 r and false discovery rate statistic (1) Portsmouth group..... Portsmouth-Hubble Yes/no No 1 Deviation from parametrized Hubble diagram (3) D. Poznanski..... Poz2007 RAW Yes/no Yes 2 SN automated Bayesian classifier (SN ABC) (2) D. Poznanski..... Poz2007 OPT Yes/no Yes 2 SN-ABC with cuts to optimize C FoM-Ia (2) S. Rodney... Rodney Yes/yes Yes 230 SN ontology with fuzzy templates (2) Asimpleandrobustmethodforautomatedphotometric classification of supernovae using neural networks N.V. Karpenka 1,F.Feroz 2 and M.P. Hobson 2 1 PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING Michelle Lochner 1,JasonD.McEwen 2,HiranyaV.Peiris 1,OferLahav 1, Max K. Winter 1 1

SNLS based on the Canada France Hawaii Telescope MegaCam : 36 CCD mosaic 4 broadband filters g,r,i,z 4 fields of 1 square degree rolling search mode spectroscopic follow up (Keck, Gemini and VLT) observations: 2003-2008 the SuperNova Legacy Survey - SNLS3 analysed and published - SNLS5 currently being processed (complete SNLS data set)

SNLS3 contains large spectroscopically and photometrically identified SNe samples Entries 45 40 35 Entries Mean RMS Entries Mean RMS 310 0.86 0.22 175 0.60 0.21 30 25 20 Unidentified sample Identified sample 15 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Host photometric redshift

SNLS pipelines standard SNLS approach e.g. Astier et al. 2005, Guy et al. 2010, used in JLA Betoule et al. 2015 real time detection, uses spectroscopy for typing and redshift 0.15 < z <1.1 deferred photometric pipeline developed in the SNLS Saclay group (France) no spectroscopy required differed detection of all kinds of transient events larger number of detections larger redshift coverage sensitive to other SN types G. Bazin et al. A&A 534, A43 (2011) Photometric selection of Type Ia supernovae in the Supernova Legacy Survey. G.Bazin et al. A&A 499, Issue 3,(2009) The core-collapse rate from the Supernova Legacy Survey.

SNLS photometric pipeline pre-processed CFHT images photometry SNe Ia transient events photometry SN-like photometry core-collapse SNe Preselection cuts (SN-like): -One significant flux variation SN-like variation Star rejection -Quality cuts on light curves SNLS3 original pipeline 300,000 -> 1,500

SN classification SN-like SNe Ia good z bad z How? 1. extract information: A. light-curve features B. redshift 2. classification strategy core-collapse SNe I. sequential cuts with host-galaxy photometric z & SALT2 G. Bazin et al. A&A 534, A43 (2011) II. supervised learning with SN z & general fitter Möller et al. in preparation (2016) III. supervised learning with host-galaxy photometric z & SALT2

method I G. Bazin et al. A&A 534, A43 (2011) How: 1. extract information: A. light-curve features SALT2: C, x1, magnitudes,chi 2. B -19-18 -17-16 -10 0 10 20 30 40 phase B. redshift host-galaxy photometric: Ilbert et al. 2006: 2. selection cuts: 520,000 galaxies with an AB magnitude brighter than 25 in im. ~4% resolution, catastrophic assignment ( z/(1 + z) > 0.15 ): 3.7% V R -19-18 -17-16 -19-18 -17-16 -10 0 10 20 30 40 phase -10 0 10 20 30 40 phase Guy et al. 2008 - sampling - poor fit 3. classification strategy: sequential cuts

method I photometric host galaxy z + SALT2 + sequential cuts G. Bazin et al. A&A 534, A43 (2011) SNLS3 data spectroscopic SNe Ia spectroscopic CC simulated SNe Ia

method I photometric host galaxy z + SALT2 + sequential cuts Simulation purity contamination e ciency SNe Ia bad redshift SNe Ia core-collapse SNIa 94.4 ± 0.5% 0.65 ± 0.08% 4.9 ± 0.5% 29.9 ± 0.3% SNLS3 data # events # spectroscopic SNe Ia # spectroscopic CC # photometric CC 486 175 0 0 the core-collapse photometric sample was determined by Bazin et al. 2009 in a specific analysis.

supervised learning decision trees make successive rectangular cuts in the parameter space supervised classification methods simulations to train improving model: boosting: making the model more complex AdaBoost, XGBoost averaging methods: Random Forest, Bagging scikit-learn implementation

supervised learning redshift + features N-dimensional problem BDT response 1-dimensional problem

method II How: 1. extract information: A. light-curve features SALT2: C, x1, magnitudes and chi 2 of the fit. B. redshift host-galaxy photometric 2. selection cuts: - sampling - poor fit 3. classification strategy: XGB DT

method II host-galaxy photometric z, SALT2 fit, XGB efficiency 38.2 +- 0.7 purity 98.0 +- 0.3 Bazin et al. 2011 Gupta et al. 2016 BUT host-galaxy redshift assignment (zgal) efficiency is 83%

method III How: 1. extract information: A. redshift SN-redshift B. light-curve features general fitter Möller et al. (2016) in preparation 2. selection cuts (adapted to our analysis): - sampling - poor fit 3. classification strategy: supervised learning

SN redshift Palanque-Delabrouille et al. 2010 estimated directly from SN light curves. light curves are fitted using SALT2 searching minimisation of the reduced χ 2 1. for successive values of the redshift separated by δz = 0.1 with color and stretch set with gaussian priors. 2. setting the color and stretch parameters free and performing a new redshift scan around first step s redshift with the same step δz as above. set up on SNLS3 data ~2% resolution, catastrophic assignment ( z/(1 + z) > 0.15 ): 1.4%

general SN fitter SN photometric redshift algorithm uses SALT2. Advisable to use a light curve fitter which is: independent from SALT2 able to fit other types of SNe. e.g. the empirical model used for SNlike selection. f k = A k e (t tk 0 )/ k fall 1+e (t tk 0 )/ k rise + c k k= filter t k max = t k 0 + k riseln( k fall/ k rise 1)

method III data preliminary all events spectroscopic SNe Ia spectroscopic CC visually non SN-like visually long extreme z

method III How: 1. extract information: A. light-curve features general fitter B. redshift SN-redshift 2. selection cuts (adapted to our analysis): - sampling - poor fit 3. classification strategy: supervised learning - setting parameters - selecting features - crossvalidation using scikit-learn packages -Random Forest -AdaBoost -XGBoost (extreme gradient boosting). http://xgboost.readthedocs.org/

method III preliminary simulations

method III simulations preliminary we want to compare classified samples -> 95% purity

method III preliminary simulations data 95% purity AdaBoost Random Forest XGBoost total e ciency Ia 36.9 ± 0.6 32.4 ± 0.7 41.1 ± 0.7 purity Ia 95.6 ± 0.5 95.6 ± 0.4 95.3 ± 0.4 contamination Ia bad z 0.53 ± 0.09 0.29 ± 0.07 0.60 ± 0.09 contamination CC 3.9 ± 0.4 4.1 ± 0.4 4.0 ± 0.4 AdaBoost Random Forest XGBoost photometric sample 478 549 670 spectroscopic Ia 166 198 223 photometric Ia 318 364 444 spectroscopic CC 2 2 3 photometric CC 1 1 6

method III simulations

method III SN-z, general fit, XGB 98% purity data cut SNLS3 events spectroscopic photometric simulated in sample Ia CC Ia CC Ia% SN-like 1483 246 42 486 109 50 selected 1193 238 30 481 77 47 classified 529 205 1 374 1 35 simulations preliminary

method III preliminary simulations simulations data

take away how to classify? 3 DT algorithms different features purity and efficiency high purity from simulations to data 1st application to high-z data 95-98% purity, high efficiency - selection cuts necessary - unexpected sim->data - rely on training from a classified sample to cosmology - SNLS3 V. Ruhlmann-Kleider - and stats ML new questions SNIa diversity method to be applied for SNLS5