Applying Machine Learning for Gravitational-wave Burst Data Analysis Junwei Cao LIGO Scientific Collaboration Research Group Research Institute of Information Technology Tsinghua University June 29, 2016 jcao@tsinghua.edu.cn http://ligo.org.cn
Outline Introduction to Tsinghua Group LIGO Data & Computing Infrastructure Real-time / low-latency GW Burst Search» Applying machine learning for burst veto analysis Future Work Entering the era of GW astronomy jcao@tsinghua.edu.cn http://ligo.org.cn
Our Group The only LSC member group in mainland China, including 3 faculty members GW burst data analysis and computing infrastructure Also involved in KAGRA, AIGO and ASTROD With close collaboration with MIT, Caltech and UWA jcao@tsinghua.edu.cn http://ligo.org.cn
LIGO Data Data are comprised of:» Gravitational Wave channel (AS_Q)» Physical Environment Monitors» Internal Engineering Monitors Multiple data products beyond raw data» RDS_R_L1, including both GW and environmental channels» RDS_R_L3, including only AS_Q channel L-RDS_R_L3-751658016-16.gwf Site (H,L) jcao@tsinghua.edu.cn http://ligo.org.cn Data Type Duration GPS Start Time
Challenges in AdvLIGO Era Larger Data Volume jcao@tsinghua.edu.cn http://ligo.org.cn, July 29, 2010 Cite from LIGO-G0900008
The LSC Data Grid (LDG) Birmingham Cardiff AEI/Golm jcao@tsinghua.edu.cn http://ligo.org.cn
Real-time Search Real-time: between online and offline mode for largescale data analysis Online Monitoring Data Streams On-site Real-time Search Data Streams+ Data Production On-site+ Off-site Offline Analysis Data Production Off-site jcao@tsinghua.edu.cn http://ligo.org.cn
Motivation Prompt E/M follow-up by LIGO s external collaborators Detect astronomy events earlier than traditional observation methods Increase the confidence of the GW candidate event Obtain more information about GW candidate event and its source: more accurate sky position, distance, Rapid detector characterization jcao@tsinghua.edu.cn http://ligo.org.cn
An Example Implementation KleineWelle jcao@tsinghua.edu.cn http://ligo.org.cn, October 28 2009 LIGO Document ID: LIGO-G0900981
Existing Veto Methods hveto: uses Poisson distribution to evaluate the coincidence significance for all auxiliary channels for a number of thresholds and time-windows UPV: finds time-coincident triggers between the GW channel and an auxiliary channel within a timewindow according to a series of self-defined metrics, used percentages Basically event-by-event methods jcao@tsinghua.edu.cn http://ligo.org.cn
Multivariate Veto Study the correlation between a GW channel trigger and adjacent triggers from auxiliary channels jcao@tsinghua.edu.cn http://ligo.org.cn
jcao@tsinghua.edu.cn http://ligo.org.cn Veto Approach The veto process can be considered as a classification problem of instrument status:» The input of the classifier is the combination of properties of all coincident AUX/ENV triggers at the time t i, assuming there is a GC trigger at the time t i.» The output of the classifier is whether the instrument is fault or not. This is definitely a GW signal If yes, the GC trigger is a glitch; If not, the GC trigger is a GW signal We want to know if this is a signal or a glitch GC 1 1 0 0 AUX1 0 1 0 0 AUXn 0 0 1 0 ENV1 0 0 1 0 ENVm 0 1 0 0 No signal and no glitch either! No instrument faults! perfect for training the classifier
13 1. Artificial neural networks 2. Random forests Some "Classic" classifiers from machine learning 3. Support Vector Machines (SVM)
14 Artificial Neural networks 1 coefficient for each input (linear combination) Function of the linear combination Example: Output > 0 = glitch predicted Input values = aux. channel vector
15 Random forests Based on decision trees. Predictions of multiple trees combined. Auxiliary channel vector Test on an auxiliary channel coordinate = auxiliary channel measurement Glitch / clean prediction
16 Support Vector Machines (SVM) Maps input vectors to a higher dimensional space and separate them with a (hyper) plane. Glitches Separation hyper-plane Clean samples
17 Prediction threshold between glitches and signals The prediction of a classifier is typically a number. Use of a threshold for separating glitches from signals. signals glitch Classifier prediction How to set the threshold? If too many glitches are predicted, we may not look often enough for gravitational waves. If too many clean predictions (signals) are made, then we analyze background noise too often.
True glitch rate 18 The receiver operating characteristic (ROC) The user of the classifier can choose the false glitch rate that he finds acceptable. The true glitch rate is given by the ROC, obtained by varying the glitch/signal threshold. "Every disturbance is a glitch" Glitch samples correctly predicted as glitches ROC Signals incorrectly predicted as glitches "Every disturbance is a signal (i.e. not a glitch)" False glitch rate
Observed performance for the gravitational wave channel veto at LIGO Performance of machine learning (ANN, RF, SVM): as good as the performance of the best designed algorithm but machine learning gives an automatic GW channel veto algorithm!
jcao@tsinghua.edu.cn http://ligo.org.cn
jcao@tsinghua.edu.cn http://ligo.org.cn Beijing GW Workshop
Entering the era of GW Astronomy There could be GW signals at the same level of noises. More machine learning methods, e.g. deep learning, could be applied for actual GW signal discoveries. 输出层 隐层 输入层 jcao@tsinghua.edu.cn http://ligo.org.cn