Gaia Data Processing - Overview and Status Anthony Brown Leiden Observatory, Leiden University brown@strw.leidenuniv.nl
Teamwork to deliver the promise of Gaia 10+ years of effort 450 scientists and engineers 160 institutes 24 countries and ESA Six data processing centres Support from national space and funding agencies (MLA) and ESA Tokyo - 2016.12.06-2/35
DPAC/ESA responsibilities ESA Flight control, up/downlink, ground stations Deliver telemetry to Gaia Science Operations Centre (at ESAC) Payload configuration and commanding Hosting and running Gaia Archive DPAC All other processing Gaia Archive requirements and facilities Validation of processed data before publication ESA/DPAC Payload health monitoring and configuration optimization Unpacking telemetry, Initial Data Treatment (IDT) and First Look (FL) Astrometric Global Iterative Solution (AGIS) processing Main Database (MDB) hosting and running System architecture (data models, GaiaTools) Coordination/project management Tokyo - 2016.12.06-3/35
What we deliver Gaia DR1 Positions + G magnitude for 1.1 billion sources, parallaxes and proper motions for 2 million Tycho-2 stars, RR Lyrae and Cepheids around LMC Gaia DR2 Radial velocities for bright stars, (G BP G RP ), 5-parameter astrometry (α, δ, ϖ, µ α, µ δ ) Gaia DR3 Full astrometry, orbital solutions for short period binaries, (G BP G RP ), BP/RP Spectrophotometry and astrophysical parameters, radial velocities, RVS spectra Gaia DR4 More sources, source classifications, multiple astrophysical parameters, variable star solutions and epoch photometry for them, solar system results, binary orbital solutions Gaia DR5 Everything, including epoch photometry/astrometry/spectra Each release updates previous one Increasing numbers, accuracy, richness Photometric and solar system alerts throughout mission lifetime Tokyo - 2016.12.06-4/35
How big is Gaia data? Astrometry/ Photometry 1 billion sources 70 observations on average, 12 CCD images per observation Spectroscopy 100 million sources 40 observations on average, 3 CCD spectra per observation 10 12 images over 5 years 500 million images/day 10 10 spectra over 5 years 5 million spectra/day Telemetry 40 GB/day on average up to 100 TB end of mission Main data base 3.5 TB/month up to 1 PB end of mission Not so big in terms of number of sources and data volume However the data processing is complex data interwoven in space and time: cannot split up processing according to sky-area self-calibration introduces complex dependencies time-variability interdependence of processing for the different instruments each bit counts... Estimated computational effort 10 20 10 21 Flops Tokyo - 2016.12.06-5/35
Data processing overview (simplified)
Data processing flows Upstream Downstream Telemetry ESOC CU5 Photometric processing DPCI CU4 Complex object processing DPCC CU3 Initial Data Treatment First Look DPCE/DPCT CU3 Astrometric core processing DPCE/DPCT Main Data Base DPCE CU7 Variability analysis DPCG CU1 System/IT architecture CU2 Simulations DPCB/DPCC CU6 Spectroscopic processing DPCC CU3 Intermediate Data Update DPCB CU8 Astrophysical characterization DPCC Alerts Transients, new SSOs,... DPCI/DPCC CU9 Archive and Catalogue access ESDC 1 Tokyo - 2016.12.06-7/35
Data processing flows Source magnitude Source Colour Raw data Flux PhotPipe Calibrations PSF model Accurate, colour dependent Pre-processing, source list creation Source parameters Attitude model Bias Image location Colour independent CTI free AGIS Background including stray light CTI terms Source parameters α, δ, ϖ, µα, µδ Attitude model Accurate, including µ-meteoroids and µ-clanks Calibrations: geometric, basic angle, global Tokyo - 2016.12.06-8/35
Data processing flows RVS telemetry RVS processing Source brightness and colours BP/RP spectra Radial velocities (per epoch) Spectra (per epoch) AGIS and PhotPipe Epoch photometry Astrometry Epoch astrometry Non-single stars, Solar system, Extended objects Source astrophysical characterization Source classification and astrophysical parameters attitude model Orbital solutions, asteroid orbits and taxonomy, extended object parameters Variable source processsing Variable source classification and characterization Tokyo - 2016.12.06-9/35
Data processing flows Main database Data quality filtering Filtered data Ingestion into Archive Gaia Archive Make release Additional data filtering criteria Global validation (Partial) reprocessing Tokyo - 2016.12.06-10/35
Why do higher level processing? Characterization of over 1 billion sources needed to enhance value of astrometry source classification and astrophysical parameterization variable source classification and characterization complex astrometric solutions for non-single stars solar system object orbits and taxonomy extended source identification and characterization Effort too large for individuals or groups in astronomical community lack of interest and motivation to tackle all Gaia sources Higher level processing is our strongest internal data quality control Tokyo - 2016.12.06-11/35
Challenges faced by DPAC
Scheduling Complex inter-dependencies Many feedback loops Many processes cannot run in parallel dependent on input from upstream processes Need to plan for processing time data transfer time data arrangement before processing output validation after processing global validation as early as possible... Solutions Extensive operations planning before launch End-to-end tests and Operations rehearsals Close monitoring and coordination by Project Office Flexibility to accommodate unanticipated problems Tokyo - 2016.12.06-13/35
Data processing and coordination Keeping up with daily telemetry (typically 60 million transits/day) data not ordered, arrival times up to two weeks after observation, not all inputs readily available Initial Data Treatment and First Look in-depth analysis feedback to spacecraft operations Tokyo - 2016.12.06-14/35
Data processing and coordination Data complexity, including unanticipated features need to adapt software during operations Software and IT infrastructure development and testing timescales often underestimated mix of astronomers and software engineers essential, but can also lead to communications problems Lots of time spent on understanding and preparing data ordering; are all necessary auxiliary data available?; adapt to data model changes Validation at both CU level and at global level very time consuming Data volume and data transfers volume estimates converging but not perfect yet transfers time scales between DPCs significant Data accounting Tokyo - 2016.12.06-15/35
Data processing and coordination Dealing with events in parallel to operations bugs, performance issues reprocessing of data to fix errors testing, rehearsals, deployment new/updated software DPAC workload is very high much research and development still needed stress within stretched DPAC teams Tokyo - 2016.12.06-16/35
Data processing and coordination MLA Steering Commmittee Mission/Project manager DPACE Project Scientist DPAC Project Office Gaia Science Team CU1 System architecture CU: Coordination Unit CU2 Simulations CU3 Core processing CU4 Object processing CU5 photometric processing CU6 Spectroscopic processing CU7 Variability processing CU8 Astrophysical parameters CU9 Archive and catalogue DPCE ESAC DPCB Barcelona DPCT Torino DPCI Cambridge DPCG Geneva DPCC CNES DPC: Data Processing Centre Coordination of 450 members and 6 processing centres many interface issues to monitor and resolve frequent communication along varied channels essential Tokyo - 2016.12.06-17/35
Data processing and coordination Coordination of 450 members and 6 processing centres many interface issues to monitor and resolve frequent communication along varied channels essential Tokyo - 2016.12.06-18/35
Data processing examples
Initial data treatment and core calibrations Monitoring of image location results (here formal errors along scan) Figure courtesy DPAC/CU3/IDT team Tokyo - 2016.12.06-20/35
Initial data treatment and core calibrations PSF mapping across the astrometric focal plane PSF/LSF modelling for high accuracy image location FOV1 Figures courtesy DPAC/CU5/DU10@IfA Tokyo - 2016.12.06-21/35
Initial data treatment and core calibrations PSF mapping across the astrometric focal plane PSF/LSF modelling for high accuracy image location FOV2 Figures courtesy DPAC/CU5/DU10@IfA Tokyo - 2016.12.06-22/35
Radiation damage monitoring Solar flare Solar flare Credits: C. Crowley, R. Kohley Radiation damage and CTI evolution monitored through charge injections Extrapolation indicates end-of-life radiation damage significantly lower (factor 10) than pre-launch predictions Tokyo - 2016.12.06-23/35
Photometric throughput monitoring and calibration Credits: CU3/FL - CU5/DPCI Tokyo - 2016.12.06-24/35
Source list creation Spurious sources example Grouping of transits and linking to sources Cross-match input source distribution Cross-match output source distribution Credits: CU3 IDT and IDU teams Tokyo - 2016.12.06-25/35
Science Alerts Credits: DPAC/Science Alerts team Over 1000 transient events so far dominated by supernovae and cataclysmic variables (lens example above, http://www.cosmos.esa.int/web/gaia/iow_20161027) Tokyo - 2016.12.06-26/35
Solar system alerts Credits: CU4/SSO team SSO alert on known asteroid GAIA1158 = (126970) 2002 FZ 21. Gaia first observation Apr 12, follow-up on Apr 28 with C2PU Telescope Tokyo - 2016.12.06-27/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-28/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-29/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-30/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-31/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-32/35
Toward BP/RP spectrophotometry Three step internal calibration of BP/RP spectra 1. geometric calibration (alignment of spectra) 2. differential flux calibration 3. derive mean spectrum To be followed by wavelength and absolute flux calibration Credits: CU5/DPCI team Tokyo - 2016.12.06-33/35
Radial velocity spectrograph First complete run of global RVS pipeline Preliminary validation results are very promising Figure shows residuals of v rad with respect to 4 data sets Precision at bright end around 500 m s 1 and at faint end 2.5 km s 1 Around G RVS 8 RVS three times as precise as RAVE (CS15 data set) Credits: CU6 team Tokyo - 2016.12.06-34/35
DPAC and JASMINE programme Space astrometric missions will be needed in the future maintenance reference frame pursuing of scientific questions at comparable or higher accuracy than Gaia Options are being considered in Europe Theia: targeted mission, 50 nano-arcsec GaiaNIR: near-infrared version of Gaia No guarantee that future space astrometry missions will be done by ESA Given space mission time scales, maintaining expertise is a major issue JASMINE programme therefore very welcome DPAC ready to continue collaboration to mutual benefit Tokyo - 2016.12.06-35/35