Data Replication in LIGO

Similar documents
Computing Infrastructure for Gravitational Wave Data Analysis

Laser Interferometer Gravitational-Wave Observatory (LIGO)! A Brief Overview!

LIGO Grid Applications Development Within GriPhyN

Contents. References... 23

LIGO containers in diverse computing environments

How the detection happened

Supporting Collaboration for the Gravitational Wave Astronomy Community with InCommon

Identity Management for the LIGO Project

DEVELOPMENT OF A DMT MONITOR FOR STATISTICAL TRACKING OF GRAVITATIONAL-WAVE BURST TRIGGERS GENERATED FROM THE OMEGA PIPELINE

LIGO s continuing search for gravitational waves

Collaborating on Mega Science Facilities

Work of the LSC Pulsar Upper Limits Group (PULG) Graham Woan, University of Glasgow on behalf of the LIGO Scientific Collaboration

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey

THE WASHINGTON COASTAL ATLAS

Broad-band CW searches in LIGO and GEO S2 and S3 data. B. Allen, Y. Itoh, M.A. Papa, X. Siemens. AEI and UWM

Overview Ground-based Interferometers. Barry Barish Caltech Amaldi-6 20-June-05

LIGO IdM Update. Scott Koranda 11th FIM4R Workshop Montreal 18 Sep 2017

Gravitational-Wave Astronomy - a Long Time Coming Livia Conti, for the Virgo Collaboration Fred Raab, for the LIGO Scientific Collaboration

Production Line Tool Sets

Target and Observation Managers. TOMs

Spatial Analysis in CyberGIS

Innovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager

Distributed Data and Grid Computing for Inquiry based Science Education and Outreach

Searching for Gravitational Waves from Binary Inspirals with LIGO

Introduction to ArcGIS Server Development

Featured Alumna Sarah Caudill ( 06)

Gravitational Wave Detectors: Back to the Future

The state of MANOS and asteroid.lowell.edu. Brian Burt - Lowell Observatory Hotwired 5

Memorandum of Understanding between XXXXX and LIGO and VIRGO regarding follow-up observations of gravitational wave event candidates

Extending a Plotting Application and Finding Hardware Injections for the LIGO Open Science Center

N E W S R E L E A S E. For Immediate Release February 11, 2016

The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center.

The File Geodatabase API. Craig Gillgrass Lance Shipman

Management of Geological Information for Mining Sector Development and Investment Attraction Examples from Uganda and Tanzania

Karsten Vennemann, Seattle. QGIS Workshop CUGOS Spring Fling 2015

RESEARCG ON THE MDA-BASED GIS INTEROPERABILITY Qi,LI *, Lingling,GUO *, Yuqi,BAI **

Fermilab Experiments. Daniel Wicke (Bergische Universität Wuppertal) Outline. (Accelerator, Experiments and Physics) Computing Concepts

Geog 469 GIS Workshop. Managing Enterprise GIS Geodatabases

Portal for ArcGIS: An Introduction

Big Computing in High Energy Physics. David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy

Comparison of band-limited RMS of error channel and calibrated strain in LIGO S5 data

LIGO Status and Plans. Barry Barish / Gary Sanders 13-May-02

MapOSMatic: city maps for the masses

MapOSMatic, free city maps for everyone!

Time Domain Astronomy in the 2020s:

Finding Black Holes with Lasers

Bringing GIS to the Front Lines Author- Mike Masters Co-Author James Kelt

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

N E W S R E L E A S E. For Immediate Release 11 February 2016, 07:30 PST

on space debris objects obtained by the

Overview of Geospatial Open Source Software which is Robust, Feature Rich and Standards Compliant

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Astrophysical Source Identification and Signature (ASIS) Group: Implementation for LIGO-I

Search for inspiralling neutron stars in LIGO S1 data

Introduction to Portal for ArcGIS

Integrated Cheminformatics to Guide Drug Discovery

GIS Integration to Maximo

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Road Ahead: Linear Referencing and UPDM

Queuing Networks. - Outline of queuing networks. - Mean Value Analisys (MVA) for open and closed queuing networks

Applying Machine Learning for Gravitational-wave Burst Data Analysis

Leveraging Web GIS: An Introduction to the ArcGIS portal

A Career to Manage: Research Skills & Evolving Needs of the Job Market. When Marie Skłodowska -Curie met Albert Einstein.

Enabling ENVI. ArcGIS for Server

Gravitational Wave Detection from the Ground Up

arxiv:gr-qc/ v1 4 Dec 2003

These modules are covered with a brief information and practical in ArcGIS Software and open source software also like QGIS, ILWIS.

It s about time... The only timeline tool you ll ever need!

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

Administering your Enterprise Geodatabase using Python. Jill Penney

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

Discovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager

Experiences and Directions in National Portals"

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

Spatial Analytics Workshop

Practical teaching of GIS at University of Liège

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL) Subject: production and operations management

Arboretum Explorer: Using GIS to map the Arnold Arboretum

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

ESRI Survey Summit August Clint Brown Director of ESRI Software Products

Chemistry Informatics in Academic Laboratories: Lessons Learned

Environmental Chemistry through Intelligent Atmospheric Data Analysis (EnChIlADA): A Platform for Mining ATOFMS and Other Atmospheric Data

CMS: Priming the Network for LHC Startup and Beyond

CAPE FARM MAPPER - an integrated spatial portal

Visualization of Antenna Pattern Factors via Projected Detector Tensors

Using the LEAD Portal for Customized Weather Forecasts on the TeraGrid

Using OGC standards to improve the common

Creation of an Internet Based Indiana Water Quality Atlas (IWQA)

A National Scale Infrastructure Database and Modelling Environment for the UK

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

How does ArcGIS Server integrate into an Enterprise Environment? Willy Lynch Mining Industry Specialist ESRI, Denver, Colorado USA

One Optimized I/O Configuration per HPC Application

GEON: Assembling Maps on Demand From Heterogeneous Grid Sources

arxiv:astro-ph/ v1 15 Sep 1999

Gravitationswellen. Wirkung auf Punktmassen. für astrophysikalische Quellen h ~ h L L

Hardware Burst Injections in Pre-S2 S2

The Architecture of the Georgia Basin Digital Library: Using geoscientific knowledge in sustainable development

TOWARDS THE DEVELOPMENT OF A MONITORING SYSTEM FOR PLANNING POLICY Residential Land Uses Case study of Brisbane, Melbourne, Chicago and London

Overview. Everywhere. Over everything.

One platform for desktop, web and mobile

Transcription:

Data Replication in LIGO Kevin Flasch for the LIGO Scientific Collaboration University of Wisconsin Milwaukee 06/25/07 LIGO Scientific Collaboration 1

Outline LIGO and LIGO Scientific Collaboration Basic Data Challenge Specific Problems & Challenges LDR LSCdataFind Successes Warts Future of LIGO Future of LDR 06/25/07 LIGO Scientific Collaboration 2

LIGO, LIGO science Facility dedicated to detection and use of cosmic gravitational waves Two sites: Livingston, LA and Hanford, WA 4 km LIGO interferometer in Livingston, LA Three interferometers: Two in Hanford, one in Livingston Partnership with Virgo (Italy and France) and GEO (Germany and the United Kingdom) LIGO is supported by the NSF 06/25/07 LIGO Scientific Collaboration 3

LIGO Scientific Collaboration The LIGO Scientific Collaboration (LSC) currently includes 428 people at 52 different institutions; data replication mainly occurs at Caltech, MIT, interferometer sites Livingston and Hanford, UWM, Penn State, Albert Einstein Institute (Germany), Cardiff (UK), Birmingham (UK) Birmingham Cardiff AEI/Golm 06/25/07 LIGO Scientific Collaboration 4

Basic Data Challenge Basic issue is to distribute approx. one TB raw data / day to all sites» Data is continually generated at both interferometer sites (LLO and LHO) during science runs long periods of uninterrupted data collection ; current is S5 and has lasted over a year and a half» Caltech (CIT) retrieves the data from the LHO and LLO sites and provides access to it for Tier 2 sites (all sites besides CIT, LLO and LHO)» Tier 2 sites replicate from CIT or other sites that have already transferred desired data» Processed data sets (e.g., filtered or calibrated) are occasionally created at various sites. They are initially replicated from the site of origin. 06/25/07 LIGO Scientific Collaboration 5

Metadata Service Specific Problems & Challenges» Require all data to be described in some fashion by a specific metadata schema» Metadata must be generated continually during a science run» Must be able to distribute metadata constantly and consistently to each site that needs it» Example of some metadata fields gpsstart: 815497955 (seconds since beginning of GPS epoch) gpsend: 815498048 runtag: S5 frametype: H1_RDS_C03_L2 md5: 28329c0eee60dbbde352a1ba94bca61f 06/25/07 LIGO Scientific Collaboration 6

Specific Problems & Challenges Storage of data» Each site has their own in house storage solution most have some configuration of commodity hard disk drives, CIT uses SAM QFS (disk and tape) local filesystems and layout may differ as well, for example: UWM uses 24 NFS mounted storage servers Cardiff stores on 100 compute nodes CIT has one large filesystem with SAM QFS» Must provide a way for administrators to store incoming data on their systems in a customizable way 06/25/07 LIGO Scientific Collaboration 7

Specific Problems & Challenges Data is not distributed equally» Sites must be able to pick and choose what particular data they want to replicate Driven by users requests» Sites must be able to tell what specific data another site has in order to replicate what it itself needs Users need to locate and access data» Computing clusters at all sites; users may be at any one of them» Users must find which sites have the data they want» They must be able to locate and have their computing jobs able to locate the physical location of data at a certain site 06/25/07 LIGO Scientific Collaboration 8

LDR LDR Lightweight/LIGO Data Replicator was created to solve these problems» Lightweight: minimal code base wrapped around other services» LIGO: code is based around LIGO's needs» What data we have custom metadata service» Where data is located Globus RLS» Authenticated, fast data transfer custom GridFTP client, standard server» Ease of data transfer easy for administrators to pick and choose data to replicate and data to make available 06/25/07 LIGO Scientific Collaboration 9

LDR LDR runs at each site as a few separate daemons» LDRMaster : monitors other daemons» LDRSchedule: finds and schedules files for transfer» LDRTransfer: supervises transfer and storage of files» LDRMetadataServer: serves local metadata to other sites» LDRMetadataUpdate: updates local metadata database Relies on a few other important pieces: MySQL, Globus RLS (Replica Location Service), Globus GridFTP Server, pyglobus (python port of Globus Toolkit) 06/25/07 LIGO Scientific Collaboration 10

LDR Each site fulfills certain roles» some publish new data, some provide data, some replicate data (or any combination)» new data is published into metadata catalog and RLS for other sites to replicate Local storage» each site has its own storage solution» administrator modifies a local storage module to govern how incoming data will be stored and recorded functions like newholdingfile(), enterfile(), newfilecallback(), failedtransfercallback() 06/25/07 LIGO Scientific Collaboration 11

LDR and LSCdataFind Needed a way for users to easily find available data Work already done for LDR itself to find data to replicate to other sites, so a user tool was based on the LDR backend: LSCdataFind Uses a local RLS and Metadata service to allow users to specify characteristics about data they want (metadata fields like gpsstart, for example) and receive usable physical locations 06/25/07 LIGO Scientific Collaboration 12

LSCdataFind Example 06/25/07 LIGO Scientific Collaboration 13

Successes Replicated over 770 TB of raw and processed S5 data so far Reliable (good enough) transfer rates (10 15 MB/s CIT > UWM) Usable tool (LSCdataFind) for users to locate data at sites Small core development team Involved community Dependable, in production software! 06/25/07 LIGO Scientific Collaboration 14

Lag Plot for Data Transfer Plot of time delay of transfer of data from interferometer sites to CIT for further Tier 2 replication 06/25/07 LIGO Scientific Collaboration 15

No 24/7 reliability Warts» Issues coping with sites going down» Unintelligent backend doesn't determine best/other places to go Had issues with RLS reliability (problems addressed thanks to the RLS team!) Not very user/administrator friendly» Relies on learning much new terminology and software and support from the LSC community» Interface is clumsy and obfuscated 06/25/07 LIGO Scientific Collaboration 16

Future of LIGO Next data run S6 is slated to begin in June of 2009» LDR must be able to scale to amount of data it will need to track and replicate Enhanced and Advanced LIGO» Enhanced LIGO (S6) will increase the sensitivity of the interferometers» Advanced LIGO will greatly increase the sensitivity and therefore replication and storage requirements for all new data» Advanced LIGO will also likely involve increased demand for greater turnaround in specific data replication 06/25/07 LIGO Scientific Collaboration 17

Future of LDR Move Metadata daemons to WSRF compliant services, probably built on Globus Java WS core Integrate Lots Of Small Files / pipelined GridFTP» We replicate many big files, but increasingly more small files such as user processed ones ; pipelining will help us maintain good transfer rates Improve monitoring by leveraging Globus MDS 4 Investigate integrating Globus RFT and Globus DRS Focus on stability and scaling... 06/25/07 LIGO Scientific Collaboration 18

Metadata Scaling» about 17,800,000 files tracked at CIT currently; We have managed to continue scaling our metadata services to this point» Starting to feel strain and will need to cope with scaling much higher for S6 Data transfer» Current data rates are acceptable and will continue to be» No worries about scaling with GridFTP; only limitation is network User demands» Currently, we are able to handle user requests for data location» Expect more users, more queries and faster expected response time 06/25/07 LIGO Scientific Collaboration 19

Credits Current Development Team» Stuart Anderson, Gerald Davies, Kevin Flasch, Filippo Grimaldi, Steffen Grunewald, Ben Johnson, Scott Koranda, Dan Kozak, Greg Mendel, Brian Moe, Murali Ramsunder, David Stops, Igor Yakushin Alumni» Bruce Allen, Paul Armor, Keith Bayer, Patrick Brady, Junwei Cao, Mike Foster, Tom Kobialka, Adam Mercer More information» LIGO: http://www.ligo.caltech.edu/» UWM LSC: http://www.lsc group.phys.uwm.edu/» LDR: http://www.lsc group.phys.uwm.edu/ldr/ 06/25/07 LIGO Scientific Collaboration 20