DataONE: Enabling Data-Intensive Biological and Environmental Research through Cyberinfrastructure

Similar documents
Computational Biology, University of Maryland, College Park, MD, USA

Scientific Data Flood. Large Science Project. Pipeline

Environmental Summit: Consolidated view from earth science to biodiversity

Spatial Data Infrastructure Concepts and Components. Douglas Nebert U.S. Federal Geographic Data Committee Secretariat

Spatially Enabled Society

From Research Objects to Research Networks: Combining Spatial and Semantic Search

Astroinformatics: massive data research in Astronomy Kirk Borne Dept of Computational & Data Sciences George Mason University

Millennium Ecosystem Assessment

Data Origin. Ron van Lammeren CGI-GIRS 0910

Data Origin. How to obtain geodata? Ron van Lammeren CGI-GIRS 0910

Cyberinfrastructure and CyberGIS: Recent Advances and Key Themes

Realizing benefits of Spatial Data Infrastructure A user s perspective from Environment Agency - Abu Dhabi

City Models, Metrics, and Future Scenarios: Basis for an Urban Genome Project?

INDONESIA S S UPDATE: GEOSPATIAL INFORMATION FOR SUSTAINABLE DEVELOPMENT RELATED TO INA SDI Dr. Asep Karsidi, M.Sc

Atlas of Living Scotland New data infrastructure for research and citizen science

Archaeology & Digital Humanities

the IRIS Consortium Collaborative, Multi-user Facilities for Research and Education Briefing NSF Business Systems Review September 9, 2008

June 19 Huntsville, Alabama 1

GEON: Geosciences Network

GeoMAPP: Using Metadata to Help Preserve Geospatial Content

The Atlas of Living Australia

Oak Ridge Urban Dynamics Institute

The Open Geospatial Consortium and EarthCube

13 June Board on Research Data & Information (BRDI)

INTEGRATION OF BIODIVERSITY DATABASES IN TAIWAN AND LINKAGE TO GLOBAL DATABASES

Louis-François Guerre, Planet Action Coordinator Copenhagen, December Spot Image 2007

Overview. Gulf of Mexico Alliance Ocean and Coastal Mapping Regional Ecosystem Data Management (REDM) Q2O (QARTOD to OGC) Things to Consider

Data Sharing and Accessibility

Overview. Project Background Project Approach: Content and Application Development Application Demonstration Future Developments

The Open Geospatial Consortium and EarthCube

Data Intensive Computing meets High Performance Computing

Observatories in the context of the Digital Continent: CZO s and HIS

Education Toolkits for the 21 rst Century Ecological Research

Welcome. C o n n e c t i n g

Introduction: The Gulf of Mexico Alliance. The Gulf GAME project MERMAid and PHINS Results & Conclusions What s next? Examples

Assessing state-wide biodiversity in the Florida Gap analysis project

Application of GIS Technology in Watershed-based Management and Decision Making

EuroGEOSS Protected Areas Pilot

Remote Sensing for Ecosystems

Part 1: Fundamentals

Integration for Informed Decision Making

OCEANOGRAPHIC DATA MANAGEMENT

Conference panel Session 98, Defining Data Professionals A Geospatial Industry Perspective on Becoming a Data Professional

Landscape Planning and Habitat Metrics

LandScan Global Population Database

Better Topographic Map Production Using ArcGIS. A Comprehensive Solution for Mapping Organizations

The future of SDIs. Ian Masser

NetRA Resources Assessment (NetRA) 1

The Role of Urban Planning and Local SDI Development in a Spatially Enabled Government. Faisal Qureishi

Basic Dublin Core Semantics

the map Redrawing Donald Hobern takes a look at the challenges of managing biodiversity data [ Feature ]

Open data integration: from satellite to UAV for protection of built environment. The archaeological site in the Centa River bed in Albenga

Manual of Digital Earth

Innovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager

Ontology Summit Framing the Conversation: Ontologies within Semantic Interoperability Ecosystems

ArcGIS. for Server. Understanding our World

An Introduction to Day Two. Linking Conservation and Transportation Planning Lakewood, Colorado August 15-16, 16, 2006

AAG Partnerships for. Sustainable Development in Africa: Geospatial Science & Technology for. Partnerships and Applications

Building a National Data Repository

TRAITS to put you on the map

CHAPTER 7 PRODUCT USE AND AVAILABILITY

1. Omit Human and Physical Geography electives (6 credits) 2. Add GEOG 677:Internet GIS (3 credits) 3. Add 3 credits to GEOG 797: Final Project

The Atlas Aspect of the Atlas of Living Australia

Key Points Sharing fosters participation and collaboration Metadata has a big role in sharing Sharing is not always easy

EnviroAtlas: An Atlas about Ecosystems and their Connection with People

Using Big Interagency Databases to Identify Climate Refugia for Idaho s Species of Concern

Big Data and Geospatial Cyberinfrastructure for Advancing Applications

Marine Spatial Planning: A Tool for Implementing Ecosystem-Based Management

Michael L. Norman Chief Scientific Officer, SDSC Distinguished Professor of Physics, UCSD

What s the problem? A Modern Odyssey in Search of Relevance. The search for relevance. Some current drivers for new services. Some Major Applications

Advancing Flood Detection and Preparedness through GEOSS Water Services

Discovery and Access of Geospatial Resources using the Geoportal Extension. Marten Hogeweg Geoportal Extension Product Manager

Roadmap to interoperability of geoinformation

Spatial Data Availability Energizes Florida s Citizens

Kansas Academic Standards Science Grade: 7 - Adopted: 2013

Open Data meets Big Data

Geoarchiving Kentucky State Government Data. Glen McAninch Kentucky Department for Libraries and Archives

The Current Status of EarthCube with an EarthScope Perspective. Tim Ahern IRIS Director of Data Services

Antarctic Marine Biodiversity Data Now Online

Introduction to Biosystematics. Course Website: Lecture 1: Introduction to Biological Systematics Outline: The role and value of Systematics

Colin Bray, OSi CEO. Collaboration to develop a data platform for geospatial and statistical information in Ireland

The Arctic Landscape Conservation Cooperative Conservation Goals

EuroGEOSS for Drought - Linking the European Drought Observatory to global and local scales

GIS Geographical Information Systems. GIS Management

Getting Started with Community Maps

NDIIPP: The National Digital Information Infrastructure and Preservation Program

Applied Geoscience and Technology Division SOPAC. Joy Papao, Risk Information Systems Officer

Introduction. Elevation Data Strategy. Status and Next Steps

VISualizing Terrestrial-Aquatic Systems (VISTAS)

Economic and Social Council

Data Dictionary for Network of Conservation Areas Transcription Reports from the Colorado Natural Heritage Program

Cross-border Maritime Spatial Plan for the Black sea - Romania and Bulgaria project

Reducing Consumer Uncertainty

Arctic Spatial Data Infrastructure Enabling Access to Arctic Location-Based Information

Plan4all (econtentplus project)

Ankur R Desai University of Wisconsin-Madison ESA 2018, SYMP 9 7 Aug 2018 New Orleans, LA

GIS at UCAR. The evolution of NCAR s GIS Initiative. Olga Wilhelmi ESIG-NCAR Unidata Workshop 24 June, 2003

Creating a Staff Development Plan with Esri

Experiences and Directions in National Portals"

Organizing Inter- and Transdisciplinary Research in TERRECO TERRECO Seminar, Winter term 2009/10

Transcription:

DataONE: Enabling Data-Intensive Biological and Environmental Research through Cyberinfrastructure Presented by John W. Cobb, Ph.D. Computer Science and Mathematics Office of Cyberinfrastructure National Science Foundation In collaboration with the DataONE team: PI: Bill Michener, University of New Mexico, and ORNL, U. Tennessee, UC Santa Barbara, Cornell, U. Cal. Digital library, USGS, Duke U., U, N. Carolina, U. Ill. Chicago, U. Kansas, Utah State U.

Global change research: Multiple data sources Smith, Knapp, Collins. In press. 2 Managed by UT-Battelle

With coupled human and natural systems 3 Managed by UT-Battelle

DataONE data sources Research networks and environmental observatories Biological specimens Individual scientists Citizen scientists data Natural resources and conservation data Observational data Global and continental land cover/land change and biogeochemical data 4 Managed by UT-Battelle

Heterogeneous data integration for scales small to large Decreasing Spatial Coverage Increasing Process Knowledge Intensive science sites and experiments Extensive science sites Volunteer & education networks Remote sensing Adapted from CENR-OSTP 5 Managed by UT-Battelle

Scattered data sources finding the needle in the haystack Data are massively dispersed Ecological field stations and research centers (100s) Natural history museums and biocollection facilities (100s) Agency data collections (100s to 1000s) Individual scientists (1000s to 10,000s to 100,000s) NCEAS LTER Sites UC Natural Reserves OBFS Sites 6 Managed by UT-Battelle

Data production exceeds storage Petabytes Worldwide 1,000,000 900,000 800,000 700,000 Transient information or unfilled demand for storage 600,000 500,000 400,000 Information Available Storage, 2007 264 EB Tape 21% 300,000 200,000 100,000 0 Available Storage 2005 2006 2007 2008 2009 2010 Disk 56% Optical 22% Other 1% Source: John Gantz, IDC Corporation: The Expanding Digital Universe 7 Managed by UT-Battelle

Poor data practice data entropy Information Content Time of publication Specific details General details Retirement or career change Accident Death Time (Michener et al. 1997) 8 Managed by UT-Battelle

Data deluge the flood of increasingly heterogeneous data Data are heterogeneous Syntax (format) Schema (model) Semantics (meaning) Jones et al. 2007 9 Managed by UT-Battelle

Distributed framework Coordinating Nodes Retain complete metadata catalog Subset of all data Perform basic indexing Provide network-wide services Ensure data availability (preservation) Provide replication services Flexible, scalable, sustainable network 10 Managed by UT-Battelle

Examples of data holdings Metadata interoperability across data holdings Data Archive Types of Data Managed Biodiversity, taxonomic, ecological Metadata Standard(s) BDP, DwC, DC, OGIS Biogeochemical dynamics, terrestrial ecological Earth observation imagery Ecological, biodiversity, biophysical, social, genomics, and taxonomic DIF, BDP, ECHO EML Avian populations and molecular biology DwC Biological and taxonomic DC subset Biophysical, biodiversity, disturbance, and Earth observation imagery Biodiversity, biotic structure, function/process, biogeochemical, climate, and hydrologic EML EML 11 Managed by UT-Battelle BDP = Biological Data Profile DC subset = Dublin Core subset DwC = Darwin Core DC = Dublin Core DIF = Directory Interchange Format ECHO = EOS ClearingHouse EML = Ecological Metadata Language OGIS = OpenGISZ

Supporting the data lifecycle Get Data NBII Replicate Data SAEON Upload Data/Metadata Publish Workflow escience Center Write Data Cal Digital Library Check for Data NCSA Authenticate UT Library Check Heartbeat Get Data Search Living Australia UCSB Node NCEAS ORC Node UNM Node Replicate Data NPN LTR USGS Library TERN NESCent Coenell UIC Library Upload Data/Metadata Morpho ESA The data lifecycle 12 Managed by UT-Battelle 1. Deposition/acquisition/ingest 2. Curation and metadata management 3. Protection, including privacy 4. Discovery, access, use, and dissemination 5. Interoperability, standards, and integration 6. Evaluation, analysis, and visualization

Use Case Scenario: Forecasting the spread of an invasive species across North America Integrated data analysis The Eurasian Collared Dove (Streptopelia decaocto) was introduced in the Bahamas in 1988. Since then it has spread across North America. There is concern that the Eurasian Collared Dove could compete with native dove species (i.e. Mourning Dove, Zenaida macrura, or Whitewinged Dove, Zenaida asiatica), both of which are economically beneficial. A analyst would like to predict how the invasive dove will spread over the next 20 years, and how it might impact the ranges of the other dove species. First, the analyst searches DataONE clearinghouse to discover and access data on the distribution of the dove species. They find that a continent-wide network of citizen scientists has gathered information on the occurrence of Eurasian Collared Dove since it was introduced. Second, analysis workflows are used to first explore and then predict patterns of species occurrence. Exploratory analysis techniques that identify the factors that best predict species occurrence and drive hypotheses generation and predictive analysis. Range expansion of Eurasian Collared Dove Sources of species observations are linked to landscape, climate, geographical and human factors. Observations are available through the distributed network of DataONE data providers. Data are organized via a core semantic model for observational data making data synthesis straightforward. Third, accurate long-range forecasting models for each species are presented. Predicted Ranges of 3 Dove species in 2025. 13 Managed by UT-Battelle Note: maps are examples of possible outcomes, and not actual representations of range.

Engaging citizens in science www.citizenscience.org 14 Managed by UT-Battelle

Libraries and digital libraries Academic institutions Research networks NSF- and government-funded synthesis and supercomputer centers/networks Governmental organizations International organizations Data and metadata archives Professional societies NGOs Commercial sector engaging diverse partners. 15 Managed by UT-Battelle

Contact John W. Cobb, Ph.D Computer Science and Mathematics (865) 576-5439 cobbjw@ornl.gov 16 Managed by UT-Battelle