DATA and KNOWLEDGE INFRASTRUCTURE ANALYTICS, KNOWLEDGE DISSEMINATION

Similar documents
Observational Data Standard - List of Entities and Attributes

Spatial Data Infrastructure Concepts and Components. Douglas Nebert U.S. Federal Geographic Data Committee Secretariat

Land Use Methods & Metrics Development Outcome

Calculating the Natura 2000 network area in Europe: The GIS approach

Recent Topics regarding ISO/TC 211 in Japan

File Geodatabase Feature Class. Tags platts, price assessement, crude oil, crude, petroleum

世界在线植物志 (World Flora Online) 项目介绍

Subwatersheds File Geodatabase Feature Class

HIGH RESOLUTION BASE MAP: A CASE STUDY OF JNTUH-HYDERABAD CAMPUS

LUCAS Technical reference document U1 LUCAS Survey data user guide. (Land Use / Cover Area Frame Survey)

ISO INTERNATIONAL STANDARD. Geographic information Spatial referencing by coordinates

CHAPTER 7 PRODUCT USE AND AVAILABILITY

INSPIRE - A Legal framework for environmental and land administration data in Europe

Orbital Insight Energy: Oil Storage v5.1 Methodologies & Data Documentation

Shale Plays. File Geodatabase Feature Class. Tags shale plays, basins, gas production, regions

8 th Meeting of IAEG-SDGs 5 8 November 2018, Stockholm, Sweden. Meetings. Report

METADATA. Publication Date: Fiscal Year Cooperative Purchase Program Geospatial Data Presentation Form: Map Publication Information:

Creating an e-flora for South Africa

Open Data meets Big Data

Survey Protocols for Monitoring Status and Trends of Pollinators

Arctic ecosystem services: TEEB Arctic Scoping study. Alexander Shestakov WWF Global Arctic Programme 3 December Arctic Biodiversity Congress

ISO INTERNATIONAL STANDARD. Geographic information Metadata Part 2: Extensions for imagery and gridded data

THE GLOBAL INDICATOR FRAMEWORK: NEW AND INNOVATIVE METHODS FOR DISAGGREGATIN BY GEOLOCATION

ISO INTERNATIONAL STANDARD. Geographic information Metadata Part 2: Extensions for imagery and gridded data

Reducing Consumer Uncertainty

Data Dictionary for Observation Data Transcription Reports from the Colorado Natural Heritage Program

INSPIRE Basics. Vlado Cetl European Commission Joint Research Centre.

Motivation for project:

Infrastructure for Spatial Information in Europe (INSPIRE) Steve Peedell

How a Media Organization Tackles the. Challenge Opportunity. Digital Gazetteer Workshop December 8, 2006

Restoration efforts required for achieving the objectives of the Birds and Habitats Directives

Spatial Planning for Protected Areas in Response to Climate Change (SPARC)

Global Distribution of Northern Fur Seals (2013)

PAD-US (CBI Edition) Version 2: Standards and Procedures

Introduction. Elevation Data Strategy. Status and Next Steps

SMEX04 Bulk Density and Rock Fraction Data: Arizona

CoB_Bounds_Full_201802

1. PURPOSE 2. PERIOD OF PERFORMANCE

Metadata Drava & Mura River Basins (Austria)

INSPIRE Monitoring and Reporting Implementing Rule Draft v2.1

Guide to the NBN Atlas Occurrence Record Upload Template for species datasets

New York City Public Use Microdata Areas (PUMAs) for 2010 Census

Integration for Informed Decision Making

Creating a Pavement Management System Using GIS

GBIF Global Biodiversity Information Facility. Dmitry Schigel, Tim Hirsch Arctic data and Global Biodiversity Information Facility

The study of insect species diversity and modern approaches

Declaration Population and culture

Quality and Coverage of Data Sources

Best Practices in Geospatial Metadata

Land Accounts - The Canadian Experience

SEA ICE OUTLOOK 2016 Report

Digitisation of insect collections An Australian perspective

Animal Sound Archives and Zoological Museums

National assessment on member s readiness to produce indicators

CUNY Tax Lots, New York NY, Sept 2016

Introduction to Part III Examining wildlife distributions and abundance using boat surveys

Soil Moisture Measurements

Colin Bray, OSi CEO. Collaboration to develop a data platform for geospatial and statistical information in Ireland

SAFARI 2000 NBI Vegetation Map of the Savannas of Southern Africa

Roadmap to interoperability of geoinformation

Marine Spatial Planning in the Baltic Sea Region

INSPIREd solutions for Air Quality problems Alexander Kotsev

Carbon Cycle Sample Site Set- up - Student Field Guide. Sample Site Corner Team

Chitra Sood, R.M. Bhagat and Vaibhav Kalia Centre for Geo-informatics Research and Training, CSK HPKV, Palampur , HP, India

Post-doc fellowships to non-eu researchers FINAL REPORT. Home Institute: Centro de Investigaciones Marinas, Universidad de La Habana, CUBA

econtentplus GS Soil

Future Ocean Floor Mapping: Ocean Stewardship & Initial Industry Contributions. U.S Hydro Galveston, TX March 23, 2017 David Millar - Fugro

Putting the U.S. Geospatial Services Industry On the Map

Topographic Strategy National Topographic Office March 2015

SEA ICE OUTLOOK 2016 Report

INTEGRATION OF BIODIVERSITY DATABASES IN TAIWAN AND LINKAGE TO GLOBAL DATABASES

Lesser Sunda - Banda Seascape Atlas

Phase One Development of a Comprehensive GIS for the Mentor Marsh and its Proximal Watershed

Compact guides GISCO. Geographic information system of the Commission

Summary Description Municipality of Anchorage. Anchorage Coastal Resource Atlas Project

Geographical Information Systems Energy Database Report. WP1 T1.3- Deliverable 1.9

Boreal Surface water inventory - metadata

The Global Statistical Geospatial Framework and the Global Fundamental Geospatial Themes

Flood Hazard Zone Modeling for Regulation Development

CLT/HER/CHP/OG 1- page 29

CLIMATE PREFERENCES FOR TOURISM: AN EXPLORATORY TRI-NATION COMPARISON. New Zealand.

Data Origin. Ron van Lammeren CGI-GIRS 0910

Global Geospatial Information Management Country Report Finland. Submitted by Director General Jarmo Ratia, National Land Survey

Amy Driskell. Laboratories of Analytical Biology National Museum of Natural History Smithsonian Institution, Wash. DC

UN-GGIM: Europe ExCom 1 June 2016 Francfort Work Group A «Core Data» Status and Progress François Chirié, France

Utilization and Provision of Geographical Name Information on the Basic Map of Japan*

A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series

APPLICATION OF GIS IN WATER SUPPLY MANAGEMENT NETWORK

UN COMMITTEE OF EXPERTSON GLOBAL GEOSPATIAL INFORMATION MANAGEMENT. August 13 15, 2012 COUNTRY REPORT OF THE PHILIPPINES

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

Data Origin. How to obtain geodata? Ron van Lammeren CGI-GIRS 0910

19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data

New York City Council Districts Water Included

The production and use of a hydrographic flow-direction network of surface waters. Rickard HALLENGREN, Håkan OLSSON and Erik SISELL, Sweden

Canadian Urban Environmental Health Research Consortium

ISO Agroclimate Data Data Product Specification. Revision: A

Directorate E: Sectoral and regional statistics Unit E-4: Regional statistics and geographical information LUCAS 2018.

Approach to Field Research Data Generation and Field Logistics Part 1. Road Map 8/26/2016

2010 Wildlife Management Unit 347 moose

2!3! (9)( 8x3 ) = 960x 3 ( 720x 3 ) = 1680x 3

Transcription:

DATA and KNOWLEDGE INFRASTRUCTURE ANALYTICS, KNOWLEDGE DISSEMINATION

Contributors from multiple sectors add to Map of Life

Consumers use Map of Life knowledge for societal needs

BUT TO PROPERLY USE THESE DATA WE NEED TO UNDERSTAND DIFFERENT SOURCES OF DATA AND HOW TO DESCRIBE THEM Description Summary inventory (limited protocol & effort report) Summary inventory (some protocol and effort reporting) Single person/group inventory: observation Inventory following protocol: stationary trapping Inventory following protocol: active sampling campaigns Full inventory following very defined protocol Inventory following loose protocol: citizen science observation Example Protected area species list Standardized area survey (e.g. atlas grid cell) Standardized area survey (e.g. transect count) Camera traps & more typical trappings fish, zooplancton netting, algal sampling CTFS forest plot, Revelle plots transect Contribut., Quality many, heterogen. many, heterogen. multi, unclear single, high single, & vetted clear single, high & vetted Effort Report Very limited multiple single, Possible clear singe, somewhat clear single, high single, & vetted clear single, high single, & vetted clear single, single, heterogen., clear unvetted Yes Yes Yes Source Raw data Literature no Literature, Project reporting Project data Project data Project data Perfect: Project full data coverage Yes Project data no yes yes yes yes yes Protocols Temporal Scope long (years) Geographic scope Clear (often >1km) long Clear (month (often s, >1km) years) short (hours, days) short (hours, days) usually short short short (hours, days) Clear (m to 1km) Very small (meters) Small (e.g. meters) V. small (e.g. meters) Clear (m to 1km) Complete &suited Reporting absence basis inference Multi (observati Yes on, Photo, Lit,) Single (e.g. yes Observati on) Single (e.g. Observati on) Single (e.g. Observati on) Single (e.g. Observati on) Single (e.g. Observati on) Single (e.g. Observati on) no somewhat (over very small extent) no yes (over very small extent) no Suited input occpy. Model No no Potent -ially Yes Yes NA Potent -ially

A metadata schema for collating data from inventories Humboldt Core Version 1 Area species checklists Gridded Atlas survey Protected area species list Geographically restricted surveys Transect count Trapping and netting CTFS forest, Revelle plots General dataset & identification terms Geospatial & Habitat Scope Terms Temporal Scope Terms Taxonomic Scope Terms inventory perfomed by; dataset name, identifier, publisher, licensing, rights holders; metadata recorded by; citation reference and id; taxa identifier by; identification quality; cited taxonomic authority Geospatial scope; areal extent; total area inventoried; number of sites; site names and details; lat/lon by site; elevation range and units; habitats included and excluded. Survey time blocks; start and end year, month day; time units spent In blocks; daily start, end time; study diurnality, study season. Prospective taxonomic scope inclusion and exclusion; distribution status included and excluded; developmental stage included and excluded, size classes included and excluded. Methodology Description Terms Completeness & effort terms Inventory type; Compiled data Y/N & type; abundances and/or absences reported? Absence list Completeness reported and how; Inferred taxonomic completeness Upper/lower bound and how. Inventory type, protocol name, detail, citation, reference, abundances reported Y/N absences reported? Effort reporting & lower/upper bound and granular breakdown; effort method; Vouchers or samples taken and how?

Putting it into practice Assembling area species checklist data and metadata from the literature Team of Boulder and Yale developers and students assembled metadata for (so far) 143 area checklists and collated information about area checklists characteristics Humbolt Core Term Possible Values Percents Compilation effort Low, medium, high, na Low 38.7%, Medium 6.2%, High 8.5%, n.a 45.7% Abundances reported Yes/No Yes-57.5% No-42.6 Absences reported Yes/No Yes-41.8, No-58.2% Completeness assessment Scale in 25 percent increments from 0-100 50-100% complete 30.3%, 75-100% - 27.9%, other- 41.8%

A tool for the long-tail data on YOUR computer

Point (and list) Uploader Key Points The point uploader is the first of many upload tools Metadata about datasets provides critical ownership and rights data As well as essential content for better use of the datasets e.g. probabilistic assessment of absences Data provided may be kept private for use Or made available for broader use, curation, improvement Metadata also provides value for downstream modeling (more expert versus novice data)

TAXONOMY TRIVIAL BUT TERRIFYING ISSUES Whether citizen science data or data from museum records, data cleaning before import is important and provides value for providers and consumers Based on a gold standard, hand vetted set of 500 museum digitized label data in VertNet: - 7.8% of scientific names could not be resolved at all - 32% of names are unaccepted but could be resolved to accepted names - 2.6% are misspelled and unaccepted names - 10% are misspellings of accepted names - 47.6% are current accepted names Take home: Huge issues with ingested data, requiring novel solutions

The non-trival problemtracking how naming meanings change Species X Species Y Lump Species X Initial circumscription New circumscription Species X Sp. X ssp1 Split Species X Species Y (ssp1) Initial circumscription New circumscription Geographic range outcome

* Based on AOU checklist (a conservative assessment) This is not a problem of old records or just some groups Taxonomic effort In North American birds 1885-2015 Primary descriptions Redescriptions that change taxon concepts

Big Challenges working with Names Reconciling Names on Ingest (for those resources where name validation has been inconsistent/problematic) Scrape list to text Import list to resolver Check name is accepted? If no, get accepted name Use new algorithmic approaches to find misspelling, revalidate, store location geom. & validated names Check names in MOL authority files

Feature Total Total records 200183168 (100.00%) (A) Geospatial issues Coordinates equal to zero 1456654 (0.73%) Impossible coordinates 10117 (0.01%) Low precision 16252100 (8.12%) Out of the specified country 14040820 (7.01%) Transposed coordinates 322734 (0.16%) Negated Latitude 272829 (0.14%) Negated Longitude 383919 (0.19%) (B) Spatio-taxonomic issues Inside range map 146877631 (73.37%) Less than 55Km 18408468 (9.20%) 55-111Km 2228994 (1.11%) 111-555Km 3783218 (1.89%) More than 555Km 5417136 (2.71%) Without range map assessment 23467721 (11.72%) Without RM assessment - 12912456 (6.45%) taxon issues A global assessment of terrestrial vertebrates using GBIF records. Data from Otegui and Guralnick CLEANING IS NOT A ONE STEP PROCESS... It is a constant process of further refining...

WHY IT ALL MATTERS - Reintegration of disparate data critical (but so is improving those data) - The data and communities assembling data are highly heterogeneous and disconnected - The data sciences components are not trivial. - Map of Life provides tools for ALL to provision data, metadata and provide innovative tools to help curate & improve it - To better serve needs for monitoring and assessment