Big Data and Geospatial Cyberinfrastructure for Advancing Applications Presented at GIScience 2012 Big Data and CyberGIS Panel Budhendra Bhaduri September 20, 2012 Columbus, OH
Geospatial Cyberinfrastructure Provides access to best in class, geographically distributed resources Data Scalable computation Visualization Platform for data integration and knowledge dissemination Enables on time and on demand information and knowledge delivery, particularly for time critical mission support
In the Garden of Big and Evil Big Data is your mother in law! You know she exists You are (still) trying to understand her You feel she s influencing your life Sometimes a challenge, but sometimes a solution Not sure when to deal with and when to ignore But your wife knows you lover her (and she s right)! How BIG is big? What we can t handle today Its not about the needle, but about defining the haystack! A CyberGIS enabled GIScience community could Define the needle (new challenges) Do better in finding the needle (novel analytics) Make a bigger haystack (keep adding data)
ORNL activities Population Assessment Web service for LandScan population data Population Density Tables portal Settlement and Damage Assessment Dynamic computation for settlement mapping and damage assessment Collaborative Knowledge Discovery Transportation Analysis Bioenergy KDF: Cyber platform for sharing data and resources Sensorpedia: Wikipedia for sensors Routing and contingency analysis Evacuation modeling Energy Security EARSS: Real time monitoring for the electric grid Large scale biomass monitoring Visualization (iglobe) Climate data analytics and visualization Spatio-temporal data mining as a service
LandScan Population Distribution and Dynamics Model and Database Census LandScan Global Gridded Night Day LandScan USA As the finest population distribution data ever produced for the world and the US, LandScan Global and LandScan USA are the community standard for estimating population at risk
Neighborhood mapping: From local interactions to global realizations Unstructured Settlements Lowest to lower middle income Rural migrants Damascus, Syria Very loosely structured Historical ethnic quarters/neighborhoods Poor residents currently being displaced in some areas with urban development/tourism Formal Urban Planning Typical Urban Services Middle to Upper Income
Addis Ababa, Ethiopia 2 Xeon Quad core 2.4GHz CPUs + 4 Tesla GPUs + 48GB Image analyzed (0.3m) 40,000x40,000 pixels (800 sq. km) RGB bands Overall accuracy 93% Settlement class 89% Non-settlement class 94% Total processing time 27 seconds
Settlement characterization tool
LandScan Data Accessed Through Google Earth Interface
June 20,2007 July 19,2007 Fargo,ND Fargo,ND Sunflower Sunflower Corn Corn Soybeans Soybeans
Successful change prediction with Gaussian Process Model MODIS NDVI Time Series from Iowa 6 years (2001 2006) 23 observations per year Trained for first 5 years and monitored last year Accuracy was 88% on a validation set consisting of 97 labeled time series with 13 true changes No Change Variance Predicted Observed Varun Chandola, Ranga Raju Vatsavai: Scalable Time Series Change Detection for Biomass Monitoring Using Gaussian Process. NASA CIDU 2010: 69-82 (One of the best papers, invited to SADM Journal). Change
Wide area biomass monitoring in near real time is becoming a reality MODIS Tile (4800x4800 pixels) 42,76,383 time series FROST: An SGI Altrix ICE 8200 Cluster at ORNL 128 compute nodes each with 16 virtual cores and 24 GB of RAM Serial Threads (16) 41,105 seconds (11.4 hours) 5,872 seconds (1.6 hours) Multicore (multithreaded) and Distributed (message passing) computing strategy MPI (96 nodes) MPI + Threads (1536 cores) 604 seconds (10 minutes) 34 seconds
Scalable analytics and visualization iglobe: an integrated visualization and analysis environment Built using Open Source NASA World Wind Java SDK library Collaboration amongst ORNL, NASA Ames, CSIRO, NCAR, NOAA, and University of Kansas Support for different data access mechanisms via OGC compliant web services Allows interactive visualization and analysis of time series data Support for server side and client side data analysis algorithms for identifying patterns in spatiotemporal data Support for advanced time series and spatial analysis Support for advanced visualization capabilities (vector fields, animations)
Population Density Tables (PDT) Population/1000 ft 2 42 structural facility categories in 8 land use classes Documented data sources and methodology with traceable provenance Open source collection from reputed sources Published sources include academic journals, official government statistics, corporate and university webpages, tourism brochures. Utilizing other sources such as GeoCommunity, Wikipedia, Panoramia, and Wikimapia. Spatial Resolution/extent Region, Nation,, City, Neighborhood Temporal resolution Diurnal Workweek/weekend Episodic/holidays/special event Seasonal Political and socio-economic characteristics will be included in the future Ethnic Religious Racial Political affinity Economic strength Available as a web service on SIPR: https://pdt.ornl.doe.sgov.gov
Making Big Data Bigger and Better CyberGIS can create a community platform for data What if we could assemble all the LULC data and maps produced by graduate student thesis research? Collective contributions from scientific communities Similar to Internet interest groups and social networks (Facebook, Twitter) Individual contributions through VGI and crowd sourcing May become part of our open data economy Seems like we want more not exactly knowing how much more we really need to solve specific problems Its probably all about organizing the haystack(s) and CyberGIS can help Managed by UT-Battelle