Mining Climate Data. Michael Steinbach Vipin Kumar University of Minnesota /AHPCRC

Similar documents
Application of Clustering to Earth Science Data: Progress and Challenges

Data Mining for the Discovery of Ocean Climate Indices *

Finding Climate Indices and Dipoles Using Data Mining

Data Mining for the Discovery of Ocean Climate Indices *

RESEARCH METHODOLOGY

Discovery of Climate Indices using Clustering

NSF Expeditions in Computing. Understanding Climate Change: A Data Driven Approach. Vipin Kumar University of Minnesota

Discovery of Patterns in the Global Climate System using Data Mining

Factors That Affect Climate

Detecting Ecosystem Disturbances and Land Cover Change using Data Mining. Pang-Ning Tan Michigan State University, East Lansing, MI 48824

Prentice Hall EARTH SCIENCE

Prentice Hall EARTH SCIENCE

NEW GENERATION OF DATA MINING APPLICA- TIONS

3) What is the difference between latitude and longitude and what is their affect on local and world weather and climate?

Construction and Analysis of Climate Networks

National Climatic Data Center Data Management Issues Tom Karl Director, NOAA s National Climatic Data Center

The Climate System and Climate Models. Gerald A. Meehl National Center for Atmospheric Research Boulder, Colorado

March was 3rd warmest month in satellite record

Exploring Climate Patterns Embedded in Global Climate Change Datasets

Statistics Research in Remote Sensing Data Analysis for Climate Science at the Jet Propulsion Laboratory

Amita Mehta and Ana Prados

Current Status of the Stratospheric Ozone Layer From: UNEP Environmental Effects of Ozone Depletion and Its Interaction with Climate Change

Dynamic Land Cover Dataset Product Description

Global Temperature Report: December 2018

Lab 12: El Nino Southern Oscillation

Web Visualization of Geo-Spatial Data using SVG and VRML/X3D

8-km Historical Datasets for FPA

Homework. Oceanography and Climate Review due Friday Feb 12 th (test day!!)

Stefan Liess University of Minnesota Saurabh Agrawal, Snigdhansu Chatterjee, Vipin Kumar University of Minnesota

Warm Up Vocabulary Check

Impact of NASA EOS data on the scientific literature: 16 years of published research results from Terra, Aqua, Aura, and Aquarius

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

1 What Is Climate? TAKE A LOOK 2. Explain Why do areas near the equator tend to have high temperatures?

Pacemaker World Geography and Cultures. correlated to. Louisiana Social Studies Grade Level Expectations: World Geography Geography Grades 9-12

Energy Systems, Structures and Processes Essential Standard: Analyze patterns of global climate change over time Learning Objective: Differentiate

Time Series Analysis with SAR & Optical Satellite Data

THE CANADIAN CENTRE FOR CLIMATE MODELLING AND ANALYSIS

Open Data meets Big Data

Graduate Courses Meteorology / Atmospheric Science UNC Charlotte

Introduction to Climatology. GEOG/ENST 2331: Lecture 1

Grades 9-12: Earth Sciences

Name: Climate Date: EI Niño Conditions

Introduction to Climatology. GEOG/ENST 2331: Lecture 1

Lesson IV. TOPEX/Poseidon Measuring Currents from Space

Global Weather Trade Winds etc.notebook February 17, 2017

Earth s Climate Patterns

RHOAPS. Real-time Hydrology Ocean Atmosphere Prediction System. Pronunciation: Ropes Motto: More than just THREDDS

World Geography Fall 2013 Semester Review Project

CHAPTER 1: INTRODUCTION

THE PACIFIC DECADAL OSCILLATION (PDO)

Interannual variation of MODIS NDVI in Lake Taihu and its relation to climate in submerged macrophyte region

Impact of Climate Change on Chinook Salmon

World Geography Chapter 3

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

A Facility for Producing Consistent Remotely Sensed Biophysical Data Products of Australia

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

NOAA/OAR Observing Systems

Once a specific data set is selected, NEO will list related data sets in the panel titled Matching Datasets, which is to the right of the image.

Unit Three Worksheet Meteorology/Oceanography 2 WS GE U3 2

Current and future climate of the Cook Islands. Pacific-Australia Climate Change Science and Adaptation Planning Program

EOS Direct Broadcast Real-Time Products for the US National Weather Service

Climate 1: The Climate System

El Niño Seasonal Weather Impacts from the OLR Event Perspective

McIDAS Activities Within The NASA Langley Research Center Clouds And Radiation Group

SYNERGY OF SATELLITE REMOTE SENSING AND SENSOR NETWORKS ON GEO GRID

OCEANOGRAPHIC DATA MANAGEMENT

General Circulation. Nili Harnik DEES, Lamont-Doherty Earth Observatory

Standardized Precipitation Evapotranspiration Index (SPEI) Dataset in Yunnan Province, China

Assimilating terrestrial remote sensing data into carbon models: Some issues

Academic Vocabulary CONTENT BUILDER FOR THE PLC WORLD GEOGRAPHY

ERBE Geographic Scene and Monthly Snow Data

Climate and the Atmosphere

Definitions Weather and Climate Climates of NYS Weather Climate 2012 Characteristics of Climate Regions of NYS NYS s Climates 1.

Droughts are normal recurring climatic phenomena that vary in space, time, and intensity. They may affect people and agriculture at local scales for

4. GIS Implementation of the TxDOT Hydrology Extensions

b. The boundary between two different air masses is called a.

Evapotranspiration monitoring with Meteosat Second Generation satellites: method, products and utility in drought detection.

Weather and climate outlooks for crop estimates

Weather Satellite Data Applications for Monitoring and Warning Hazard at BMKG

Topic 6: Insolation and the Seasons

Greening of Arctic: Knowledge and Uncertainties

Presentation Overview. Southwestern Climate: Past, present and future. Global Energy Balance. What is climate?

3A: Use a model to describe how the flow of energy from the sun influences weather patterns and interacts with the layers of the atmosphere

Inter- Annual Land Surface Variation NAGS 9329

MS RAND CPP PROG0407. HadAT: An update to 2005 and development of the dataset website

Rainforests and Deserts: Distribution, Uses, and Human Influences. Teacher s Masters California Education and the Environment Initiative

Behind the Climate Prediction Center s Extended and Long Range Outlooks Mike Halpert, Deputy Director Climate Prediction Center / NCEP

World Geography TEKS 2nd Nine Weeks. Unit of Study Regional Studies; U.S. and Canada Regional Studies; Latin America; and Europe

East Penn School District Curriculum and Instruction

El Niño / Southern Oscillation

MPACT OF EL-NINO ON SUMMER MONSOON RAINFALL OF PAKISTAN

An OLR perspective on El Niño and La Niña impacts on seasonal weather anomalies

Data Origin. Ron van Lammeren CGI-GIRS 0910

Warmest January in satellite record leads off 2016

Permanent Ice and Snow

ACCURACY ASSESSMENT OF ASTER GLOBAL DEM OVER TURKEY

2018 Science Olympiad: Badger Invitational Meteorology Exam. Team Name: Team Motto:

Module 7, Lesson 1 Water world

Weather & Ocean Currents

1 What Is Climate? TAKE A LOOK 2. Explain Why do areas near the equator tend to have high temperatures?

Transcription:

Mining Climate Data Michael Steinbach Vipin Kumar University of Minnesota /AHPCRC Collaborators: G. Karypis, S. Shekhar (University of Minnesota/AHPCRC) V. Chadola, S. Iyer, G. Simon, P. Zhang (UM/AHPCRC) P. N. Tan (Michigan State University) C. Potter (NASA Ames Research Center), S. Klooster (California State University, Monterey Bay). NASA funded project: Discovery of Changes from the Global Carbon Cycle and Climate System Using Data Mining Additional support from Army High Performance Computing Research Center Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. M. Steinbach Mining Climate Data

Overview Background Data Mining Tasks Detection of Disturbances and Associations Discovery of Climate Indices Distributed Issues Conclusion M. Steinbach Mining Climate Data 2

Research Goal Average Monthly Temperature Research Goal: Find global climate patterns of interest to Earth Scientists A key interest is finding connections between the ocean / atmosphere and the land. NPP Pressure Precipitation... NPP Pressure Precipitation Global snapshots of values for a number of variables on land surfaces or water. SST Latitude SST Span a range of 10 to 50 years. grid cell Longitude Time zone Gridded data M. Steinbach Mining Climate Data 3

The El Nino Climate Phenomenon El Nino is the anomalous warming of the eastern tropical region of the Pacific. Normal Year: Trade winds push warm ocean water west, cool water rises in its place El Nino Year: Trade winds ease, switch direction, warmest water moves east. http://www.usatoday.com/weather/tg/wetnino/wetnino.htm M. Steinbach Mining Climate Data 4

Overview Background Data Mining Tasks Detection of Disturbances and Associations Discovery of Climate Indices Distributed Issues Conclusion M. Steinbach Mining Climate Data 5

Detection of Ecosystem Disturbances Can detect ecosystem disturbances by detecting sudden changes in greenness from satellite data FPAR: Fraction of Photosynthetic Active Radiation absorbed by the green part of vegetation. M. Steinbach Mining Climate Data 6

Detection of Ecosystem Disturbances Major ecosystem disturbances detected in North America. NASA image of patterns in the 18-year record (1982-1999) of global satellite observations of vegetation greenness from the Advanced Very High Resolution Radiometer (AVHRR). Different colored areas identify the major ecosystem disturbance events detected and the year they occurred. The majority of potential disturbance events pictured occurred in boreal forest ecosystems of Canada or shrublands and rangelands of the southern United States. Release: 03-51AR NASA DATA MINING REVEALS A NEW HISTORY OF NATURAL DISASTERS NASA is using satellite data to paint a detailed global picture of the interplay among natural disasters, human activities and the rise of carbon dioxide in the Earth's atmosphere during the past 20 years. http://amesnews.arc.nasa.gov/releases/2003/03_51ar.html Smoke over Borneo, Indonesia M. Steinbach Mining Climate Data 7

Mining Associations in Earth Science Data: Challenges Transaction Items Id 1 Bread, Milk 2 Beer, Diaper, Bread, Eggs 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Bread, Diaper, Milk Rules Discovered: {Milk} --> -->{Coke} {Diaper, Milk} Milk} --> -->{Beer} How to transform Earth Science data into transactions? What are the baskets? What are the items? How to define support? M. Steinbach Mining Climate Data 8

Mining Associations Patterns in Earth Science Data: Challenges (Lat,Long,time) Events (10N,10E,1) {Temp-Hi, Prec-Lo} (10N,10E,2) {Temp-Hi,Prec-Lo,NPP-Lo} (10N,11E,2) {Temp-Hi, NPP-Lo} (10N,11E,5) {Solar-Hi, NPP-Lo} (10N,11E,10) {Prec-Hi, PET-LO} 1 FPAR-HI PET-HI PREC-HI SOLAR-HI TEMP-HI ==> NPP-HI (support count=145, confidence=100%) 2 FPAR-HI PET-HI PREC-HI TEMP-HI ==> NPP-HI (support count=933, confidence=99.3%) 3 FPAR-HI PET-HI PREC-HI ==> NPP-HI (support count=1655, confidence=98.8%) 4 FPAR-HI PET-HI PREC-HI SOLAR-HI ==> NPP-HI (support count=268, confidence=98.2%) How to efficiently discover spatio-temporal associations? Use existing algorithms. Develop new algorithms. How to identify interesting patterns? Use objective interest measures. Use domain knowledge. M. Steinbach Mining Climate Data 9

Example of Interesting Association Patterns FPAR-Hi ==> NPP-Hi (sup=5.9%, conf=55.7%) Shrubland areas Rule has high support in shrubland areas M. Steinbach Mining Climate Data 10

Overview Background Data Mining Tasks Detection of Disturbances and Associations Discovery of Climate Indices Distributed Issues Conclusion M. Steinbach Mining Climate Data 11

Climate Indices: Connecting the Ocean/Atmosphere and the Land A climate index is a time series of temperature or pressure Similar to business or economic indices Based on Sea Surface Temperature (SST) or Sea Level Pressure (SLP) Climate indices are important because They distill climate variability at a regional or global scale into a single time series. They are well-accepted by Earth scientists. Dow Jones Index (from Yahoo) They are related to well-known climate phenomena such as El Niño. M. Steinbach Discovery of Climate Indices Using Clustering 12

A Temperature Based Climate Index: NINO1+2 Correlation Between ANOM 1+2 and Land Temp (>0.2) Correlation Between Nino 1+2 and Land Temperature (>0.2) 90 90 1 0.8 El Nino Events 60 60 30 0.9 0.6 0.8 0.4 0.7 0.2 0.6 latitude latitude 0 0.5 0 Nino 1+2 Index -30-60 -60-90 -90-180 -150-120 -90-60 -30 30 60 90 120 150 180-180 -150-120 -90-60 -30 0 30 60 90 120 150 180 longitude longitude 0.4-0.2 0.3-0.4 0.2-0.6 0.1-0.8 0 M. Steinbach Mining Climate Data 13

SST Clusters 90 107 SST Clusters 60 30 latitude 0-30 -60-90 -180-150 -120-90 -60-30 0 30 60 90 120 150 180 longitude M. Steinbach Mining Climate Data 14

SST Clusters That Reproduce El Nino Indices latitude 90 60 30 0 75 78 67 94 Niño Region Range Longitude Range Latitude 1+2 (94) 90 W-80 W 10 S-0 3 (67) 150 W-90 W 5 S-5 N 3.4 (78) 170 W-120 W 5 S-5 N 4 (75) 160 E-150 W 5 S-5 N El Nino Regions Defined by Earth Scientists -30-60 -90-180 -150-120 -90-60 -30 0 30 60 90 120 150 180 longitude Cluster Nino Index Correlation 94 NINO 1+2 0.9225 67 NINO 3 0.9462 78 NINO 3.4 0.9196 75 NINO 4 0.9165 M. Steinbach Mining Climate Data 15

An SST Cluster Moderately Correlated to Known Indices Cluster Cluster 29 vs. 29 Known vs. SOI, El ANOM Nino Climate 12, ANOM3, Indices: ANOM34, Nino 1+2, ANOM4 Nino 3, (mincorr Nino 3.4, = 0.2) Nino 4, and SOI 90 0.6 60 0.4 30 29 0.2 latitude 0 0-30 -0.2-60 -0.4-90 -180-150 -120-90 -60-30 0 30 60 90 120 150 180 longitude Corr Diff M. Steinbach Mining Climate Data 16-0.6

Overview Background Data Mining Tasks Detection of Ecological Disturbances Discovery of Climate Indices Distributed Issues Conclusion M. Steinbach Mining Climate Data 17

Need for High Performance Computing SNN clustering analysis require O(n 2 ) comparisons. Association rule algorithms can also be very compute intensive. Potentially very much greater than O(n 2 ) Amount of memory required exceeds for clustering and association rule algorithms can exceed 4GB of traditional sequential servers Pairwise correlation between every land and ocean pixels very time consuming on a sequential computer M. Steinbach Mining Climate Data 18

High Resolution EOS Data EOS satellites provide high resolution measurements Finer spatial grids 8 km 8 km grid produces 10,848,672 data points 1 km 1 km grid produces 694,315,008 data points More frequent measurements Multiple instruments Generates terabytes of day per day High resolution data allows us to answer more detailed questions: Detecting patterns such as trajectories, fronts, and movements of regions with uniform properties Finding relationships between leaf area index (LAI) and topography of a river drainage basin Finding relationships between fire frequency and elevation as well as topographic position Earth Observing System (e.g., Terra and Aqua satellites) http://www.crh.noaa.gov/lmk/soo/docu/basicwx.htm M. Steinbach Mining Climate Data 19

Distributed System For Analyzing Earth Science Data Climate Data Model Data DAAC ESIP...... Data Acquisition, Fusion and Transformation Subsetting Event Detection Data Exploration and Pre-Processing Clustering Statistical Analysis Visualization Trajectory Analysis Association Analysis Output and Presentation ESIP Distributed Data Sources Local Data Deviation Detection Data Mining Classification and Regression

Data Acquisition, Fusion, and Transformation Climate Data Model Data DAAC ESIP...... Data Acquisition, Fusion and Transformation Subsetting Event Detection Data Exploration and Pre-Processing Clustering Statistical Analysis Visualization Trajectory Analysis Association Analysis Output and Presentation ESIP Deviation Detection Classification and Regression Distributed Data Sources Local Data Data Mining This module will provide the ability to acquire the data necessary for the analysis, and since the data comes from heterogeneous sources, the ability to fuse and transform the data.

Data Acquisition, Transformation, and Fusion Challenges Locate and download data when it becomes available Web services such as directory services and peer-to-peer networking capabilities for file sharing Data fusion Conversion between different formats: HDF, HDF-EOS, netcdf, binary, ASCII, Earth Science Markup Language (ESML), Geographic Markup Language (GML) Data Transformation Data transformation such as scaling, radiometric conversion, sampling in time, aggregation, and mathematical or geometrical map transformations to convert the incoming data to the same coordinate system M. Steinbach Mining Climate Data 22

Data Exploration and Pre-processing Climate Data Model Data DAAC ESIP...... Data Acquisition, Fusion and Transformation Subsetting Event Detection Data Exploration and Pre-Processing Clustering Statistical Analysis Visualization Trajectory Analysis Association Analysis Output and Presentation ESIP Deviation Detection Classification and Regression Distributed Data Sources Local Data Data Mining This module consists of several visualization, statistical, and time series preprocessing tools for supporting the exploratory analysis of large-scale Earth Science datasets. Such tools can be used to aid scientists in gaining an initial insight into the distribution, regularity, and quality of the input data.

Data Mining Climate Data Model Data DAAC ESIP...... Data Acquisition, Fusion and Transformation Subsetting Event Detection Data Exploration and Pre-Processing Clustering Statistical Analysis Visualization Trajectory Analysis Association Analysis Output and Presentation ESIP Deviation Detection Classification and Regression Distributed Data Sources Local Data Data Mining Data mining technology offers a suite of advanced decision support tools to facilitate the automatic generation of scientific hypotheses from data.

Data Mining and Exploration Challenges Complex data distribution Data may be split according to time periods, region, attribute, etc. Example: To obtain different types of data for a given point on the Earth it is often necessary to pull data from many sources. Distributed computation For resource or other reasons, data mining tasks may be better executed by distributing the computation across resources in multiple organizations. Example: Finding specific events of interest; summarizing data Time vs. Accuracy Tradeoff User should be able to make choices Example: Time series similarity using correlation vs. time series similarity computed using dynamic time warping. M. Steinbach Mining Climate Data 25

Output and Presentation Climate Data Model Data DAAC ESIP...... Data Acquisition, Fusion and Transformation Subsetting Event Detection Data Exploration and Pre-Processing Clustering Statistical Analysis Visualization Trajectory Analysis Association Analysis Output and Presentation ESIP Deviation Detection Classification and Regression Distributed Data Sources Local Data Data Mining Output and presentation tools will convert results to the common representations (e.g., MIME, ESML/GML/XML, jpeg, binary, etc). Results can be posted to the Web and discussed in a collaborative fashion or easily incorporated into more traditional publications.

Output and Presentation Challenges Allow others to locate and download results when they becomes available Web services such as directory services and peer-to-peer networking capabilities for file sharing Describing the data via XML, Earth Science Markup Language, etc. Data Transformation Privacy and Policy Constraints Different levels of access needed M. Steinbach Mining Climate Data 27

Usage Scenario Download software from a web site and install it. Launch the application and collect data from well-known sources on the Web or from local sources. The system selects the appropriate format transformation and data fusion steps to convert all the data into a single coregistered format. Use the metadata associated with the retrieved data to understand the resolution, spatio-temporal framework, attributes User input Use analysis tools and subsets of the data to perform preprocessing, data exploration, data mining, and post-processing. Select results to be published on the Web, allowing collaboration and access via the Internet. M. Steinbach Mining Climate Data 28

Conclusions Disturbance and association analysis can uncover interesting patterns for Earth Scientists to investigate. By using clustering we have made some progress towards automatically finding climate patterns that display interesting connections between the ocean and the land. Many more opportunities for data mining/data analysis in Earth Science data. Many opportunities for distributed computing to play a useful or critical role. M. Steinbach Mining Climate Data 29

Questions? More information can be found at http://www.ahpcrc.umn.edu/nasa-umn/index.html