The construction of complex networks from linear and nonlinear measures Climate Networks

Similar documents
What have we learned about our climate by using networks and nonlinear analysis tools?

Interacting Networks in Climate. Marcelo Barreiro Department of Atmospheric Sciences School of Sciences, Universidad de la República, Uruguay

Statistical modelling in climate science

Preferred spatio-temporal patterns as non-equilibrium currents

arxiv: v3 [physics.ao-ph] 27 Jan 2011

Exploring Climate Patterns Embedded in Global Climate Change Datasets

The Forcing of the Pacific Decadal Oscillation

A Ngari Director Cook Islands Meteorological Service

Separation of a Signal of Interest from a Seasonal Effect in Geophysical Data: I. El Niño/La Niña Phenomenon

ATMOSPHERIC MODELLING. GEOG/ENST 3331 Lecture 9 Ahrens: Chapter 13; A&B: Chapters 12 and 13

Q & A on Trade-off between intensity and frequency of global tropical cyclones

Simple Mathematical, Dynamical Stochastic Models Capturing the Observed Diversity of the El Niño Southern Oscillation (ENSO)

The importance of sampling multidecadal variability when assessing impacts of extreme precipitation

CHAPTER 1: INTRODUCTION

Forced and internal variability of tropical cyclone track density in the western North Pacific

Ecological indicators: Software development

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Seasonal Climate Watch September 2018 to January 2019

Stefan Liess University of Minnesota Saurabh Agrawal, Snigdhansu Chatterjee, Vipin Kumar University of Minnesota

Assessment of the Impact of El Niño-Southern Oscillation (ENSO) Events on Rainfall Amount in South-Western Nigeria

Seasonal Climate Watch July to November 2018

Seasonal Climate Watch April to August 2018

Will it be a Good Ski Season? Correlation between El Niño and U.S. Weather

TOOLS AND DATA NEEDS FOR FORECASTING AND EARLY WARNING

Which Climate Model is Best?

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Appearance of solar activity signals in Indian Ocean Dipole (IOD) phenomena and monsoon climate pattern over Indonesia

Link between Hurricanes and Climate Change: SST

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

2013 ATLANTIC HURRICANE SEASON OUTLOOK. June RMS Cat Response

Seasonal Climate Watch June to October 2018

JP1.7 A NEAR-ANNUAL COUPLED OCEAN-ATMOSPHERE MODE IN THE EQUATORIAL PACIFIC OCEAN

TROPICAL-EXTRATROPICAL INTERACTIONS

Chapter outline. Reference 12/13/2016

18. ATTRIBUTION OF EXTREME RAINFALL IN SOUTHEAST CHINA DURING MAY 2015

Transformational Climate Science. The future of climate change research following the IPCC Fifth Assessment Report

HMM part 1. Dr Philip Jackson

Lecture 8: Natural Climate Variability

Seasonal Climate Watch January to May 2016

Ocean cycles and climate ENSO, PDO, AMO, AO

NOTES AND CORRESPONDENCE. A Quantitative Estimate of the Effect of Aliasing in Climatological Time Series

EL NIÑO/LA NIÑA UPDATE

Weather Outlook for Spring and Summer in Central TX. Aaron Treadway Meteorologist National Weather Service Austin/San Antonio

NON-STATIONARY & NON-LINEAR ANALYSIS, & PREDICTION OF HYDROCLIMATIC VARIABLES OF AFRICA

Author's personal copy

NIWA Outlook: March-May 2015

SIO 210: Data analysis methods L. Talley, Fall Sampling and error 2. Basic statistical concepts 3. Time series analysis

Construction and Analysis of Climate Networks

El Niño / Southern Oscillation

Historical Changes in Climate

Hurricane Risk: Importance of Climate Time Scale and Uncertainty

ENSO-DRIVEN PREDICTABILITY OF TROPICAL DRY AUTUMNS USING THE SEASONAL ENSEMBLES MULTIMODEL

Hilbert analysis unveils inter-decadal changes in large-scale patterns of surface air temperature variability

An ENSO-Neutral Winter

How Patterns Far Away Can Influence Our Weather. Mark Shafer University of Oklahoma Norman, OK

SIO 210 CSP: Data analysis methods L. Talley, Fall Sampling and error 2. Basic statistical concepts 3. Time series analysis

PRMS WHITE PAPER 2014 NORTH ATLANTIC HURRICANE SEASON OUTLOOK. June RMS Event Response

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions

NOTES AND CORRESPONDENCE. El Niño Southern Oscillation and North Atlantic Oscillation Control of Climate in Puerto Rico

Research Note INTELLIGENT FORECASTING OF RAINFALL AND TEMPERATURE OF SHIRAZ CITY USING NEURAL NETWORKS *

Early Successes El Nino Southern Oscillation and seasonal forecasting. David Anderson, With thanks to Magdalena Balmaseda, Tim Stockdale.

The New Normal or Was It?

William M. Frank* and George S. Young The Pennsylvania State University, University Park, PA. 2. Data

Learning a probabalistic model of rainfall using graphical models

The role of teleconnections in extreme (high and low) precipitation events: The case of the Mediterranean region

An Introduction to Coupled Models of the Atmosphere Ocean System

2. DYNAMICS OF CLIMATIC AND GEOPHYSICAL INDICES

Challenges to Improving the Skill of Weekly to Seasonal Climate Predictions. David DeWitt with contributions from CPC staff

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

arxiv: v1 [physics.soc-ph] 18 Sep 2014

Lecture 28. El Nino Southern Oscillation (ENSO) part 5

Evaluating a Genesis Potential Index with Community Climate System Model Version 3 (CCSM3) By: Kieran Bhatia

Predicting uncertainty in forecasts of weather and climate (Also published as ECMWF Technical Memorandum No. 294)

Principal Component Analysis of Sea Surface Temperature via Singular Value Decomposition

Impacts of Recent El Niño Modoki on Extreme Climate Conditions In East Asia and the United States during Boreal Summer

Atmosphere, Ocean and Climate Dynamics Fall 2008

A Rossby Wave Bridge from the Tropical Atlantic to West Antarctica

How can we explain possible human contribution to weather events?

Latin Hypercube Sampling with Multidimensional Uniformity

CLIMATE SIMULATION AND ASSESSMENT OF PREDICTABILITY OF RAINFALL IN THE SOUTHEASTERN SOUTH AMERICA REGION USING THE CPTEC/COLA ATMOSPHERIC MODEL

June Current Situation and Outlook

the 2 past three decades

An Introduction to Nonlinear Principal Component Analysis

OAKLYN PUBLIC SCHOOL MATHEMATICS CURRICULUM MAP EIGHTH GRADE

JEFF JOHNSON S Winter Weather Outlook

Projections of future climate change

Entropy Rate of Stochastic Processes

Pacific Decadal Oscillation ( PDO ):

MPACT OF EL-NINO ON SUMMER MONSOON RAINFALL OF PAKISTAN

by Jerald Murdock, Ellen Kamischke, and Eric Kamischke An Investigative Approach

Semiblind Source Separation of Climate Data Detects El Niño as the Component with the Highest Interannual Variability

EL NIÑO/LA NIÑA UPDATE

SUPPLEMENTARY INFORMATION

The Climate System and Climate Models. Gerald A. Meehl National Center for Atmospheric Research Boulder, Colorado

Comparison between ensembles generated with atmospheric and with oceanic perturbations, using the MPI-ESM model

Global Ocean Monitoring: A Synthesis of Atmospheric and Oceanic Analysis

UPDATE OF REGIONAL WEATHER AND SMOKE HAZE (May 2017)

Application of Clustering to Earth Science Data: Progress and Challenges

REQUEST FOR A SPECIAL PROJECT

Climate, Weather and Renewable Energy. Prof. Frank McDermott, UCD School of Earth Sciences 30 th November 2017 UCD

Transcription:

Procedia Computer Science Volume 51, 215, Pages 44 412 ICCS 215 International Conference On Computational Science The construction of complex networks from linear and nonlinear measures Climate Networks J. Ignacio Deza 1 and Hisham Ihshaish 2 1 Department of Physics and Nuclear Engineering, Polytechnic University of Catalonia, Barcelona, Spain juan.ignacio.deza@upc.edu 2 Department of Computer Science and Creative Technologies, University of the West of England - Bristol, the UK hisham.ihshaish@uwe.ac.uk Abstract During the last decade the techniques of complex network analysis have found application in climate research. The main idea consists in embedding the characteristics of climate variables, e.g., temperature, pressure or rainfall, into the topology of complex networks by appropriate linear and nonlinear measures. Applying such measures on climate time series leads to defining links between their corresponding locations on the studied region, whereas the locations are the network s nodes. The resulted networks, consequently, are analysed using the various network analysis tools present in literature in order to get a better insight on the processes, patterns and interactions occurring in climate system. In this regard we present ClimNet; a complete set of software tools to construct climate networks based on a wide range of linear (cross correlation) and nonlinear (Information theoretic) measures. The presented software will allow the construction of large networks adjacency matrices from climate time series while supporting functions to tune relationships to different time-scales by means of symbolic ordinal analysis. The provided tools have been used in the production of various original contributions in climate research. This work presents an in-depth description of the implemented statistical functions widely used to construct climate networks. Additionally, a general overview of the architecture of the developed software is provided as well as a brief analysis of application examples. Keywords: complex systems, climate networks, time series analysis, graph theory Software availability Availability: ClimNet - Software will be available shortly in a public repository, meanwhile researchers are recommended to contact the authors for the source code. Programming languages: C++, Python Software requirements: NetCDF, C++ compiler. Python (optional, for pre- and postprocessing). 44 Selection and peer-review under responsibility of the Scientific Programme Committee of ICCS 215 c The Authors. Published by Elsevier B.V. doi:1.116/j.procs.215.5.26

1 Introduction Many techniques of complex network analysis have found application to complex systems. Examples of these include, but not limited to, applications in airline transportation networks modelling [?], social interactions [?] ortheinternet[?]. The study of network properties have proven to be vital tools to analyse and predict the behaviour of such complex systems. These systems share the characteristic that they can be straightforwardly represented in terms of a well-defined set of nodes coupled via links that have a clear physical interpretation. In a continuous system, however, like the atmosphere or the ocean, it is not always clear how to define the relevant network nodes, and there is usually no obvious interaction that can be used to define links between them. In order to tackle this problem, the so called interaction networks have been recently used. As such, the observed atmospheric or ocean locations serve as nodes, and the links are calculated based on statistical measures of similarity (e.g. correlation coefficient) between pairwise time series of observations at the studied locations. In like manner, climate networks (CNs) represent a statistical similarity structure of spatiotemporal resolved climatological variables, which depend on this definition of nodes and links. Their spatial sampling results in a space-embedded topology, and thus, a careful interpretation of the inferred network is required. CNs have been successfully employed to analyse climate features, for instance the analysis of significant correlations [?,?] and atmospheric teleconnections [?], the study of local phenomena as El Niño-Southern Oscillation (ENSO) [?]. Other CNs techniques have been applied to investigate the connections between sea surface temperature (SST) variability and the global mean temperature [?] and to study the possible collapse of the meridional overturning circulation (MOC) [?] on the north Atlantic. On a related note, climate system is known for its multi-scale interactions and hence one would like to explore the interaction of processes over the different scales. Data are available through high-resolution ocean/atmosphere/climate observational and model simulations but they lead to networks with a big number of nodes ( 1) making the process of network construction computationally demanding. Moreover, when high-resolution model data are used, constructing such networks becomes very challenging using currently available software tools. For such scales, parallel climate network construction and analysis tools have been developed, e.g. Par@Graph [?]. However, these tools are not usually based on information theoretic techniques, but rather on linear correlation calculations. In contrast, ClimNet enables the construction of directed and non-directed climate networks based on state of the art Information-theoretic tools, besides ordinary correlation coefficient measures. As well, ordinal analysis technique is implemented to analyse time scales directly from the data. Additionally, robust statistical analysis are in place to assure the significance of the results. The provided features of ClimNet have been used in previous works [?,?,?] to construct and analyse climate networks from correlations in surface air temperature (SAT) time series. More precisely, the software have been used to process the interdependencies between the time series and use this information together with a statistical significance test to create an adjacency matrix suitable to be explored and analysed by conventional network analysis tools and interpreted from a climatological point of view (see section 5). In this paper the software to construct complex climate networks is described, mainly from a functional (features) perspective, with special attention to the implemented statistical and Information-theoretic measures. The software functions have been computationally optimised and safe multithreading has been also introduced to enable parallel processing on multi-core 45

machines. However, since our major focus here is on the usability and application of the tools, their application to real climate data will be addressed here, whereas the scalability and performance of ClimNet will be discussed elseweher in a coming work. On a different matter, It should be mentioned that knowledge in climate dynamics is necessary in order to understand the patterns calculated by the the presented software. However, the analysis of the networks are done by running common graph theory algorithms to find network metrics including degree distribution, closeness and betweenness centrality. 2 Measures for Climate Networks Construction The software is equipped with functions to produce Information-theoretic measures - non linear correlation and information transfer between time series. All the measures (except entropy) are pairwise, i.e. based on the similarity found between each pair of time series. The list of these measures are described as follows: 2.1 Cross Correlation It is implemented for benchmarking and for comparison with information theoretic techniques. The Pearson correlation coefficient [?] is defined as: E[(X μ X )(Y μ Y )] = 1 (n 1) n (x i μ x )(y i μ y ) (1) i=1 where x i and y i are X and Y are two random variables with expected values μ X and μ Y standard deviations σ X and σ Y and E is the expected value. and 2.2 Entropy The Shannon s definition of entropy is generally defined in terms of the probability density function (PDF) p i, of the system to be in state x out of a possible set of states A: H = i A p i log b p i. (2) Here, p i is the probability of the value number i to appear in a sequence of characters of a given time series. This measure can be used to better understand the characteristics of the series. It is an univariate measure and the result of the calculations does not yield a network. 2.3 Mutual Information Mutual information is computed from the probability density functions (PDFs) that characterise two time series in two nodes, p i and p j, as well as their joint probability function, p ij [?]: M ij = m,n p ij (m, n)log p ij(m, n) p i (m)p j (n). (3) M ij is a symmetric measure M ij = M ji (4) 46

of the degree of statistical interdependence for the time series i(t) andj(t); if they are independent: p ij (m, n) =p i (m)p j (n) (5) and thus M ij =. 2.4 Directionality Index The directionality index (DI) is defined as: DI XY (τ) = I XY (τ) I YX (τ) I XY (τ)+i YX (τ), (6) where the I XY (τ) andi YX (τ) are called conditional mutual information and are defined as: I XY (τ) = I(X; Y X τ ) = H(X X τ )+H(Y X τ ) H(X, Y X τ ); (7) I YX (τ) = I(Y ; X Y τ ) = H(Y Y τ )+H(X Y τ ) H(Y,X Y τ ), (8) with X τ = X(t τ), Y τ = Y (t τ) andh(x Y ) being the conditional entropy [?]. The directionality index, DI XY, then quantifies the net information flow. From the definition of DI XY, Eq.(6), it is clear that DI XY = DI YX. Also, 1 DI XY 1: DI XY = 1 if and only if I XY,I YX =(i.e., the information flow is X Y and there is no back coupling Y X) anddi XY = 1 if and only if I XY =,I YX (i.e., the information flow is Y X and there is no back coupling X Y ). Naturally, τ>is a parameter that has to be tuned appropriately to the time-scales relevant to the particular dataset. 2.5 Symbolic Ordinal Patterns For the information theoretic measures (entropy, MI and DI) the PDF of the time series has to be approximated. This can be done in several ways, the most usual is by means of histograms of the data. ClimNet provides two way to calculate the PDFs p i, p j and p ij : by histograms of the original values (this case will be referred to as MIH) and by using a symbolic transformation, in terms of probabilities of ordinal patterns [?] (this case will be referred to as MIOP). MIOP allows time scale selection, which can be used in any information theoretic measure. The procedure for this calculation is the following: Ordinal Patterns are calculated by pointing out the value of a data point relative to other values in the series. Using e.g. three symbols (letters) 3! = 6 different patterns exist, for four symbols there will be 4! = 24 patterns, and so forth see Fig. 1 (a) and (b) respectively. The possibility of two equal adjacent values is not considered, partly due to the presence of noise in the time series. As shown in Fig. 1 OPs of length 3 are formed by 3 symbols in the following way: if a value (x i (2)) is higher than the previous one (x i (1)) but lower than the next one (x i (3)), it will yield the pattern 123 (Fig. 1 (1)), while the opposite case (x i (3) <x i (2) <x i (1)) will give the pattern 321 (Fig. 1 (6)), etc. This symbolic transformation allows to detect correlations in the sequence of values which are not taken into account when using histograms of values as they do not consider the order in which the values appear in the time series. As a drawback, this 47

1 2 3 4 5 6 Figure 1: Ordinal patterns are computed from the comparison between (not necessarily adjacent) points on a time series. The number of possible patterns depend on the number of letters considered. In the general case they grow as n!. [?] technique does not contain information about the relative magnitudes; this is usually useful as a natural robustness under low to moderate noise embedding. After constructing time series of the OPs, histograms can be calculated from them. In this symbolic approach, the number of bins is naturally defined by the number of possible patterns, which in turn is determined by the number of symbols in the ordinal pattern. As explained above, if the OP word is of length n, there will be n! possible patterns, and this will be the number of bins used for computing the probabilities associated with the symbolic sequences. This eliminates the binning problems which frequently appear when using histograms. Ordinal patterns do not need to be constructed with immediately adjacent data points only. It is possible to construct them with data points that are separated in time, and in this way different time scales are considered. This symbolic transformation keeps the information about correlations present in a time series at the selected time scale, but does not keep information about the absolute values of the data points. Therefore, the e.g. Mutual Information computed from ordinal patterns (MIOP) can be expected to provide complementary information with respect to the standard method of computing the mutual information (MIH). 3 Statistical Significance In any experiment or observation that involves drawing a sample from a population there is always the possibility that an observed effect would have occurred due to sampling error or chance alone. However if the probability of obtaining an extreme result large difference between two or more sample means given the null hypothesis is true, is less than a predetermined threshold (e.g. 5% chance), then it is possible to conclude that the observed effect is not due to chance[?]. To address the significance of the values, ClimNet uses bootstrap (BS) algorithm [?]. This algorithm randomly resamples with replacement from the original datasets using blocks of data of approximately the size of the autocorrelation time of the time series, and then computes the estimators (MI and DI) from the resampled data. Doing so, both the statistics (histogram) and the power spectrum of the original time series are approximately preserved. In this way, an empirical distribution can be obtained for each link, and significance thresholds can be extracted from them. 48

4 ClimNet Architecture ClimNet is equipped with wide range of necessary functions to construct climate networks from raw time series. In line with the described measures in the previous section, the processing time series sequence carried out by ClimNet is shown in fig. 2. The core functionality of the software is to compute the adjacency matrix from time series. To do so, various options are possible: from the linear cross correlation, used mostly for benchmarking and for comparison to the other measures, through entropy (an univariate measure which estimates the amount of variability a certain region has) to Mutual information and the directionality index, which can be calculated both using histograms or using the more sophisticated method of ordinal analysis. Each link is repeated a number of times by using the bootstrap algorithm in order to assure the statistical significance of the result. Insignificant links are discarded, whereas significant ones are accepted and thus included in the resulted adjacency matrix. Time series (input) Pre-processing tools [Python] Statistical Measures Entropy Cross Correlation Mutual Information Mutual Information OP Directionality Index Directionality Index OP Correlation / Directionality Matrix Statistical analysis Adjacency Matrix Post-processing/plotting tools [Python] Figure 2: Software description. ClimNet provides an interface to read and pre-process climate time series. Afterwards, the user is given the choice to select the type of network based on a statistical measure(s). The resulting network (adjacency matrix) is subject to further analysis by graph analysis by common graph analysing algorithms/libraries. 49

Having produced the adjacency matrix (network), it can be processed consecutively by any graph analysing algorithm(s) in order to obtain more advanced measures as betweenness centrality and community detection. Accordingly, the general architecture of ClimNet is gown in fig. 3. netcdf files ASCII files User Interface Core Functions C++ Interface with Graph Algorithms Figure 3: ClimNet architecture. The software core is implemented in C++ with python tools for pre-processing (removal of the seasonal cycle, or trends from the time series, normalisation, among others) and post-processing (calculating some basic network properties and graphing the network as a map in order to be interpreted by climatologists). ClimNet s interface is provided with routines to read netcdf files (self-describing, machine-independent data formats - very popular in climate science) and simple ASCII files. 5 Applications - Climate Networks Examples The software presented in this paper has been used in various climatological studies through its development. Some of the results obtained with these techniques is shown below. All the maps shown are statistically significant. In fig. 4 Mutual information using ordinal patterns is shown for the SAT field. This technique allows to select the time scales of the phenomena directly, something harder to achieve using traditional methods. The figures (left) show networks for shorter time scales while in the right longer time scales are displayed. The top maps are of connectivity while the bottom maps are of connections to a given point (in the central Pacific ocean). Note that for longer time scales the connectivity is higher and there are longer connections than for shorter timescales. This is compatible with the current understanding of climatological patterns, dominated by inter annual time scales like ENSO. Figure 5 show the comparison between the mutual information and the directionality index. Both calculated without ordinal patterns. MI show that the Pacific ocean is a highly clustered area for the SAT field while DI show that information is flowing from east to west in the equator, probably following the trade winds. All this information is inferred directly from the data and can be of big help for climatologists in order to analyse non linear relationships and the flow of the information hinting (but not proving) causal effects. 41

MIOP3L1: threshold =.52 ; ρ =.63 6N 3N 6S 45E 9E 135E 18E 135W 9W 45W MIOP3L1: threshold =.52 ; ρ =.63 6N 3N 6S MIOP3L4: threshold =.52 ; ρ =.113.13.12.11 6N.1.9 3N.8.7.6.5.4.3 6S.2 45E 9E 135E 18E 135W 9W 45W MIOP3L4: threshold =.52 ; ρ =.113 1.4 1.26 6N 1.12.98 3N.84.7.56.42.28 6S.14 MIOP3L12: threshold =.54 ; ρ =.117.3.27 6N.24 3N.21.18.15.12.9.6 6S.3. 45E 9E 135E 18E 135W 9W 45W MIOP3L12: threshold =.54 ; ρ =.117 1.62 1.44 6N 1.26 3N 1.8.9.72.54.36 6S.18.5.45.4.35.3.25.2.15.1.5. 1.98 1.76 1.54 1.32 1.1.88.66.44.22 45E 9E 135E 18E 135W 9W 45W 45E 9E 135E 18E 135W9W 45W 45E 9E 135E 18E 135W 9W 45W Figure 4: Example of Mutual information using ordinal patterns (MIOP). The Area weighted connectivity (equivalent to the degree of the network over a sphere) is shown in the top maps. Bottom maps show one column of the matrix showing the connections to the point shown with an x. First row: mostly time scales, second row, seasonal time scales, third row, yearly time scales [?]. Figure 5: Columns of the Adjacency matrix, showing connections for a point in the central Pacific ocean (marked with a green triangle), for (left) Mutual information and (right) Directionality Index. Notice the similarities in the shape of the patterns and the new structure unveiled by DI showing the direction of the flow of information, not present in the figure on the left. [?] 411

6 Conclusions The presented software tools in ClimNet allow the construction of relatively large climate networks from linear and non-linear statistical measures. ClimNet is equipped with easy-to-use user interface to facilitate the reading and the pre-processing of climate time series. The developed techniques to evaluate linear and the nonlinear relationships between variables at different timescales has been addressed in detail. Additionally, we have discussed examples on its application to real climate data, resulting in recent findings and original contributions in climate research. We believe ClimNet will help researchers in various complex sciences domains, and it is hoped that it will contribute to further findings in the investigation of complex systems as well as in climate research References 412