Processing NOAA Observation Data over Hybrid Computer Systems for Comparative Climate Change Analysis

Processing NOAA Observation Data over Hybrid Computer Systems for Comparative Climate Change Analysis Xuan Shi 1,, Dali Wang 2 1 Department of Geosciences, University of Arkansas, Fayetteville, AR 72701, USA 2 Environmental Science Division, Oak Ridge National Lab Oak Ridge TN 37831, USA Abstract - With the rapid development of weather monitoring system, numerous observational data are available. For example, NOAA provides Global Surface Summary of Day (GSOD) data that incorporates daily weather measurements from over 9000 weather stations around the world. In this paper, a comprehensive workflow and methodology is presented to elaborate how to transform GSOD data into a new and useful format so as to generate interpolated product of daily, monthly and annual mean surface temperature datasets by using advanced computation platforms. The quality of this gridded, high resolution (at ¼ degree) daily product is further examined by comparing to an existing global climate dataset. A preliminary comparison on global surface temperature shows a consistent agreement between these two datasets, with the major differences located in a few regions. The interpolated GSOD data products are supplementary to existing datasets by providing new gridded, high resolution observation-based daily temperature information over three decades (1982-2011), which are very useful for decadal climate change researches. Keywords: GSOD/NOAA Observational Data, Interpolation, Parallel Computing, GPU 1 Introduction Historical weather datasets have been extensively used as a source of information to study global climate change and to validate and verify earth system models [1][2]. NOAA provides Global Surface Summary of the Day (GSOD) data through an FTP server ftp://ftp.ncdc.noaa.gov/pub/data/gsod/. GSOD data incorporates daily weather measurements (temperature, dew point, wind speed, humidity, barometric pressure, and so forth) from over 9000 weather stations around the world. GSOD data are available from 1929 to the present, with the data from 1973 to the present being the most complete. GSOD data, in its original format, however, could hardly be utilized directly and efficiently by researchers. Since each weather station is identified by a specific code, it is difficult to know where those stations are located in the archived data files when the location [latitude and longitude] information of weather observation station, as well as other descriptive information about the data, is documented in the metadata file, which is stored separately from the source data in ASCII format. This means, even the users can download the data from the FTP Server, it is difficult for them to understand where the weather stations are located if they cannot link the location information within the metadata with the ASCII file, let alone to find station(s) for a specific area or place. It is impossible to derive the global climate change information from such individual, station-based, unstructured data for any given temporal scope. This paper describes the workflow and method to transform GSOD raw data into an applicable format and how to produce interpolated temperature data from daily meteorological observations at any arbitrary resolution. Considering the general interest of journal readership, global daily temperature results on a 0.25 degree x 0.25 degree surface grid were generated for duration from 1982 to 2011. Furthermore, time-series monthly average temperature grids were generated to compare with widely used high resolution gridded datasets (http://www.cru.uea.ac.uk/cru/data/hrg/), which contain the time series of monthly average temperature developed by the Climatic Research Unit at the University of East Anglia (CRU TS). The preliminary comparison result indicates the overall consistence between these two datasets, while major differences are located around the Tibet plateau. The future work will focus on the improvements of station selection for interpretation, the topography-dependent heterogeneity of surface temperature measurement, as well as comparison with other existing global datasets, such as NASA Goddard Institute for Space Studies (GISS) and Moderate Resolution Imaging Spectroradiometer) MODIS datasets. 2 Methodology and Workflow GSOD data contains a variety of observed weather information including the mean, maximum and minimum temperature, mean dew point, mean sea level and station pressure, precipitation amount and snow depth, as well as other elements. However, such an invaluable data has been archived by individual station in unstructured ASCII format. In order to efficiently utilize GSOD data for climate change research, station-based data has to be transformed into datebased data in which the location of the station is embedded

and merged into the time series datasets. As a result, daily global mean surface temperature can be approximated by applying interpolation algorithm. Furthermore, monthly and annual mean surface temperature as well as anomaly can be developed. Interpolation is a method to estimate the value of unsampled location based on the values of existing observations. Interpolation can be implemented by different approaches in different domain science applications. Among these approaches, Kriging is a geostatistical interpolation method that is effective for predicting the spatial distributions of geographic features, although Kriging has complex implementation and a large computation load [3][4][5][6]. We applied Kriging interpolation in this pilot study. Technically, the data processing workflow contains four steps (as shown in Figure 1), including 1) data transformation, which converts station-based ASCII file into date-based data by integrating the location information into the new daily dataset; 2) data interpolation, which generates interpolated daily mean surface temperature; 3) data aggregation, which generate the monthly and annual mean surface temperature or 30 year anomaly for example; and 4) data subtraction, which derives the temperature change information. Scientifically, the new data product will help user and researchers to 1) identify and understand the spatial and temporal differentiation of climate change in the past decades at the global and local scale; 2) explore and understand the climate change tendency by analyzing and visualizing historical data; 3) compare, validate or examine potential climate models and results; 4) integrate the output into other research projects as the source of data. 3 Computational Platform Kriging interpolation is data and compute intensive especially when great circle distance is applied to identify a given number of nearest neighboring observation stations. When high resolution output grids are generated, it may take hundreds of days to process the entire data for three decades. Parallel computing over the Graphic Processing Units (GPU) can significantly accelerate the time-consuming calculation process to improve the performance in verities of scientific computation. When multiple GPUs can be utilized to accelerate Kriging calculation, the processing time can be reduced from dozens of minutes on the serial program to dozens of seconds on a single GPU over a desktop computer and to a few seconds on Keeneland [7], which is a hybrid computer system that has 240 CPUs and 360 GPUs. We started the development process to implement Kriging computation over a desktop computer in order to establish a standard for quality control and performance comparison to the parallel solution and products. The desktop computer has an Intel Pentium 4 CPU with 3.00 GHz main frequency, while the RAM size is 4 GB. The desktop machine has a graphic processing unit (GPU) that is a NVIDIA GeForce GTS 450, which has 192 cores and has 1 GB global memory. According to the technical specification, this GPU has 24 streaming multiprocessors (SM). Each SM has 8 CUDA cores called as streaming processor (SP). In this GTS 450 with a compute capability of 2.1, up to 1024 threads can be assigned to each SM. Thus a maximum of 1024 x 24 = 24,576 threads can run concurrently in parallel on the physical GPU, although the maximum sizes of each dimension of a block is 512 x 512 x 64 and the maximum Figure 1: Workflow of GSOD data transformation and computation

sizes of each dimension of a grid is 65535 x 65535 x 1. If the number of threads is more than the maximum number [24,576], the remaining threads have to wait. After the Kriging interpolation algorithm is implemented and validated over both the serial program and the parallel solution over desktop GPU, Kriging is implemented over Keeneland, a hybrid computer system jointly developed by Georgia Institute of Technology, the University of Tennessee at Knoxville and the Oak Ridge National Laboratory sponsored by NSF, to accelerate the computation over more than 10,000 daily temperature estimation. Keeneland is composed of an HP SL-390 (Ariston) cluster with Intel Westmere hex-core CPUs, NVIDIA 6GB Fermi GPUs, and a Qlogic QDR InfiniBand interconnect. The system has 120 nodes, each with two CPUs and three GPUs, while all CPUs and GPUs are bridged together through one I/O hub from which the CPUs can read/write data. Generally the CPUs serve as a high-level controller coordinated through Message Passing Interface (MPI), while GPUs implement the intensive computation job at a relatively low-level. By utilizing multiple GPUs on Keeneland, the computational time on interpolation was reduced to 3-4 seconds when one Keeneland node with three GPUs was utilized even without applying any spatial index 4 Implementation Details Within the workflow of data transformation and computation, Kriging interpolation could be the most timeconsuming procedure when a resolution of 0.25 x 0.25 degree grid is designed as the output product that has 1440 x 720 = 1,036,800 cells. If the GSOD data has 2,000+ to 5,000+ records, Kriging through serial computer program needs more than 11 or 30 minutes. While the weather observational stations in the world have been increasing in recent decades, GSOD data may have more than 10,000 daily records thus Kriging may need 40-50 minutes to process one daily data. If interpolating one day data needs 30 minutes by average through serial program, it would take about 328,500 minutes or 228 days to process 30 years of daily data from 1982 to 2011. For this reason, we pursued high performance computing solution by utilizing hybrid computer system to implement kriging through combined message Passing Interface (MPI) and Graphics processing unit (GPU) programs. 4.1 Implementation on desktop GPU GPU was traditionally utilized in computer graphics applications. Considering the massive parallelism enabled by the GPU, it can be used for general-purpose computing and thus called GPGPU. By executing tens of thousands of threads concurrently, GPGPU enables high performance computing even on desktop or laptop computers. Compute Unified Device Architecture (CUDA) is NVIDIA s generalpurpose parallel computing architecture. Here, the Central Processing Unit (CPU) is referred to as a host, while an individual GPU is referred to as a device. Normally the GPU executes the data computation process while I/O is done on the CPU, which also manipulates the workflow. The kernel is the function that runs on the device and is executed by an array of threads, while all threads can run the same code concurrently. Each thread has a unique thread identifier and can be accessed via the threadidx variable. Thread identifiers (threadids) can be defined in one, two or three dimensions. Furthermore, threads can be grouped into thread blocks and grids. Threads in same thread block can cooperate with each other via shared memory, atomic operations or barrier synchronization. Threads in different blocks cannot cooperate. A user-defined number of threads can be organized in a block with a maximum number of 512 threads. Similarly a group of thread blocks can be organized into a grid in which each thread may be executed independently and thus may execute in parallel. The first test is on the desktop computer through Visual Studio.Net 2010. Interpolation calculation could be a perfect match for parallel computing using GPUs. In essence, interpolation can be treated as a matrix calculation which is generic in GPGPU applications. We specify a number of columns and rows to define the dimension of the output grid. For each cell, we need to first find a given number of nearest neighboring points that have observational records. Then we implement the Kriging algorithm over each cell for interpolation calculation to derive the approximated value based on the observational values of its nearest neighbors. The calculation on each cell has no dependence on the other cells thus interpolation can be processed as an embarrassingly parallelism. A general scheme for Kriging by CUDA C program can be summarized as: 1. Specify the types and sizes of input and output data; 2. Allocate memory on GPU for input data, output data, and intermediate data; 3. Allocate the computing resource on GPU, i.e. specify number of threads per block and total number of blocks; 4. Copy both input and output data from CPU to GPU; 5. Execute the algorithm for kriging computation; 6. Copy both input and output data from GPU to CPU; 7. Write the output data in ASCII grid format; 8. Free the allocated GPU memory. To achieve high performance, we specify the number of blocks to be used and the number of threads in each block. In this case, for example, if 20,000 concurrent threads can be used to run Kriging interpolation, the program will be executed 50 times if the output grid has 1 million cells.

Table 1. Performance comparison based on different scale of data between serial program, a single GPU over desktop machine, as well as combined MPI and GPU program over Keeneland. Time is counted in seconds. Data Size Time/speedup on desktop Time/Speedup on Keeneland 1 CPU 1 GPU 1 GPU 3 GPUs 6 GPUs 9 GPUs 2191 669 56 / 12 7 / 96 4 / 167 6 / 112 6 / 112 4596 1570 66 / 24 8 / 196 5 / 314 6 / 262 7 / 224 6941 1960 65 / 30 7 / 280 4 / 490 7 / 280 6 / 327 9817 2771 52 / 53 6 / 462 4 / 693 7 / 396 6 / 462 4.2 Implementation on Keeneland Implementing the spherical interpolation computation on Keeneland is a combination of MPI and CUDA programs. The CUDA program is responsible for the computation of a block of the interpolated raster grid divided by horizontal rows. Each MPI process has a unique process rank number which is used to specify how many and which rows each CUDA program will process on the GPU node. The MPI processes read the input data, assign the jobs to the GPU nodes to implement the spherical interpolation program, and write a segment of the output data into a file in parallel. When all MPI processes are completed, one MPI process merges all segments of the output data into a single file. 5 Performance Evaluation A varied scale of datasets is used in the performance testing. Given the size of the output grid as 1440 x 720 = 1,036,800 cells, Tables 1 displays the performance of the Kriging interpolation over different datasets using a single CPU, and the performance and speedups of the parallelized solutions on a single GPU on a desktop computer, and on 1, 3, 6, and 9 GPUs on Keeneland. In this case, 10 nearest neighboring points that have observational values are used in the Kriging calculation. The advantage of using the GPU is noticeable along with the increasing data scale as the speedup increases even when a single GPU is used on a desktop computer. In Keeneland, the maximum speedup is achieved when one node with 3 GPUs was utilized at all scales of input data. This result implies that at current scale of input data size, utilizing more GPUs may result in more overhead for data manipulation between the host and device. If larger scale of data could be applied, it may have a different performance pattern or result over different number of GPUs utilized in the calculation. 6 Visualization and Analytics of Interpolated GSOD Data Now that GSOD data has been transformed into the new format, such observational data can be visualized and analyzed through geographic information system (GIS) software for example. Figure 2 displays the distribution of the weather observation stations on the globe on July 1, 2009 and the global mean surface temperature on this specific day modeled by Kriging calculation. Time series of temperature evolution can be visualized as a movie or animation. For any given location on the earth, we can to search query on the GSOD data through the identification function call. By clicking on the map interface, we can identify the location (latitude and longitude) of the clicked point and retrieve the daily mean surface temperature of this given location for any year. In this way, we can examine the quality of the Kriging result by comparing the Kriging result with the original GSOD data for any known station so as to validate the methodology for further improvement of this work on the one hand. On the other hand, we can offer the capability or service to allow users to search query over the local or Figure 2: Distribution of weather observation stations and the global mean surface temperature on 07/01/2009

Figure 3: Global temperature profiles in winter month (Jan, 2006) (left: CRU TS data, right: gridded GSOD product with the same color scheme). Figure 4: Global temperature profiles in summer month (July 2006) (left: CRU TS data, right: gridded GSOD product with the same color scheme). regional temperature change for a certain time period to enhance the domain science research and application development. 7 Comparison with Existing Global Climate Dataset 7.1 Dataset description Climatic Research Unit (CRU) TS (time-series), or CRU TS, datasets contain month-by-month variations of global climate information over the last century or so. CRU TS datasets are archived as high-resolution (0.5 x 0.5 degree) grids of monthly mean temperatures derived from more than 4000 weather stations distributed around the world. CRU TS data includes weather information such as cloud cover, diurnal temperature range, frost day frequency, precipitation, daily mean temperature, monthly average daily maximum temperature, vapor pressure and wet day frequency. At present, the British Atmospheric Data Center holds the CRU TS 3.0 datasets for the period 1901-2006 as well as the CRU TS 3.1 datasets for the period 1901-2009. In this study, the monthly temperature dataset from CRU TS 3.0 [8] is used for the comparison to the gridded daily product of GSOD temperature. 7.2 Comparison Since CRU TS only has half degree grids of monthly average temperature over six earth continents (without Antarctica), we filtered out all the data covering the ocean and Antarctica for comparison to the monthly mean surface temperature derived from the interpolated GSOD daily mean surface temperature over those six continents for comparison. Figures 3 and 4 display the spatial distribution of the average temperature of two specific months, specified by CRU TS dataset in winter and summer month (January and July) in 2006. As shown in Figure 3 and 4, the temperature distribution pattern exemplifies a good match between CRU TS data and our gridded, GSOD product in a majority of areas, while major difference exists around Himalaya mountain areas. 8 Conclusions and Future Work This paper presented the method and workflow to transform NOAA Global Surface Summary of Day (GSOD) data into a more useful format to support climate change research. While the location of weather observation stations is embedded, date-based GSOD data can be further transformed into detailed gridded data products at very fine spatial and temporal scale. By deploying hybrid computer architecture and systems, interpolating global daily mean surface temperature in the past 30 years can be accomplished within two hours. The quality of interpolated GSOD products exemplifies satisfied quality in most regions over the continents in the world. The preliminary comparison between CRU TS data and our new gridded data products derived from GSOD shows a consistent match between these two datasets, with the major difference identified around boundaries of Tibetan plateau. With the increasing demands on the research of decadal climate change and its impact, our gridded GSOD data products can serve as a high fidelity

benchmark datasets to validate and verify those finer scale climate simulation results. It can also be used as fine scale (both temporal and spatial) external forcing to investigate regional climate impacts. The future work will focus on the improvements of station selection for interpretation, and the topography-dependent heterogeneity of surface temperature measurement in the data generation procedure. Further comparison with other existing global climate datasets, such as NASA GISS datasets (http://data.giss.nasa.gov) and MODIS datasets (lpdaac.usgs.gov), will help to understand the different models and the output results for climate change research. Our gridded GSOD data product [i.e. global daily mean surface temperature grids at a resolution of 0.25 degree x 0.25 degree for a duration between 01/01/1982 and 12/31/2011] is now available upon request, and authors are making plans to make the product available via Distributed Active Archive Center for Biogeochemical Dynamics at Oak Ridge National Laboratory. All the datasets generated by this study are available upon request, and DOI was requested for those datasets. 9 Acknowledgement This research was supported partially by the National Science Foundation through the award OCI-1047916. This research used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI- 0910735. Oak Ridge National Laboratory is managed by UT- Battelle LLC for the Department of Energy under contract DE-AC05-00OR22725. [5] Cheng, T.; Zhang, Y.; Li, D.; Wang, Q. 2010, A component-based design and implementation for parallel Kriging library, Information Science and Engineering (ICISE), 2010 2nd International Conference on, vol., no., pp.1-4, 4-6 Dec. 2010. [6] Srinivasan, B. V.; R. Duraiswami; and R. Murtugudde. 2010. Efficient kriging for real-time spatio-temporal interpolation. Online proceedings of the 20th Conference on Probability and Statistics in the Atmospheric Sciences. [7] Vetter, J.S., R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, Keeneland: Bringing heterogeneous GPU computing to the computational science community, IEEE Computing in Science and Engineering, 13(5):90-5, 2011. http://dx.doi.org/10.1109/mcse.2011.83 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=59 99785 [8] CRUNECP, Mitchell, T.D. and Jones, P.D., 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. International Journal of Climatology 25, 693-712 doi:10.1002/joc.1181 10 References [1] National Research Council. A National Strategy for Advancing Climate Modeling. Washington, DC: The National Academies Press, 2012. [2] Trenberth, K. E., Anthes, R. A., Belward, A., Brown, O., Haberman, E., Karl, T. R., Running, S., Ryan, B., Tanner, M., and Wielicki, B., 2012: Challenges of a sustained climate observing system. In Climate Science for Serving Society: Research, Modelling and Prediction Priorities, Hurrell, J. W. and Asrar, G. eds., Springer, accepted. [3] Oliver, M. A. and R. Webster. 1990. Kriging: a method of interpolation for geographical information systems. International Journal of Geographical Information Science, Vol. 4, No. 3. (1990), pp. 313-332. [4] Tang, Tao. 2005. Spatial Statistic Interpolation of Morphological Factors for Terrain Development. GIScience & Remote Sensing. Volume 42, Number 2 / April-June 2005. pp 131-143.