GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications http://www.tempoquest.com Allen Huang, Ph.D. allen@tempoquest.com CTO, Tempo Quest Inc. GTC 2016 San Jose, CA 5 April, 2016 1
GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications Why Weather Forecast is not accurate enough Model is not Perfect yet evolving scientific understanding & algorithm development Data is not always accurate actual and accurate initial data are expensive to collect & process High performance computer is expensive only can afford limited resource to deploy & operate HPC Acceleration of Weather Forecasting S/W Same forecasts faster, much faster Better forecasts take much more computations Location, timing, intensity, next hour, tomorrow, next week,. Most of the legacy S/W can t take advantage of the new H/W Acceleration of Satellite Data Processing Hyperspectral Data Retrieval Hyperspectral Data Compression Summary 2
GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications Why Weather Forecast is not accurate enough Model is not Perfect yet evolving scientific understanding & algorithm development Data is not always accurate actual and accurate initial data are expensive to collect & process High performance computer is expensive only can afford limited resource to deploy & operate HPC Acceleration of Weather Forecasting S/W Same forecasts faster, much faster Better forecasts take much more computations Location, timing, intensity, next hour, tomorrow, next week,. Most of the legacy S/W can t take advantage of the new H/W Acceleration of Satellite Data Processing Hyperspectral Data Retrieval Hyperspectral Data Compression Summary 3
Three critical factors: 1. Imperfect MODEL 2. Lack of/erroneous INITIAL DATA/CONDITIONS No data or sparse coverage, infrequent Unknown attributes; not coupled 3. Lack of COMPUTING POWER Why are the Weather Forecast Models not accurate enough? 4 4
Why are the Weather Forecast Models not accurate enough? Three critical factors: 1. Imperfect MODEL 2. Lack of/erroneous INITIAL DATA/CONDITIONS 3. Lack of COMPUTING POWER Increasing needs of ensemble runs Increasing demands for higher resolution Increasing high frequency of assimilations Increasing model complexity Resulting to high demand in computing resources 100,000 to 200,000 CPU cores required for: Global cloud resolving NIM @2KM resolution, 2x/day Regional Models North American (NA) Domain HRRR @<1KM, hourly Ensembles HRRR @3KM NA, 100 members, hourly Reference : 250,000 CPU cost ~$100M; use 7,000KW & ~$8M/year energy bill 5 5
Why are the Weather Forecast Models not accurate enough? Operational (T574~ 27km) Experiment (T1500~ 13km) Note: Last 24h of the high resolution experiment track based on 6h model output 2X resolution 10X of computing cost 6 6
1 Zflops = 10 21 flops 1 million trillion (1 billion billion) flop 7 per sec, or 1 exaflops
GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications Why Weather Forecast is not accurate enough Model is not Perfect yet evolving scientific understanding & algorithm development Data is not always accurate actual and accurate initial data are expensive to collect & process High performance computer is expensive only can afford limited resource to deploy & operate HPC Acceleration of Satellite Data Processing Hyperspectral Data Retrieval Hyperspectral Data Compression Acceleration of Weather Forecasting S/W Same forecasts faster, much faster Better forecasts take much more computations Location, timing, intensity, next hour, tomorrow, next week,. Most of the legacy S/W can t take advantage of the new H/W Summary 8
9
Processing times CPU Vs. GPU Early Result (2009) Time [ms] The original Fortran code on CPU 16928 CUDA C with I/O on GPU 83.6 CUDA C without I/O on GPU 48.3 Our experiments on the Intel i7 970 CPU running at 3.20 GHz and a single GPU out of two GPUs on NVIDIA GTX 590
The Fast Radiative Transfer Model Without losing the generality of our GPU implementation, we consider the following radiative transfer model: p s d ( ) ( ) ( ) ( ) v p Rv v Bv Ts v p s Bv T p dp 0 dp with the regression-based transmittances: 11
12
GPU-based Multi-input RTM A forward model to concurrently compute 40 radiance spectra was further developed to take advantage of GPU s massive parallelism capability. To compute one day's amount of 1,296,000 IASI spectra, the original RTM (with O2 optimization) will take ~10 days on a 3.0 GHz CPU core; the single-input GPU-RTM will take ~ 10 minutes (with 1455x speedup), whereas the multi-input GPU-RTM will take ~ 5 minutes (with 3024x speedup).
GPU Acceleration of Satellite Hyper Spectral Maximum Likelihood Retrieval 14
GPU Acceleration of Predictive Partitioned Vector Quantization for Ultraspectral Sounder Data Compression 15
GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications Why Weather Forecast is not accurate enough Model is not Perfect yet evolving scientific understanding & algorithm development Data is not always accurate actual and accurate initial data are expensive to collect & process High performance computer is expensive only can afford limited resource to deploy & operate HPC Acceleration of Satellite Data Processing Hyperspectral Data Retrieval Hyperspectral Data Compression Acceleration of Weather Forecasting S/W Same forecasts faster, much faster Accleration of Weather Research and Forecasting (WRF) Model Radiation; PBL, Surface Cumulus Parameterization, Cloud Microphysics and Dynamic Core Summary 16
CONtinental United States (CONUS) benchmark data set for 12 km resolution domain for October 24, 2001 The size of the CONUS 12 km domain is 433 x 308 horizontal grid points with 35 vertical levels. The test problem is a 12 km resolution 48-hour forecast over the Continental U.S. capturing the development of a strong baroclinic cyclone and a frontal boundary that extends from north to south across the entire U.S. 17
18
CU P PBL Surface Radiation RRTMG LW 123x / 127x (GPU) JSTARS, 7, 3660-3667, 2014 RRTMG SW 202x / 207x (GPU) JSTARS, PP, 1-11, 2015 Goddard SW 92x / 134x (GPU) JSTARS, 5, 555-562, 2012 Dudhia SW MYNN SL 19x / 409x 6x / 113x TEMF SL 5x / 214x Thermal Diffusion LS 10x / 311x [ 2.1 x ] (GPU) JSTARS, 8, 2249-2259, 2015 YSU PBL 34x / 193x [ 2.4x ] (GPU) GMD, 8, 2977-2990, 2015 TEMF PBL [14.8x ] (MIC) SPIE:doi:10.1117/12.2055040 Betts-Miller-Janjic (BMJ) convetion 55x / 105x GPU speedup: speedup with IO / speedup without IO MIC improvement factor in [ ]: w.r.t. 1 st version multi-threading code before any improvement
Cloud Microphysics Kessler MP 70x / 816x J. Comp. & GeoSci., 52, 292-299, 2012 Purdue-Lin MP 156x / 692x [ 4.2x] (GPU) SPIE: doi:10.1117/12.901825 WSM 3-class MP 150x / 331x WSM 5-class MP 202x / 350x (GPU) JSTARS, 5, 1256-1265, 2012 Eta MP 37x / 272x SPIE: doi:10.1117/12.976908 WSM 6-class MP 165x / 216x (GPU) J. Comp. & GeoSci., 83, 17-26, 2015 Goddard GCE MP 348x / 361x [ 4.7x] (GPU) JSTARS, 8, 2260-2272, 2015 Thompson MP 76x / 153x [ 2.3x] (MIC) SPIE: doi:10.1117/12.2055038 SBU 5-class MP 213x / 896x JSTARS, 5, 625-633, 2012 WDM 5-class MP 147x / 206x WDM 6-class MP 150x / 206x J. Atmo. Ocean. Tech., 30, 2896, 2013 GPU speedup: speedup with IO / speedup without IO MIC improvement factor in [ ]: w.r.t. 1 st version multi-threading code before any improvement 20
Tempo Quest Inc. (TQI) S/W Product Pipeline Weather/Environment Domain AceCAST Lite: 6 months out Pre AceCAST (CPU/GPU Hybrid WRF) AceCAST: 12 months out (subject to funding) CUDA GPU WRF Beyond AceCAST: 2-3 years out (subject to funding) DataCAST (CUDA WRF Data Assimilation) ChemCAST (CUDA WRF Chem) HurCAST (CUDA Hurricane WRF) HydroCAST (CUDA WRF Hydro) FireCAST (CUDA WRF Fire) 21
GPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications Thank you for your Attention Questions are Welcomed allen@tempoquest.com 22