Statistical analysis of regional climate models. Douglas Nychka, National Center for Atmospheric Research National Science Foundation Olso workshop, February 2010
Outline Regional models and the NARCCAP design ANOVA for rainfall fields Distribution of Extremes Challenges: Spatial and functional data, computation for large data sets.
Acknowledgements NARCCAP Linda Mearns (NCAR), Cari Kaufman (Berkeley), Stephan Sain (NCAR)
What is climate? Climate is what you expect... weather is what you get. Weather: Climate: For example, the 30 year average rainfall In general it is the distribution of weather events.
Climate models A climate model is a computer code based on physical principles that simulates the complex interactions between the many parts of the Earth system. It can be used to: run climate process experiments simulate current climate or make projections about future climate. Historically the models have been numerical laboratories for geophysics.
Climate system model components: Global, Atmosphere, Ocean, Sea Ice, Land. AOGCM = Atmosphere Ocean General Circulation Model.
Big Iron to run the models 1 day wall time 5 years climate simulation 10s of terabytes of output. 100s of scientists 10 6 lines of code
The problem of regional climate. The global models on their own do not give enough detailed information at regional and local scales. 32km 256km
A single global model grid box All the features in the atmosphere are reduced to a single one dimensional column! Many physical processes and features can not be modeled explicitly! E.g. thunderstorms or extreme weather events.
An data example from Colorado Elevation (m) and grid boxes at 60 and 250 km Doppler radar provides detailed daily rainfall data on a complete grid. mm/hour 0.0 0.5 1.0 1.5 0 20 40 60 80 June August 1996 Larger grid box mean does not capture extremes and variability of the point radar observations. Smaller grid box mean better tracks observations.
Downscaling and statistics Point rainfall and the grid box mean point precip mm/hour 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 grid point precip The skewness for low values yields a drizzle for the grid box mean compared to stronger intermittent local rainfall. Not enough observations to build empirical, statistical models. Physical relationships are important for projecting future changes
An approach to Regional Climate Nest a higher resolution weather model inside a global simulation to gain higher resolution Downscaled rainfall: A snapshot from the 3-dimensional RSM3 model forced by global observations (NCEP) Consider different combinations of global and regional models to characterize model uncertainty.
NARCCAP the design 4GCMS 6RCMs: 12 runs as a balanced half fraction design Global observations 6RCMs for model calibration High resolution global X - examples in this talk AOGCM Regional Climate Models MM5I RegCM CRCM HADRM RSM3 WRFP GFDL CCSM GFDL X X CGCM3 HADCM3 CCSM Obs Massive data: Surface fields on 100 100 grid. Every 3 hours for 30 year periods!
Data is never simple... Model output is on different grids and requires regridding. We regrid using compact splines
How will rainfall change for the US? Investigate with a 2 2 design that is a subset of NARCCAP. The Factors (levels): Downscaling: (high resolution global model, regional model) GFDL climate model with GFDL high resolution atmosphere and UC Santa Cruz Regional Climate Model. Time period: (current climate, future scenario) [1971, 2000] and [2040, 2071] A scenario of relatively high emissions of greenhouse gases (A2). The Response: Summer precipitation (JJA) on a common 134 83 grid.
A Functional ANOVA Model Y M,T,k : Summer rainfall at grid boxes based on: Model (timeslice vs RCM) Time Period (current vs future) for 30 years (k = 1,..., 30). Y M,T,k = Baseline + Model effect + Time Period + Interaction + noise k noise k sum of a trend component and Markov random field. Linear model components a realization from a Markov random field Key inference concerns the uncertainty in present verses future regional climate change.
Useful covariance models Σ modeled as Σ = σ 2 R 1 (φ) C 1 (φ) R and C are 1-D stationary Markov field covariances depending on parameters φ. Computationally efficient: sparse precision matrices with Kronecker construction. Approximates exponential form of spatial covariance.
A Bayesian implementation Prior distributions are chosen to be largely non-informative. A Gibbs sampler was constructed with some Metropolis- Hastings A single chain of length 5000 was run; convergence suggested
Posterior inference on the factors present α 0 model α 1 change α 2 inter. α 3 P [α0 > µcurr ] P [α1 > 0] P [α2 > µdif f ] P [α3 > 0] p < 0.05 blue; p > 0.95 red.
Some interpretation Larger differences among regional models than between present and future. Interaction is small suggesting a linear response. Some of the largest benefits will be in interaction with the modelers about significant spatial differences and their dynamical sources.
Distribution of daily precipitation An analysis using part of the NARCCAP present climate runs. Includes 800 grid points from RCM simulations
An example Daily (summer) precipitation from a grid box of ECPC. Values relative to the mean and the logged transformation. 0 200 400 600 800 0 50 100 150 200 0 2 4 6 8 10 PPT/mean 4 3 2 1 0 1 2 log(ppt/mean) We have 1K of these!
Summarizing the distribution log spline estimates Stone, Hansen, Kooperberg, Truong (1997) ˆf(y) = exp{b 1 (y)β 1 + B 2 (y)β 2 +... + B 2 (y)β 2 + C(β)} Basis will be a cubic polynomial in between the J knots. Y 1 < Y 2 <... < Y J and linear outside of Y 1 and Y J ˆf will have exponential tails. If applied to the logged data will have polynomial tails.
Logspline fit Fit having logged the precipitation values Density 0.00 0.05 0.10 0.15 0.20 log pdf 8 7 6 5 4 3 2 4 2 0 2 log( PPT/mean) 4 2 0 2 log(ppt/mean)
Fit in unlogged scale Log density estimated tail probabilities. log pdf 10 6 4 2 0 2 PPT/mean 0.5 1.0 2.0 5.0 10.0 0 5 10 15 PPT/mean 0.001 0.010 0.100 tail probability We have 1K of these!
Basis functions to reduce dimension Comparing leading EOFs of 4 RCMs: ECPC, MM5I, WRFP, RCM3 log(ppt/pmean) 0.2 0.0 0.2 0.4 0.001 0.005 0.020 0.050 0.200 0.500 1 P (Is this a result about the logspline estimator?)
An ANOVA summary of first EOF Mean CRCM ECPC 13.5 13.0 12.5 12.0 MM5I RCM3 WRF 3 2 1 0 1 2
Conclusions Numerical models for climate projections or prediction produce complicated output that benefits from statistical summaries. Spatial statistics needs to address functional data in the form of curves and surfaces. Large data sets are the norm.
Thank you!