Extremes Analysis for Climate Models. Dan Cooley Department of Statistics

Extremes Analysis for Climate Models Dan Cooley Department of Statistics In collaboration with Miranda Fix, Grant Weller, Steve Sain, Melissa Bukovsky, and Linda Mearns. CMIP Analysis Workshop NCAR, August 2016 NSF DMS-1243102

Outline Climate models from a statistician s point of view Basics of extreme value analysis Extreme analyses of climate model output Different type of climate model study: assessing extremal correspondance Understanding processes leading to extremes

Climate vs. Weather Climate is what you expect, weather is what you get" expected value: climate often summarized by means. 30-year averages sometimes used in practice. useful way of describing difference to public....but this doesn t really apply when doing an EV analysis. Instead: Climate is the distribution from which weather is drawn (not just the expected value). climate change: how will this dist n change? EV: how will tail of distribution change?

Explaining Climate Models to Statisticians Tools for simulating weather under different climate conditions. Deterministic models: discretized solutions of the differential equations which govern circulation. Why? Cannot run experiments on Earth itself. Climate models produce output not data (observations). Future projections not predictions. Uncertainty Initial conditions internal variability. Statistician s interpretation: sample variability. Parameter uncertainty. Interpretation: Somewhat like a prior but forward running. Model uncertainty. Interpretation: Somewhat like model mismatch. Not as hard for statisticians to understand.

Model Output and Observations Annual Maxima for Boulder, Colorado Density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 observations model output 2013 0 50 100 150 200 250 precipitation (mm) Output may not be directly physically interpretable. People want to use climate model output to investigate potential impacts of climate change. Still may be useful via (statistical) downscaling. Q (for later): When does it make sense to downscale?

GCM s, Reanalysis Products, and RCM s General Circulation Models model large-scale processes over entire globe. grid boxes on scale 100 s of km s. good for studying large-scale changes. are not weather prediction models. Regional Climate Models dynamical downscalers: resolve smaller-scale processes grid boxes on scale 10 s of km s model only a region driven by GCM s (or reanalysis) assessing local impacts sensible? downscaling needed? Reanalysis Products data assimilation products integrate observations. produce GCM-like output. Hindcasts" should exhibit correspondence with observations.

Ordinary vs Extreme Value Statistics Ordinary Statistics: Tries to describe main part of distribution; may ignore outliers. Extremes: Tries to characterize the tail of the distribution; keeps only the extreme observations. Ordinary Stats Extremes

Colorado Flood of 2013 Widespread heavy precip Sept 9-15, 2013. 8 killed, > $1B damage NOAA s NWS, HDSC Big Thompson River Canyon Boulder: Flash flood event Sept 12: 9.08 in. NOAA HDSC 1000-year rtn level est for 24 hr precip: 8.16 in; 90% CI: (5.46-10.9).

Univariate Extreme Value Analysis EVA has a relatively long history of answering questions like: very high quantile: e.g., return level of 100-year flood. return frequency of observed event. Illustration: Boulder precipitation record (May-Sept) precip (in.) 0 2 4 6 8 1950 1960 1970 1980 1990 2000 2010 Analyze the data two ways: 1. Model all of the data. 2. Model only the tail. (Classical EVA)

Modeling all precipitation data Let X t be the daily summer" precipitation amount for Boulder. (Summer = May-Sept) { Xt > 0 w.p. p Assume: ˆp = 0.32. X t = 0 w.p. 1 p. Further, assume that [X t X t > 0] Gamma(k, θ). ML estimates: ˆk = 0.653, ˆθ =.322. Histogram of Non Zero Data Density 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Tail Estimates (Modeling all data) 100-year rtn level estimate: 2.395 (2.29, 2.50) NOAA: 5.52 (4.20, 6.93) Rtn pd of 2013 event est: 161 Billion years (42B, 727B) QQ plot of Non Zero Data empirical 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 model Note: < 1% of all data and 2.7% of non-zero data are > 1.25.

Tail Estimates (Modeling all data) Q: Is the gamma model to blame? A: Only partly. Lognormal: ˆµ = -2.49, ˆσ = 1.37 Histogram of Non Zero Data QQ plot of Non Zero Data Density 0 2 4 6 8 10 empirical 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 2 4 6 8 100-year rtn level estimate: 10.42 (9.31, 11.60) NOAA: 5.52 (4.20, 6.93) Rtn period of 2013 event est: 68.8 years (52.2, 93.2) model

Classical Extremes Approach Select subset of extreme data, fit a model from EVT. block maxima GEV threshold exceedances GPD GEV analysis: (annual max only 64 data points!) ˆµ = 1.35; ˆσ = 0.57; ˆξ = 0.15. empirical 2 4 6 8 1 2 3 4 model 100 year RL est: 5.12 (4.06, 7.32); NOAA: 5.52 (4.20, 6.93) Rtn pd of 2013 event est: 1654 years (188.3, 65K)

Why is the GEV right? Time out. Why is the normal right for the sample mean? Gamma (.5,2) Histogram of Means (n = 100) y 0.00 0.05 0.10 0.15 Sample Mean n = 100 n sum-stable Density 0.0 0.5 1.0 1.5 0 5 10 15 20 x 3.5 4.0 4.5 mvec Central Limit Theorem: If X i has finite second moments, then S n = 1 ni=1 X i µ converges to the normal distribution. n σ

Convergence of Sample Maxima Gamma (.5,2) Histogram of Maximums (n = 100) y 0.00 0.05 0.10 0.15 Sample Maximum n = 100 n max-stable Density 0.00 0.05 0.10 0.15 0 5 10 15 20 x 10 15 20 xvec Max-stability: If Z 1,..., Z n iid and M n = max i (Z i ), and there exist a n and b n s.t. P ( M n b n a n < z ) F, then F is max stable. 3-Types Theorem: F Gumbel (light tail), Fréchet (heavy tail), or Weibull (bounded tail). [Fisher and Tippett, 1928] GEV: The three max-stable types all belong to the Generalized Extreme Value family of distributions.

GEV Distribution [ ( z µ P {M n z} = exp 1 + ξ σ where M n = max(z 1, Z 2,..., Z n ) Generalized Pareto Distribution { )] 1/ξ } Used for threshold exceedances. P {Z u > z Z > u} = ( 1 + ξz σ u ) 1/ξ Return Levels Defined in terms of annual maximum. a high quantile of the GEV (easy to calculate) need to account for dependence if GPD

Fitting GEV Demo Demo Here!

Summary of Classical Univariate Extremes Mantra: Let the tail speak for itself." Fit only a subset of extreme data because... any single distribution is wrong. non-extreme data overwhelm the fit tail poorly fit. large amount of data results in small uncertainties in parameter estimates, underestimates uncertainty in tail (model uncertainty not accounted for). Use a distribution from extreme value theory because... asymptotically justified (probability theory). it doesn t matter what the underlying distribution is. justification for extrapolation into tail.

Pointwise Extreme Analysis of Model Output General Aim: Characterize climatological/marginal behavior of extremes. Approach: Fit a GEV (or occasionally a GPD) at each location independently. Used to: Contrast current and future behavior. Model how extremes change in time or in terms of some covariate. Compare extreme behavior produced by different climate models. Desire to be used for future impacts studies (likely requires downscaling). Examples appear regularly in climate science literature.

Example Study: Fix et al. (2016) Output from CESM1 (NCAR s CMIP5 coupled model). Focus only on continental US. Historical period: 1920-2005, transient run with observed forcings. Future period: 2005-2080 (2100) under RCP8.5 and RCP4.5 projections. Unique aspect: Output from an initial condition experiment. 30 ensemble members for historical/rcp8.5. 15 ensemble members for RCP4.5. Somewhat unique statistical modeling aspect: pattern scaling.

Conditional GEV Model ( G s,x(t) (y) = exp y µ(s, x(t)) 1 + ξ(s) σ(s, x(t)) ) 1/ξ(s), µ(s, x(t)) = µ 0 (s) + µ 1 (s)(x(t) x(2005)), and φ(s, x(t)) := log(σ(s, x(t))) = φ 0 (s) + φ 1 (s)(x(t) x(2005)), Covariate is global mean temperature x(t). Global mean temperature approx. 14.5 C in 2005 17.9 C in 2080 under RCP8.5 16.4 C under RCP4.5 Model locations s independently. Intercept at year 2005.

Back to Case Study: Problem of Pointwise Analysis ˆξ(s) 0.4 0.4 Latitude 25 30 35 40 45 0.2 0.0 0.2 0.4 Latitude 25 30 35 40 45 0.2 0.0 0.2 0.4 120 110 100 90 80 70 Longitude 120 110 100 90 80 70 Longitude Left: Fit of a single ensemble member. Right: Fit from all ensemble members. Take away: Ensemble yields an artificially large data set 30 annual maxima for each year!

Parameter Estimates/SEs: µ 0, µ 1 Latitude 25 30 35 40 45 (a) 70 60 50 40 30 20 Latitude 25 30 35 40 45 (b) 0.30 0.25 0.20 0.15 0.10 0.05 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude 3.5 Latitude 25 30 35 40 45 (c) 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Latitude 25 30 35 40 45 (d) 0.15 0.10 0.05 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude Note: All results according to this climate model."

Parameter Estimates/SEs: φ 0, φ 1 3.0 0.0145 Latitude 25 30 35 40 45 (e) 2.5 2.0 1.5 1.0 Latitude 25 30 35 40 45 (f) 0.0140 0.0135 0.0130 0.0125 0.0120 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude 0.15 0.0085 Latitude 25 30 35 40 45 (g) 0.10 0.05 0.00 Latitude 25 30 35 40 45 (h) 0.0080 0.0075 0.0070 0.0065 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude

Estimates for 1% AEP: RCP 8.5 160 160 Latitude 25 30 35 40 45 (a) 140 120 100 80 60 40 mm Latitude 25 30 35 40 45 (b) 140 120 100 80 60 40 mm 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude Latitude 25 30 35 40 45 (c) 35 30 25 20 15 10 5 0 % Latitude 25 30 35 40 45 (d) 8 7 6 5 4 3 2 1 % 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude (a) 2005; (b) RCP8.5 2080; (c) est. percent increase 2005-2080; (d) maps the AEP (%) in 2080 corresponding to a 1% AEP level in 2005 (8% 12.5 year event).

Estimates for 1% AEP: RCP 4.5 35 Latitude 25 30 35 40 45 (a) 30 25 20 15 10 5 0 % Latitude 25 30 35 40 45 (b) 5 10 15 mm 120 110 100 90 80 70 120 110 100 90 80 70 Longitude Longitude 1% AEP level (mm) 90 100 110 120 130 (c) 1% AEP level (mm) 60 65 70 75 80 85 (d) 1950 2000 2050 1950 2000 2050 year year (a) est. percent increase 2005-2080 RCP4.5; (b) magnitude decrease RCP4.5 relative to RCP8.5; (c) trajectories of the estimated 1% AEP for RCP4.5 (blue) and RCP8.5 (red) Charlotte, NC; (d) and Fort Collins, CO.

Pattern Scaling for Extremes Q: What is extreme behavior under a different projection? A1: Run GCM under the new projection, get output, apply similar methods and get answers. But this would be expensive. A2: Build a conditional model, assume covariate is adequate for describing behavior, get covariate information under different projection from cheap model, apply conditional model to covariate for new projection. This has been done for mean behavior. extremes. Novel to apply to The RCP4.5 run allows us to evaluate pattern scaling, where model is fit only to RCP8.5 output. Global mean temps in RCP4.5 contained in range of RCP8.5.

Evaluating Pattern Scaling Latitude 25 30 35 40 45 (a) 0.04 0.03 0.02 0.01 0.00 Density 0.000 0.010 0.020 0.030 (b) 120 110 100 90 80 70 20 40 60 80 100 120 Longitude Annual maximum daily precipitation (mm) (a) P-values for grid cells where the Anderson-Darling test rejects the null hypothesis that the annual precipitation maxima come from the pattern-scaled GEV distribution. (b) Colorado grid cell (p-value < 0.001), compares the densities of the pattern-scaled GEV distribution (black) and the GEV distribution fitted directly to the RCP4.5 precipitation maxima (red).

Evaluating Spatial Smoothing Methods 0.4 0.4 Latitude 25 30 35 40 45 0.2 0.0 0.2 0.4 Latitude 25 30 35 40 45 0.2 0.0 0.2 0.4 120 110 100 90 80 70 Longitude 120 110 100 90 80 70 Longitude 0.4 0.15 Latitude 25 30 35 40 45 0.2 0.0 0.2 0.4 Latitude 25 30 35 40 45 0.10 0.05 0.00 0.05 0.10 0.15 120 110 100 90 80 70 Longitude 120 110 100 90 80 70 Longitude Comparison of ˆξ(s). (a) Using all ensemble members. (b) One ensemble member. (c) Spatially smoothed est from one ensemble (borrowing strength). (d) Bias.

Borrowing Strength Across Locations Cooley and Sain (2010) Data: Output from a RCM for the western US, 2500 locations. Both control and future runs modeled simultaneously. Hierarchical Model Data Level: Point process model for threshold exceedances. Conditional independence assumed. Process Level: Multivariate IAR model for (μ, σ, ξ). Precision matrix Q has dimension 14784 x 14784. 29598 (non-indep) parameters, effective number 4250 Inference via Gibbs Sampler

What is gained? ξ MLE ξ BHM latitude 30 35 40 45 50 55 0.2 0.1 0.0 0.1 0.2 0.3 latitude 30 35 40 45 50 55 0.2 0.1 0.0 0.1 0.2 0.3 130 125 120 115 110 105 longitude 130 125 120 115 110 105 longitude Less noisy. Still not great. Simple spatial model has limited range.

Do the RCM s get extremes right? What is meant by right? Possibilities: Do RCM s get marginal distributions right? Maybe not (spatial averaging, heavy tails, tuned for mean behavior). But even if marginal isn t right, could (still) downscale. Do RCM s produce extreme behavior when they should? When (large scale) conditions are right for extremes, do the RCM s produce extremes? Marginal less important, correspondence is important. Perhaps answering: Does downscaling make sense? For second question, we assess the tail dependence between the RCM output and observations using bivariate framework.

NARCCAP Project Multi-model RCM comparison project. NARCCAP domain and elevation Recall: RCM s driven by large-scale boundary conditions. Note for later: Hawaii outside RCM domain.

Our Studied RCMs: NARCCAP Project Driver RCM CCSM CGCM3 GFDL HadCM3 NCEP WRFG P/F P/F P ECP2 P/F P CRCM P/F P/F P MM5I P/F P/F P RegCM3 P/F P/F P HRM3 P/F P/F P P = present, F = Future GCM-driven runs: present (1981-2003), future (2041-2070). Future runs: A2 emissions scenario. NCEP reanalysis driven runs (1979-2004); should exhibit correspondance with observations. Observations: High-resolution gridded product (Maurer et al., 2002).

What is the Pineapple Express? PE: atmospheric rivers hitting the west coast in winter Often bring heavy rain and warm temperatures Great impact on water resources of western US

Pacific Coast Study Region and Data February 7, 1996 Data: max of daily precipitation footprints (200km) 2. Bivariate pairs (X jt, Y t ) of output from model j and obs. Do not require location of footprints to coincide. Note different spatial resolutions. RCM and NCEP show evidence for extreme precip above.

Marginal Behavior Model u j ˆψ j (se) ˆξ j (se) ˆx j,20 (CI) ˆx j,50 (CI) CRCM 863 172.5 (21.6) 0.02 (0.09) 102.3 (93.0, 125.7) 111.3 (98.6, 148.0) ECP2 1129 325.9 (43.8) 0.04 (0.10) 157.4 (140.5, 203.5) 172.5 (149.4, 245.3) HRM3 1032 273.9 (32.3) 0.13 (0.08) 124.5 (115.6, 145.8) 132.5 (114.2, 161.6) MM5I 1026 246.7 (33.3) 0.11 (0.10) 159.0 (135.0, 222.5) 184.0 (148.3, 293.9) RegCM 1093 325.2 (42.4) 0.06 (0.10) 151.6 (136.4, 192.4) 165.4 (144.9, 228.7) WRFG 1086 339.8 (43.2) 0.06 (0.09) 153.8 (138.4, 193.1) 167.7 (147.2, 228.0) NCEP 46 10.4 (1.2) 0.07 (0.08) 88.3 (81.3, 105.0) 95.1 (86.1, 120.1) (Obs) 14969 3938.5 (554.6) 0.00 (0.11) 116.1 (102.4, 154.8) 128.8 (109.5, 192.1) GPD fit to 0.955 threshold, based on exploratory analysis. RCM s have relatively consistent estimates (CRCM lower). Mostly negative point estimates for ξ, obs 0.0. Return level ests high for RCM s, low for NCEP. Downscaling still needed?

Assessing Correspondence of Extreme Precip yellow = NCEP, RCMs are other colors, 95% CI for CRCM model (black, dashed) NCEP exhibits tail dependence with obs (ˆχ.0.35). RCMs exhibit quite strong tail dependence (ˆχ 0.5). RCMs an improvement over NCEP(?)

Modeling Tail Dependence Original scale, Frechet scale, angular measure estimate.

Spatial Discrepancy Model E N p d>500 CRCM 0.12 0.23 0.05 ECP2 0.38 0.69 0.05 HRM3 0.45 2.21 0.18 MM5I 0.15 0.39 0.07 RegCM 0.24 0.68 0.11 WRFG 0.06 0.70 0.09 NCEP 1.50 1.34 0.11 Median location biases and proportion of events with separation > 500 km. Most models show little spatial bias (HRM3). Few events with spatial separation of > 500 km. Overall conclusions, Pacific Coast: Both NCEP and RCMs show correspondance with obs. Get extremes right produce extreme precip when should. Downscaling still likely needed.

Corn Belt Summer Precipitation June 16, 1990

Corn Belt Summer Precipitation Topographically more simple. Further from boundary conditions, convective precip. Footprint: (100km) 2.

Marginal Behavior Model u j ˆψ j (se) ˆξ j (se) ˆx j,20 ˆx j,50 CRCM 153 51.1 (5.9) 0.03 (0.07) 94.2 (84.2, 116.8) 104.2 (91.2, 139.0) ECP2 220 72.7 (9.3) 0.06 (0.09) 153.0 (130.8, 202.2) 175.3 (144.1, 267.2) HRM3 230 142.7 (18.1) 0.10 (0.09) 191.9 (169.1, 249.8) 211.6 (181.9, 299.7) MM5I 237 85.9 (10.0) 0.12 (0.08) 136.8 (124.9, 163.9) 147.5 (132.7, 187.2) RegCM 364 108.2 (14.5) 0.07 (0.10) 240.5 (204.6, 331.8) 275.4 (223.7, 429.4) WRFG 280 67.8 (10.1) 0.15 (0.12) 184.6 (151.4, 280.0) 217.5 (166.8, 391.7) NCEP 59 11.9 (1.7) 0.14 (0.10) 97.9 (91.5, 114.1) 103.5 (95.3, 128.0) (Obs) 3939 964.3 (119.1) 0.05 (0.09) 99.0 (89.8, 121.8) 107.7 (95.4, 142.8) Threshold at empirical 0.94 quantile. Thresholds, GPD params, return levels less consistent. Obs. have negative shape parameter estimate.

Tail Correspondance χ near zero. Reject χ = 1. My conclusion: NCEP/RCM output asy. indep. of obs. Models do not produce their most extreme behavior when conditions are such that we see largest obs." Also: large spatial discrepancy: 40% > 500 km.

Summary Extremes methods specifically designed for studying tail. Use only data in the tail. Use models justified for tail. This includes dependence! Pointwise extremes analysis of climate model output. Assessing tail dependence/correspondance between obs and output. Pacific Coast winter extreme precipitation showed correspondance. Corn Belt summer precip did not.

References Cooley, D. and Sain, S. R. (2010). Spatial hierarchical modeling of precipiation extremes from a regional climate model. Journal of Agricultural, Biological, and Envrionmental Statistics, 15:381 402. Dettinger, M. (2004). Fifty-two years of Òpineapple-expressÓ storms across the west coast of North America. US Geological Survey, Scripps Institution of Oceanography for the California Energy Commission, PIER Project Rep. CEC-500-2005-004. Fix, M., Cooley, D., Sain, S., and Tebaldi, C. (2016). A comparison of US precipitation extremes under RCP8.5 and RCP4.5 with application to pattern scaling. Climatic Change. Maurer, E., Wood, A., Adam, J., Lettenmaier, D., and Nijssen, B. (2002). A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States. Journal of Climate, 15(22):3237 3251. Weller, G., Cooley, D., and Sain, S. (2012). An investigation of the pineapple express phenomenon via bivariate extreme value theory. Environmetrics, 23:420 439. Weller, G., Cooley, D., Sain, S., Bukovsky, M., and Mearns, L. (2013). Two case studies of NARCCAP precipitation extremes. Journal of Geophysical Research Atmospheres, 118:10475 10489.

Aside: Conditional/Non-stationary/Regression Extremes Models Geoscientists love this approach. They are in danger of loving it to death.

Aside: Conditional Extremes Models The way I think about it: Assume X b,t Y b are identically distributed for t = 1,... n, for each block b = 1,..., B. If n is large enough, the distribution of M b,n = n t=1 X b,t Y b should be well approximated by a GEV. Then, [M b,n Y b ] GEV (µ = f µ (Y b ), σ = f σ (Y b ), ξ = f ξ (Y b )), and [X b,t X b,t > u b, Y b ] GP D(ψ ub = f ψ (Y b ), ξ = f ξ (Y b )), where u b is a suitable threshold given Y b Moral 1: Condition first on a covariate, then extract subset of extreme data (given the covariate). Moral 2: Covariate (like ENSO) must change more slowly than dep. variable (like precip).