(Regional) Climate Model Validation

(Regional) Climate Model Validation Francis W. Zwiers Canadian Centre for Climate Modelling and Analysis Atmospheric Environment Service Victoria, BC

Outline - three questions What sophisticated validation methods can be employed in evaluating regional climate? How do we quantify the value-added (or lost) due to downscaling when compared with raw GCM output? Can (dynamical) downscaling improve the simulations of extreme events over GCMs?

Question 1 What sophisticated validation methods can be employed in evaluating regional climate? Are objective skill measures that are mostly developed for large scale diagnostics adequate for regional applications? Implicit in these questions is the suspicion that current large scale techniques are not up to the task data hungry and not powerful enough? not sure they can tell us how models differ from reality (or each other)

Question 1 What are the objectives of model assessment? do we reject a model if it is significantly different from observations? would we have any models left? Can we set standards for acceptable model performance? application specific some users will have very stringent standards but useful to consider biomedical science analogy is a rabbit a suitable animal model? Need to accept a rabbit for what it is

Question 1 What can statistics do? Identify and assess differences (model vs obs, model vs model) in the context of uncontrolled variability relative to a specified standard usual standard - no difference is acceptable! What are the tools? local assessment (at a grid point or station) multivariate analysis assessment of spatial variability assessment of temporal variability pattern correlation evaluate hindecast and forecast skill

Tools Local assessments paired difference tests may be appropriate e.g., to compare different kinds of nesting what does a significant difference mean?» precipitation» large scale (e.g., mslp)» surface temperature Multivariate analysis require estimates of covariance structure spatial modelling may be necessary, but not easy

Tools Assess spatial variability/covariability EOF, EEOF, REOF, MCA(SVD), etc. same problems as large scale simple structure not necessarily physical structure data sets too small inference tools very limited no rational selection rules EOFs adapt to data from which they are estimated difficult to compare observed with model need to do common EOF analyses

EOFs adapt... Eigenspecturm of annual mean surface temperature from CCCma1 control run - 40S to 90N Eigenspectrum from last 50-year chunk Mean psuedo-eigenspectrum from 19 remaining 50-year chunks Graphic by Slava Kharin

Figure Caption From Presentation Notes Figure 1: This diagram illustrates the adaptation of EOFs to the sample from which they are estimated. A 1000 year control run performed with the first generation CCCma coupled model (CGCM1) was divided into 50-year chunks. EOFs and eigenvalues of annual mean surface temperature were estimated from the last 50-year chunk. The domain of analysis used for this calculation covers 90N to 40S, roughly mimicing the coverage of observations during the 2nd half of the 20th century. The black bars illustrate the resulting eigenvalue distribution. Observations in each of the other chunks was projected onto the EOFs obtained from the last chunk. The variance of each of the resulting psuedo PCs was computed for each chunk. For each EOF, the 19 psuedo PC variances obtained in this way were averaged. The result is illustrated by the red bars. The message is that the EOFs estimated from one sample explain less variance in other, independent samples from the same process. This occurs because the EOFs are tuned to best fit the specific realization of variability that is contained in the sample from which they were obtained. Consequently, it is not a good idea to compared samples from two climates (say observed and climate model simulated) by fitting EOFs to one sample (observations), and then comparing the variance explained by each of the EOFs in that sample with the variance explained in the second (model simulated) sample. There is a real danger that such a comparison will erroneously conclude that the second sample contains less variance in the direction of each the EOFs, and therefore that the second climate is inferior.

Barnett (1999) variability of annual mean temperature common EOF analysis of first 100 years of 11 CMIP1 runs caution - drift not removed! Observations and 95% confidence interval

Figure Caption From Presentation Notes Figure 2: The problem described in Figure 1 is avoided by performing a common EOF analysis. Such an analysis is performed by removing the sample means from the individual samples to be intercompared, combining the resulting anomalies into a large super sample, finding the EOFs of this super sample, and then calculating the variance explained by each of the EOFs in the individual samples. This diagram illustrates the result of such an analysis using annual mean surface temperature simulated by 11 coupled models participating in AMIP. Diagram from Barnett (1999, J. Climate).

Tools Assess temporal variability/covariability eigen-analyses SSA, MSSA, EEOF, CEOF, POP, etc. time domain modelling Box-Jenkins markov processes (e.g., for precipitation occurrence) classification / weather typing frequency domain analysis can ask if model produces observed variability on some time/space scale

Figure Caption From Presentation Notes Figure 3: An illustration of a frequency domain analysis. Illustrated is the estimated power spectrum of global annual mean temperature simulated in 9 long control runs with coupled general circulation models. The solid black curve is a corresponding estimate obtained from observations and the dotted black line is the estimate obtained after an estimate of the anthropogenic signal is removed from the data. The spectral estimates have varying equivalent bandwidths adjusted in such a way that all estimates have the same 5-95% uncertainty band. The vertical dashed lines indicate the range of time scales (10-60 years) most important for detection and attribution studies. Models that simulate significantly less variance than observed on these time scales are indicated by an asterix. From IPCC WG1 Third Assessment Report (2001, Ch 12, Figure 12.2).

Tools Pattern correlation include time evolution large gradients in space-time may be a problem assess uncertainty/significance with resampling techniques, and using ensembles

Figure Caption From Presentation Notes Figure 4: An example of the BLT diagram that can be used to intercompare pattern correlation and related information. This particular diagram (from Lambert and Boer, 2001, Climate Dynamics) intercompares the surface air temperature climatologies of models participating in CMIP1. The pink radii labeled correlation indicate the pattern correlation between observed and simulated climatologies. The horizontal scale indicates the mean squared difference between the climatologies, and the vertical scale compares the spatial variance of the model simulated climatology with the observed spatial variance. A perfect model would be located at the red dot. Dots illustrate the quality of the model simulation of the meriodional structure (zonal means) of DJF mean temperature. Coloured labels indicate models without flux adjustment. Performance is uniformly good because it is easy to reproduce the gross meridional temperature structure of the observed climate. Triangles illustrate the quality of the model simulation of the eddie (pattern) structure that remains when zonal means are removed from the climatological distribution of DJF mean temperature. This statistic discriminates more effectively between models (for example, flux adjusted models tend to perform better) because correlations in this calculation are not dominated by the large (and easy to obtain) pole to equator temperature gradient. Note that in both cases, the ensemble mean simulation outperforms individual simulations.

Figure Caption From Presentation Notes Figure 5: As Figure 4, except for DJF precipitation.

Tools Evaluate hindecast and forecast skill seasonal forecasting climate change detection/attribution turns out to be multiple linear regression p T a S + N t = i =1 i t i t t amplitude of i th signal

1946-56 Model Estimated Signal 1956-66 1966-76 1976-86 Observed anomalies 1986-96

Figure Caption From Presentation Notes Figure 6: The diagram illustrates schematically the extended 5-decade observation (on the right) and corresponding model simulated signal pattern (on the left) that are matched by generalized multiple linear regression in modern detection studies (such as Stott et al, 2000, Science). Signal uncertainty resulting from internal climate variability is reduced by decadal averaging, by averaging across an ensemble of transient simulations, and by considering only the largest global scales. The mask that appears in the animation of this diagram illustrates the much greater challenge posed by detection on the regional scale.

Summary What sophisticated validation methods can be employed in evaluating regional climate? Need to decide what constitutes an acceptable model We have many validation methods (statistical and physical) Need to use methods that are well understood so that physical interpretation is not obscured Can t avoid cost - long runs and ensembles

Question 3 Can downscaling improve the simulations of extreme events over GCMs? How do we demonstrate this, and what measures can be used? What should we expect? How do GCMs do? How might RCMs improve, and how might we tell?

Figure Caption From Presentation Notes Figure 7: Daily precipitation as observed at the Toronto Airport during a 2-year period and as simulated by two generations of the CCCma atmospheric general circulation model at a grid point near Toronto. The diagram demonstrates that these GCMs simulate precipitation variability that is similar to that which is observed. The more recent version of the model (AGCM3) appears to produce smaller precipitation extremes at this particular location that the earlier version of the model. Scale considerations suggest that models should simulate smaller precipitation extremes than observed, and that there should be increasing agreement as the resolution of the model increases.

Figure Caption From Presentation Notes Figure 8: As Figure 7, except for a 90-day subset of the two-year record. We see he some suggestion that models precipitate small amounts of moisture more frequently than observed.

CGCM1 simulated precipitation extremes (1975-1995) Observed 20-year precipitation events (mm in 24 hours) 60 mm 80 mm Simulated 20-year precipitation events (present climate)

Figure Caption From Presentation Notes Figure 9: Illustration of the ability of a GCM to simulate extreme precipitation. Upper panel: 20-year return values for 24-hour precipitation as estimated from Canadian station data. Lower panel: As above, except as estimated from daily precipitation amounts simulated by the first generation Canadian coupled model (CGCM1) in an ensemble of three transient change simulations (see Boer, et al., 2000, Climate Dynamics) forced with observed changes in greenhouse gas concentrations and sulphate aerosol loadings. The period analysed represents 1975-1995. Note that there is some (probably fortuitus) similarity between the return values of the observed and simulated climates. Extreme precipitation events are clearly undersimulated by the model on the west coast of North America (because the model has a very smooth version of the surface topography) and on the eastern seaboard.

Simulated 20-year precip events (present climate) Locations where significantly different from ERA15 too wet too dry

Figure Caption From Presentation Notes Figure 10: Continued assessment of the ability of a GCM to simulate observed daily precipitation extremes. Upper panel shows the global distribution of estimated 20-year return values of daily precipitation simulated by CGCM1 for present day climate. The lower panel identifies locations where CGCM1 simulated daily precipitation extremes are significantly from daily precipitation extremes inferred from the ECMWF ERA15 reanalysis.

T42L18 AMIP2 T42L18 AMIP2

Figure Caption From Presentation Notes Figure 11: Continued assessment of the ability of GCMs to simulate daily precipitation extremes. Upper and lower panels illustrate estimated 20-year return values daily precipitation from two AMIP2 simulations performed with two closely related climate models. These two models have different parameterizations of convection, and one is tempted to infer that this is the cause of the large difference. However, comparison between the upper model, and an unrelated model using the same convection scheme (not shown) shows that such a conclusion would be premature. This third model has precipitation extremes behaviour that is very similar to that of the lower model above.

Question 3 What should we expect RCMs to improve? Reduction of mean bias Improved stochastic behaviour More realistic variance Better spatial variability Better tail behaviour (i.e., extremes) What would we not expect RCMs to improve? Large scale errors in climate (and forced response) of driving model (e.g., El-Nino like response to ghg forcing).

Question 3 - can we demonstrate? Reduction of mean bias yes Improved stochastic behaviour probably can study threshold crossing less demanding of data More realistic variance maybe tests not as powerful, data not as good

Question 3 - can we demonstrate? Better spatial variability easy and difficult

Figure Caption From Presentation Notes Figure 12: This animated diagram shows that some aspects of the spatial variability of RCMs relative to GCMs are easy to assess. The spatial structure seen in this snapshot from a run with the Canadian RCM is obviously superior to the kinds of spatial structure that make be seen in a typical scene for the same region from the Canadian second generation GCM. Diagram courtesy Rene Laprise and colleagues.

Question 3 - can we demonstrate? Better spatial variability? is there a supporting body of literature? many local features have remote links can RCMs simulate these features? Better tail behaviour (I.e., extremes) maybe need appropriate observations model performance mixed, at best

Summary Can downscaling improve the simulations of extreme events over GCMs? Yes - they should Can t avoid cost or uncertainty as you move into the tails Again leads to questions about what constitutes acceptable model performance.