Supplemental material The multivariate bias correction algorithm presented by Bürger et al. (2011) is based on a linear transformation that is specified in terms of the observed and climate model multivariate means and covariances. In this case, correction of the covariance structure means that the method is guaranteed to fully correct data that follow a multivariate Gaussian distribution. In Bürger et al. (2011), marginal distributions are first transformed to be univariate Gaussian via normalizing probability integral transforms (probit) before applying the multivariate linear transformation; each corrected variable is then subjected to the observed inverse probit to map back to the observed marginal distributions. While multivariate Gaussianity ensures that marginal distributions will be univariate Gaussian, the reverse need not be true; having the marginal distributions be Gaussian does not guarantee that the multivariate distribution will be Gaussian. Hence, it is not known a priori how closely the method will correct the full multivariate distribution in cases where data are not multivariate Gaussian. The same is true of the algorithm. Comparisons between the multivariate linear bias correction of Bürger et al. (2011), including application of the empirical probit/inverse probit, henceforth referred to as, and the and algorithms presented in the main text are given for the three simple bivariate distributions shown in Figure 1 and described below. For and, results are also presented following a single algorithm iteration to illustrate the potential need for additional iterations to correct both marginal distributions and correlation dependence structure; these single iteration results are denoted and respectively. Dataset #1: Synthetic univariate Gaussian/non-Gaussian multivariate distributions The first dataset features synthetic variables that are univariate Gaussian but whose joint distributions are non-gaussian. Following Dutta and Genton (2014), the synthetic observational dataset is given by x 1 = x1 sign(x 2 ) and x 2 = x2 sign(x 1 ) where x 1 and x 2 are both drawn from independent and identically distributed standard Gaussian distributions and sign(.) is the univariate sign function (i.e., returning +1 for positive values, -1 for negative values, and 0 otherwise). The first synthetic climate model variable y 1 is drawn from a standard Gaussian distribution and the second synthetic climate model variable is given by y 2 = y 1 sign(ε 2 ) where ε 2 is also drawn from a standard Gaussian distribution. Dataset #2: Synthetic Weibull/exponential distributions The second dataset features variables with non-gaussian marginal and joint distributions, in this case designed to loosely mimic correlated wet-day precipitation amounts at neighboring locations. The first synthetic observational variable x 1 is drawn from a Weibull distribution, Weibull(λ = 2, k = 1), where λ and k are the Weibull scale and shape parameters respectively. The second synthetic observational variable is then given by x 2 = x 1 + ε 1 where ε 1 Weibull(λ = 2, k = 1). The first synthetic climate model variable y 1 is drawn from an exponential distribution Exp(λ = 1/2), where λ is now defined as the exponential distribution rate parameter. The second variable is then given by y 2 = 5y 1 + ε 2 where ε 2 Exp(λ = 1/2). 1
(a) UV Gaussian/non Gaussian MV (obs.) 4 2 0 2 4 (b) UV Gaussian/non Gaussian MV (mod.) x1 y1 4 2 0 2 4 x2 y2 (c) Weibull/exponential (obs.) (d) Weibull/exponential (mod.) x1 y1 x2 y2 (e) Surface temperature/humidity (obs.) 00 04 08 (f) Surface temperature/humidity (mod.) 00 02 04 06 08 00 04 08 x1 tas x2 huss 00 04 08 y1 tas y2 huss Figure 1: Pairwise scatterplot, bivariate kernel density estimate histogram, and marginal histograms of (a) synthetic observations and (b) synthetic climate model simulations for dataset #1; (c) synthetic observations and (d) synthetic climate model simulations for dataset #2; and (e) observations and (f) model simulations for dataset #3. 2
Dataset #3: Surface temperature and specific humidity The third dataset features 3-hr near surface temperature and specific humidity data from the 0.5- deg 0.5-deg global WFDEI forcing dataset (Weedon et al., 2014) and corresponding dynamically downscaled outputs from the CanRCM4 regional climate model (Scinocca et al., 2016) for the North American NAM-44i 0.5-deg 0.5-deg domain. The evaluation run from CanRCM4 relies on lateral boundary conditions provided by the ERA-Interim global reanalysis and also employs interior spectral nudging to constrain large scales to respect the reanalysis driving fields. In this case, the WFDEI data serve as the observational reference dataset to which the CanRCM4 outputs are corrected. Data are extracted for a grid point in the Pacific Northwest of North America (121.25 W and 43.25 N). CanRCM4 outputs are available for the period 1989-2009. Outputs are split by month into 12 subsets and are bias corrected separately for each month. The first half of each data subset is used for calibration of the bias correction algorithms and the second half is used as an independent verification dataset. Comparison results For the three datasets,,,,, and are fitted to calibration data and applied to independent verification data. Performance on the verification data is summarized via the energy skill score (ESS) taken with respect to univariate quantile mapping (QDM), i.e., without taking into account multivariate Pearson or rank correlation dependence structure. To assess the dependence of performance on sample size, the two synthetic datasets are generated with random calibration/verification samples each of sizes 500, 1500, and 3000 cases; results are repeated over 100 trials. For the third dataset, 12 trials, one for each month, are run. Monthly calibration/verification subsamples each of sizes 868 and 1364 cases, in addition to the entire sample size of 2604 cases, are evaluated. Performance is measured as the median ESS and lower/upper ESS quartiles over the respective trials and is reported in Figure 2. Of the three methods that correct the Pearson correlation dependence structure, outperforms in terms of median ESS for all 9 combinations of dataset and sample size. It also performs better than on datasets #1 and #3, highlighting the benefits of multiple iterations, with similar performance seen on dataset #2. /, which correct the Spearman rank correlation dependence structure, perform best on the non-gaussian multivariate dataset #1 that features a highly artificial bivariate dependence structure. Results highlight the fact that a wide range of bias correction behavior is possible depending on the specific characteristics of the underlying multivariate distributions. Correction of both marginal distributions and the correlation dependence structure may not be possible using a single iteration algorithm, e.g., or /, which provides evidence for the added value of the and methods. References Bürger, G., J. Schulla, and A. Werner, Estimates of future flow, including extremes, of the Columbia River headwaters, Water Resources Research, 47(10), doi:10.1029/2010wr009716, 2011. 3
(a) UV Gaussian/non Gaussian MV (b) Weibull/exponential (c) Surface temperature/humidity n=500 n=500 n=868 (d) UV Gaussian/non Gaussian MV n=1500 n=1500 (e) Weibull/exponential (f) Surface temperature/humidity n=1364 n=3000 (g) UV Gaussian/non Gaussian MV n=3000 (h) Weibull/exponential n=2604 (i) Surface temperature/humidity Figure 2: Median (bars) and lower/upper quartile (vertical lines) verification ESS values for,,,, and algorithms applied to dataset #1 with (a) 500 cases, (d) 1500 cases, and (g) 3000 cases; dataset #2 with (b) 500 cases, (e) 1500 cases, and (h) 3000 cases; and dataset #4 with (c) 868 cases, (f) 1364 cases, and (i) 2604 cases. 4
Dutta, S., and M. G. Genton, A non-gaussian multivariate distribution with all lowerdimensional Gaussians and related families, Journal of Multivariate Analysis, 132, 82 93, doi: 10.1016/j.jmva.2014.07.007, 2014. Scinocca, J., V. Kharin, Y. Jiao, M. Qian, M. Lazare, L. Solheim, G. Flato, S. Biner, M. Desgagne, and B. Dugas, Coordinated global and regional climate modeling, Journal of Climate, 29(1), 17 35, doi:10.1175/jcli-d-15-0161.1, 2016. Weedon, G. P., G. Balsamo, N. Bellouin, S. Gomes, M. J. Best, and P. Viterbo, The WFDEI meteorological forcing data set: WATCH Forcing Data methodology applied to ERA-Interim reanalysis data, Water Resources Research, 50(9), 7505 7514, doi:10.1002/2014wr015638, 2014. 5
Table S1: CMIP5 GCMs used for bivariate bias correction of monthly temperature and precipitation to the CRU TS3.22 dataset. Name Institution ACCESS CSIRO (Commonwealth Scientific and Industrial Research Organisation, Australia), and BOM (Bureau of Meteorology, Australia) ACCESS1.3 " BNU-ESM College of Global Change and Earth System Science, Beijing Normal University CCSM4 National Center for Atmospheric Research CESM1-BGC National Science Foundation, Department of Energy, National Center for Atmospheric Research CESM1-CAM5 " CESM1-FASTCHEM " CESM1-WACCM " CMCC-CESM Centro Euro-Mediterraneo per I Cambiamenti Climatici CMCC-CMS " CMCC-CM " CNRM-CM5 " CSIRO-Mk3.6.0 CSIRO (Commonwealth Scientific and Industrial Research Organisation, Australia), and BOM (Bureau of Meteorology, Australia) CanESM2 Canadian Centre for Climate Modelling and Analysis EC-EARTH EC-EARTH consortium FGOALS-g2 LASG, Institute of Atmospheric Physics, Chinese Academy of Sciences; and CESS, Tsinghua University FGOALS-s2 LASG, Institute of Atmospheric Physics, Chinese Academy of Sciences FIO-ESM The First Institute of Oceanography, SOA, China GFDL-CM3 Geophysical Fluid Dynamics Laboratory GFDL-ESM2G " GFDL-ESM2M " GISS-E2-H NASA Goddard Institute for Space Studies GISS-E2-R " HadCM3 Met Office Hadley Centre (additional HadGEM2-ES realizations contributed by Instituto Nacional de Pesquisas Espaciais) HadGEM2-CC " HadGEM2-ES " HadGEM2-AO National Institute of Meteorological Research/Korea Meteorological Administration IPSL-CM5A-LR Institut Pierre-Simon Laplace IPSL-CM5A-MR " IPSL-CM5B-LR " MIROC-ESM-CHEM Japan Agency for Marine-Earth Science and Technology, Atmosphere and Ocean Research Institute (The University of Tokyo), and National Institute for Environmental Studies MIROC-ESM " MIROC4h Atmosphere and Ocean Research Institute (The University of Tokyo), National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and Technology MIROC5 " MPI-ESM-LR Max Planck Institute for Meteorology (MPI-M) MPI-ESM-MR " MPI-ESM-P " MRI-CGCM3 Meteorological Research Institute NorESM1-ME Norwegian Climate Centre NorESM1-M " bcc-csm1.1(m) Beijing Climate Center, China Meteorological Administration bcc-csm1.1 Beijing Climate Center, China Meteorological Administration inmcm4 Institute for Numerical Mathematics