The anomalies (i.e. z-scores) of the variables rainfall, maximum temperature and minimum

Supplemental Information (SI) README 1. Methodological details 1.1 Overview The anomalies (i.e. z-scores) of the variables rainfall, maximum temperature and minimum temperature are computed by subtracting their respective means and then dividing by their respective standard deviations, both computed from a baseline period (1900-1999). A 30-year moving average is then applied to each time series to reveal low-frequency temporal behavior and to suppress high-frequency components. The distinct climatic patterns observed from the moving average plots are periodic patterns in rainfall and minimum temperature and an increasing trend in maximum temperature. The moving average time series plots for rainfall (observed and GCM simulated) are de-trended to focus on the periodic pattern of All India Monsoon Rainfall (AIMR). Individual observed and modeled time series are displayed for the entire country and for several subdivisions of India in Figure S1. 1.2 Metrics To compare the performances of individual GCMs with equally-weighted MMAs, the combinations of single to sets of 2, 3... 7 GCMs are evaluated for a total of 127 combinations. Median Absolute Percentage Error (MAPE), which has been used in evaluating forecasts (e.g., Armstrong and Collopy 1992), is chosen as the metric for model skill in Figure 1. Because it is not affected by gross outlying errors, this is used rather than the more conventional Mean Absolute Percentage Error. Figure S3 shows the same box plots as in Figure 1(c) and 1(f) but using Mean Squared Error (MSE) and Mean Absolute Deviation (MAD). Note that for AIMR,

MSE is sensitive to outlying errors, and consequentially here BCCR is no longer ranked 1 st but 4 th overall. However, all three model combinations that outperform BCCR also contain BCCR in this case, and the other models in the combinations seem to dampen the effect of the outlying errors, thus creating lower MSE values. Table S1 and S2 summarize information derived from the data displayed in Figure 1 and S3: the proportional improvement in metrics using the single best GCM versus the seven-model MMA, as well as the best model combination, are displayed for AIMR using all three performance metrics (Table S1); in addition, the ranking of the MMA is displayed in the context of all individual GCMs (Table S2), for AIMR and TMAX using all three performance metrics. 2.2 Computation of periodicity To compute the periodicities of rainfall and minimum temperature, the sine curves with the following general equation are fitted to the time series: X Detrended = πt A + C sin + Φ (1) N where X Detrended is a de-trended anomaly series. A, C, N and φ are the parameters of the regression equation, which are computed through least square optimization using the search algorithm Probabilistic Global Search Laussane (PGSL) (Raphael and Smith 2003). A is the intercept term and does not have any role in evaluating the performance of GCM in simulating periodicity; this term is a consequence of the sinusoidal fit. C may be considered the amplitude of the wave (periodic time series). A low C value indicates a flat curve with almost no periodicity. N denotes the duration of half of a cycle of the periodic signal. The symbol φ

indicates the phase of the signal. Given two signals having the same N values, a difference in φ denotes a difference in phase. However, when N values differ between two signals, the φ value by itself would not be an indication of phase difference between the signals. Probabilistic Global Search Lausanne (PGSL) is a search algorithm for solving linear/non-linear optimization problems. Tests on benchmark problems having multi-parameter non-linear objective functions revealed that PGSL performs better than genetic algorithms and advanced algorithms for simulated annealing (Raphael and Smith 2003). It assumes that better sets of points are more likely to be found in the neighborhood of good sets of points, and therefore it intensifies its search in the region containing good solutions. PGSL first generates sets of values for the decision variables, uniformly in their range, and computes the objective function values. Then, it ranks these sets and assigns a modified distribution (the better the solution, the higher the probability density in the region containing that solution) for decision variables to perform the same procedure. Based on the positions of peaks in the density function, after a number of iterations, PGSL reduces the domain of decision variables. It continues until it converges, through four cycles: the sampling cycle, the probability updating cycle, the focusing cycle, and the sub-domain cycle. Details of the algorithm are described elsewhere (Raphael and Smith 2003). The least squares optimization problem of interest here has a nonlinear objective function; hence, PGSL is used.

3. Notes on Monsoon Periodicity The onset and withdrawal of the Indian summer monsoon rainfall are governed by the difference of tropospheric temperature (DTT), vertically integrated between 200 hpa and 600 hpa between a north box (30 0 E 100 0 E, 10 0 N 35 0 N) and a south box (30 0 E 100 0 E, 15 0 S 10 0 N) (Goswami et al 2006). When DTT (also known as the meridional gradient of tropospheric temperature) crosses negative to positive, the onset of monsoon occurs. Similarly, the withdrawal of monsoon is determined by the crossing of DTT from positive to negative. The area under positive DDT is strongly correlated with AIMR. AMO is responsible for negative (positive) tropospheric temperature anomaly in Eurasia, resulting weakening (strengthening) of the DTT, and thus influences the AIMR. The periodicity of AMO is approximately 65-70 years (Schlesinger and Ramankutty 1994; Delworth and Mann 2000; Ottera et al 2010), which is similar to the periodicity of Eurasia temperature and AIMR (computed as a 67-year periodicity). There is empirical evidence (Delworth and Mann 2000) that TMIN over all of India in the three months (March-April-May) prior to the onset of the AIMR (June-July-August-September) is a predictor of the monsoon rainfall. This relationship is likely to have an analogous explanation as the relationship of the Indian monsoon rainfall with Eurasian temperature (discussed above) with possibly similar underlying drivers. 4. Sensitivity of Results to Initial Conditions Model performance results may be sensitive to choice of initial conditions (ICs) to varying degrees (Shukla and Fennessy 1994; Wu et al 2005). For four of the seven GCMs used in this study, multiple ICs are publicly available. Figure S4 displays AIMR calculated from each

available initial condition run of each of these four GCMs. In many cases, varying the initial condition choice appears to have a significant effect on the resultant AIMR time series. One notable exception is for PCM, where three out of four IC runs appear quite similar. In addition, AOM appears relatively robust to IC choice. Figure S5(a-b) is analogous to Figure 1(b-c), where for the four GCMs, arbitrary IC runs (CSIRO run 2, AOM run 2, MIROC-MED run 3, and PCM run 4) are substituted in for the runs used to generate results in Figure 1. Results suggest that phase and variability may be sensitive to ICs, affecting performance of models relative to the metric MAPE. In Figure S5, BCCR is ranked 12 th with respect to all 127 combinations and is no longer the best individual GCM relative to MAPE; rather, CSIRO is ranked as the best individual GCM. The 7-model GCM, however, is still suboptimal as in Figure 1, which may be related to the fact that MIROC-HI (the same IC as in the manuscript) does not perform very well. Unfortunately, only one IC run is available for BCCR, the GCM which appeared to perform well for AIMR in the manuscript, and thus we cannot readily test whether this particular model is affected by ICs. 5. AIMR and TMAX Data For those who wish to reproduce results seen in the main text, the files AIMR.csv and TMAX.csv are available as supplementary files. They contain the 30-year moving average anomalies calculated from one run of each of the 7 global climate models (the runs used in the main text) as well as the observations. In each file, column headers 30MAA_X stands for 30- year moving average anomalies of dataset X (where X is the observed data or data from a GCM). Values -999999 indicate years where data is not available.

SI README References Armstrong, J. S., & Collopy, F. 1992, Int J Forecasting, 8, 69-80 Raphael, B., & Smith, I. F. C. 2003, J App Math Comp, 146(2-3), 729-758 Shukla, J., & Fennessy, M. 1994, in Proceedings of the International Conference on Monsoon Variability and Prediction Tech. Rep. WCRP-84: 567-575, World Climate Research Programme Wu, W., Lynch, A. H., & Rivers, A. 2005, J Clim, 18, 917-933

Supplementary Figure Captions Figure S1 Trends and periodicity in temperature and monsoon rainfall for all-india and meteorologically homogeneous regions within India. The 30-year moving average plots for observed TMAX, AIMR and TMIN in the first column suggest the primary patterns of interest: an increasing trend in TMAX and a dominant low-frequency periodicity in AIMR and TMIN, with the latter having similar periodicity to the former. From left to right, then top to bottom, the observed all-india TMAX trend (a) is not necessarily reflected in the observations of TMAX over regions considered homogeneous in terms of temperature (b), but this heterogeneity is not reflected in best individual GCM (INM) simulations, as seen in (c), which shows the best individual GCM over each region. The observed AIMR 67-year periodicity (d) can be roughly seen across all regions considered homogeneous in terms of rainfall (e) other than for Peninsular and Northeast India, and in the corresponding best individual GCM (BCCR) simulations (f). The corresponding periodicity and behavior is seen in TMIN as well (g-i) although the regional patterns appear even more homogeneous. Figure S2 Best-fit sinusoidal corresponding to each of the seven GCM simulations of de-trended 30-year moving of AIMR shows the degree to which the models capture observed periodicity The best-fit sinusoidal curves are plotted to get a preliminary idea for how well the lowfrequency periodicity, which appear the dominant in the observed AIMR, is captured by the various individual GCMs. The periodicity is roughly captured by three of the seven GCMs: BCCR, PCM and MIROC-HI. Two of these three GCMs, specifically, PCM and MIROC-HI, are

in opposite phase with the observed AIMR. The multi-model average of all seven GCM does not show any substantial periodicity. Table S3 displays parameters associated with these sinusoidal fits. Figure S3 Evaluation of GCM combinations based on Mean Squared Errors (MSE) and Mean Absolute Deviation (MAD). The box-plots are similar to those in Figure 1b and 1e, with the exception that the performance metrics are MSE (a-b) and MAD (c-d) for AIMR (left column) and TMAX. The best individual GCM (defined by best performance relative to each metric) consistently outperforms the 7-GCM MMA for AIMR. Figure S4 Time series plots of all available initial condition (IC) runs from 4 of 7 GCMs. In each case, the black series represents IC run 1 as labeled in the PCMDI CMIP3 archive (the runs used in the manuscript), red to IC run 2, (and, when applicable) green to IC run 3, and blue to IC run 4. Significant within-model differences in periodic variability and phase can be seen in most cases. Figure S5 In panels (a-c), results analogous to Figure 1 (a-c) are displayed, but for four models (CSIRO, AOM, MIROC MED, and PCM). The former best (with respect to MAPE) individual GCM BCCR (Figure 1) is now ranked number 12 and the 7-GCM MMA ranks 59. The best individual GCM is CSIRO, a result of choosing a different initial condition (IC) run than that which was used in the manuscript.

Supplementary Figures Figure S1

Figure S2

Figure S3

Figure S4

Figure S5

Supplementary Table Legends Table S1: Percent improvement of single-best model over 7-GCM MMA in terms of MSE, MAD, and MAPE for AIMR periodicity The percent improvement from considering just BCCR (as determined by each skill metric) compared to the 7-GCM MMA is shown in the 1 st column. The best combination, out of all 127 combinations, corresponding to each AIMR for each skill metric is shown. Table S2: Ranking of individual GCMs and MMA Individual GCMs and the 7-model equally-weighted MMA are ranked according to MSE, MAPE, and MAD error metrics for both AIMR and TMAX patterns. If an individual GCMs were to be randomly selected (out of the 7 here), there is a 3/7 (43%) chance or 4/7 (57%) chance of selecting a model that performs better than the MMA for AIMR, depending on metric. There is a 1/7 (14%) or 0/7 (0%) chance that the chosen model performs better than MMA for TMAX, depending on metric. Table S3: Parameters of sinusoidal fit for observed, each of the seven GCM and the MMA The parameters for the overall sinusoidal fit are tabulated, corresponding to Figure S2.

Table S1 AIMR (De-trended MAA) (Best Model BCCR) MSE 49.1%.5*BCCR +.5*GISS MAD 35.8% BCCR MAPE 46.9% BCCR

Table S2 a. AIMR Rank MAPE MSE MAD 1 0.75 (BCCR) 0.02 (BCCR) 0.11 (BCCR) 2 1.15 (INM) 0.026 (INM) 0.135 (INM) 3 1.21 (GISS) 0.027 (GISS) 0.14 (GISS) 4 1.42 (MMA) 0.037 (MMA) 0.16 (CSIRO) 5 1.50 (CSIRO) 0.04 (CSIRO) 0.17 (MMA) 6 1.60 (MIROCMED) 0.05 (MIROCMED) 0.18 (MIROCMED) 7 1.93 (PCM) 0.07 (PCM) 0.22 (PCM) 8 2.71 (MIROC-HI) 0.11 (MIROC-HI) 0.29 (MIROC-HI) b. TMAX Rank MAPE MSE MAD 1 1.07 (INM) 0.011 (MMA) 0.09 (MMA) 2 1.17 (MMA) 0.018 (PCM) 0.11 (INM) 3 1.67 (PCM) 0.019 (INM) 0.12 (PCM) 4 2.09 (MIROC-HI) 0.027 (MIROCMED) 0.13 (MIROCMED) 5 2.36 (MIROCMED) 0.04 (MIROCHI) 0.16 (MIROCHI) 6 3.91 (CSIRO) 0.08 (CSIRO) 0.22 (CSIRO) 7 2.99 (BCCR) 0.10 (BCCR) 0.29 (GISS) 8 4.65 (GISS) 0.11 (GISS) 0.30 (BCCR)

Table S3 Rainfall A C N φ MSE Peaks Troughs Observed 0.0284 0.2594 33.6807 2.7046 0.0091 1865,1930 1830, 1900,1967 BCCR 0.0193 0.1707 24.7520-0.9461 0.0054 1882,1931 1856,1905,1954 MIROC- HiRes 0.2028 0.4499 56.2930 2.2080 0.0029 No peak 1940 MIROC- 0.0000 0.1879 47.2923 2.6287 0.0073 No visible periodicity MedRes INM-CM 0.0005 0.0370 5.3798 0.4432 0.0044 No visible periodicity NCAR- PCM 0.0001 0.1495 30.1094 1.7633 0.0034 1903,1963 1933 CSIRO- 0.0291 0.1840 38.6555 0.3288 0.0071 No visible periodicity MK3 GISS- 0.0227 0.1063 41.0312 3.1416 0.00 No visible periodicity AOM MMA 0.0054 0.0438 27.3658-0.2413 0.0021 No visible periodicity