NOTES AND CORRESPONDENCE. Improving Week-2 Forecasts with Multimodel Reforecast Ensembles

Similar documents
4.3.2 Configuration. 4.3 Ensemble Prediction System Introduction

Exploring ensemble forecast calibration issues using reforecast data sets

NOAA Earth System Research Lab, Boulder, Colorado, USA

Exploring ensemble forecast calibration issues using reforecast data sets

Ensemble Reforecasting: Improving Medium-Range Forecast Skill Using Retrospective Forecasts

Reprint 527. Short range climate forecasting at the Hong Kong Observatory. and the application of APCN and other web site products

Developing Operational MME Forecasts for Subseasonal Timescales

Estimation of seasonal precipitation tercile-based categorical probabilities. from ensembles. April 27, 2006

Will it rain? Predictability, risk assessment and the need for ensemble forecasts

A Comparison of Three Kinds of Multimodel Ensemble Forecast Techniques Based on the TIGGE Data

Application and verification of the ECMWF products Report 2007

Monthly forecast and the Summer 2003 heat wave over Europe: a case study

Hindcast Experiment for Intraseasonal Prediction

István Ihász, Máté Mile and Zoltán Üveges Hungarian Meteorological Service, Budapest, Hungary

The Madden Julian Oscillation in the ECMWF monthly forecasting system

P3.11 A COMPARISON OF AN ENSEMBLE OF POSITIVE/NEGATIVE PAIRS AND A CENTERED SPHERICAL SIMPLEX ENSEMBLE

Upgrade of JMA s Typhoon Ensemble Prediction System

The Coupled Model Predictability of the Western North Pacific Summer Monsoon with Different Leading Times

Department of Civil, Construction and Environmental Engineering, North Carolina State University, Raleigh, North Carolina

Calibration of ECMWF forecasts

Evaluation of the Twentieth Century Reanalysis Dataset in Describing East Asian Winter Monsoon Variability

Improving the Prediction of Winter Precipitation and. Temperature over the continental United States: Role of ENSO

J1.3 GENERATING INITIAL CONDITIONS FOR ENSEMBLE FORECASTS: MONTE-CARLO VS. DYNAMIC METHODS

PACS: Wc, Bh

A stochastic method for improving seasonal predictions

Sub-seasonal predictions at ECMWF and links with international programmes

15 day VarEPS introduced at. 28 November 2006

12.2 PROBABILISTIC GUIDANCE OF AVIATION HAZARDS FOR TRANSOCEANIC FLIGHTS

A simple method for seamless verification applied to precipitation hindcasts from two global models

Monthly forecasting system

Sub-seasonal predictions at ECMWF and links with international programmes

Five years of limited-area ensemble activities at ARPA-SIM: the COSMO-LEPS system

Statistical Downscaling of Pattern Projection Using Multi-Model Output Variables as Predictors

Estimation of Seasonal Precipitation Tercile-Based Categorical Probabilities from Ensembles

High initial time sensitivity of medium range forecasting observed for a stratospheric sudden warming

EMC Probabilistic Forecast Verification for Sub-season Scales

Experimental MOS Precipitation Type Guidance from the ECMWF Model

The benefits and developments in ensemble wind forecasting

Skill of multi-model ENSO probability forecasts. October 19, 2007

The Influence of Intraseasonal Variations on Medium- to Extended-Range Weather Forecasts over South America

Revisiting predictability of the strongest storms that have hit France over the past 32 years.

Model error and seasonal forecasting

Renate Hagedorn 1, Roberto Buizza, Thomas M. Hamill 2, Martin Leutbecher and T.N. Palmer

Evolution of ECMWF sub-seasonal forecast skill scores

Figure ES1 demonstrates that along the sledging

NOTES AND CORRESPONDENCE. On Ensemble Prediction Using Singular Vectors Started from Forecasts

The ECMWF Extended range forecasts

Seasonal Climate Watch September 2018 to January 2019

Numerical Weather Prediction. Medium-range multi-model ensemble combination and calibration

On Sampling Errors in Empirical Orthogonal Functions

Influence of the Madden Julian Oscillation on Forecasts of Extreme Precipitation in the Contiguous United States

EVALUATION OF NDFD AND DOWNSCALED NCEP FORECASTS IN THE INTERMOUNTAIN WEST 2. DATA

P1.8 INTEGRATING ASSESSMENTS OF USER NEEDS WITH WEATHER RESEARCH: DEVELOPING USER-CENTRIC TOOLS FOR RESERVOIR MANAGEMENT

J11.5 HYDROLOGIC APPLICATIONS OF SHORT AND MEDIUM RANGE ENSEMBLE FORECASTS IN THE NWS ADVANCED HYDROLOGIC PREDICTION SERVICES (AHPS)

Clustering Techniques and their applications at ECMWF

NOTES AND CORRESPONDENCE. On the Seasonality of the Hadley Cell

István Ihász, Hungarian Meteorological Service, Budapest, Hungary

VERFICATION OF OCEAN WAVE ENSEMBLE FORECAST AT NCEP 1. Degui Cao, H.S. Chen and Hendrik Tolman

Forecasting precipitation for hydroelectric power management: how to exploit GCM s seasonal ensemble forecasts

Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions

TC/PR/RB Lecture 3 - Simulation of Random Model Errors

Improvement of 6 15 Day Precipitation Forecasts Using a Time-Lagged Ensemble Method

S e a s o n a l F o r e c a s t i n g f o r t h e E u r o p e a n e n e r g y s e c t o r

P 1.86 A COMPARISON OF THE HYBRID ENSEMBLE TRANSFORM KALMAN FILTER (ETKF)- 3DVAR AND THE PURE ENSEMBLE SQUARE ROOT FILTER (EnSRF) ANALYSIS SCHEMES

SUPPLEMENTARY INFORMATION

ENSO and ENSO teleconnection

Statistical Downscaling Using K-Nearest Neighbors

How far in advance can we forecast cold/heat spells?

NCEP Global Ensemble Forecast System (GEFS) Yuejian Zhu Ensemble Team Leader Environmental Modeling Center NCEP/NWS/NOAA February

P1.8 INTEGRATING ASSESSMENTS OF USER NEEDS WITH WEATHER RESEARCH: DEVELOPING USER-CENTRIC TOOLS FOR RESERVOIR MANAGEMENT

Generation and Initial Evaluation of a 27-Year Satellite-Derived Wind Data Set for the Polar Regions NNX09AJ39G. Final Report Ending November 2011

Ensemble Verification Metrics

Motivation & Goal. We investigate a way to generate PDFs from a single deterministic run

4C.4 TRENDS IN LARGE-SCALE CIRCULATIONS AND THERMODYNAMIC STRUCTURES IN THE TROPICS DERIVED FROM ATMOSPHERIC REANALYSES AND CLIMATE CHANGE EXPERIMENTS

Estimating the intermonth covariance between rainfall and the atmospheric circulation

An extended re-forecast set for ECMWF system 4. in the context of EUROSIP

The Impact of Horizontal Resolution and Ensemble Size on Probabilistic Forecasts of Precipitation by the ECMWF EPS

P3.6 THE INFLUENCE OF PNA AND NAO PATTERNS ON TEMPERATURE ANOMALIES IN THE MIDWEST DURING FOUR RECENT El NINO EVENTS: A STATISTICAL STUDY

Human influence on terrestrial precipitation trends revealed by dynamical

Comparison of a 51-member low-resolution (T L 399L62) ensemble with a 6-member high-resolution (T L 799L91) lagged-forecast ensemble

WMO Aeronautical Meteorology Scientific Conference 2017

10. EXTREME CALIFORNIA RAINS DURING WINTER 2015/16: A CHANGE IN EL NIÑO TELECONNECTION?

The Interdecadal Variation of the Western Pacific Subtropical High as Measured by 500 hpa Eddy Geopotential Height

Multi-model calibration and combination of seasonal sea surface temperature forecasts over two different tropical regions

Seasonal Climate Watch July to November 2018

The U. S. Winter Outlook

Verification at JMA on Ensemble Prediction

Adaptation for global application of calibration and downscaling methods of medium range ensemble weather forecasts

Does increasing model stratospheric resolution improve. extended-range forecast skill?

Uncertainty in Operational Atmospheric Analyses. Rolf Langland Naval Research Laboratory Monterey, CA

A Probabilistic Forecast Approach for Daily Precipitation Totals

Application and verification of ECMWF products 2009

Simple Doppler Wind Lidar adaptive observation experiments with 3D-Var. and an ensemble Kalman filter in a global primitive equations model

Validating Atmospheric Reanalysis Data Using Tropical Cyclones as Thermometers

PRMS WHITE PAPER 2014 NORTH ATLANTIC HURRICANE SEASON OUTLOOK. June RMS Event Response

PROBABILISTIC FORECASTS OF MEDITER- RANEAN STORMS WITH A LIMITED AREA MODEL Chiara Marsigli 1, Andrea Montani 1, Fabrizio Nerozzi 1, Tiziana Paccagnel

Improved Skill for the Anomaly Correlation of Geopotential Height at 500 hpa

Quantifying Uncertainty through Global and Mesoscale Ensembles

Impact of truncation on variable resolution forecasts

ASSESMENT OF THE SEVERE WEATHER ENVIROMENT IN NORTH AMERICA SIMULATED BY A GLOBAL CLIMATE MODEL

Transcription:

AUGUST 2006 N O T E S A N D C O R R E S P O N D E N C E 2279 NOTES AND CORRESPONDENCE Improving Week-2 Forecasts with Multimodel Reforecast Ensembles JEFFREY S. WHITAKER AND XUE WEI NOAA CIRES Climate Diagnostics Center, Boulder, Colorado FRÉDÉRIC VITART Seasonal Forecasting Group, ECMWF, Reading, United Kingdom (Manuscript received 1 August 2005, in final form 3 November 2005) ABSTRACT It has recently been demonstrated that model output statistics (MOS) computed from a long retrospective dataset of ensemble reforecasts from a single model can significantly improve the skill of probabilistic week-2 forecasts (with the same model). In this study the technique is extended to a multimodel reforecast dataset consisting of forecasts from ECMWF and NCEP global models. Even though the ECMWF model is more advanced than the version of the NCEP model used (it has more than double the horizontal resolution and is about five years newer), the forecasts produced by the multimodel MOS technique are more skillful than those produced by the MOS technique applied to either the NCEP or ECMWF forecasts alone. These results demonstrate that the MOS reforecast approach yields benefits for week-2 forecasts that are just as large for high-resolution state-of-the-art models as they are for relatively low resolution out-ofdate models. Furthermore, operational forecast centers can benefit by sharing both retrospective reforecast datasets and real-time forecasts. 1. Introduction Operational weather prediction centers strive to improve forecasts by continually improving weather forecast models, and the analyses used to initialize those models. In a recent study, Hamill et al. (2004) investigated an alternative approach to improving forecasts, namely, using the statistics of past forecast errors to correct errors in independent forecasts. They used a dataset of retrospective ensemble forecasts (or reforecasts ) with a 1998 version of the operational global forecast model from the National Centers for Environmental Prediction (NCEP; part of the U.S. National Weather Service). Probabilistic forecasts of surface temperature and precipitation in week-2 (days 8 14) were not skillful relative to climatology when computed directly from the model output, but were more skillful Corresponding author address: Jeffrey Whitaker, NOAA CIRES Climate Diagnostics Center, 325 Broadway, R/CDC1, Boulder, CO 80305-3328. E-mail: jeffrey.s.whitaker@noaa.gov than the operationally produced forecasts from NCEP s Climate Prediction Center when computed using forecast-error statistics from the reforecast dataset. Ensembles of forecasts with a single model suffer from a deficiency in spread partly because they do not represent the error in the forecast model itself. This causes probabilistic forecasts derived from such ensembles to be overconfident. Statistical postprocessing using reforecast datasets can ameliorate this problem to some extent, yielding reliable probabilities (e.g., Hamill et al. 2004). However, the use of multimodel ensembles (ensembles consisting of forecasts from different models, perhaps initialized from different analyses) have also been shown to significantly increase the skill of probabilistic forecasts (Krishnamurti et al. 1999; Palmer et al. 2004; Rajagopalan et al. 2002; Mylne et al. 2002), even without statistical postprocessing. In this note, we examine whether the results of Hamill et al. (2004) can be improved upon if the reforecast dataset used to statistically correct the forecasts consists of more than one model. Here we combine the reforecast dataset described in Hamill et al. (2006) using a 1998 version of 2006 American Meteorological Society

2280 M O N T H L Y W E A T H E R R E V I E W VOLUME 134 the NCEP Global Forecast System (GFS) model, with a reforecast dataset generated at the European Centre for Medium-Range Weather Forecasts (ECMWF) to support monthly forecasting (Vitart 2004). The ECMWF forecasts are run at roughly 1 grid spacing compared with a 2.5 grid spacing used for the NCEP 1998 model forecasts. In this note we address two questions. The first concerns whether the large gains in skill reported by Hamill et al. (2004) were a consequence of the fact that the forecast model was run at a relatively low resolution and is now nearly 6 yr behind the state-of-the art model. In other words, do the benefits of statistically correcting forecasts using reforecasts datasets found in that study also apply to newer, higher-resolution models? Second, we investigate whether the methods described in Hamill et al. (2004) can be applied to multimodel ensembles. Specifically, we seek to address the question of whether statistically corrected multimodel forecasts are more skillful than the forecasts generated from the component models (using the statistics from the component reforecast datasets). 2. Datasets and methodology a. The analyses Forecasts for 850-hPa temperature are presented in this note. These forecasts have been verified using both the NCEP National Center for Atmospheric Research (NCAR) reanalysis (Kistler et al. 2001) and the 40-yr ECMWF Re-Analysis (ERA-40; Uppala et al. 2005). Both of these analyses are on a 2.5 grid, and only those points poleward of 20 N are included. Because the verification statistics computed with the NCEP NCAR and ERA-40 analyses are so similar, all of the results described here have been computed using the ERA-40 reanalyses unless otherwise noted. b. The forecasts A reforecast dataset was created at the National Oceanic and Atmospheric Administration (NOAA) Climate Diagnostics Center (CDC), using a T62 resolution (roughly 2.5 grid spacing) version of NCEP s GFS model, which was operational until January 1998. This model was run with 28 vertical sigma levels. A 15- member ensemble run out to 15 days lead time is available for every day from 1979 to the present, starting from 0000 UTC initial conditions. The ensemble initial conditions consisted of a control initialized with the NCEP NCAR reanalysis (Kistler et al. 2001) and a set of seven bred pairs of initial conditions (Toth and Kalnay 1997), centered each day on the reanalysis initial condition. The breeding method and the forecast model are the same as that used operationally at NCEP in January 1998. Sea surface conditions were specified from the NCEP NCAR reanalysis, and were held fixed to their initial values throughout the forecast. Further details describing this dataset are available in Hamill et al. (2004) and Hamill et al. (2006). An independent set of reforecasts has been generated at the ECMWF to support operational monthly forecasting (Vitart 2004). The atmospheric component of the forecast model is a T159 (roughly 1 grid) version of the ECMWF Integrated Forecast System (IFS) with 40 levels in the vertical. The oceanic component is the Hamburg Ocean Primitive Equation model. Atmospheric initial conditions were taken from the ERA-40 reanalysis, and ocean conditions were taken from the ocean data assimilation system used to produce seasonal forecasts at ECMWF. A five-member ensemble was run out to 32 days once every 2 weeks for a 12-yr period from 27 March 1990 to 18 June 2002. Initial atmospheric perturbations were generated using the same singular vector technique used in ECMWF operations (Buizza and Palmer 1995). Further details describing the ECMWF monthly forecasting system can be found in Vitart (2004). A combined reforecast dataset was created by subsampling five members from the NCEP CDC reforecasts for just those dates in December, January, and February for which the ECMWF forecasts were available (a total of 84). Only week-2 (8 14-day average) forecast means for 850-hPa temperature in the Northern Hemisphere poleward of 20 N were included. c. Methodology Three-category probability forecasts for 850-hPa temperature in week 2 were produced for the Northern Hemisphere poleward of 20 N. The three categories are the lower, middle, and upper terciles of the climatological distribution of analyzed anomalies, so that the climatological forecast for each category is always 33%. The climatological distribution is defined for all winter (December February) seasons from 1971 to 2000. The climatological mean for each day is computed by smoothing the 30-yr mean analyses for each calendar day with a 31-day running mean. The terciles of the climatological distribution are calculated using the analyzed anomalies over a 31-day window centered on each calendar day. Further details may be found in Hamill et al. (2004). Tercile probability forecasts are generated from the raw five-member ensemble output for each model by simply counting the number of members that fall into each category at each grid point, and then dividing that

AUGUST 2006 N O T E S A N D C O R R E S P O N D E N C E 2281 number by 5. The NCEP 15-member ensemble was subsampled by taking the control and the first two (out of seven) positive and negative perturbations generated by the breeding method (Toth and Kalnay 1997). These uncorrected, or raw ensemble probabilities are compared with probabilities computed using a technique called logistic regression (Wilks 1995), which is described in detail in Hamill et al. (2004). In short, as implemented here, the logistic regression predicts the tercile probabilities given the ensemble mean forecast anomaly (relative to the observed climatology). As discussed in Hamill et al. (2004), we have found adding other predictors, such as ensemble spread, into the logistic regression does not improve forecast skill. Software from the Numerical Algorithms Group (NAG; more information available online at http://www.nag. co.uk) was used to compute the regression coefficients (NAG library routine G02GBF). These coefficients are computed at each grid point using cross validation (i.e., for each of the 84 forecast times, the other 83 forecast verification pairs are used to compute the regression coefficients). Since the forecasts are separated by 2 weeks, the correlation between adjacent forecasts is quite small and each forecast is essentially independent of the others. Probabilities are computed using logistic regression using each model s ensemble mean anomaly separately, and then a multimodel forecast probability is produced by using a combined ECMWF NCEP ensemble mean anomaly as the predictor in the logistic regression. The combined ensemble mean anomaly is computed using F combined b 0 b NCEP F NCEP b ECMWF F ECMWF, 1 where the F NCEP is the NCEP ensemble mean, F ECMWF is the ECMWF ensemble mean, F combined is the multimodel ensemble mean, b NCEP is the weight given to the NCEP ensemble mean, b ECMWF is the weight given to the ECMWF ensemble mean, and b 0 is a constant term. The coefficients (b 0, b NCEP, and b ECMWF ) are estimated at each Northern Hemisphere grid point using an iterative least squares regression algorithm (NAG library routine G02GAF). Instead of combining the ensemble means and using the result as a predictor in the logistic regression, a multimodel forecast can be computed by simply using each ensemble mean as a separate predictor in the logistic regression. We have found that forecasts obtained with a two-predictor logistic regression are slightly less skillful, so only results using the two-step procedure (using the combined ensemble mean as a single predictor in the logistic regression) are presented here. FIG. 1. Reliability diagrams for week-2 tercile probability forecasts of 850-hPa temperature. A total of 84 forecasts during December February 1990 2002 are used, and forecasts are evaluated for the Northern Hemisphere poleward of 20 N using the ERA-40 reanalysis for verification. The reliability curve shown is for both upper and lower tercile probability forecasts. Inset histograms indicate frequency with which extreme tercile probabilities were issued. Forecasts from (a) the five-member ECMWF model ensemble and (b) the five-member NCEP model ensemble. 3. Results Figure 1 is a reliability diagram summarizing the skill of the raw, uncorrected week-2 tercile probability forecasts for each model in the Northern Hemisphere poleward of 20 N. For both the NCEP and ECMWF models, the reliability of the forecasts is poor (the forecasts are overconfident) and the ranked probability skill score (RPSS; see Wilks 1995) is near zero, indicating no

2282 M O N T H L Y W E A T H E R R E V I E W VOLUME 134 FIG. 3. Same as in Fig. 1, but for multimodel ensemble. The NCEP and ECMWF forecast ensemble means are combined using linear regression, and then the combined ensemble mean is used to predict tercile probabilities using logistic regression. The combined ensemble is more skillful than the logistic regression correction applied to either the ECMWF or NCEP models alone (Fig. 2). FIG. 2. Same as in Fig. 1, but for forecasts corrected with a logistic regression. skill relative to the climatological forecast of 33% in each category. Richardson (2001) has shown that the Brier skill score (BSS; which is the same as the RPSS for forecasts of a single category) depends on ensemble size according to B B M M 1, 2 1 1 M where B M is the BSS for an ensemble of size M and B is the BSS for an infinite ensemble. This equation assumes the ensemble prediction system is perfect (i.e., the verifying analysis is statistically indistinguishable from a randomly chosen ensemble member). According to this formula, if a five-member ensemble with a perfect model has no skill (B M 0) then an infinite ensemble should have a BSS of 0.166. However, we have found that the RPSS of the NCEP forecast is still negative for an ensemble size of 15 (the total number of ensemble members available). This suggests that model error is primarily responsible for the low skill of the NCEP model forecasts, not the small ensemble size used to generate the probabilities. Although we cannot test the sensitivity of the RPSS to ensemble size with the ECMWF model (since the reforecasts were only run with five members), the fact that the reliability diagrams shown in Fig. 1 are so similar suggests that the same holds true for the ECMWF forecasts. The lack of reliability shown in Fig. 1 is a consequence of the spread deficiency in both models on average the ensemble spread is a factor of nearly 2 smaller than the ensemble mean error (not shown). Applying a logistic regression to the ensemble means produces forecasts that are more reliable and have higher skill for both ensembles (Fig. 2). The corrected ECMWF model forecasts are significantly more skillful than the corrected NCEP model forecasts (RPSS of 0.155 for the ECMWF versus 0.113 for the NCEP). The relative improvement in the ECMWF forecasts is nearly the same as the NCEP forecasts, demonstrating that the value of having a reforecast dataset is just as large for the ECMWF model as it was for the NCEP model. This is despite the fact that the ECMWF model

AUGUST 2006 N O T E S A N D C O R R E S P O N D E N C E 2283 FIG. 4. A map of the weights used to combine the ECMWF and NCEP ensemble mean forecasts. These maps correspond to b NCEP and b ECMWF in Eq. (1). The mean term b 0 is not shown. has the benefit of twice the resolution and five extra years of model development. The improvement in skill is limited by the small sample size if all 25 yr of forecasts available for the NCEP model are used to compute the logistic regression, the RPSS increases to 0.15. The multimodel forecast, using the combined ensemble mean in the logistic regression, produces a forecast with an average RPSS of 0.17 (Fig. 3), about a 10% improvement over the corrected ECMWF forecast alone. The fact that the multimodel ensemble forecast is more skillful than the ECMWF forecast, even though the ECMWF forecast is, on average, superior to the NCEP forecast, is a consequence of the fact that there are places where the NCEP model is consistently more skillful than the ECMWF model (Krishnamurty et al. 2000). This is reflected in the weights used to combine the two ensemble means (Fig. 4). Although the ECMWF model receives more weight on average, there are regions where the NCEP model receives a weight of greater than 0.5 (the red regions in the left panel of Fig. 4). In other words, the NCEP model, though inferior on average to the ECMWF model, supplies independent information that can be used to improve upon the ECMWF forecast. This is obviously only possible if a reforecast dataset is available for both models. All of the results discussed in this section have been computed using the ERA-40 reanalysis as the verifying analysis. The results are not substantively changed if the NCEP NCAR reanalysis is used instead (not shown). 4. Discussion The study of Hamill et al. (2004) has been extended to include more than one forecast model. The results show that the benefits of reforecasts apply equally to older, lower-resolution forecasts systems as they do to higher-resolution state-of-the-art forecast systems. In this case, logistic regression was used to combine the NCEP and ECMWF week-2 forecasts. The resulting multimodel forecast was more skillful than the statistically corrected ECMWF forecast, which was run at more than twice the horizontal resolution of the other model, a 1998 version of the NCEP global forecast model. These results clearly demonstrate that all operational centers can benefit by sharing both reforecast

2284 M O N T H L Y W E A T H E R R E V I E W VOLUME 134 datasets and real-time forecasts. This is true even if the models run by the various centers differ substantially in resolution, as long as they have some skill and provide independent information to the statistical scheme used to combine the forecasts. For week-2 tercile probability forecasts of 850-hPa temperature, the benefits of multimodel ensembles cannot be realized unless all the models have corresponding reforecast datasets with which to estimate regression equations to combine (and calibrate) the forecasts. Acknowledgments. Fruitful discussions with Tom Hamill are gratefully acknowledged. The NOAA Weather Climate Connection program funded this project. REFERENCES Buizza, R., and T. N. Palmer, 1995: The singular vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 1434 1456. Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434 1447.,, and S. L. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 33 46. Kistler, R., and Coauthors, 2001: The NCEP NCAR 50-Year Reanalysis: Montly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82, 247 268. Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, E. C. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensembles. Science, 285, 1548 1550.,, Z. Zhang, T. LaRow, D. Bachiochi, E. Williford, S. Gadgil, and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13, 4196 4216. Mylne, K. R., R. E. Evans, and R. T. Clark, 2002: Multi-model multi-analysis ensembles in quasi-operational medium-range forecasting. Quart. J. Roy. Meteor. Soc., 128, 361 384. Palmer, T. N., and Coauthors, 2004: Development of a European multimodel ensemble system for seasonal to interannual prediction (DEMETER). Bull. Amer. Meteor. Soc., 85, 853 872. Rajagopalan, B., U. Lall, and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130, 1792 1811. Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127, 2473 2489. Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297 3319. Uppala, S. M., and Coauthors, 2005: The ERA-40 reanalysis. Quart. J. Roy. Meteor. Soc., 131, 2961 3012. Vitart, F., 2004: Monthly forecasting at ECMWF. Mon. Wea. Rev., 132, 2761 2779. Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.