Verification at JMA on Ensemble Prediction - Part Ⅱ : Seasonal prediction - Yukiko Naruse, Hitoshi Sato Climate Prediction Division Japan Meteorological Agency 05/11/08 05/11/08 Training seminar on Forecasting 1
Contents Verification of seasonal predictions at JMA Real-time verification of three-month prediction Verification over the Hindcasts based on SVSLRF Standardized Verification System for Long Range Forecast (SVSLRF) Outline of SVSLRF Compare verification scores of JMA with those of other centers Notes Summary Information Probabilistic forecasts using the Ordered Probit model on the TCC website The Ordered Probit model is used as the statistical tool of the MOS. 05/11/08 Training seminar on Forecasting 2
Verification of Seasonal predictions on TCC web-site TCC home NWP Model Prediction here Click here! Verification of Real-time forecast Verification of hindcast based on SVSLRF 05/11/08 Training seminar on Forecasting 3
Real-time verification of three-month prediction Z500 over the Northern Hemisphere Observation Forecast Error Exp. Initial date 2008.6.16.12Z Forecast period : Jul.-Aug.-Sep. Shaded patterns show anomalies. Blue represents negative anomalies, orange positive anomalies. Shaded patterns show errors (forecast minus observation). Stream function at 850 and 200hPa 850hPa stream function: Obs: Anti-cyclonic circulation anomalies over the north of the Pacific were stronger. Forecast: Cyclone circulation anomalies were located over the south-eastern areas of Japan. higher errors 850hPa 200hPa RMSE and Anomaly Correlation ACC: 0.134(over NH), 0.623(over JPN) 05/11/08 Training seminar on Forecasting 4
02/9/10 03/1/10 03/5/10 03/9/10 04/1/10 04/5/10 04/9/10 05/1/10 05/5/10 05/9/10 06/1/10 06/5/10 06/9/10 07/1/10 07/5/10 07/9/10 08/1/10 08/5/10 ACC Real-time verification of three-month prediction Time series show anomaly correlation of 500hPa height in the Northern Hemisphere from Sep. 2002 started three-month forecast at JMA to present. Red-colored solid circles represent ACC of each forecast, thick red line 5-times running means. Periods of El Nino and La Nina events are shown as redarrows and green-arrows, respectively. Blue-colored solid circle indicates average score of ACC from Sep. 2002 to Jun. 2008. My opinion: 0.8 0.7 0.6 0.5 0.4 0.3 0.2 90days mean NH500 anomaly correlation 90days mean ( running mean of five times) 0.32 average score: 2002.09-2008.06 Time series seem to change the cycle about 0.1 one and a half of year or two years. 0 Season and three-month -0.1 forecasts over periods of El Nino and La Nina events have better accuracy. 05/11/08 Training seminar on Forecasting initial date 5 El Nino La Nina
Verification over the Hindcasts based on SVSLRF SVSLRF : Standardized Verification System for Long-Range Forecast Two kind of verification 1)Verification of deterministic forecasts Mean Square Skill Score (MSSS) Contingency tables 2)Verification of probabilistic forecasts Reliability diagrams ROC curves and ROC areas The same method as one-month forecast 05/11/08 Training seminar on Forecasting 6
Mean Squared Skill Score (MSSS) MSSS 1 MSE MSE c Perfect score: 1 (when MSE=0) Climatology forecast score: 0 where MSE is the mean squared error 1 MSE N N F i O i i 1 2 F :forecast O :observation MSSS can be expanded (Murphy,1988) as 2 2 s f s f f o MSSS 2 r fo 2 so so so 1 2 3 2n 1 2n 1 1 2 n 1 n 1 s : variance r : correlation f : forecast o : observation The first 3 terms are related to 1 phase error (through the correlation) 2amplitude errors (through the ratio of the forecast to observed variances) 3bias error 05/11/08 Training seminar on Forecasting 7
Examples of MSSS Forecast period: Dec-Feb started on 10 Nov. T2m Precipitation Positive MSSS indicates that the forecast is better than climatological forecast. T2m is almost better than climatological forecast. Precipitation is little worse than climatological forecast. Be careful the forecast over the region worse. 05/11/08 Training seminar on Forecasting 8
Contingency tables General three by three contingency table for categorical deterministic forecasts. Forecasts Below Normal Near Normal Above Normal Below Normal Observations Near Normal Above Normal n11 n12 n13 n1* n21 n22 n23 n2* n31 n32 n33 n3* n*1 n*2 n*3 Total n*1 n11 n21 n31 KS HR FAR General ROC contingency table for deterministic forecasts. Observations nonoccurrences occurrences Forecasts occurrences O1 KS 1 KSS NO1 O1+NO1 nonoccurrences O2 NO2 2 O2+NO2 O1+O2 NO1+NO2 T Exp. Dec.-Feb. started on 10 Nov. Elem.; T2m Region; Northern Japan The contingency tables are useful for comparisons between different deterministic categorical forecast sets. KS 1 05/11/08 Training seminar on Forecasting KSS=0.5 : no information Our model are over-forecasting at KS HR FAR KSS 9 2 (HR being equal to FAR) below and above categories
Examples of Reliability Diagrams Dec-Feb started on 10 Nov. Event; upper tercile T2m Precipitation Good Skill! No Skill! BSS*=4.1 BSS=16.3 BSS=-1.2 BSS=-6.1 *Brier Skill Scores x 100 Positive MSSS indicates that the forecast is better than climatological forecast. Positive BSS indicates that the forecast is better than climatological forecast. The T2m prediction has good skill. 05/11/08 Training seminar on Forecasting 10 Be careful when use the precipitation prediction. Some calibrations before use are necessary.
Examples of ROC curves and areas Dec-Feb started on 10 Nov. Event; upper tercile T2m:NH Precipitation:TRP T2m: ROC area on each grid point Good Skill! ROC*=67.9 ROC=65.0 *ROC x 100 ROC area better than 0.5 indicates that the forecast is better than climatological forecast. According 05/11/08 to ROC area on each grid point, Training scores seminar over Iran, on India Forecasting and Southeast Asia are good. 11 On the other hand, right above scores of Japan are worse than climatological forecast.
Contents Verification of seasonal predictions at JMA Real-time verification of three-month prediction Verification results over the Hindcasts based on SVSLRF Standardized Verification System for Long Range Forecast (SVSLRF) Outline of SVSLRF Compare verification scores of JMA with those of other centers Notes Summary Information Probabilistic forecasts using the Ordered Probit model on the TCC website The Ordered Probit model is used as the statistical tool of the MOS. 05/11/08 Training seminar on Forecasting 12
SVSLRF (Standardized Verification System for Long-Range Forecast) What is this? WMO standard tool to verify skill in seasonal models Why is SVSLRF necessary? Long-range forecasts are being issued from several Centers and are being made available in the public domain. Forecasts for specific locations may differ substantially at times, due to the inherent limited skill of long-range forecast systems. This situation will lead to confusion amongst users, and ultimately was reflecting back on the science behind long-range forecasts. Users should appropriately understand "How much skill does this forecast issued from the Center have?". The SVS for LRF described herein constitutes the basis for long-range forecast evaluation and validation, and for exchange of verification scores. This manual was offered within the Commission for Basic Systems (CBS) of the World Meteorological Organisation (WMO) in December, 2002. 05/11/08 Training seminar on Forecasting 13
Outline of SVSLRF Long-Range Forecasts LRF extend from thirty (30) days up to two (2) years. Monthly and Three-month or 90-day period, Seasonal Forecasts periods A 90-day period or a season. (If available 12 rolling three-month periods (e.g. JFM, FMA, MAM). Parameters to be verified a) Surface air temperature (T2m) anomaly at screen level; b) Precipitation anomaly; c) Sea surface temperature (SST) anomaly and the Nino3.4 Index. c) is only the coupled ocean-atmosphere model. Three levels of verification Level 1 : large scale aggregated overall measures of forecast performance. Level 2 : verification at grid-points. Level 3 : grid-point by grid-point contingency tables for more extensive verification. 05/11/08 Training seminar on Forecasting 14
Three Levels of Verification Level 1 Level 2 Level 3 Parameters Verification regions Deterministic forecasts T2m anomaly Precipitation anomaly (Nino3.4 Index) T2m anomaly Precipitation anomaly (SST anomaly) T2m anomaly Precipitation anomaly Tropics(20S-20N) Northern extratropics(20n-90n) Southern extratropics(20s-90s) (N/A) Grid-point verification on a 2.5 by 2.5 grid Grid-point verification on a 2.5 by 2.5 grid MSSS MSSS and its three-term decomposition at each gridpoint 3 by 3 contingency tables at each grid-point Submit results of levels 1 and 2 to the Lead Center Probabilistic forecasts ROC curves ROC areas Reliability diagrams Frequency histograms ROC areas at each grid-point ROC reliability tables at each grid-point 05/11/08 (SST anomaly) Training seminar on Forecasting 15
Verification Data Hindcasts LRF systems should be verified over as long a period as possible in hindcast mode.( not real-time operational forecast, not enough forecast time for verification ) Period : from 1981 to 2001 The number of bins : between 9 and 20 (bins = ensemble member size + 1) Calculation (means, standard deviations, class limits, etc) Cross-validation framework Lead time At least 2, to 4 (max lead time) JMA : submitted only 1 lead time Verification Data Sets T2m : UKMO/CRU or ERA-40 Precipitation : GPCP or CMAP SST : Reynolds OI or Smith and Reynolds If recommended data is not available, the center can use the center own reanalysis. JRA-25 in the case of JMA Considering the merit of JRA-25 and JCDAS in the verification of hindcast and real-time forecastings, JMA are 05/11/08 using JRA-25 as verification data of T2m Training prediction seminar instead on Forecasting of UKMO/CRU. 16
Outline of SVSLRF Lead Centre for the Long-Range Forecast Verification System Australian Bureau of Meteorology (BOM) Meteorological Service of Canada (MSC) URL: http://www.bom.gov.au/wmo/lrfvs/ The Manual is here Lead Center web-site You can get scores of several centers on Lead Centre website. 05/11/08 Training seminar on Forecasting 17
Examples of graphics on the SVSLRF website Level1. Region scores Exp. : ROC curves Parameter : T2m Season : DJF Lead time : 1 month Area : Tropics ROC scores JMA Upper Tercile : 0.755 UKMetO Upper Tercile : 0.76 ROC scores>0.5 : better skill than climatology forecasts Model : JMA Model : UKMetO ROC scores ROC scores 05/11/08 Training seminar on Forecasting 18
Compare verification scores of JMA with those of other centers Publishing countries (12 countries) Model Verification data sets Hindcasts T2m Precipitati period on JMA JRA-25 GPCP2 1984-2005(22) 11 Ensemble size for hindcasts 05/11/08 Training seminar on Forecasting 19 note TL95L40 SST: combination of persisted anomaly, climate and prediction MSC CRU2.1 GPCP 1981-2000(20) 12 AGCM(1.875 1.875L50,T32L10) UKMetO ERA-40 GPCP 1987-2002(16) 15 AGCM (2.5 3.75L40,0.3-1.25 1.25L40) KMA CDAS2 CMAP 1979-2006(28)? GDAPS (Deterministic long-range forecasts) BOM ERA40 CMAP 1987-2001(18)? ECMWF ERA-40 GPCP 1987-2002(16) 5 (may and nov.: 40) AOGCM (TL95L60,0.3-1.4 1.4L29) system2 Meteo-Fr ERA-40 CMAP 1993-2003(11) 5 AOGCM (T63L31C1,) NCEP CPC CMAP 1982-2004(23) AOGCM (T62L64,0.3-1 1L40) IRI CRU GPCP 1981-2001(21) 12 AGCM(T42) SST; persisted anomaly BCC- CGCM ERA-40 CMAP 1983-2001(19)? AOGCM (T63L16,T63L30) HMC- CRU CMAP 1980-2001(22) 10 (1.125x1.40625L28) SLAV CPTEC CRU? GPCP 1979-2000(22) 10 AGCM T062L28
ROC score ROC score Compare verification scores of JMA with those of other centers ROC areas : Parameter 2-meter temperature (T2m) Event Upper tercile scores 0.68 [T2m] Upper tercile region: northern extra-tropics Northern extra-tropics (20-90N) 0.8 [T2m] Upper tercile region: tropics Tropics (20S-20N) 0.66 0.64 0.62 0.75 0.7 0.6 0.58 0.56 0.65 0.6 0.54 0.52 0.5 DJF MAM JJA SON JMA(V0703C) MSC UKMetO ECMWF Meteo-Fr NCEP IRI BCC-CGCM HMC-SLAV 0.55 0.5 DJF MAM JJA SON JMA(V0703C) MSC UKMetO ECMWF Meteo-Fr NCEP IRI BCC-CGCM HMC-SLAV ROC scores of T2m over the Northern extra-tropics in JMA is the best among them. ROC scores of T2m over the Tropics in JMA is similar to these one in ECMWF. 05/11/08 Training seminar on Forecasting 20
ROC score ROC score Compare verification scores of JMA with those of other centers 0.62 0.6 0.58 0.56 0.54 0.52 0.5 0.48 0.46 0.44 ROC areas : Parameter Precipitation Event Upper tercile scores [Precipitation] Upper tercile region: northern extra-tropics Northern extra-tropics (20-90N) 0.63 0.61 0.59 0.57 0.55 DJF MAM JJA SON JMA(V0703C) MSC UKMetO ECMWF Meteo-Fr NCEP IRI BCC-CGCM HMC-SLAV 0.69 0.67 0.65 [Precipitation] Upper tercile region: tropics Tropics (20S-20N) DJF MAM JJA SON JMA(V0703C) MSC UKMetO ECMWF Meteo-Fr NCEP IRI BCC-CGCM HMC-SLAV ROC scores of precipitation over the Northern extra-tropics in JMA is similar to these one in ECMWF. The score in JJA in JMA is the best among them. 05/11/08 Training seminar on Forecasting 21 ROC scores of precipitation over the tropics in JMA is worse than these one in ECMWF.
MSSS score MSSS score Compare verification scores of JMA with those of other centers Mean Square Skill Score Parameter 2-meter temperature (T2m) 0.2 Northern extra-tropics (20-90N) [T2m] region: northern extra-tropics 0.5 [T2m] Tropics region: (20S-20N) tropics 0.1 0.4 0 0.3-0.1 0.2-0.2 0.1-0.3 0-0.4-0.1-0.5-0.2-0.6 DJF MAM JJA SON JMA(V0703C) MSC ECMWF Meteo-Fr -0.3 DJF MAM JJA SON JMA(V0703C) MSC ECMWF Meteo-Fr MSSS of T2m over the Northern extra-tropics and the Tropics in JMA is the best among them. 05/11/08 Training seminar on Forecasting 22
MSSS score MSSS score Compare verification scores of JMA with those of other centers Mean Square Skill Score Parameter Precipitation 0.4 [Precipitation] region: northern extra-tropics Northern extra-tropics (20-90N) 0.3 [Precipitation] Tropics region: (20S-20N) tropics 0.2 0.2 0.1 0 0-0.2-0.4-0.1-0.2-0.3-0.6-0.8 DJF MAM JJA SON JMA(V0703C) MSC ECMWF Meteo-Fr -0.4-0.5-0.6 DJF MAM JJA SON JMA(V0703C) MSC ECMWF Meteo-Fr MSSS of precipitation over the Northern extra-tropics in JMA is similar to these one in ECMWF. MSSS of precipitation over the Tropics in JJA in JMA is the worst among them. 05/11/08 Training seminar on Forecasting 23
ROC area ROC area MSSS Notes 0.50 0.40 0.30 0.20 0.10 0.00-0.10 Mean Square Skill Score of 2-meter temperature over land in the tropics JRA-25 ERA-40 UKMO/CRU DJF MAM JJA SON Season Figure 1: MSSS with JRA-25, ERA-40 and CRU. 0.85 0.80 0.75 0.70 0.65 0.60 ROC area of 2-meter temperature in upper tercile over land in the tropics JRA-25 ERA-40 UKMO/CRU Sensitivity of skill score to verification data The differences between each score using three verification data are smaller than 1%. (Figure 1, 2) The error bar indicates the uncertainty of verification sampling. Verification sampling have larger influence on the forecast skill scores than difference of verification data. Sensitivity of skill score to period of hindcasts The difference between two scores is larger than those of verification data as shown in figure 2. (Figure 3) DJF MAM JJA SON 05/11/08 Training seminar on Forecasting 24 Season Figure 2: ROC area with JRA-25, ERA-40 and CRU. It is important that have the same verification sampling and hindcast period as other centers. 0.8 0.75 0.7 0.65 0.6 0.55 18years 22years DJF MAM JJA SON Season Figure 3: ROC area with 18years hindcast, 22years hindcst.
Summary of Part Ⅱ We are verifying over period of hindcasts based on SVSLRF and real-time operational forecasts. You should check the accuracy of our models on TCC web-site. http://ds.data.jma.go.jp/tcc/tcc/products/model/index.html We can get verification scores of several centers on Lead Center web-site. We have to note that there are different verification sampling and hindcast period among those centers when comparing own scores with other centers. 05/11/08 Training seminar on Forecasting 25
References Murphy, A.H., 1988: Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Wea. Rev., 16, 2417-2424. WMO, 2006: Standardized Verification System (SVS) for Long-Range Forecasts (LRF). New Attachment II-8 to the Manual on the GDPFS (WMO-No.485), Volume I. http://www.bom.gov.au/wmo/lrfvs/ 05/11/08 Training seminar on Forecasting 26
Information Probabilistic forecasts 05/11/08 Training seminar on Forecasting 27
Probabilistic forecasts using the Ordered Probit model* on the TCC website *The Ordered Probit model is used as the statistical tool of the MOS. http://ds.data.jma.go.jp/tcc/tcc/products/model/probfcst/4me/index.html three-month forecast Verification 05/11/08 Training seminar on Forecasting 28
Probabilistic forecasts Forecast period: three-month mean Parameter: Surface temperature and Precipitation Region: each grid Issued date: the end of every month Nov-Jan 05/11/08 Training seminar on Forecasting 29
Probabilistic forecasts Init. Oct. 2008 Forecast period: Nov.-Dec.-Jan. 2009 Parameter: surface temperature Probabilistic forecasts 23:35:42 Probabilistic verification score at the grid-point(100e, 15N) Init. 10 th Aug. Probabilistic forecast of all case over hindcast BSS: 8.4 > 0 ROC area: 0.59 > 0.5 The grid-point is better than climatology. Notice: Observed frequency of probabilistic over 0.5 is less than the expected frequency. 05/11/08 Training seminar on Forecasting 30