Verification of the operational NWP models at DWD - with special focus at COSMO-EU Ulrich Damrath Ulrich.Damrath@dwd.de
Ein Mensch erkennt (und das ist wichtig): Nichts ist ganz falsch und nichts ganz richtig Eugen Roth
A man discerns - perhaps you too: Nothing is entirely wrong and nothing entirely true Eugen Roth
Basic tasks of verification 4Comparison of forecast results with the truth (observation and analysis) 4Calculation of forecast quality in terms of so called scores 4Feedback to endusers and modellers
Outline: A mixture of 4General remarks 4The verification system at DWD 4Examples of verification results 4How did verification help (in some cases)?
Purposes of verification 4Watching the operational system 4Assessment of new model versions 4Evaluation of predictability 4Interpretation of model errors
Methods of verification 4Subjective assessment 4Objective assessment 4Special investigations
Course of verification on principal Forecast Modeller Analysis Verification Enduser Data control Observation Data control
Forecast of precipitation
Climate simulation for precipitation from: U. Böhm et.al.: Climate reconstruction over Europe...
Climate simulation for precipitation
Observation (12.08.2002 06 UTC 13.08.2002 06 UTC) and forecast of precipitation (T+18-T+42 from 11.08.2002 12 UTC
Observation (12.08.2002 06 UTC 13.08.2002 06 UTC) and forecast of precipitation (T+06-T+30 from 12.08.2002 00 UTC
Objective assessment calculation of scores which characterize the truth of the forecast Advantage: 8objective criteria for all forecasts Disadvantage: 8Perhaps only some aspects of the forecasts are catched. 8Impossibility to express all degrees of freedom of a forecast in some simple numbers
Methods of objective verification 4calculation of scores for continuous weather elements 4calculation of scores for categorical weather elements 4calculation of scores for probabilistic weather elements 4calculation of scores that are representative for very high resolution models
Scores for continuous weather elements 8Mean error 8Mean absolute error 8Root mean square error 8Standard deviation 8Correlation coefficient 8Skill scores
A basic rule Try to interpret each score dependent on its characteristics as - a physical and - a statistical term!
Scores for continuous weather elements - mean error or BIAS BIAS = F i ME = 1 N forecast N ( F O i i= 1 Properties: ideal value: 0 no general information about forecast quality If errors have a Gaussian distribution: Estimation of the expected value of errors i ) O i observation
Time series of BIAS of 2m-temperatures over Germany (January 1994 till May 2007)
Time series of BIAS of 2m-temperatures over Germany (January 1994 till May 2007)
Time series of BIAS of wind speed 10m over Germany (May 2006 - May 2007)
Time series of BIAS of wind speed 10m over Germany stations below 100 m (May 2006 - May 2007)
Monthly mean values of surface weather elements for station 10427 - January 2008
Monthly mean values of surface weather elements for station 10673 - January 2008
Monthly mean values of surface weather elements for station 10637 2008 (cases with cloud cover observed and forecasted >4/8)
Example for conditional verification Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 (RMSE and STDV) Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 observed and forecasted values lower than 1020 hpa (RMSE and STDV) Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 observed and forecasted values higher than 1020 hpa (RMSE and STDV)
Distribution of surface level pressure in COSMO-EU and in GME
Distribution of surface level pressure in COSMO-EU and in GME
Distribution of surface level pressure in COSMO-EU and in GME
Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU Consideration of frictional and gravity effects caused by processes that are not resolved by model orography
Distribution of surface level pressure in COSMO-EU and in GME
Distribution of surface level pressure in COSMO-EU and in GME
Distribution of surface level pressure in COSMO-EU and in GME
Distribution of surface level pressure in COSMO-EU and in GME
Scores for continuous weather elements - mean absolute error MAE = 1 N N i= 1 F i O i Properties: ideal value: 0 errors are simply added
Scores for continuous weather elements - root mean square error RMSE = N 1 ( F i N i= 1 O i ) 2 Properties: ideal value: 0 sensitive concerning phase and amplitude errors
Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU (Spring)
Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU(Autumn)
Verification of surface weather elements: Modification of T2m-diagnosis in GME, COSMO-EU and COSMO-DE z P + z 0 z A + z 0 0 z 0 l = l κ logarithmic freeturbulent-layer profile φ A expon. roughness-layer profile 0 φ S φ 2 φ δ φ m h 2m h z P + z d z A + z d z d 0 l d l = d κ unstable stable φ 2m φ S φ 20m 10m z P z A h 2m 0 z φ upper boundary of the lowest model layer lower boundary of the lowest model layer lowest model main level
Verification of surface weather elements: Modification of T2m-diagnosis in GME, COSMO-EU and COSMO-DE Z 0 taken from model surface Z 0 taken as 2 cm
Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Winter)
Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Winter)
Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Summer)
Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Sommer)
Diurnal cycle of forecasted and observed T2m- COSMO-EU, Summer 2007, Start 00 UTC
Diurnal cycle of forecasted and observed T2m - COSMO-EU, Summer 2008, Start 00 UTC
Diurnal cycle of forecasted and observed precipitation - COSMO-EU, Summer 2007(top) and 2008(bottom), Start 00 UTC!!!!!
Bias T2m- GME/LM/COSMO-EU, since 1999, Stations below 100 m, Start 00 UTC
Bias T2m- GME/LM/COSMO-EU, since 1999, Stations below 100 m, Start 00 UTC
Scores for continuous weather elements - standard deviation STDV = N 1 ( F i Oi ( F N i= 1 O)) Properties: ideal value: 0 If errors have a Gaussian distribution: around 68% of errors can be expected in the interval: < BIAS-STDV, BIAS+STDV > 2
Scores for continuous weather elements - correlation coefficient R : reference forecast short range : persistence medium range: climate CC = Properties: range: -1 till +1 (ideal value) mainly sensitive concerning phase errors critical value: 0.6 ( F ( F R R ( F ( F R)) R)) 2 ( A ( A R R ( A ( A R)) R)) 2
Scores for continuous weater elements Examples of scores for four artificial situations:
Scores for continuous weater elements Examples of scores for four artificial situations:
Scores for continuous weater elements Examples of scores for four artificial situations:
Scores for continuous weater elements Examples of scores for four artificial situations:
Scores for continuous weather elements - Skill Score S1 Properties: range: 0 (ideal value) till 200 (worst value)
Scores for continuous weather elements - skill score - relates a score of interest to a score of a reference forecast SKSC = SCORE( forecast) SCORE( ideal) SCORE( reference) SCORE( reference) Properties: range: - till +1 (ideal value) example: reduction of variance: sometimes given in percent RV = 1 RMSE RMSE 2 forecast 2 reference
A global score for COSMO models: COSI rmse = 1 n n ( ) t f t o 2 Continuous elements rmsvwe = 1 n n V f V o 2 Wind SS = 1 r r 2 f 2 p Skill score for wind and continuous elements related to persistence ETS = R "chance" T "chance" Skill score for categorical elements related to chance
Final definition of the score: 1 Continuous elements, sum over 8 time steps: (start: vv=3, end: vv=24, step: 3) Equal weights for all forecast times! SK SK j j 8 1 = 8 : i= 1 Skill sk i, j Score for element j sk i, j : Skill Score for element j at time stepi
SK sk Final definition of the score: 2 Categorical elements, sum over N time steps: (start: vv=vvb, end: vv=vve, step: vvs) Cloud cover: vvb=3, vve=24 vvs=3 Precipitation: vvb=6, vve=24 vvs=6 SK Equal weights for all forecast times and categories! j j i, j, m M N Skill Skill M l= 1 N = : : 1 * i= 1 sk Score Score in category m i, j, m for element for element j j at time stepi
Final definition of the score: 3 Sum over all elements: Equal weights for all elements! SK = 1 N E N E j= 1 SK j
The Score COSI 2003-2010: All days COSMO-EU V3.19 V 3.22 T 2m SSO
Long term trends of verification results for surface weather elements: Wind 00 UTC
Long term trends of verification results for surface weather elements: Wind 12 UTC
Long term trends of verification results for surface weather elements: Temperature 2m 12 UTC
Long term trends of verification results for surface weather elements: Minimumtemperature 2m 06 UTC
Long term trends of verification results for surface weather elements: Maximumtemperature 2m 18 UTC
Long term trends of verification results for surface weather elements: Temperature 2m 18 UTC
Single components of the COSI: Day 1 Trend: + Trend: + Trend: + Trend:? Trend:?
Verification of precipitation area mean
Scores for categorical weather elements aim: to express the correctness of forecasts for yes/no events for example: fog, thunderstorm, cloudiness, precipitation (over a limit of interest), temperature: warming or cooling,... Method: stratification of forecasts build up contingency tables
Scores for categorical weather elements General structure of contingency tables: Observation yes Observation no Forecast yes A B Forecast no C D
Scores for categorical weather elements Probability of detection: POD OY ON POD = A A + C FY A B FN C D range : 0-1 perfect : 1 maximise : overforecast event disadvantage: only yes-forecasts are considered
Scores for categorical weather elements False alarm rate: FAR OY ON FAR = A B + B FY A B FN C D range : 0-1 perfect : 0 maximise : underforecast event disadvantage: only yes-forecasts are considered
Scores for categorical weather elements Percentage correct: PEC A D A+ B+ + = C+ D A N D PEC = + range : 0-1 perfect : 1 maximise : do everything correct disadvantage: for rare events high correctness is suggested OY ON FY A B FN C D
Scores for categorical weather elements Frequency bias: FBI FBI = A A + + C B OY ON FY A B FN C D range : 0 - perfect : 1 maximise : do everything correct disadvantage: no account to accuracy
Scores for categorical weather elements Threat score (critical success index): CSI CSI = A+ B A + C range : 0-1 perfect : 1 advantage : false alarms and missed events are included disadvantage: Depends on climatological frequency of events (poorer scores for rare events) since some hits can occur purely due to random chance. OY ON FY A B FN C D
Scores for categorical weather elements Equitable threat score : ETS FY A B relates CSI to a reference forecast FN C D = A E A+ B+ ( A+ B) ( C E E( chance) = A+ B+ C range : -1/3-1 perfect : 1 advantage : false alarms and missed events are included, effect of a reference forecasts is included ETS OY ON A+ C) + D
Scores for categorical weather elements Heidke skill score : HSS OY relates PEC to a reference forecast = A+ D R A+ B+ C+ D R HSS R( chance) range : - -1 perfect : 1 advantage : false alarms and missed events are included, effect of a reference forecasts is included disadvantage: very sensitive to dry events = ( A+ B) ( A+ C) + ( C+ D) ( B+ D) A+ B+ C+ D ON FY A B FN C D
Scores for categorical weather elements True skill statistics (Hansen-Kuiper discriminant) : TSS TSS = A D B C ( A+ C) ( B+ D) range : -1 - +1 perfect : 1 advantage : equal emphasis to yes/noevents disadvantage: for rare events similar to POD OY ON FY A B FN C D
Scores for categorical weather elements: GME-grid 2007
Scores for categorical weather elements COSMO-EU-grid 2007
Scores for probabilistic weather elements: The Brier score BRSC = N 1 ( F i N i= 1 O i F i : forecasted probability O i : observed probability 0: if event was not observed 1: if event was observed Properties: ideal value: 0 maximum: 1 ) 2
Verification of very high resolution models - basic problem: The double penalty hhigh resolution models often give right signals hbut 8 forecasts may be a little bit shifted compared to observations htherefore 8 forecasts are penalized at those points where the event was predicted 8 forecasts are penalized at those points where the event was not predicted 8 The right signal is not recompensed!
!!! A question for a quiz program: Which forecast is the best?!!!
!?!
Verification of very high resolution models - basic problem: Solution look into windows with different sizes calculate a set of scores that represent the content of the windows
GME COSMO-EU COSMO-DE Fractions skill score for forecasts of GME, COSMO-EU and COSMO-DE for December 2008, forecast time 06-18 hours
Examination of statistical significance of fuzzy -verification results using bootstrapping Basic idea of bootstrapping: Repeat a resampling all elements of a given in a sample of forecasts and observations as often as necessary (N times) and calculate the relevant score(s) Calculate from N scores statistical properties of the sample such as mean value standard deviation, confidence intervals and quantiles Application to fuzzy -verification Resampling is done using blocks. Blocks are defined as single days. Number of resampling cases: N=Days*100 Calculation scores from N samples for NT thesholds and NW windows Calculation of quantiles for each window and threshold
Values and quantiles 0.1 and 0.9 for Upscaling ETS COSMO-DE, period June -August 2009 Germany
Differences between GME and COSMO-DE ETS(COSMO-DE) - ETS(GME) Significance test Germany COSMO-DE better than GME COSMO-DE worse than GME
Differences between COSMO-DE and COSMO-EU ETS(COSMO-DE) - ETS(COSMO-EU) Significance test Germany COSMO-DE better than COSMO-EU COSMO-DE worse than COSMO-EU
Differences between COSMO-DE and COSMO-EU ETS(COSMO-DE) - ETS(COSMO-EU) Significance test COSMO-DE better than COSMO-EU COSMO-DE worse than COSMO-EU
The problem of precipitation seems to be solved!!! BUT!
One of open questions: Surprisingly bad forecasts for temperature on some days during winter
One of open questions: Forecasted temperature over snow
One of open questions: Forecasted temperature over snow
The verification system at DWD verification against analysis verification against surface observations verification of vertical structures using different observation systems
Verification against analysis verification of geopotential, temperature, wind and humidity at all pressure levels calculation of ME, RMSE, correlation coefficients, Skill-Score S1,... calculation of scores for 26 regions calculation of horizontal distribution of errors
Verification against surface weather elements 8Verification worldwide (~3300 stations) 8 cloud cover(s) and 6h-precipitation (contingency tables), gusts 8 wind, temperature, dew point(depression), extreme temperatures 8Verification over Germany (~205 stations) 8 cloud cover(s) and 1h-precipitation (contingency tables), gusts 8 wind, temperature, dew point(depression), extreme temperatures
Verification of vertical structures 4Geopotential, wind, temperature, humidity 4TEMPs, AMDARs, wind profiler 4GME, COSMO-EU, COSMO-DE 4single TEMPs, single wind profiler,...
Vertical profiles of forecast errors according to AMDAR measurements bias of wind speed Forecasts starting 00 UTC since 2000 positive bias negative bias Fc-time:00 h Fc-time:24 h Fc-time:48 h
Vertical profiles of forecast errors according to TEMP measurements bias of wind speed Forecasts starting 00 UTC since 2000 positive bias negative bias Fc-time:00 h Fc-time:12 h Fc-time:24 h Fc-time:36 h Fc-time:48 h
Number of scores for each model run (crude approximation) 8Verifcation against surface observations: ~ 35000 scores 8Verification against analysis: ~ 45000 scores 8Verification against upper air observations: ~ 130000 scores 8Calculate the scores AND watch the system! 8Verification is a full time job for at least 2 persons!
A typical situation
First possible solution of the problem: t-test,... Hypothesistests: Are average values of two different time series significantly different? Significancetests for the parameters of these time series -t Test weak t test t = t test N X μ S N: Number of cases X: sample mean value S: sample standard deviation μ: hypothetial mean value t S = 2 D = x1 x S D ( N 1 2 2 1) S1 + ( N2 1) S N + N 2 1 N1N2 N + N 1 2 2 2 2
Second possible solution of the problem: bootstrapping Sample with 10 elements Realisation 1: mean value using elements: 5 3 8 7 8 4 7 0 4 3 Realisation 2: mean value using elements: 3 2 0 5 1 2 0 2 2 8 Realisation 3: mean value using elements: 5 2 3 6 8 3 8 0 8 6 Realisation 4: mean value using elements: 7 5 1 6 4 0 1 2 1 6 Realisation 5: mean value using elements: 6 5 8 6 1 0 0 2 3 2 Realisation 6: mean value using elements: 1 0 5 5 6 5 8 5 5 8 Realisation 7: mean value using elements: 3 4 4 4 2 8 5 3 2 6 Realisation 8: mean value using elements: 0 8 2 0 6 4 1 6 6 5 Realisation 9: mean value using elements: 0 7 5 6 3 2 2 3 8 8 Realisation 10: mean value using elements: 2 2 3 6 6 6 6 2 0 0
A typical situation