Verification of the operational NWP models at DWD - with special focus at COSMO-EU. Ulrich Damrath

Verification of the operational NWP models at DWD - with special focus at COSMO-EU Ulrich Damrath Ulrich.Damrath@dwd.de

Ein Mensch erkennt (und das ist wichtig): Nichts ist ganz falsch und nichts ganz richtig Eugen Roth

A man discerns - perhaps you too: Nothing is entirely wrong and nothing entirely true Eugen Roth

Basic tasks of verification 4Comparison of forecast results with the truth (observation and analysis) 4Calculation of forecast quality in terms of so called scores 4Feedback to endusers and modellers

Outline: A mixture of 4General remarks 4The verification system at DWD 4Examples of verification results 4How did verification help (in some cases)?

Purposes of verification 4Watching the operational system 4Assessment of new model versions 4Evaluation of predictability 4Interpretation of model errors

Course of verification on principal Forecast Modeller Analysis Verification Enduser Data control Observation Data control

Methods of verification 4Subjective assessment 4Objective assessment 4Special investigations

Forecast of precipitation

Observation (12.08.2002 06 UTC 13.08.2002 06 UTC) and forecast of precipitation (T+18-T+42 from 11.08.2002 12 UTC

Observation (12.08.2002 06 UTC 13.08.2002 06 UTC) and forecast of precipitation (T+06-T+30 from 12.08.2002 00 UTC

Objective assessment calculation of scores which characterize the truth of the forecast Advantage: 8objective criteria for all forecasts Disadvantage: 8Perhaps only some aspects of the forecasts are catched. 8Impossibility to express all degrees of freedom of a forecast in some simple numbers

Methods of objective verification 4calculation of scores for continuous weather elements 4calculation of scores for categorical weather elements 4calculation of scores for probabilistic weather elements (a short information only) 4calculation of scores that are representative for very high resolution models

Scores for continuous weather elements 8Mean error 8Mean absolute error 8Root mean square error 8Standard deviation 8Correlation coefficient 8Skill scores

A basic rule Try to interpret each score dependent on its characteristics as - a physical and - a statistical term!

Scores for continuous weather elements - mean error or BIAS BIAS = F i ME = 1 N forecast N ( F O i i= 1 Properties: ideal value: 0 no general information about forecast quality If errors have a Gaussian distribution: Estimation of the expected value of errors i ) O i observation

Time series of BIAS of 2m-temperatures over Germany (January 1994 till May 2007)

Time series of BIAS of wind speed 10m over Germany (May 2006 - May 2007)

Time series of BIAS of wind speed 10m over Germany stations below 100 m (May 2006 - May 2007)

Monthly mean values of surface weather elements for station 10427 - January 2008

Monthly mean values of surface weather elements for station 10673 - January 2008

Monthly mean values of surface weather elements for station 10637 2008 (cases with cloud cover observed and forecasted >4/8)

Example for conditional verification Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 (RMSE and STDV) Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 observed and forecasted values lower than 1020 hpa (RMSE and STDV) Forecasted and observed values of surface level pressure over the region of Germany during DJF 2005/2006 observed and forecasted values higher than 1020 hpa (RMSE and STDV)

Distribution of surface level pressure in COSMO-EU and in GME

Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU Consideration of frictional and gravity effects caused by processes that are not resolved by model orography

Distribution of surface level pressure in COSMO-EU and in GME

Scores for continuous weather elements - mean absolute error MAE = 1 N N i= 1 F i O i Properties: ideal value: 0 errors are simply added

Scores for continuous weather elements - root mean square error RMSE = N 1 ( F i N i= 1 O i ) 2 Properties: ideal value: 0 sensitive concerning phase and amplitude errors

Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU (Spring)

Verification of surface weather elements: The effect of the SSO-scheme in COSMO-EU(Autumn)

Verification of surface weather elements: Modification of T2m-diagnosis in GME, COSMO-EU and COSMO-DE z P + z 0 z A + z 0 0 z 0 l = l κ logarithmic freeturbulent-layer profile φ A expon. roughness-layer profile 0 φ S φ 2 φ δ φ m h 2m h z P + z d z A + z d z d 0 l d l = d κ unstable stable φ 2m φ S φ 20m 10m z P z A h 2m 0 z φ upper boundary of the lowest model layer lower boundary of the lowest model layer lowest model main level

Verification of surface weather elements: Modification of T2m-diagnosis in GME, COSMO-EU and COSMO-DE Z 0 taken from model surface Z 0 taken as 2 cm

Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Winter)

Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Summer)

Verification of surface weather elements: Modification of T2m-diagnosis in COSMO-EU (Sommer)

Diurnal cycle of forecasted and observed T2m- COSMO-EU, Summer 2007, Start 00 UTC

Diurnal cycle of forecasted and observed T2m - COSMO-EU, Summer 2008, Start 00 UTC

Diurnal cycle of forecasted and observed precipitation - COSMO-EU, Summer 2007(top) and 2008(bottom), Start 00 UTC!!!!!

Bias T2m- GME/LM/COSMO-EU, since 1999, Stations below 100 m, Start 00 UTC

Scores for continuous weather elements - standard deviation STDV = N 1 ( F i Oi ( F N i= 1 O)) Properties: ideal value: 0 If errors have a Gaussian distribution: around 68% of errors can be expected in the interval: < BIAS-STDV, BIAS+STDV > 2

Scores for continuous weather elements - correlation coefficient R : reference forecast short range : persistence medium range: climate CC = Properties: range: -1 till +1 (ideal value) mainly sensitive concerning phase errors critical value: 0.6 ( F ( F R R ( F ( F R)) R)) 2 ( A ( A R R ( A ( A R)) R)) 2

Scores for continuous weater elements Examples of scores for four artificial situations:

Scores for continuous weather elements - Skill Score S1 Properties: range: 0 (ideal value) till 200 (worst value)

Scores for continuous weather elements - skill score - relates a score of interest to a score of a reference forecast SKSC = SCORE( forecast) SCORE( ideal) SCORE( reference) SCORE( reference) Properties: range: - till +1 (ideal value) example: reduction of variance: sometimes given in percent RV = 1 RMSE RMSE 2 forecast 2 reference

Scores for categorical weather elements aim: to express the correctness of forecasts for yes/no events for example: fog, thunderstorm, cloudiness, precipitation (over a limit of interest), temperature: warming or cooling,... Method: stratification of forecasts build up contingency tables

Scores for categorical weather elements General structure of contingency tables: Observation yes Observation no Forecast yes A B Forecast no C D

Scores for categorical weather elements Probability of detection: POD OY ON POD = A A + C FY A B FN C D range : 0-1 perfect : 1 maximise : overforecast event disadvantage: only yes-forecasts are considered

Scores for categorical weather elements False alarm rate: FAR FAR = B A+ B OY ON FY A B FN C D range : 0-1 perfect : 0 maximise : underforecast event disadvantage: only yes-forecasts are considered

Scores for categorical weather elements Percentage correct: PEC A+ D = A+ B+ C+ D A D N PEC = + range : 0-1 perfect : 1 maximise : do everything correct disadvantage: for rare events high correctness is suggested OY ON FY A B FN C D

Scores for categorical weather elements Frequency bias: FBI FBI = A+ B A+ C OY ON FY A B FN C D range : 0 - perfect : 1 maximise : do everything correct disadvantage: no account to accuracy

Scores for categorical weather elements Threat score (critical success index): CSI CSI = A A+ B+ C range : 0-1 perfect : 1 advantage : false alarms and missed events are included disadvantage: Depends on climatological frequency of events (poorer scores for rare events) since some hits can occur purely due to random chance. OY ON FY A B FN C D

Scores for categorical weather elements Equitable threat score : ETS FY A B relates CSI to a reference forecast FN C D = A E ( A+ B) ( A+ B+ C E E( chance) = A+ B+ C range : -1/3-1 perfect : 1 advantage : false alarms and missed events are included, effect of a reference forecasts is included ETS OY ON A+ C) + D

Scores for categorical weather elements Heidke skill score : HSS OY relates PEC to a reference forecast = A+ D R A+ B+ C+ D R HSS R( chance) range : - - 1 perfect : 1 advantage : false alarms and missed events are included, effect of a reference forecasts is included disadvantage: very sensitive to dry events = ( A+ B) ( A+ C) + ( C+ D) ( B+ D) A+ B+ C+ D ON FY A B FN C D

Scores for categorical weather elements True skill statistics (Hansen-Kuiper discriminant) : TSS TSS = A D B C ( A+ C) ( B+ D) range : -1 - +1 perfect : 1 advantage : equal emphasis to yes/noevents disadvantage: for rare events similar to POD OY ON FY A B FN C D

Gütemaße für kategorische Wetterelemente Beispiele für Gütemaße für zwei fiktive Situationen: B J B N B J B N V J 1 7 V N 3 8 9 V J 2 5 1 0 V N 1 0 5 5 PEC: 90% POD: 25% FAR: 88% HSS: 12% PEC: 80% POD: 71% FAR: 29% HSS: 56%

Scores for categorical weather elements: GME-grid 2007

Scores for categorical weather elements COSMO-EU-grid 2007

A global score for COSMO models: COSI - a mixture of continuous and categorical scores rmse = 1 n ( ) t f t o n 2 Continuous elements rmsvwe = 1 n Vf V n o 2 Wind SS = 1 r r 2 f 2 p Skill score for wind and continuous elements related to persistence ETS = R "chance" T "chance" Skill score for categorical elements related to chance

Final definition of the score: 1 Continuous elements, sum over 8 time steps: (start: vv=3, end: vv=24, step: 3) Equal weights for all forecast times! SK SK j j 8 1 = sk 8 : i= 1 Skill i, j Score for element j sk i, j : Skill Score for element j at time step i

SK sk Final definition of the score: 2 Categorical elements, sum over N time steps: (start: vv=vvb, end: vv=vve, step: vvs) Cloud cover: vvb=3, vve=24 vvs=3 Precipitation: vvb=6, vve=24 vvs=6 SK Equal weights for all forecast times and categories! j j i, j, m M N Skill Skill M l= 1 N = : : 1 * i= 1 sk Score Score in category m i, j, m for element for element j j at time step i

Final definition of the score: 3 Sum over all elements: Equal weights for all elements! SK = 1 N E N E j= 1 SK j

The Score COSI 2003-2010: All days COSMO-EU V3.19 V 3.22 T 2m SSO

The Score COSI 2003-2011: All days

Long term trends of verification results for surface weather elements: Wind 00 UTC

Long term trends of verification results for surface weather elements: Wind 12 UTC

Long term trends of verification results for surface weather elements: Temperature 2m 12 UTC

Long term trends of verification results for surface weather elements: Minimumtemperature 2m 06 UTC

Long term trends of verification results for surface weather elements: Maximumtemperature 2m 18 UTC

Long term trends of verification results for surface weather elements: Temperature 2m 18 UTC

Single components of the COSI: Day 1 Trend: + Trend: + Trend: + Trend:? Trend:?

Verification of precipitation area mean

Scores for probabilistic weather elements: The Brier score BRSC = N 1 ( F i N i= 1 O i F i : forecasted probability O i : observed probability 0: if event was not observed 1: if event was observed Properties: ideal value: 0 maximum: 1 ) 2

Scores for probabilistic weather elements: The Brier skill score The reliability diagram The ROC curve The Talagrand diagram

Verification of very high resolution models - basic problem: The double penalty hhigh resolution models often give right signals hbut 8 forecasts may be a little bit shifted compared to observations htherefore 8 forecasts are penalized at those points where the event was predicted 8 forecasts are penalized at those points where the event was not predicted 8 The right signal is not recompensed!

Verification of very high resolution models - basic problem: Solution look into windows with different sizes calculate a set of scores that represent the content of the windows

Fraction of pixels with rain accumulation 4 mm in 1x1 km windows (left) and in 35x35 km windows (right) Fuzzy Verification of High Resolution Gridded Forecasts: A Review and Proposed Framework Elizabeth E. Ebert Bureau of Meteorology Research Centre Submitted to Meteorological Applications FSS 1 N N ( P P fcst obs i = 1 = 1 N N 1 2 1 P + fcst N i = 1 N i = 1 ) P 2 2 obs

GME COSMO-EU COSMO-DE Fractions skill score for forecasts of GME, COSMO-EU and COSMO-DE for December 2008, forecast time 06-18 hours

The verification system at DWD verification against analysis verification against surface observations verification of vertical structures using different observation systems

Verification against analysis verification of geopotential, temperature, wind and humidity at all pressure levels calculation of ME, RMSE, correlation coefficients, Skill-Score S1,... calculation of scores for 26 regions calculation of horizontal distribution of errors

Verification against surface weather elements 8Verification worldwide (~3300 stations) 8 cloud cover(s) and 6h-precipitation (contingency tables), gusts 8 wind, temperature, dew point(depression), extreme temperatures 8Verification over Germany (~205 stations) 8 cloud cover(s) and 1h-precipitation (contingency tables), gusts 8 wind, temperature, dew point(depression), extreme temperatures

Verification of vertical structures 4Geopotential, wind, temperature, humidity 4TEMPs, AMDARs, wind profiler 4GME, COSMO-EU, COSMO-DE 4single TEMPs, single wind profiler,... 4Wait perhaps a moment for the next slide

BIAS 00 H 24 H 48 H Geopotential RMSE

BIAS 00 H 24 H 48 H Geopotential 17.12.2003 17.12.2003 GME GMEPseudoTemps PseudoTemps RMSE

BIAS 00 H 24 H 48 H Geopotential 27.09.2004 27.09.2004 GME GME40km 40km RMSE

BIAS 00 H 24 H 48 H Geopotential 28.09.2005 28.09.2005 COSMO-EU COSMO-EUEinführung Einführung RMSE

BIAS 00 H 24 H 48 H Geopotential 12.11.2008 12.11.2008 Einschalten Einschaltendes dessso-schemas SSO-Schemas RMSE

BIAS 00 H 24 H 48 H Geopotential RMSE 29.06.2010 29.06.2010 Runge-Kutta-Kern/Änderung Runge-Kutta-Kern/Änderungder derreferenzatmosphäre Referenzatmosphäre

BIAS Windspeed RMSE 00 H 24 H 48 H

BIAS Windspeed 17.12.2003 17.12.2003 GME GME PseudoTemps PseudoTemps RMSE 00 H 24 H 48 H

BIAS Windspeed 27.09.2004 27.09.2004 GME GME 40km 40km RMSE 00 H 24 H 48 H

BIAS Windspeed 28.09.2005 28.09.2005 COSMO-EU COSMO-EU Einführung Einführung RMSE 00 H 24 H 48 H

BIAS Windspeed 12.11.2008 12.11.2008 Einschalten Einschalten des des SSO-Schemas SSO-Schemas RMSE 00 H 24 H 48 H

BIAS Windspeed RMSE 29.06.2010 29.06.2010 Runge-Kutta-Kern/Änderung der der Referenzatmosphäre Referenzatmosphäre 00 H 24 H 48 H

BIAS Temperature RMSE 00 H 24 H 48 H

BIAS Temperature 17.12.2003 17.12.2003 GME GME PseudoTemps PseudoTemps RMSE 00 H 24 H 48 H

BIAS Temperature 27.09.2004 27.09.2004 GME GME 40km 40km RMSE 00 H 24 H 48 H

BIAS Temperature 28.09.2005 28.09.2005 COSMO-EU COSMO-EU Einführung Einführung RMSE 00 H 24 H 48 H

BIAS Temperature 12.11.2008 12.11.2008 Einschalten Einschalten des des SSO-Schemas SSO-Schemas RMSE 00 H 24 H 48 H

BIAS Temperature RMSE 29.06.2010 29.06.2010 Runge-Kutta-Kern/Änderung der der Referenzatmosphäre Referenzatmosphäre 00 H 24 H 48 H

BIAS Temperature 2007 2008 2009 2010

BIAS Relative Humidity RMSE 00 H 24 H 48 H

BIAS Relative Humidity 16.09.2003 16.09.2003 Prognostisches Prognostisches Wolkeneis Wolkeneis RMSE 00 H 24 H 48 H

BIAS Relative Humidity 28.09.2005 28.09.2005 COSMO-EU COSMO-EU Einführung Einführung RMSE 00 H 24 H 48 H

BIAS Relative Humidity 17.07.2007 17.07.2007 Verbesserte Verbesserte Qualitätskontrolle Qualitätskontrolle der der Radiosondenfeuchten Radiosondenfeuchten RMSE 00 H 24 H 48 H

BIAS Relative Humidity RMSE 02.02.2010 02.02.2010 Änderung Änderung seitliche seitliche RB RB für für Regen/Schneewassergehalt 00 H 24 H 48 H

Vertical profiles of forecast errors according to AMDAR measurements bias of wind speed Forecasts starting 00 UTC since 2000 positive bias negative bias Fc-time:00 h Fc-time:24 h Fc-time:48 h

Vertical profiles of forecast errors according to TEMP measurements bias of wind speed Forecasts starting 00 UTC since 2000 positive bias negative bias Fc-time:00 h Fc-time:12 h Fc-time:24 h Fc-time:36 h Fc-time:48 h

Number of scores for each model run (crude approximation) 8Verifcation against surface observations: ~ 35000 scores 8Verification against analysis: ~ 45000 scores 8Verification against upper air observations: ~ 130000 scores 8Calculate the scores AND watch the system! 8Verification is a full time job for at least 2 persons!

A typical situation

First possible solution of the problem: t-test,... Hypothesistests: Are average values of two different time series significantly different? Significancetests for the parameters of these time series - t Test weak t test t = t test N X µ S N: Number of cases X: sample mean value S: sample standard deviation µ: hypothetial mean value t S = 2 D = x1 x S D ( N 1 2 2 1) S1 + ( N2 1) S N + N 2 1 N1N2 N + N 1 2 2 2 2

Second possible solution of the problem: bootstrapping Sample with 10 elements Realisation 1: mean value using elements: 5 3 8 7 8 4 7 0 4 3 Realisation 2: mean value using elements: 3 2 0 5 1 2 0 2 2 8 Realisation 3: mean value using elements: 5 2 3 6 8 3 8 0 8 6 Realisation 4: mean value using elements: 7 5 1 6 4 0 1 2 1 6 Realisation 5: mean value using elements: 6 5 8 6 1 0 0 2 3 2 Realisation 6: mean value using elements: 1 0 5 5 6 5 8 5 5 8 Realisation 7: mean value using elements: 3 4 4 4 2 8 5 3 2 6 Realisation 8: mean value using elements: 0 8 2 0 6 4 1 6 6 5 Realisation 9: mean value using elements: 0 7 5 6 3 2 2 3 8 8 Realisation 10: mean value using elements: 2 2 3 6 6 6 6 2 0 0

A typical situation

Examination of statistical significance of fuzzy -verification results using bootstrapping Basic idea of bootstrapping: Repeat a resampling all elements of a given in a sample of forecasts and observations as often as necessary (N times) and calculate the relevant score(s) Calculate from N scores statistical properties of the sample such as mean value standard deviation, confidence intervals and quantiles Application to fuzzy -verification Resampling is done using blocks. Blocks are defined as single days. Number of resampling cases: N=Days*100 Calculation scores from N samples for NT thesholds and NW windows Calculation of quantiles for each window and threshold

Values and quantiles 0.1 and 0.9 for Upscaling ETS COSMO-DE, period June -August 2009 Germany

Differences between GME and COSMO-DE ETS(COSMO-DE) - ETS(GME) Significance test Germany COSMO-DE better than GME COSMO-DE worse than GME

Differences between COSMO-DE and COSMO-EU ETS(COSMO-DE) - ETS(COSMO-EU) Significance test Germany COSMO-DE better than COSMO-EU COSMO-DE worse than COSMO-EU

Differences between COSMO-DE and COSMO-EU ETS(COSMO-DE) - ETS(COSMO-EU) Significance test COSMO-DE better than COSMO-EU COSMO-DE worse than COSMO-EU

One of open questions: Surprisingly bad forecasts for temperature on some days during winter

Weather situation from 12.07.2010 06 UTC

Verificationmeteogram from 12.07.2010 00 UTC

Verificationmeteogram from 01.07.2010 00 UTC

Verificationmeteogram vom 02.07.2010 00 UTC

Verificationmeteogram vom 03.07.2010 00 UTC