Categorical Verification

Similar documents
Verification of Continuous Forecasts

Verification of ensemble and probability forecasts

Verification Methods for High Resolution Model Forecasts

Spatial Forecast Verification Methods

Validation of Forecasts (Forecast Verification) Overview. Ian Jolliffe

THE MODEL EVALUATION TOOLS (MET): COMMUNITY TOOLS FOR FORECAST EVALUATION

Basic Verification Concepts

Evaluating Forecast Quality

Ensemble Verification Metrics

Verification of nowcasts and short-range forecasts, including aviation weather

Verification of Probability Forecasts

Spatial forecast verification

Model verification / validation A distributions-oriented approach

Basic Verification Concepts

Probabilistic seasonal forecast verification

Focus on Spatial Verification Filtering techniques. Flora Gofa

Verification of the operational NWP models at DWD - with special focus at COSMO-EU

Methods of forecast verification

Verifying Ensemble Forecasts Using A Neighborhood Approach

10A.1 The Model Evaluation Tool

TAF and TREND Verification

Forecast Verification Analysis of Rainfall for Southern Districts of Tamil Nadu, India

Evaluation of Ensemble Icing Probability Forecasts in NCEP s SREF, VSREF and NARRE-TL Systems

DROUGHT FORECASTING IN CROATIA USING STANDARDIZED PRECIPITATION INDEX (SPI)

WRF Developmental Testbed Center (DTC) Visitor Program Summary on visit achievements by Barbara Casati

Probabilistic verification

Severe storm forecast guidance based on explicit identification of convective phenomena in WRF-model forecasts

Verification as Diagnosis: Performance Diagrams for Numerical Forecasts of Severe Precipitation in California during the HMT Winter Exercises

9.4. Jennifer L. Mahoney NOAA Research-Forecast Systems Laboratory, Boulder, Colorado

Verification and performance measures of Meteorological Services to Air Traffic Management (MSTA)

Application and verification of ECMWF products 2015

Overview of Verification Methods

Active Learning with Support Vector Machines for Tornado Prediction

Performance of a Probabilistic Cloud-to-Ground Lightning Prediction Algorithm

Verification of the operational NWP models at DWD - with special focus at COSMO-EU. Ulrich Damrath

MET Grid-Stat Tool. John Halley Gotway. Copyright 2012, University Corporation for Atmospheric Research, all rights reserved

ICAM conference 6 June 2013 Kranjska Gora (SLO) Objective forecast verification of WRF compared to ALARO and the derived INCA-FVG outputs

System Validation. SEEFFG Operations Workshop. Theresa M. Modrick, PhD Hydrologic Engineer Hydrologic Research Center

Model verification and tools. C. Zingerle ZAMG

Improvement and Ensemble Strategy of Heavy-Rainfall Quantitative Precipitation Forecasts using a Cloud-Resolving Model in Taiwan

MET+ Tutorial. Goals. MET+ Team

Report on stay at ZAMG

Cb-LIKE: thunderstorm forecasts up to 6 hrs with fuzzy logic

Spatial verification of NWP model fields. Beth Ebert BMRC, Australia

P r o c e. d i n g s. 1st Workshop. Madrid, Spain September 2002

Current verification practices with a particular focus on dust

North Carolina State University Land Grant University founded in undergraduate fields of study 80 master's degree areas 51 doctoral degree

FORECASTING: A REVIEW OF STATUS AND CHALLENGES. Eric Grimit and Kristin Larson 3TIER, Inc. Pacific Northwest Weather Workshop March 5-6, 2010

ECMWF 10 th workshop on Meteorological Operational Systems

Application and verification of ECMWF products 2016

Ensemble forecast and verification of low level wind shear by the NCEP SREF system

Extracting probabilistic severe weather guidance from convection-allowing model forecasts. Ryan Sobash 4 December 2009 Convection/NWP Seminar Series

Verification of wind forecasts of ramping events

Australia. Fine Scale Hail Hazard Prediction using the WRF Model. IBM Research Australia

QUANTIFYING THE ECONOMIC VALUE OF WEATHER FORECASTS: REVIEW OF METHODS AND RESULTS

Proper Scores for Probability Forecasts Can Never Be Equitable

DEFINITION OF DRY THUNDERSTORMS FOR USE IN VERIFYING SPC FIRE WEATHER PRODUCTS. 2.1 Data

Downscaling of ECMWF seasonal integrations by RegCM

Towards a Definitive High- Resolution Climate Dataset for Ireland Promoting Climate Research in Ireland

A Preliminary Exploration of the Upper Bound of Tropical Cyclone Intensification

Drought forecasting methods Blaz Kurnik DESERT Action JRC

STATISTICAL MODELS and VERIFICATION

Detective Miss Marple in Murder at the Gallop. 3rd verification workshop, ECMWF 2007, Göber

Analysis and accuracy of the Weather Research and Forecasting Model (WRF) for Climate Change Prediction in Thailand

Towards Operational Probabilistic Precipitation Forecast

Wind verification with DIST method: preliminary results. Maria Stefania Tesini Offenbach - 6 September 2016 COSMO General Meeting

Global Flash Flood Forecasting from the ECMWF Ensemble

Denis Nikolayevich Botambekov. Master of Science In Hydrometeorology Kazakh National University 2006

Verification of ECMWF products at the Deutscher Wetterdienst (DWD)

Predicting uncertainty in forecasts of weather and climate (Also published as ECMWF Technical Memorandum No. 294)

VALIDATION RESULTS OF THE OPERATIONAL LSA-SAF SNOW COVER MAPPING

Assessing high resolution forecasts using fuzzy verification methods

Monthly probabilistic drought forecasting using the ECMWF Ensemble system

VERFICATION OF OCEAN WAVE ENSEMBLE FORECAST AT NCEP 1. Degui Cao, H.S. Chen and Hendrik Tolman

Comparison of the NCEP and DTC Verification Software Packages

Jaclyn Ashley Shafer. Bachelor of Science Meteorology Florida Institute of Technology 2006

Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting

Concepts of Forecast Verification FROST-14

On the use of the intensity-scale verification technique to assess operational precipitation forecasts

VALIDATION OF THE NOAA SPACE WEATHER PREDICTION CENTER S SOLAR FLARE FORECASTING LOOK-UP TABLES AND FORECASTER ISSUED PROBABILITIES (PREPRINT)

Denver International Airport MDSS Demonstration Verification Report for the Season

Precipitation type from the Thies disdrometer

Application of automated CB/TCU detection based on radar and satellite data

Upscaled and fuzzy probabilistic forecasts: verification results

LIST OF TABLES Table 1. Table 2. Table 3. Table 4.

Application and verification of ECMWF products in Croatia - July 2007

Verification of Weather Warnings

Precipitation verification. Thanks to CMC, CPTEC, DWD, ECMWF, JMA, MF, NCEP, NRL, RHMC, UKMO

EVALUATION AND VERIFICATION OF PUBLIC WEATHER SERVICES. Pablo Santos Meteorologist In Charge National Weather Service Miami, FL

Spatial Verification. for Ensemble. at DWD

Probabilistic Quantitative Precipitation Forecasts for Tropical Cyclone Rainfall

A Gaussian Mixture Model Approach to

3.6 NCEP s Global Icing Ensemble Prediction and Evaluation

Benchmark study of Icing Forecasts Do they add value? Ben Martinez, Pierre Julien Trombe Vattenfall R&D

Recommendations on trajectory selection in flight planning based on weather uncertainty

Application and verification of ECMWF products: 2010

Exploring ensemble forecast calibration issues using reforecast data sets

Performance of TANC (Taiwan Auto- Nowcaster) for 2014 Warm-Season Afternoon Thunderstorm

Recent advances in Tropical Cyclone prediction using ensembles

A skill score based on economic value for probability forecasts

Transcription:

Forecast M H F Observation Categorical Verification Tina Kalb Contributions from Tara Jensen, Matt Pocernich, Eric Gilleland, Tressa Fowler, Barbara Brown and others

Finley Tornado Data (1884) Forecast answering the question: Observation answering the question: Will there be a tornado? YES NO Did a tornado occur? YES NO Answers fall into 1 of 2 categories ù Forecasts and Obs are Binary

Finley Tornado Data (1884) Observed Forecast Yes No Total Yes 28 72 100 No 23 2680 2703 Total 51 2752 2803 Contingency Table Copyright 2015, University Corporation for Atmospheric Research, all rights reserved

A Success? Observed Forecast Yes No Total Yes 28 72 100 No 23 2680 2703 Total 51 2752 2803 Percent Correct = (28+2680)/2803 = 96.6%!!!! Copyright 2015, University Corporation for Atmospheric Research, all rights reserved

What if forecaster never forecasted a tornado? Observed Forecast Yes No Total Yes 0 0 0 No 51 2752 2803 Total 51 2752 2803 Percent Correct = (0+2752)/2803 = 98.2%!!!!

maybe Accuracy is not the most informative statistic But the contingency table concept is good

2 x 2 Contingency Table Forecast Observed Yes No Total False Forecast Yes Hit Alarm Yes No Miss Correct Negative Forecast No Total Obs. Yes Obs. No Total Example: Accuracy = (Hits+Correct Negs)/Total MET supports both 2x2 and NxN Contingency Tables

Common Notation (however not universal notation) Forecast Observed Yes No Total Yes a b a+b No c d c+d Total a+c b+d n Example: Accuracy = (a+d)/n

What if data are not binary? Examples: Temperature < 0 C Precipitation > 1 inch CAPE > 1000 J/kg Ozone > 20 µg/m³ Winds at 80 m > 24 m/s 500 mb HGTS < 5520 m Radar Reflectivity > 40 dbz MSLP < 990 hpa LCL < 1000 ft Cloud Droplet Concentration > 500/cc Hint: Pick a threshold that is meaningful to your end-user

Contingency Table for Freezing Temps (i.e. T<=0 C) Forecast Observed <= 0C > 0C Total <= 0C a b a+b > 0C c d c+d Total a+c b+d n Another Example: Base Rate (aka sample climatology) = (a+c)/n

Alternative Perspective on Contingency Table Correct Negatives False Alarms Misses Forecast = yes Observed = yes Hits

Conditioning to form a statistic Considers the probability of one event given another event Notation: p(x Y=1) is probability of X occuring given Y=1 or in other words Y=yes Conditioning on Fcst provides: Info about how your forecast is performing Apples-to-Oranges comparison if comparing stats from 2 models Conditioning on Obs provides: Info about ability of forecast to discriminate between event and nonevent - also called Conditional Probability or Likelihood Apples-to-Apples comparison if comparing stats from 2 models

Conditioning on forecasts Forecast = yes f=1 p(x f=1) Observed = yes x=1 p(x=1 f=1) = a / aub = a/(a+b) = Fraction of Hits p(x=0 f=1) = b / aub = b/(a+b) = False Alarm Ratio

Conditioning on observations Forecast = yes f=1 p(f x=1) Observed = yes x=1 p(f=1 x=1) = a / auc = a/(a+c) = Hit Rate p(f=0 x=1) = c / auc = c/(a+c) = Fraction of Misses

What s considered good? Conditioning on Forecast Fraction of hits - p(x=1 f=1) = a/(a+b) : close to 1 False Alarm Ratio - p(x=0 f=1) = b/(a+b) : close to 0 Conditioning on Observations Hit Rate - p(f=1 x=1) = a/(a+c): close to 1 [aka Probability of Detection Yes (PODy)] Fraction of misses p(f=0 x=1) = a/(a+c) : close to 0

Examples of Categorical Scores (most based on conditioning) PODy = a/(a+c) False Alarm Ratio (FAR) = b/(a+b) PODn = d/(b+d) = ( 1 POFD) False Alarm Rate (POFD) = b/(b+d) (Frequency) Bias (FBIAS) = (a+b)/(a+c) Threat Score or Critical Success Index = a/(a+b+c) b (CSI) POD Probability of Detection POFD Probability of False Detection a c d

Examples of CTC calculations Forecast Observed Yes No Total Yes 28 72 100 No 23 2680 2703 Total 51 2752 2803 Threat Score = 28 / (28 + 72+ 23) = 0.228 Probability of Detection = 28 / (28 + 23) = 0.55 False Alarm Ratio= 72/(28 + 72) = 0.720

Example

Relationships among scores CSI is a nonlinear function of POD and FAR CSI depends on base rate (event frequency) and Bias CSI = 1 1 1 + -1 POD 1- FAR CSI Bias POD = 1 - FAR Very different combinations of FAR and POD can lead to same CSI value

HMT Performance Diagram On same plot POD 1-FAR (aka Success Ratio) CSI Freq Bias Dots: Scores Aggregated Over Lead Time Colors: different thresholds Results: Decreasing skill with higher thresholds across multiple metrics Highest skill 18 24 h lead times 9km - Ensemble Mean 6h Precip >0.1 in. >0.5 in. >1.0 in. >2.0 in. 72 24 30 36 42 48 54 60 66 18 12 06 Success Ratio (1-FAR) Equal lines of CSI Best 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Equal lines of Freq. Bias Freq Bias Roberts et al. (2011), Roebber (WAF, 2009), Wilson (presentation, 2008)

Skill Scores How do you compare the skill of easy to predict events with difficult to predict events? Provides a single value to summarize performance. Reference forecast - best naive guess; persistence; climatology. Reference forecast must be comparable. Perfect forecast implies that the object can be perfectly observed.

Generic Skill Score SS= ( A- A ) ref ( A ) perf - Aref where A = any measure ref = reference perf = perfect Example: MSESS = 1- MSE MSE climo where MSE = Mean Square Error Interpreted as fractional improvement over reference forecast Reference could be: Climatology, Persistence, your baseline forecast, etc.. Climatology could be a separate forecast or a gridded forecast sample climatology SS typically positively oriented with 1 as optimal

Commonly Used Skill Scores Gilbert Skill Score - based on the CSI corrected for the number of hits expected by chance. Heidke Skill Score - based on Accuracy corrected by the number of hits expected by chance. Hanssen-Kuipers Discriminant (Pierce Skill Score) measures ability of forecast to discriminate between (or correctly classify) events and non-events. H-K=POD-POFD Brier Skill Score for probabilistic forecasts Fractional Skill Score for neighborhood methods Intensity-Scale Skill Score for wavelet methods

Example

Thank you! References: Jolliffe and Stephenson (2012): Forecast Verification: a practitioner s guide, Wiley & Sons, 240 pp. Wilks (2011): Statistical Methods in Atmospheric Science, Academic press, 467 pp. Stanski, Burrows, Wilson (1989) Survey of Common Verification Methods in Meteorology http://www.eumetcal.org.uk/eumetcal/verification/www/english/courses/msgcrs/index.htm WMO Verification working group forecast verification web page, http://www.cawcr.gov.au/projects/verification/