Spatial Forecast Verification Methods

Similar documents
Verification Methods for High Resolution Model Forecasts

Spatial verification of NWP model fields. Beth Ebert BMRC, Australia

Spatial forecast verification

Verification of ensemble and probability forecasts

Current verification practices with a particular focus on dust

Verification of nowcasts and short-range forecasts, including aviation weather

Overview of Verification Methods

Assessing high resolution forecasts using fuzzy verification methods

THE MODEL EVALUATION TOOLS (MET): COMMUNITY TOOLS FOR FORECAST EVALUATION

Categorical Verification

A Gaussian Mixture Model Approach to

Intercomparison of spatial forecast verification methods: A review and new project launch

Focus on Spatial Verification Filtering techniques. Flora Gofa

10A.1 The Model Evaluation Tool

Spatial Forecast Verification and the Mesoscale Verification Intercomparison over Complex Terrain

Upscaled and fuzzy probabilistic forecasts: verification results

Comparison of the NCEP and DTC Verification Software Packages

9.4. Jennifer L. Mahoney NOAA Research-Forecast Systems Laboratory, Boulder, Colorado

Analyzing the Image Warp Forecast Verification Method on Precipitation Fields from the ICP

Three Spatial Verification Techniques: Cluster Analysis, Variogram, and Optical Flow

Basic Verification Concepts

WRF Developmental Testbed Center (DTC) Visitor Program Summary on visit achievements by Barbara Casati

Feature-specific verification of ensemble forecasts

A comparison of QPF from WRF simulations with operational NAM and GFS output using multiple verification methods

Object-oriented verification of WRF forecasts from 2005 SPC/NSSL Spring Program

Update on CoSPA Storm Forecasts

Using time-lag ensemble techniques to assess behaviour of high-resolution precipitation forecasts

Model verification and tools. C. Zingerle ZAMG

Verification of wind forecasts of ramping events

Forecast Verification Research - state of the art and future challenges

Radius of reliability: A distance metric for interpreting and verifying spatial probability forecasts

Wind verification with DIST method: preliminary results. Maria Stefania Tesini Offenbach - 6 September 2016 COSMO General Meeting

Verification of Probability Forecasts

Improvement and Ensemble Strategy of Heavy-Rainfall Quantitative Precipitation Forecasts using a Cloud-Resolving Model in Taiwan

Assimilation of cloud/precipitation data at regional scales

P r o c e. d i n g s. 1st Workshop. Madrid, Spain September 2002

Identifying skilful spatial scales using the Fractions Skill Score

Spatial Verification. for Ensemble. at DWD

COMPOSITE-BASED VERIFICATION OF PRECIPITATION FORECASTS FROM A MESOSCALE MODEL

Verifying Ensemble Forecasts Using A Neighborhood Approach

2012 and changes to the Rapid Refresh and HRRR weather forecast models

Sensitivity of COSMO-LEPS forecast skill to the verification network: application to MesoVICT cases Andrea Montani, C. Marsigli, T.

Verification and performance measures of Meteorological Services to Air Traffic Management (MSTA)

Probabilistic Quantitative Precipitation Forecasts for Tropical Cyclone Rainfall

ICAM conference 6 June 2013 Kranjska Gora (SLO) Objective forecast verification of WRF compared to ALARO and the derived INCA-FVG outputs

Comparison of Convection-permitting and Convection-parameterizing Ensembles

Developmental Testbed Center Visitor Program Final Report: Development and Application of 3-Dimensional Object Algorithms to High Resolution Forecasts

Basic Verification Concepts

Cb-LIKE: thunderstorm forecasts up to 6 hrs with fuzzy logic

Ensemble Verification Metrics

A new intensity-scale approach for the verification of spatial precipitation forecasts

Verification Marion Mittermaier

Percentile-based Neighborhood Precipitation Verification and Its Application to a Landfalling Tropical Storm Case with Radar Data Assimilation

QPF Verification in Small River Catchments with the SAL Measure

EVALUATION OF SATELLITE-DERIVED HIGH RESOLUTION RAINFALL ESTIMATES OVER EASTERN SÃO PAULO AND PARANÁ,, BRAZIL

Venugopal, Basu, and Foufoula-Georgiou, 2005: New metric for comparing precipitation patterns

Performance of a Probabilistic Cloud-to-Ground Lightning Prediction Algorithm

Precipitation verification. Thanks to CMC, CPTEC, DWD, ECMWF, JMA, MF, NCEP, NRL, RHMC, UKMO

CONTRIBUTION OF ENSEMBLE FORECASTING APPROACHES TO FLASH FLOOD NOWCASTING AT GAUGED AND UNGAUGED CATCHMENTS

Verification as Diagnosis: Performance Diagrams for Numerical Forecasts of Severe Precipitation in California during the HMT Winter Exercises

Severe storm forecast guidance based on explicit identification of convective phenomena in WRF-model forecasts

Tyler Wixtrom Texas Tech University Unidata Users Workshop June 2018 Boulder, CO. 7/2/2018 workshop_slides slides

Xinhua Liu National Meteorological Center (NMC) of China Meteorological Administration (CMA)

Analysis and accuracy of the Weather Research and Forecasting Model (WRF) for Climate Change Prediction in Thailand

Benchmark study of Icing Forecasts Do they add value? Ben Martinez, Pierre Julien Trombe Vattenfall R&D

On evaluating the applicability of CRA over small verification domains

On the use of the intensity-scale verification technique to assess operational precipitation forecasts

Thunderstorm-Scale EnKF Analyses Verified with Dual-Polarization, Dual-Doppler Radar Data

Scale selective verification of rainfall accumulations from high resolution forecasts of convective events

The Impact of Horizontal Resolution and Ensemble Size on Probabilistic Forecasts of Precipitation by the ECMWF EPS

Operational convective scale NWP in the Met Office

Concepts of Forecast Verification FROST-14

Probabilistic verification

A Preliminary Exploration of the Upper Bound of Tropical Cyclone Intensification

VERIFICATION OF PROXY STORM REPORTS DERIVED FROM ENSEMBLE UPDRAFT HELICITY

Extracting probabilistic severe weather guidance from convection-allowing model forecasts. Ryan Sobash 4 December 2009 Convection/NWP Seminar Series

8.10 VERIFICATION OF NOWCASTS AND VERY SHORT RANGE FORECASTS. Elizabeth Ebert * Bureau of Meteorology Research Centre, Melbourne, Australia

Evaluating Forecast Quality

Towards a Definitive High- Resolution Climate Dataset for Ireland Promoting Climate Research in Ireland

Complimentary assessment of forecast performance with climatological approaches

Evaluation of Ensemble Icing Probability Forecasts in NCEP s SREF, VSREF and NARRE-TL Systems

Validation Report for Precipitation products from Cloud Physical Properties (PPh-PGE14: PCPh v1.0 & CRPh v1.0)

USE OF SPATIAL STATISTICS IN PRECIPITATION CLIMATOLOGY AND MODEL EVALUATION

Measuring the quality of updating high resolution time-lagged ensemble probability forecasts using spatial verification techniques.

Assessing the spatial and temporal variation in the skill of precipitation forecasts from an NWP model

Detective Miss Marple in Murder at the Gallop. 3rd verification workshop, ECMWF 2007, Göber

Forecast Verification Analysis of Rainfall for Southern Districts of Tamil Nadu, India

Performance of TANC (Taiwan Auto- Nowcaster) for 2014 Warm-Season Afternoon Thunderstorm

Spectral and morphing ensemble Kalman filters

J8.4 NOWCASTING OCEANIC CONVECTION FOR AVIATION USING RANDOM FOREST CLASSIFICATION

First tests of QPF verification using SAL at AEMET

Convection-Resolving NWP with WRF. Section coordinator Ming Xue University of Oklahoma

Global Flash Flood Forecasting from the ECMWF Ensemble

Evaluation of Satellite Precipitation Products over the Central of Vietnam

A COMPARISON OF VERY SHORT-TERM QPF S FOR SUMMER CONVECTION OVER COMPLEX TERRAIN AREAS, WITH THE NCAR/ATEC WRF AND MM5-BASED RTFDDA SYSTEMS

Assessment of Ensemble Forecasts

NHCM-1: Non-hydrostatic climate modelling Part I Defining and Detecting Added Value in Cloud-Resolving Climate Simulations

Merging Rain-Gauge and Radar Data

NEW SCHEME TO IMPROVE THE DETECTION OF RAINY CLOUDS IN PUERTO RICO

Shaping future approaches to evaluating highimpact weather forecasts

Transcription:

Spatial Forecast Verification Methods Barbara Brown Joint Numerical Testbed Program Research Applications Laboratory, NCAR 22 October 2014 Acknowledgements: Tara Jensen, Randy Bullock, Eric Gilleland, David Ahijevych, Beth Ebert, Marion Mittermaier, Mike Baldwin

Goal Describe new methods for evaluation of spatial fields Many methods have been developed in the context of high resolution precipitation forecasts, which may have application to aerosol predictions 2

Context Motivation: Traditional verification approaches don t Reflect observed capabilities of new (higher-resolution) modeling systems (generally with higher spatial resolution) Provide diagnostic information about model performance (i.e., what was wrong) History: Most new spatial methods developed over the last 10-15 years Initial methods published in early 1990s Earliest used method published in 2000 (Ebert and McBride) Beginning to be used operationally Initial development target: Mesoscale precipitation Other applications: Clouds and Reflectivity Jets and Low pressure systems (Mittermaier) Convective storm characteristics (Clark) Wind/RH Climate/climatology Limitations: Typically requires gridded forecasts and observations 3

Spatial fields Weather variables defined over spatial domains have coherent spatial structure and features WRF model Stage II radar 4

Spatial fields Weather variables defined over spatial domains have coherent spatial structure and features AOD Dust Sulfate Smoke 5

Matching two fields (forecasts and observations) Focus: Gridded fields Forecast grid Traditional grid to grid approach: Overlay forecast and observed grids Match each forecast and observation gridpoint Observed grid

Traditional spatial verification measures Contingency Table Observed yes no Misses Forecast yes hits false alarms no misses correct negatives False alarms Forecast Hits Observed Basic methods: 1. Create contingency table by thresholding forecast and observed values Compute traditional contingency table statistics: POD, FAR, Freq. Bias, CSI, GSS (= ETS) 2. Directly compute errors in predictions Compute measures for continuous variables: MSE, MAE, ME

OBSERVED Forecast #1: smooth FCST #1: smooth OBSERVED FCST #2: detailed From Baldwin 2002 8

Measures-oriented approach to evaluating these forecasts Verification Measure Forecast #1 (smooth) Forecast #2 (detailed) Mean absolute error 0.157 0.159 RMS error 0.254 0.309 Bias 0.98 0.98 CSI (>0.45) 0.214 0.161 GSS (>0.45) 0.170 0.102 From Baldwin 2002

What are the issues with the traditional approaches? Double penalty problem Scores may be insensitive to the size of the errors or the kind of errors Small errors can lead to very poor scores Forecasts are generally rewarded for being smooth Verification measures don t provide Information about kinds of errors (Placement? Intensity? Pattern?) Diagnostic information What went wrong? What went right? Does the forecast look realistic? How can I improve this forecast? How can I use it to make a decision? 10

Double penalty problem Traditional approach requires an exact match between forecasts and observations at every grid point to score a hit Double penalty: (1) Event predicted where it did not occur => False alarm (2) No event predicted where it did occur => Miss fcst obs fcst obs 10 10 3 10 Hi res forecast RMS ~ 4.7 POD=0, FAR=1 TS=0 Low res forecast RMS ~ 2.7 POD~1, FAR~0.7 TS~0.3 fcst obs 10 10 11

Summary: What are the issues with the traditional approaches? Double penalty problem Scores may be insensitive to the size of the errors or the kind of errors Small errors can lead to very poor scores Forecasts are generally rewarded for being smooth Verification measures don t provide Information about kinds of errors (Placement? Intensity? Pattern?) Diagnostic information What went wrong? What went right? Does the forecast look realistic? How can I improve this forecast? How can I use it to make a decision? 12

Traditional approach Consider gridded forecasts and observations of precipitation Which is better? OBS 2 3 1 4 5

Traditional approach Scores for Examples 1-4: Correlation Coefficient = -0.02 Probability of Detection = 0.00 False Alarm Ratio = 1.00 Hanssen-Kuipers = -0.03 Gilbert Skill Score (ETS) = -0.01 OBS 1 Scores for Example 5: Correlation Coefficient = 0.2 Probability of Detection = 0.88 False Alarm Ratio = 0.89 Hanssen-Kuipers = 0.69 Gilbert Skill Score (ETS) = 0.08 Forecast 5 is Best 4 2 3 5

Summary: What are the issues with the traditional approaches? Double penalty problem Scores may be insensitive to the size of the errors or the kind of errors Small errors can lead to very poor scores Forecasts are generally rewarded for being smooth Verification measures don t provide Information about kinds of errors (Placement? Intensity? Pattern?) Diagnostic information What went wrong? What went right? Does the forecast look realistic? How can I improve this forecast? How can I use it to make a decision? 15

Spatial Method Categories To address the issues described here, a variety of new methods have been developed

New spatial verification approaches Neighborhood Successive smoothing of forecasts/obs Gives credit to "close" forecasts Object- and featurebased Evaluate attributes of identifiable features Scale separation Measure scale-dependent error Field deformation Measure distortion and displacement (phase error) for whole field How should the forecast be adjusted to make the best match with the observed field? 17

Scale separation methods Goal: Examine performance as a function of spatial scale Example: Power spectra Does it look real? Harris et al. (2001): compare multi-scale statistics for model and radar data From Harris et al. 2001 18

Scale decomposition Wavelet component analysis Briggs and Levine, 1997 Casati et al., 2004 Examine how different scales contribute to traditional scores 19

Scale separation methods Intensity-scale approach (Casati et al. 2004) Discrete wavelet Estimate performance as a function of scale Multi-scale variability (Zapeda-Arce et al. 2000; Harris et al. 2001; Mittermaier 2006) Variogram (Marzban and Sandgathe 2009) MSE Skill Score 20

Neighborhood verification Goal: Examine forecast performance in a region; don t require exact matches Also called fuzzy verification Example: Upscaling Put observations and/or forecast on coarser grid Calculate traditional metrics Provide information about scales where the forecasts have skill

Neighborhood methods Examples : Distribution approach (Marsigli) Fractions Skill Score (Roberts 2005; Roberts and Lean 2008; Mittermaier and Roberts 2009) Multiple approaches (Ebert 2008, 2009) (e.g., Upscaling, Multi-event cont. table, Practically perfect) Atger, 2001 single threshold Ebert (2007; Met Applications) provides a review and synthesis of these approaches 22

Fractions skill score FSS 1 N N ( P P fcst obs i = 1 = 1 N N 1 2 1 Pfcst + N i = 1 N i = 1 ) P 2 2 obs observed forecast Fraction = 6/25 Fraction = 6/25 (Roberts 2005; Roberts and Lean 2007)

Field deformation Goal: Examine how much a forecast field needs to be transformed in order to match the observed field 24

Field deformation methods Example methods : Forecast Quality Index (Venugopal et al. 2005) Forecast Quality Measure/Displacement Amplitude Score (Keil and Craig 2007, 2009) Image Warping (Gilleland et al. 2009; Lindström et al. 2010; Engel 2009) From Keil and Craig 2008 25

Object/Feature-based Goals: 1. Identify relevant features in the forecast and observed fields 2. Compare attributes of the forecast and observed features MODE example 26

Object/Feature-based Example methods: Cluster analysis (Marzban and Sandgathe 2006a,b) Composite (Nachamkin 2005, 2009) Contiguous Rain Area (CRA) (Ebert and Gallus 2009) Procrustes (Micheas et al. 2007, Lack et al. 2009) SAL (Wernli et al. 2008, 2009) MODE (Davis et al. 2005,2009) CRA example (Ebert and Gallus) The CRA method measures displacement and estimates error due to displacement, pattern, and volume 27

MODE Method for Object-based Diagnostic Evaluation MODE Object identification Two parameters: 1. Convolution radius 2. Threshold Davis et al., MWR, 2006; 2009

MODE methodology Identification Smoothing threshold process Measure Attributes Merging Matching Comparison Summarize Fuzzy Logic Approach Compare forecast and observed attributes Merge single objects into composite objects Compute individual and total interest values Identify matched pairs Accumulate and examine comparisons across many cases

Threshold Example: MODE object definitions Spatial scale

MODE Example: Summer 2014 QPF experiment (STEP)

MODE Example: Summer 2014 QPF experiment (STEP)

Method intercomparison projects First International Intercomparison Project (2006-2011) Participants applied methods to a variety of cases: Geometric Real precipitation cases from US Midwest Modified real cases (e.g., known displacements) Summarized in several publications in Weather and Forecasting and Bulletin of the AMS (see reference list at http://www.rap.ucar.edu/projects/icp/index.html) Second International Intercomparison Project: MesoVICT ( Mesoscale Verification in Complex Terrain ) Focus: Precipitation and wind in complex terrain (Alps region) Ensemble forecasts and analyses Kick-off workshop Oct 2-3, 2014 in Vienna, Austria Web site: http://www.ral.ucar.edu/projects/icp/ 33

Method strengths and limitations: Filtering methods Strengths Accounts for Unpredictable scales Uncertainty in observations Simple ready-to-go Evaluates different aspects of a forecast (e.g., texture) Provides information about scale-dependent skill Limitations Does not clearly isolate specific errors (e.g., displacement, amplitude, structure) 34

Method strengths and limitations: Displacement methods Strengths Features-based Gives credit for close forecast Measures displacement, structure Provides diagnostic information Field-deformation Gives credit for a close forecast Can be combined with a field comparison significance test Limitations May have somewhat arbitrary matching criteria Often many parameters to be defined More research needed on diagnosing structure 35

What do the new methods measure? Method Scalespecific errors Scales of useful skill Structure errors Location errors Intensity errors Hits, misses, false alarms, correct negatives Traditional No No No No Yes Yes Neighborhood Yes Yes No Sensitive, but no direct information Yes Yes Filter Scale Separation Yes Yes No Sensitive, but no direct information Yes Yes Displacement Field deformation No No No Yes Yes No Features-based Indirectly No Yes Yes Yes Yes, based on features rather than gridpoints

Back to the earlier example What can the new methods tell us? Example: MODE Interest measures overall ability of forecasts to match obs Interest values provide more intuitive estimates of performance than the traditional measure (ETS) Warning: Even for spatial methods, Single measures don t tell the whole story 5 4 1 3 2

Application to other fields Methods have been commonly applied to precipitation and reflectivity New applications Wind Cloud analysis Vertical cloud profile Satellite estimates of precipitation Tropical cyclone structure Ensemble Time dimension can also be included (see Fowler presentation) 38

Cloud-Sat Object-based Comparison: Along Track CPR reflectivity RUC reflectivity 39

Satellite precipitation estimates TRMM PERSIANN Skok et al. (2010) Object counts All Small Med Large 40

Conclusion New spatial methods provide great opportunities for more meaningful evaluation of spatial fields Feed back into forecast or product development Measure aspects of importance to users Each method is useful for particular types of situations and for answering particular types of questions Methods are useful for a wide variety of types of fields For more information (and references), see http://www.rap.ucar.edu/projects/icp/index.html 41

Method availability Neighborhood, Intensity-Scale, and MODE methods are available as part of the Model Evaluation Tools (MET) Available at http://www.dtcenter.org/met/users/ Implemented and supported by the Developmental Testbed Center and staff at NCAR/RAL/JNTP Software for other methods may be available on the ICP web page http://www.ral.ucar.edu/projects/icp/index.html or directly from the developer 42