The Impact of Horizontal Resolution and Ensemble Size on Probabilistic Forecasts of Precipitation by the ECMWF EPS S. L. Mullen Univ. of Arizona R. Buizza ECMWF University of Wisconsin Predictability Workshop, 14 March 2004
Why Precipitation? Directly impacts society, economy, public safety Most difficult sensible weather element to forecast accurately and skillfully Requires precise forecasts of circulation, moisture, stability and boundary forcing Inherently short predictability limits Model errors are VERY important Verifying observations and analyses are available, but compromised by LARGE uncertainty
Why Precipitation? Precipitation is related to other critical weather elements: 1) Temperature, Dew Point 2) Cloud Cover, Ceiling 3) Icing 4) Turbulence 5) Microburst Winds, Wind Shear 6) Lightning 7) Visibility and Dust
Fundamental Operational Issue Operational centers must make choices on the allocation of CPU resources CPU allocation will always be an issue What are the tradeoffs between model resolution and ensemble size for equivalent CPU time?
Resolution and Ensemble Size Experiments for Precipitation Two Sets of Experiments 1) Compare T159 and T255 truncations for 30 summer 1999 forecasts (Aug 99) 57 winter 1999/2000 forecasts (Dec 99 Feb 00) 2) Compare T159, T255 and T319 truncations for Day 5 forecasts with observed amounts 50 mm 8 forecasts from summer of 1998 8 forecasts from winters of 1997/98 & 1998/99
Verifying Data 24 h Precipitation Reports from NOAA River Forecast Centers (RFC) 7,000 to 8,000 Stations over Conterminous U.S. Map Station Data to 1.25 o x 1.25 o Grid Consistent with Resolution of T159 Model Verification on Uniform 1.25 o Lat-Lon Grid Interpolated to 1.25 o Lat-Lon Grid Also Verify at Rain Gauge Sites Verification Region: U.S. East of 105 o W
Verification Measures 24 Hour Thresholds: 1, 10, 20 & 50 mm Bias and RMSE Brier Skill Score (BSS) Ranked Probability Skill Score (RPSS) Verification of Rank-Outlier Ratio Relative Operating Characteristic (ROC) Cost-Lost Value Curves for Economic Model Subjective Evaluation of Synoptic Cases Results shown today
Brier Skill for 57 Winter Cases 1 mm on 1.25 o Model Grid Brier Skill 0.6 0.5 0.4 0.3 0.2 0.1 0.0-0.1 1 day T159 T255 1 2 3 4 5 6 7 8 9 10 T255 More Skillful than T159 to Day 9 Loss of Skill by Day 8 for T159 Day 9 for T255 1 Day Increase in Predictability at T255 Fcst Day Differences are significant at 5% level wherever error bars are separated on all figures Mullen and Buizza (2002)
Brier Skill for 57 Winter Cases 1 mm at Rain Gauges Brier Skill 0.6 0.5 0.4 0.3 0.2 0.1 0.0-0.1 3 days 1 2 3 4 5 6 7 8 9 10 Fcst Day Mullen and Buizza (2002) T159 T255 Lower Skill at Gauges Differences are Stat. Significant to Day8 Loss of Skill by Day 5 for T159 Day 8 for T255 3 Day Increase in Predictability at T255
Brier Skill for 57 Winter Cases 10 mm on 1.25 o Model Grid Brier Skill 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 Fcst Day T159 T255 Both Resolutions are Skillful at Day 10 Differences Not Significant at 5% Skill Lower for Higher Amounts (20 & 50mm), Differences Not Significant at 5% Mullen and Buizza (2002)
0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Rank Histograms for 30 Summer Days All Rain Gauges 1 1 5 5 9 9 13 13 17 17 21 21 Rank 25 25 29 29 T159 33 T255 33 37 37 41 41 45 45 49 49 1 4 7 10 1 4 7 10 Fcst Day Fcst Day Rank Histograms Wet Bias Under Dispersion - Strongest at Days 1-21 Decreases with Time T255 Resolution Has Significantly Fewer Outliers Than T159 Mullen and Buizza (2002)
Outlier Percentages, Winter-Summer All Rain Gauges Outlier Percentage Outlier Percentage 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.6 0.5 0.4 0.3 0.2 0.1 0.0 T159M51 T255M15 T255M51 Winter Expected Values = 2/(M+1) 1 2 3 4 Fcst 5 Day 6 7 8 9 10 Summer T159M51 T255M15 T255M51 Expected Values = 2/(M+1) 1 2 3 4 5 6 7 8 9 10 Comparable CPU Costs (51 versus 15 members) Percentages Above Expected Values for Perfect Ensemble Fewest Outliers at T255M51 Fewer Outliers at T159M51 Than T255M15 Fcst Day Mullen and Buizza (2002)
Potential Economic Value (V) V defined as improvement over climatological ME, with maximum possible value defined as hypothetical perfect forecast. ME(climate)-ME(forecast) V= ME(climate)-ME(perfect) V= min[ α,o]-far α(1-o)+hro(1- α)-o min[ α,o]-oα
Day 7 Value for 57 Winter Cases 20 mm Threshold, Gauges Value 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 51-T159 51-T255 15-T255 Always protect 0.001 0.010 0.100 1.000 Cost/Loss Comparable CPU costs in same color Never protect Comparable CPU Costs (51 versus 15 members) 51-T255 Value Exceeds 51-T159 for All C/L Resolution Benefit 51-T159 Value Exceeds 15-T255 for All C/L Ensemble Size Benefits Mullen and Buizza (2002)
T159-T255-T319 Reruns Examine day 4 and day 5 forecasts with large RMSE and observed amounts 50 mm 8 summer cases 4 mid-latitude cyclones 4 land-falling tropical cyclones 8 winter cases All mid-latitude cyclones T159 and T255 combined to make a 102-member Mixed Resolution EPS vs. T319 51-member EPS
T159-T255-T319 Reruns Combine T159 and T255 Ensembles to Create Mixed Resolution Ensembles Add T159M51+T255M51 EPS to Create Mixed Resolution EPS, 102 Members Add T159M26+T255M25 EPS to Create Mixed Resolution EPS, 51 Members
Day 5 Value for 16 Runs with > 50 mm 1 mm Threshold, 1.25 o Grid Value 0.6 0.5 0.4 0.3 0.2 0.1 0.0 51-T159 51-T255 51-T319 15-T255 15-T319 102-Mixed Always protect 0.001 0.010 0.100 1.000 Cost/Loss Comparable CPU costs in same color Never protect Little Separation by Ensemble Groups Benefit from Increased Ensemble Size or Finer Resolution at Light Thresholds is Miniscule Mullen and Buizza (2002)
Day 5 Value for 16 Runs with > 50 mm 50 mm Threshold, 1.25 o Grid Value 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Always protect 51-T159 51-T255 51-T319 15-T255 15-T319 102-Mixed 0.001 0.010 0.100 1.000 Cost/Loss Comparable CPU costs in same color Never protect Many Users Benefit from More from Larger Size 102-Mixed Ensemble Beats 51-T319 Ensemble for C/L Ratios < 0.03 at Lower CPU Cost 51-T159 Beats 15-T255 51-T255 Beats 15-T319 for Same CPU Cost Mullen and Buizza (2002)
Summary Impact of Finer Resolution on PQPF? -Improves Low Threshold Forecasts -Reduces Outliers -Improves Discrimination of Heavy Events All Evidence Points to Benefits from Finer Resolution. However
Summary Benefit of Ensemble Size on PQPF? -Helps Resolve Small Forecast Probabilities Associated with Heavy Precipitation Events (i.e. Rare in Terms of Climatic Frequency) -For Same CPU Cost Larger Ensemble Sizes at Lower Resolution can Beat Smaller Ensemble Sizes at Higher Resolution for Heavy Precipitation Events
Conclusion User Should Consider Examining the Tradeoff Between Ensemble Size and Model Resolution for All Weather Sensitivities of Interest Specific Weather Elements Spatial-Temporal Scales Cost-Loss Ratio
What s a Next Step? Examine the Impact of Statistical Post-Processing on Tradeoff Btw Model Resolution and Ensemble Size -Can smaller ensemble sizes, whether from same op center or poor man s ensemble, benefit users as much as current larger ensemble sizes after all competitors are post-processed?
Brier Skill Score (4 Warm Seasons, All Stations) 0.6 0.5 0.4 0.3 0.2 0.1 NET 1mm NET 10mm NET 25mm EPS 1mm EPS 10mm EPS 25mm Neural Net Calibration Skill Increases for 1, 10 and 25 mm but not 50 mm Increases Largest Beginning of Forecast 0.0 1 2 3 4 5 6 7 8 9 10 Fcst Day
Brier Score Decomposition Murphy (1973) BS = BS BS + BS where rel 1 I = [ ] 2 rel N i i i i= 1 1 I [ ] 2 res = N i i i= 1 BS = o[1 o] unc Skill Score res BS N f o BS N o o BSS = cli = BS cli unc BS BS BSres BS BS unc rel
Obs Prob 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 NET EPS Attributes Diagram Day 2 (4 Summers, All Stations) 10 mm 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fcst Prob 1 0.1 0.01 0.001 0.0001 0.00001 Typical Results Excellent Reliability Every Year-Season No High Probabilities (Pr < 85% at D+2) Forecasts not as Sharp Note Differences in Forecast Frequencies (On Logarithmic Scale)
Summer Brier Decomposition (4 Summers, All Stations) 0.03 0.02 0.01 0.00 10 mm REL NET REL EPS RES NET RES EPS UNC.088 1 2 3 4 5 Fcst Day Typical Results REL (Reliability) Increases Skill RES (Resolution) Slight Increase @ D+1 Calibration does not improve ability to discriminate events
Other Considerations? The Impact of Analysis Uncertainty on Verification Scores -What are the differences among NCEP precipitation analyses? -How does the use of difference analyses affected estimates of forecast quality?
Other Considerations? Thorough Verification Should Include Estimate of Observational or Analyses Uncertainty Inclusion can lead to markedly different - values for accuracy measures - conclusions Rainfall marked by LARGE uncertainty QPE differences can be comparable to spread at 24-48 h for QPF in localized regions!
RFC Analyses on Same Grid GAGE 14.3 km grid Blend 12.2 km grid Note large differences over CA, TX, FL
Uncertainty of Precipitation Analyses NCEP Precipitation Analyses Resolution Data source QC Interval Time (UTC) mask Gauge RFC8 1/8 th (14km) Radar+Gauge Yes 24 h 1200 Yes 7K-8K RFC4 4 km Gauge only No 24 h 1200 Yes 7K-8K Stage4 4 km Radar+Gauge Yes 6 h/24 h 0000&1200 No 3000 QC: Quality Control done at RFCs Accumulated 24-h precipitation for 1200 UTC 8 Nov-9 Nov, 2002 RFC8 RFC4 Stage4 Yuan et al. (2004, in progress)
Different Verifying Analyses 4 km grid for CONUS region NDJFM 2002-03 Black Blue same QPE on different 14 vs. 4 km grid Red Green different QPE on same ~4 km grid Yuan et al. (2004, in progress)
Open Issues for Ensemble QPF Heavy Rain Events Problematical -Predictability of convection, mesoscale forcing -Rarity of events, small samples of observed events -Uncertainty of verifying analysis -More Cases Required No Substitute for Large Sample