Ensemble Consistency Testing for CESM: A new form of Quality Assurance

Similar documents
Towards characterizing the variability of statistically consistent Community Earth System Model simulations

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Attributing climate variations: Part of an information system for the future. Kevin E Trenberth NCAR

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

Projections of 21st century Arctic sea ice loss. Alexandra Jahn University of Colorado Boulder

On the Paths to Exascale: Will We be Hungry?

A new century-long high-resolution global climate run using the Community Earth System Model.

SPECIAL PROJECT PROGRESS REPORT

A HIERARCHICAL MODEL FOR REGRESSION-BASED CLIMATE CHANGE DETECTION AND ATTRIBUTION

Status of CESM data assimilation using the DART system

Scalable and Power-Efficient Data Mining Kernels

Sea Ice Data Assimilation in the Arctic via DART/CICE5 in the CESM1

Possible Applications of Deep Neural Networks in Climate and Weather. David M. Hall Assistant Research Professor Dept. Computer Science, CU Boulder

Sanjeev Kumar Jha Assistant Professor Earth and Environmental Sciences Indian Institute of Science Education and Research Bhopal

Sea ice thickness. Ed Blanchard-Wrigglesworth University of Washington

RASM Summary Achievements and Challenges May - November 2012

WP 4 Testing Arctic sea ice predictability in NorESM

Some remarks on climate modeling

Data assimilation with sea ice models

Susan Bates Ocean Model Working Group Science Liaison

The Interaction between Climate Forcing and Feedbacks From the global scale to the process level

Convergence of Eigenspaces in Kernel Principal Component Analysis

One Optimized I/O Configuration per HPC Application

J. Fasullo, R.S. Nerem, and B. Hamlington NCAR / University of Colorado. Fasullo: Is Detection of Accelerated Sea Level Rise Imminent?

NCAR Unified Community Atmosphere Modeling Roadmap

Using DART Tools for CAM Development

Systematic stochastic modeling. of atmospheric variability. Christian Franzke

Ice sheet freshwater forcing

Changing predictability characteristics of Arctic sea ice in a warming climate

TECA, 13TB, 80,000 processors

Principal Component Analysis of Sea Surface Temperature via Singular Value Decomposition

What are the five components of a GIS? A typically GIS consists of five elements: - Hardware, Software, Data, People and Procedures (Work Flows)

How to Prepare Weather and Climate Models for Future HPC Hardware

CESM1.5 simulations since Mini-Breck

The coupled ocean atmosphere model at ECMWF: overview and technical challenges. Kristian S. Mogensen Marine Prediction Section ECMWF

Timescales of variability discussion

State of CESM. Jean-François Lamarque. CESM Chief Scientist NCAR. Breckenridge June Jean-François Lamarque.

Arctic System Reanalysis Provides Highresolution Accuracy for Arctic Studies

NCEP Applications -- HPC Performance and Strategies. Mark Iredell software team lead USDOC/NOAA/NWS/NCEP/EMC

The influence of fixing the Southern Ocean shortwave radiation model bias on global energy budgets and circulation patterns

Multi-scale Arctic prediction with MPAS-CESM

NCAS Unified Model Introduction. Part I: Overview of the UM system

Implementing a vector-based river routing scheme within the WRF-Hydro modeling system

Photo courtesy National Geographic

Bavarian Riots, 1819

LATE REQUEST FOR A SPECIAL PROJECT

Blurring the boundary between dynamics and physics. Tim Palmer, Peter Düben, Hugh McNamara University of Oxford

The Chemical Kinetics Time Step a detailed lecture. Andrew Conley ACOM Division

WRF Historical and PGW Simulations over Alaska

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

Fig. F-1-1. Data finder on Gfdnavi. Left panel shows data tree. Right panel shows items in the selected folder. After Otsuka and Yoden (2010).

Progress and Challenges in Coupled Ice- Sheet/Climate Modeling with CESM

NCAR-Wyoming Supercomputing Center: Building Critical Earth-System Science Understanding

Early Student Support to Investigate the Role of Sea Ice-Albedo Feedback in Sea Ice Predictions

Watch for Week 8/9 Review Assessment

PMIP3/CMIP5 Last Millennium Model Simulation Effort

USE OF STATISTICAL BOOTSTRAPPING FOR SAMPLE SIZE DETERMINATION TO ESTIMATE LENGTH-FREQUENCY DISTRIBUTIONS FOR PACIFIC ALBACORE TUNA (THUNNUS ALALUNGA)

NOAA Supercomputing Directions and Challenges. Frank Indiviglio GFDL MRC Workshop June 1, 2017

Sample Homogeneity Testing

Thermal Flow Sensor Modeling Using Electronic Circuit Simulator

Assessing and Improving the Numerical Solution of Atmospheric Physics in an Earth System Model

Using a high-resolution ensemble modeling method to inform risk-based decision-making at Taylor Park Dam, Colorado

Sea Ice Update. Marika Holland and David Bailey. National Center for Atmospheric Research. CESM Workshop. University of Toronto

Results from CAM-SE AMIP and coupled simulations

El Niño: How it works, how we observe it. William Kessler and the TAO group NOAA / Pacific Marine Environmental Laboratory

Overview of Achievements October 2001 October 2003 Adrian Raftery, P.I. MURI Overview Presentation, 17 October 2003 c 2003 Adrian E.

WGNE Blue Book status and proposed changes

AMPS Update June 2017

How fast does Aquarius go around the Earth? Gary: It takes 96 minutes to make one orbit. David: I think it s 7 kilometers per second (Gary agrees.

Principal Components Analysis (PCA)

Advanced Quantitative Data Analysis

How well do we know the climatological characteristics of the North Atlantic jet stream? Isla Simpson, CAS, CDG, NCAR

Presented at WaPUG Spring Meeting 1 st May 2001

Simulating and Visualizing Hurricane-Ocean Interactions using High-Resolution CESM

Our Climate without Antarctica

GIS at UCAR. The evolution of NCAR s GIS Initiative. Olga Wilhelmi ESIG-NCAR Unidata Workshop 24 June, 2003

ICON-ESM MPI-M s next-generation Earth system model

NetCDF, NCAR s climate model data, and the IPCC. Gary Strand NCAR/NESL/CGD

The role of North Atlantic Ocean dynamics in simulating glacial inception: a study with CCSM4

Representing model error in the Met Office convection permitting ensemble prediction system

Tropical Intra-Seasonal Oscillations in the DEMETER Multi-Model System

Ji-Sun Kang. Pr. Eugenia Kalnay (Chair/Advisor) Pr. Ning Zeng (Co-Chair) Pr. Brian Hunt (Dean s representative) Pr. Kayo Ide Pr.

Interpre'ng Model Results

3.6 NCEP s Global Icing Ensemble Prediction and Evaluation

BIDIRECTIONAL COUPLING OF EARTH SYSTEM MODELS WITH SOCIAL SCIENCE MODELS: NEW MOTIVATIONS AND A PROPOSED PATH

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Climate Modeling in a Changed World

ww.padasalai.net

Impact of Solar and Sulfate Geoengineering on Surface Ozone

CS281 Section 4: Factor Analysis and PCA

Skillful climate forecasts using model-analogs

The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center.

Statistical analysis of regional climate models. Douglas Nychka, National Center for Atmospheric Research

Reduced Complexity Frameworks for Exploring Physics Dynamics Coupling Sensitivities

Downscaled Climate Change Projection for the Department of Energy s Savannah River Site

The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology

The Transport Matrix Method (TMM) (for fast, offline simulation of passive tracers in the ocean) Samar Khatiwala

Multi-instance CESM for Fully Coupled Data Assimilation Capability using DART

NMME Progress and Plans

A framework for detailed multiphase cloud modeling on HPC systems

Transcription:

Ensemble Consistency Testing for CESM: A new form of Quality Assurance Dorit Hammerling Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research (NCAR) Joint work with Allison Baker (Application Scalability and Performance Group/NCAR) and Daniel Milroy (Department of Computer Science/CU Boulder)... and many other contributors April 5, 2016 Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 1 / 34

Outline Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 2 / 34

Motivation Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 3 / 34

Motivation NCAR s Community Earth System Model a virtual laboratory to study past, present and future climate states describes interactions of the atmosphere, land, river runoff, land-ice, oceans and sea-ice complex! Large code base: approx. 1.5 Millions lines of code Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 4 / 34

Motivation Need for Software Quality Assurance Code ported to new environment (different institution) To ensure that changes during the CESM development life cycle do not adversely effect the results. Code modifications New machine architectures Compiler changes Exascale-computing technologies... Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 5 / 34

Motivation Bit-for-Bit reproducibility CESM is deterministic and results are bit-for-bit reproducible IF exact same code is run, with same parameter settings, same initial conditions, on same hardware architecture, using the same compiler, same implementation of MPI, same random number generator,... not the case in most applications Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 6 / 34

Motivation Bit-for-Bit reproducibility CESM is deterministic and results are bit-for-bit reproducible IF exact same code is run, with same parameter settings, same initial conditions, on same hardware architecture, using the same compiler, same implementation of MPI, same random number generator,... not the case in most applications Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 6 / 34

Motivation How to assess difference if not bit-for-bit? X : Original accepted data X new : New data Key question: If X X new is the code/implementation still correct? Does the new data still represent the same climate? Or is it climate-changing? Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 7 / 34

Motivation How to assess difference if not bit-for-bit? X : Original accepted data X new : New data Key question: If X X new is the code/implementation still correct? Does the new data still represent the same climate? Or is it climate-changing? Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 7 / 34

Motivation Evaluating the difference How to assess whether the difference between X and X new is climate-changing? Main problem: There is no clear definition of climate-changing. What was done in the past: Climate scientists compare 400-year simulations. computationally intensive time-consuming subjective... Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 8 / 34

Motivation Evaluating the difference How to assess whether the difference between X and X new is climate-changing? Main problem: There is no clear definition of climate-changing. What was done in the past: Climate scientists compare 400-year simulations. computationally intensive time-consuming subjective... Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 8 / 34

Motivation Evaluating the difference How to assess whether the difference between X and X new is climate-changing? Main problem: There is no clear definition of climate-changing. What was done in the past: Climate scientists compare 400-year simulations. computationally intensive time-consuming subjective... Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 8 / 34

Ensemble Consistency Testing Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 9 / 34

Ensemble Consistency Testing New approach: Ensemble Consistency Testing Leverage climate system s natural variability (or rather it s chaotic nature)! Evaluate new data in the context of an ensemble of CESM runs. Use accepted machine and accepted software stack One-year CESM simulations with O(10 14 ) perturbations in initial atmospheric temperature 1-deg atmosphere model (F-case): 134 variables Creates an accepted ensemble distribution that can be used to evaluate new runs. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 10 / 34

Ensemble Consistency Testing New approach: Ensemble Consistency Testing Leverage climate system s natural variability (or rather it s chaotic nature)! Evaluate new data in the context of an ensemble of CESM runs. Use accepted machine and accepted software stack One-year CESM simulations with O(10 14 ) perturbations in initial atmospheric temperature 1-deg atmosphere model (F-case): 134 variables Creates an accepted ensemble distribution that can be used to evaluate new runs. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 10 / 34

Ensemble Consistency Testing New approach: Ensemble Consistency Testing Leverage climate system s natural variability (or rather it s chaotic nature)! Evaluate new data in the context of an ensemble of CESM runs. Use accepted machine and accepted software stack One-year CESM simulations with O(10 14 ) perturbations in initial atmospheric temperature 1-deg atmosphere model (F-case): 134 variables Creates an accepted ensemble distribution that can be used to evaluate new runs. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 10 / 34

Ensemble Consistency Testing New approach: Ensemble Consistency Testing Leverage climate system s natural variability (or rather it s chaotic nature)! Evaluate new data in the context of an ensemble of CESM runs. Use accepted machine and accepted software stack One-year CESM simulations with O(10 14 ) perturbations in initial atmospheric temperature 1-deg atmosphere model (F-case): 134 variables Creates an accepted ensemble distribution that can be used to evaluate new runs. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 10 / 34

Ensemble Consistency Testing Ensemble features Variable dependencies: Many of the variables are highly correlated (>.9) Difficult to make pass/fail choices based on number of variables because of dependencies. Principal Component Analysis (PCA) Projection into orthogonal space Linear combinations of variables ( scores ) instead of individual variables are used as the ensemble distribution and evaluated Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 11 / 34

Ensemble Consistency Testing Ensemble features Variable dependencies: Many of the variables are highly correlated (>.9) Difficult to make pass/fail choices based on number of variables because of dependencies. Principal Component Analysis (PCA) Projection into orthogonal space Linear combinations of variables ( scores ) instead of individual variables are used as the ensemble distribution and evaluated Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 11 / 34

Ensemble Consistency Testing Ensemble features Variable dependencies: Many of the variables are highly correlated (>.9) Difficult to make pass/fail choices based on number of variables because of dependencies. Principal Component Analysis (PCA) Projection into orthogonal space Linear combinations of variables ( scores ) instead of individual variables are used as the ensemble distribution and evaluated Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 11 / 34

Ensemble Consistency Testing PCA-based Ensemble Consistency Testing setup Step 1: Create ensemble Step 2: Standardize variables and determine transformation matrix ( loadings ) Step 3: Determine distribution of scores Step 4: Create new runs, apply transformation and determine new scores Step 5: Compare new scores to ensemble scores and determine pass or fail Main idea: compare scores from new runs to scores from ensemble Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 12 / 34

Ensemble Consistency Testing PCA-based Ensemble Consistency Testing setup Step 1: Create ensemble Step 2: Standardize variables and determine transformation matrix ( loadings ) Step 3: Determine distribution of scores Step 4: Create new runs, apply transformation and determine new scores Step 5: Compare new scores to ensemble scores and determine pass or fail Main idea: compare scores from new runs to scores from ensemble Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 12 / 34

Ensemble Consistency Testing Manuscript and Code available https://github.com/ncar-cisl-asap/pycect/releases Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 13 / 34

Ensemble Consistency Testing Schematic of Ensemble Testing framework Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 14 / 34

Ensemble Consistency Testing Highlights from the manuscript Defined testing framework to obtain a 0.5% false-positive rate (conservative!) Testing of purposefully constructed cases that should fail Testing of other machines Most passed, but some failed mira failed persistently Turned out to be an issue with FMA, but led us to scrutinize the approach and ensemble composition Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 15 / 34

Ensemble Consistency Testing Highlights from the manuscript Defined testing framework to obtain a 0.5% false-positive rate (conservative!) Testing of purposefully constructed cases that should fail Testing of other machines Most passed, but some failed mira failed persistently Turned out to be an issue with FMA, but led us to scrutinize the approach and ensemble composition Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 15 / 34

Ensemble Consistency Testing Highlights from the manuscript Defined testing framework to obtain a 0.5% false-positive rate (conservative!) Testing of purposefully constructed cases that should fail Testing of other machines Most passed, but some failed mira failed persistently Turned out to be an issue with FMA, but led us to scrutinize the approach and ensemble composition Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 15 / 34

Ensemble Consistency Testing Highlights from the manuscript Defined testing framework to obtain a 0.5% false-positive rate (conservative!) Testing of purposefully constructed cases that should fail Testing of other machines Most passed, but some failed mira failed persistently Turned out to be an issue with FMA, but led us to scrutinize the approach and ensemble composition Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 15 / 34

Ensemble Consistency Testing Highlights from the manuscript Defined testing framework to obtain a 0.5% false-positive rate (conservative!) Testing of purposefully constructed cases that should fail Testing of other machines Most passed, but some failed mira failed persistently Turned out to be an issue with FMA, but led us to scrutinize the approach and ensemble composition Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 15 / 34

Recent developments Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 16 / 34

Recent developments How well does CAM-ECT truly work? Modifications expected to be climate-changing fail, but is what *should* pass actually passing corresponding to our specified false positive rate? Properties of the test rely heavily on the accepted ensemble composition. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 17 / 34

Recent developments How well does CAM-ECT truly work? Modifications expected to be climate-changing fail, but is what *should* pass actually passing corresponding to our specified false positive rate? Properties of the test rely heavily on the accepted ensemble composition. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 17 / 34

Recent developments How well does CAM-ECT truly work? Modifications expected to be climate-changing fail, but is what *should* pass actually passing corresponding to our specified false positive rate? Properties of the test rely heavily on the accepted ensemble composition. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 17 / 34

Recent developments Revisiting the ensemble composition Is using only initial condition perturbations the right approach? How well do initial condition perturbations capture legitimate differences? How do runs from different compilers compare to initial perturbation runs? How large of an ensemble do we need to capture legitimate changes? We created very large ensembles using a combination of initial conditions and different compilers. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 18 / 34

Recent developments Revisiting the ensemble composition Is using only initial condition perturbations the right approach? How well do initial condition perturbations capture legitimate differences? How do runs from different compilers compare to initial perturbation runs? How large of an ensemble do we need to capture legitimate changes? We created very large ensembles using a combination of initial conditions and different compilers. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 18 / 34

Recent developments Revisiting the ensemble composition Is using only initial condition perturbations the right approach? How well do initial condition perturbations capture legitimate differences? How do runs from different compilers compare to initial perturbation runs? How large of an ensemble do we need to capture legitimate changes? We created very large ensembles using a combination of initial conditions and different compilers. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 18 / 34

Recent developments Revisiting the ensemble composition Is using only initial condition perturbations the right approach? How well do initial condition perturbations capture legitimate differences? How do runs from different compilers compare to initial perturbation runs? How large of an ensemble do we need to capture legitimate changes? We created very large ensembles using a combination of initial conditions and different compilers. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 18 / 34

Recent developments Revisiting the ensemble composition Is using only initial condition perturbations the right approach? How well do initial condition perturbations capture legitimate differences? How do runs from different compilers compare to initial perturbation runs? How large of an ensemble do we need to capture legitimate changes? We created very large ensembles using a combination of initial conditions and different compilers. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 18 / 34

Recent developments Two-way ANOVA Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 19 / 34

Recent developments Two-way ANOVA... in more detail Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 20 / 34

Recent developments How large of an ensemble? Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 21 / 34

Recent developments Purposeful-code-changes experiments Modification to preq_omega_ps subroutine of prim_si_mod.f90: Original code: omega_p(i,j,1)=vgrad_p(i,j,1)/p(i,j,1) omega_p(i,j,1)=omega_p(i,j,1)-0.5d0/p(i,j,1)*divdp(i,j,1) Modified code: omega_p(i,j,1)=(vgrad_p(i,j,1)-&0.5d0*divdp(i,j,1))/p(i,j,1) Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 22 / 34

Recent developments Purposeful-code-changes experiments Modification to preq_omega_ps subroutine of prim_si_mod.f90: Original code: omega_p(i,j,1)=vgrad_p(i,j,1)/p(i,j,1) omega_p(i,j,1)=omega_p(i,j,1)-0.5d0/p(i,j,1)*divdp(i,j,1) Modified code: omega_p(i,j,1)=(vgrad_p(i,j,1)-&0.5d0*divdp(i,j,1))/p(i,j,1) Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 23 / 34

Recent developments Purposeful code changes... indeed have purpose! Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 24 / 34

ECT for the ocean component Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 25 / 34

ECT for the ocean component POP-ECT: a version for the ocean component of CESM Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 26 / 34

ECT for the ocean component Ensemble spread in the ocean model component Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 27 / 34

ECT for the ocean component POP-ECT is conceptually very different Only a few variables (compared to over 100 for CAM) Variability is spatially heterogeneous, too much lost in using global mean (no sensitivity!) We evaluate each location as a function of the spatially-varying point-wise ensemble variability and look at their combined exceedances above a threshold. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 28 / 34

ECT for the ocean component POP-ECT is conceptually very different Only a few variables (compared to over 100 for CAM) Variability is spatially heterogeneous, too much lost in using global mean (no sensitivity!) We evaluate each location as a function of the spatially-varying point-wise ensemble variability and look at their combined exceedances above a threshold. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 28 / 34

ECT for the ocean component Empirical evaluation of ensemble size for POP-ECT Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 29 / 34

Ultra-fast version Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 30 / 34

Ultra-fast version How fast does the ensemble spread? Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 31 / 34

Ultra-fast version Ultra-fast Ensemble Consistency Testing We only use the first 9 time steps (corresponds to about 5 hours model time) Early results very promising: we catch errors and runs that should pass do Early results also indicate that we might need a large ensemble size Runs a very cheap And we don t need to store the ensemble long term, just the (very small!) summary file Idea is to use Ultra-fast ECT in combination with yearly ECT as a first (necessary, but non-sufficient) test Might be ideal for quick check after code changes. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 32 / 34

Ultra-fast version Ultra-fast Ensemble Consistency Testing We only use the first 9 time steps (corresponds to about 5 hours model time) Early results very promising: we catch errors and runs that should pass do Early results also indicate that we might need a large ensemble size Runs a very cheap And we don t need to store the ensemble long term, just the (very small!) summary file Idea is to use Ultra-fast ECT in combination with yearly ECT as a first (necessary, but non-sufficient) test Might be ideal for quick check after code changes. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 32 / 34

Ultra-fast version Ultra-fast Ensemble Consistency Testing We only use the first 9 time steps (corresponds to about 5 hours model time) Early results very promising: we catch errors and runs that should pass do Early results also indicate that we might need a large ensemble size Runs a very cheap And we don t need to store the ensemble long term, just the (very small!) summary file Idea is to use Ultra-fast ECT in combination with yearly ECT as a first (necessary, but non-sufficient) test Might be ideal for quick check after code changes. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 32 / 34

Ultra-fast version Ultra-fast Ensemble Consistency Testing We only use the first 9 time steps (corresponds to about 5 hours model time) Early results very promising: we catch errors and runs that should pass do Early results also indicate that we might need a large ensemble size Runs a very cheap And we don t need to store the ensemble long term, just the (very small!) summary file Idea is to use Ultra-fast ECT in combination with yearly ECT as a first (necessary, but non-sufficient) test Might be ideal for quick check after code changes. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 32 / 34

Ultra-fast version Ultra-fast Ensemble Consistency Testing We only use the first 9 time steps (corresponds to about 5 hours model time) Early results very promising: we catch errors and runs that should pass do Early results also indicate that we might need a large ensemble size Runs a very cheap And we don t need to store the ensemble long term, just the (very small!) summary file Idea is to use Ultra-fast ECT in combination with yearly ECT as a first (necessary, but non-sufficient) test Might be ideal for quick check after code changes. Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 32 / 34

Summary Outline 1 Motivation 2 Ensemble Consistency Testing 3 Recent developments 4 ECT for the ocean component 5 Ultra-fast version 6 Summary Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 33 / 34

Summary Summary Ensemble Consistency testing is a subjective and user-friendly way for quality assurance Estimating a high dimensional covariance matrix well demands a large ensemble Ultra-fast testing very promising so far Ocean testing demands a conceptually different approach than atmosphere Future work: Fine-grained testing: what is the source of the inconsistency More thorough study of ensemble composition and size Hammerling et al. (NCAR) Ensemble Consistency Testing April 5, 2016 34 / 34