EnKF Localization Techniques and Balance

EnKF Localization Techniques and Balance Steven Greybush Eugenia Kalnay, Kayo Ide, Takemasa Miyoshi, and Brian Hunt Weather Chaos Meeting September 21, 2009

Data Assimilation Equation Scalar form: x a = x b + w*(y o h(x b )), w = σ b 2 / (σ b 2 + σ o2 ) The analysis is equal to the background plus a weighted sum of observation increments. Matrix form: x a = x b + K(y o H(x b )), K = BH T (HBH T + R) -1 B = background error covariance matrix R = observation error covariance matrix Background error covariance in observation space. Background error covariance between model variables and observed variables. Ensemble Perturbation matrix. Number of ensemble members.

Motivation for Localization Distance-dependent assumption (reasoning empirically): (typically large) covariances between nearby locations are physically valid whereas (typically small) covariances between locations that are far away are more noise than signal, and thus spurious. (Hamill et al., 2001)

Covariance Localization A modification of the covariance matrices in the Kalman gain formula that reduces the influence of distant regions. (Houtekamer and Mitchell, 2001) Removes spurious long distance correlations due to sampling error of the model covariance from finite ensemble size. (Anderson, 2007) Takes advantage of the atmosphere s lower dimensionality in local regions. (Hunt et. al., 2007) Ultimately creates a more accurate analysis (reduces RMSE) it is a practical necessity.

The Notion of Balance An atmospheric state that approximately follows physical balance equations appropriate to the scale and location Forecast will not have spurious time oscillations. Example: geostrophic balance between wind and mass (temperature / height) field

Geostrophic Adjustment Temperature (Mass Field) T = 0 hours Wind Magnitude (Wind Field)

Geostrophic Adjustment Temperature (Mass Field) T = 2 hours Wind Magnitude (Wind Field)

Geostrophic Adjustment Temperature (Mass Field) T = 4 hours Wind Magnitude (Wind Field)

Geostrophic Adjustment Temperature (Mass Field) T = 6 hours Wind Magnitude (Wind Field)

Geostrophic Adjustment Temperature (Mass Field) T = 8 hours Wind Magnitude (Wind Field)

Gravity Waves

Coping with Imbalance Richardson s failed forecast Simplified models Initialization step Penalty methods

Balance vs. Accuracy Observations are noisy, and hence unbalanced. Therefore an analysis that fits the observations too closely (accurate) is not balanced. Additionally, data assimilation techniques can introduce imbalance.

Covariance Localization Accomplished by taking a Schur product between the model covariance matrix and a matrix whose elements are dependent upon the distance between the corresponding grid points: (Hamill et al., 2001) B loc = B * exp(-(r i -r j ) 2 / 2L 2 ) Localization Distance L

Problems with Localization Lorenc (2003) and Kepert (2006) argue that localization reduces the balance information encoded in the model covariance matrix. Houtekamer and Mitchell (2005) noted balance problems when applying a localized EnKF to the Canadian GCM. Imbalanced analyses project information onto inertial-gravity waves, which are filtered out (geostrophic adjustment, digital filtering, etc.), resulting in a loss of information and a suboptimal analysis.

Localization Methods B Localization - Model grid points that are far apart have zero error covariance. R Localization B loc = B * exp(-(r i -r j ) 2 / 2L 2 ) - Observations that are far away from a grid point have infinite error covariance. R loc = R * exp(+(d) 2 / 2L 2 ) R localization can be used with LETKF. (Hunt 2005, Miyoshi 2005)

Research Questions How does localization introduce imbalance into an analysis? Can it be avoided? How do the analyses produced by B- localization and R-localization EnKF compare in terms of accuracy (RMSE) and (geostrophic) balance?

Part I: Simple Model The shallow water equations in a rotating, inviscid fluid: The geostrophic balance is thus: Consider variation only along the x-axis. The variables of interest are thus h and v. Linearize the equations, and apply a harmonic form to the solution: Substituting into the governing equations, and assuming geostrophic balance, yields the following solutions for h and v:

Experimental Design

Analysis Increments Circles are observations. L=500 km

Analysis Error Circles are observations. L=500 km Here, RMS Error: B Localization ~ R Localization < No Localization

Analysis Imbalance Black circles are observation locations. L=500 km Imbalance: B Localization >> R Localization > No Localization

Covariance Localization; Varying the Localization Distance L RMS Error RMS Imbalance Localization Distance L (km) Localization Distance L (km) Wavelength (W) = 2000 km Distance between obs (D) = 250 km, Number of Ensemble Members (p) = 5 Results taken as mean over 100 random simulations. Use LETKF for R-localization to avoid undesired statistical properties (asymmetric B-matrix). Results are very similar to EnKF, so the comparison is fair.

Simple Model Conclusions Both types of localization do introduce imbalance into analysis increments, especially for short localization distances. R localization is more balanced than B localization for same L, but is slightly less accurate. The two methods have differing optimal localization length scales (L).

Why Does Localization Produce Imbalance? Example: Apply a Gaussian localization function to an h and v waveform based upon the distance from the origin: Original (solid) and localized (dashed) Waveforms Imbalance of original and localized Waveforms Analogy: Assimilate height observation at origin. Waveforms are proportional to analysis increments. Example considers modification of K, irrespective of modification of B or R. L=250 km Example adapted from Lorenc 2003.

Why do the optimum length scales differ? Two grid points, observation at grid point 1. = K for grid point 2: B localization K 2 = f Bloc (d 12 )B 12 (B 11 + R 1 ) -1 R localization K 2 = B 12 (B 11 + f Rloc (d 12 )R 1 ) -1 = f Bloc (d 12 )B 12 (f Bloc (d 12 )B 11 + R 1 ) -1 f Bloc = exp(-(d ij ) 2 / 2L 2 ) f Rloc = exp(+(d ij ) 2 / 2L 2 )

Measuring Balance in Full Model Background can no longer be considered to be balanced. Natural imbalance vs. imbalance induced by data assimilation Methods: Magnitude of the Ageostrophic Wind 2 nd Derivative of Surface Pressure Difference between original analysis and initialized analysis (with digital filter)

SPEEDY Model Simplified Parametrizations, primitive-equation DYnamics (SPEEDY) Atmospheric Global Circulation Model seven vertical levels using the sigma coordinate system horizontal spectral resolution of T30, which corresponds to a standard 96x48 Gaussian grid Leapfrog time step There are five dynamical variables included in the output: zonal wind (u), meridional wind (v), temperature (T), specific humidity (q), and surface pressure (ps).

SPEEDY Evaluation Metrics Compare EnSRF B-localization vs. LETKF R-localization for accuracy and balance at level 4 (~500 hpa) mid-latitudes in both hemispheres. Observing system: (rawinsonde distribution located on grid points) Experiment Length: 2 months (Feb. and Mar.)

SPEEDY Results: Southern Hemisphere Dark Solid Line = Nature Run; Dotted Line = Free Run

SPEEDY Results: Northern Hemisphere Dark Solid Line = Nature Run; Dotted Line = Free Run

SPEEDY Results Results averaged between Feb 20 and Mar 20, which is after spin-up.

SPEEDY Conclusions Error is greater in the Southern Hemisphere (less observations) than in the Northern Hemisphere. Imbalance is greater in the Northern Hemisphere (presence of Tibetan plateau; seasonal dependence). Optimal localization scale for LETKF R-localization is shorter (~300-500 km) than for EnKF B-localization (~500-750 km). This agrees with previous results with simple model. Both localization methods introduce similar imbalance when the optimal length scale of each technique is considered. Additional balance metrics must be evaluated.

Advanced Localization Methods Adaptive Localization (do not require distancedependent assumption) Anderson hierarchical filter Bishop and Hodyss raising correlations to a power Miyoshi Variable Transformations Kepert transform variables to streamfunction and velocity potential Variable Localization Kang et al.

B Localization vs. K Localization B cannot be expressed explicitly for high-dimensional systems. Therefore, BH T and HBH T are determined directly from the ensemble. Y= HX Kalman Gain: K = BH T (HBH T + R) -1 In K-localization, elements of K are localized based upon the distance between the observation and the model grid point. It is thus a hybrid of B- localization and R-localization, and is used in place of B-localization in practice for several variations of EnKF. K localization reduces to B localization for EnSRF if observations are located on grid points.

Discussion Questions How does one avoid / mitigate / cope with assimilation-induced imbalance? Will adaptive localization schemes improve accuracy and balance? Are they practical? Should an initialization step (digital filters) be used with assimilation? Are there other ways of encouraging balance within the LETKF?