Revision of the ECMWF humidity analysis: Construction of a gaussian control variable

Similar documents
How does 4D-Var handle Nonlinearity and non-gaussianity?

Lecture notes on assimilation algorithms

Variational Data Assimilation Current Status

The ECMWF Hybrid 4D-Var and Ensemble of Data Assimilations

Introduction to Data Assimilation. Reima Eresmaa Finnish Meteorological Institute

GSI Tutorial Background and Observation Error Estimation and Tuning. 8/6/2013 Wan-Shu Wu 1

Autonomous Navigation for Flying Robots

Fundamentals of Data Assimila1on

GSI Tutorial Background and Observation Errors: Estimation and Tuning. Daryl Kleist NCEP/EMC June 2011 GSI Tutorial

Numerical Weather prediction at the European Centre for Medium-Range Weather Forecasts (2)

Assimilation Techniques (4): 4dVar April 2001

In the derivation of Optimal Interpolation, we found the optimal weight matrix W that minimizes the total analysis error variance.

Background Error Covariance Modelling

Ensemble forecasting and flow-dependent estimates of initial uncertainty. Martin Leutbecher

ACCOUNTING FOR THE SITUATION-DEPENDENCE OF THE AMV OBSERVATION ERROR IN THE ECMWF SYSTEM

Optimal Interpolation ( 5.4) We now generalize the least squares method to obtain the OI equations for vectors of observations and background fields.

Representation of inhomogeneous, non-separable covariances by sparse wavelet-transformed matrices

4. DATA ASSIMILATION FUNDAMENTALS

Diagnostics of linear and incremental approximations in 4D-Var

Parallel Algorithms for Four-Dimensional Variational Data Assimilation

Linear Regression (continued)

(Extended) Kalman Filter

APPENDIX 1 BASIC STATISTICS. Summarizing Data

Filtering of variances and correlations by local spatial averaging. Loïk Berre Météo-France

Evolution of Forecast Error Covariances in 4D-Var and ETKF methods

The Instability of Correlations: Measurement and the Implications for Market Risk

Variational ensemble DA at Météo-France Cliquez pour modifier le style des sous-titres du masque

Use of analysis ensembles in estimating flow-dependent background error variance

Weak Constraints 4D-Var

Ensemble Data Assimilation and Uncertainty Quantification

Direct assimilation of all-sky microwave radiances at ECMWF

The Australian Operational Daily Rain Gauge Analysis

Observation impact on data assimilation with dynamic background error formulation

The Choice of Variable for Atmospheric Moisture Analysis DEE AND DA SILVA. Liz Satterfield AOSC 615

Stochastic Processes for Physicists

Accounting for non-gaussian observation error

Computer Vision & Digital Image Processing

Assimilation of IASI reconstructed radiances from Principal Components in AROME model

Fundamentals of Data Assimila1on

Comparison of ensemble and NMC type of background error statistics for the ALADIN/HU model

Transformation of Probability Densities

Model error and parameter estimation

REPORT ABOUT ENVISAT SCIAMACHY NRT OZONE PRODUCT (SCI RV 2P) FOR JULY August 11, 2005

DART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation.

Gaussian Filtering Strategies for Nonlinear Systems

The University of Reading

Model-error estimation in 4D-Var

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Bastian Steder

Numerical Weather prediction at the European Centre for Medium-Range Weather Forecasts

Assimilation of MIPAS limb radiances at ECMWF using 1d and 2d radiative transfer models

All-sky observations: errors, biases, and Gaussianity

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Multivariate Correlations: Applying a Dynamic Constraint and Variable Localization in an Ensemble Context

Histogram Processing

Lesson 1: Inverses of Functions Lesson 2: Graphs of Polynomial Functions Lesson 3: 3-Dimensional Space

Data Assimilation Research Testbed Tutorial

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Why is the field of statistics still an active one?

Variational data assimilation

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Masahiro Kazumori, Takashi Kadowaki Numerical Prediction Division Japan Meteorological Agency

Data assimilation concepts and methods March 1999

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Observation error specifications

Configuration of All-sky Microwave Radiance Assimilation in the NCEP's GFS Data Assimilation System

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic.

All-sky observations: errors, biases, representativeness and gaussianity

Rosemary Munro*, Graeme Kelly, Michael Rohn* and Roger Saunders

Open Problems in Mixed Models

Statistical Methods in Particle Physics

Efficient implementation of inverse variance-covariance matrices in variational data assimilation systems

HANDBOOK OF APPLICABLE MATHEMATICS

Simple Examples. Let s look at a few simple examples of OI analysis.

Anisotropic spatial filter that is based on flow-dependent background error structures is implemented and tested.

EXPERIMENTAL ASSIMILATION OF SPACE-BORNE CLOUD RADAR AND LIDAR OBSERVATIONS AT ECMWF

How 4DVAR can benefit from or contribute to EnKF (a 4DVAR perspective)

Hypothesis testing:power, test statistic CMS:

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Towards a more physically based approach to Extreme Value Analysis in the climate system

Assimilation of cloud/precipitation data at regional scales

Applied Probability and Stochastic Processes

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

Multivariate Regression

Random processes and probability distributions. Phys 420/580 Lecture 20

Update on the KENDA project

Descriptive Statistics

8.1 Bifurcations of Equilibria

Learning Objectives for Stat 225

Estimation of state space models with skewed shocks

7. The Multivariate Normal Distribution

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Using Hankel structured low-rank approximation for sparse signal recovery

Introduction to Mobile Robotics Compact Course on Linear Algebra. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Prospects for radar and lidar cloud assimilation

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IMU-Laser Scanner Localization: Observability Analysis

Computationally, diagonal matrices are the easiest to work with. With this idea in mind, we introduce similarity:

E. Andersson, M. Fisher, E. Hólm, L. Isaksen, G. Radnóti, and Y. Trémolet. Research Department

Random Variate Generation

Transcription:

Revision of the ECMWF humidity analysis: Construction of a gaussian control variable Elías Valur Hólm ECMWF, Shinfield Park, Reading GR2 9AX, United Kingdom 1 Introduction In the ECMWF analysis specific humidity q is used as a control variable for the humidity. The background error covariances for humidity are determined partly statistically and partly empirically. The correlations are determined statistically from forecast differences. However, due to the spatial inhomogeneity of the humidity field, the global statistical variances are given by a local empiric function of relative humidity and temperature. As a part of a revision of the ECMWF humidity analysis, we have looked for a new control variable for humidity with simpler errors characteristics than q. The main aim has been to find a control variable which has close to gaussian error statistics, which are the only error statistics that can be modelled accurately within the least square formulation of the analysis system. Another limitation posed by the analysis system is that at present only uncorrelated control variables can be modelled accurately. To obtain an uncorrelated control variable for humidity, the total variable is divided into an unbalanced/uncorrelated part and a balanced part, with the unbalanced part used as control variable and the balanced part being given by a function of the other variables. The aim is thus to obtain a gaussian control variable uncorrelated with other control variables. Here we will only consider a control variable for total humidity in order to demonstrate the methodology. Later, when a humidity balance has been defined, statistically and/or analytically, exactly the same methodology can be applied to the unbalanced part of the humidity background error. In the following sections we will look at the probability distributions of humidity forecast differences and use these distributions to obtain a more gaussian control variable. 2 Analysis of humidity forecast differences By a gaussian control variable we mean that the background error of the variable, sampled over all gridpoints, has a gaussian distribution. We do not have the background errors available. However, we do have sets of forecast differences which are statistically related to the background errors. Consider two forecasts of the truth x, x b 1 and xb 2, where x b i x b b x ε b i (1) with b b the bias and ε b the stochastic error. The difference between forecasts is x b 1 x b 2 ε b 1 ε b 2 (2) We know that ε b 1 and εb 2 are independent stochastic variables with the same pdf Pb ε b. Then for δx ε b 1 ε b 2, P b C ε b 1 ε b 2 P b C δx P b ε b 1 P b δx ε b 1 dε b 1 (3) So the forecast error differences are a convolution of the background errors with themselves. If P b C δx is gaussian with variance σ 2, it can be shown that P b ε b is also a gaussian and with variance σ 2 2. Therefore 1

it is of particular interest to find a gaussian description of the forecast differences, since this directly translates into a gaussian description of the background errors. To study the error distribution of different humidity variables we can create histograms of the forecast differences. The forecast differences are between two forecasts valid at the same time, where each forecast comes from a different assimilation. Each assimilation uses the same set of observations with different perturbations added to each observation (consistent with the observation errors). To begin with, it is enough to consider one set of forecast differences (about 140000 gridpoints per level) to get a good idea about the statistical behaviour of the errors. From studying various candidates for the control variable, two results emerge: Forecast differences for q, log q and RH at a given model level (ca. 850 hpa here) show exponential like, rather than gaussian error distribution (Fig. 1, left). The same was found to be the case for several other humidity variables. For the particular level shown here, the q distribution deviates most from a gaussian. The forecast differences for a limited geographical region and/or similar values of the background fields are easier to approximate with a gaussian. For the same 850 hpa model level difference we have now looked at the distribution of differences in a 2.5% interval centered around the median of the background values of each variable (Fig. 1, right). Here q and log q now still have an exponential distribution, whereas RH is closer to a gaussian. The form of these distributions varies from level to level and for different values of the background. There are instances where for example RH is definitely not gaussian (close to RH 0 and RH 1), and where log q appears almost gaussian (close to the surface). 0.01 q log q RH Exponential q log q RH Exponential 5 4 3 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 Figure 1: The pdf for a single forecast difference at ca. 850 hpa for q, logq and RH. The left panel shows all differences, and the right panel shows differences for similar values of the background field (a 2.5% interval centered around the median of the background values of each variable). For comparison, a gaussian and an exponential pdf are shown. The right panel is noisy due to limited number of differences in each interval, but the result remains similar when more fields are added to the statistics. 3 Finding a gaussian control variable The explanation for the results in the previous section is that the total distribution we see at each level is an integral over different conditions, each with its characteristic errors. The forecast difference δϕ of a given humidity variable ϕ q T p can be studied as a function of the background, for example as a function of ϕ 2

HÓLM, E. V.: REVISION OF THE ECMWF HUMIDITY ANALYSIS itself. The conditional error distribution P δϕ ϕ varies with ϕ, and the total distribution is P δϕ P δϕ ϕ dϕ (4) This function is generally not gaussian. Even if P δϕ ϕ were gaussian for all values of ϕ (with standard deviation σ ϕ and bias b ϕ ), P δϕ would more probably be exponential due to the variation of σ and b with ϕ. A special case which would give gaussian P δϕ is if σ and b were constants. This gives a clue how to construct a gaussian control variable: Find a variable ϕ whose forecast difference δϕ follows a gaussian conditional error distribution P δϕ Φ as a function of some variable Φ. This means that the background conditional error P b ε b Φ will also be gaussian, with variance σ 2 2. In practice a close to gaussian distribution will be sufficient. Determine the bias and standard deviation of the forecast differences as a function of Φ. From technical point of view, we would like the bias b Φ to be negligible. The reason is that if the minimum of the costfunction J b δx is not at δx 0, then we will have nonzero analysis increments even if there are no observations. Normalize forecast differences by the bias and standard deviation, δϕ δϕ b Φ σ Φ Note that if we have managed to find δϕ following this procedure, then the forecast differences will be uniformly gaussian for all levels taken together, with zero bias and a standard deviation σ 1. δϕ For the analysis this implies a change to a control variable according to Eq. 5, with Φ chosen so that the bias is negligible and with the forecast difference standard deviation σ Φ replaced by σ Φ 2. (5) 4 Control variable for humidity 4.1 Linear transformation of relative humidity After experimenting with several formulations of the control variable, it was found that relative humidity RH had reasonably homogeneous statistics, but as shown in Fig. 1 using δrh as a control variable gives exponential error distribution. Including normalization δrh δrh b RH b σ RH b gives close to gaussian distributions for median values of RH b, but the distribution is asymmetric for extreme values of RH b. This is expected since P δrh RH b is skewed towards negative values for large RH b and positive values for small RH b (see Fig. 2, left). An additional problem with this choice of control variable is the non-negligible bias, which does not fit into the formulation of the analysis (Fig. 2, right). 4.2 Symmetrizing transformation of relative humidity From the study of forecast differences we can see that there is a way to avoid bias and asymmetry. If we have two forecasts RH a and RH b, then P RH a RH b RH a and P RH a RH b RH b are antisymmetric. This is easily checked by plotting the corresponding graphs (not shown). This antisymmetry can be explained by rewriting P RH a RH b RH b as P RH b RH a RH b and noting that since RH a and RH b follow the same distribution, they can change place in the calculation of the statistics, so that P RH a RH b RH b P RH a RH b RH a. From this follows that if we stratify the statistics of δrh we get the symmetric distribution P 1 RH a RH b RH a in Fig. 3. The control variable is thus δrh RH a RH b according to the average of the forecasts 2 RH b P 1 δrh RH b 2δRH. The result is shown σ δrh RH b 1 δrh, where we have been able to eliminate the bias. 2 3

Lowest RH Median RH Highest RH 0.05 0.00 Standard deviation Bias 5 4 3 2 1 0 1 2 3 4 5 0.05 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 RH Figure 2: Forecast differences for the linear δrh RH b at 850 hpa. The left panel shows the pdf s for lowest, median and highest 2.5% values of RH b, and the right panel shows the standard deviation and bias as a function of RH b. Lowest RH+dRH/2 Median RH+dRH/2 Highest RH+dRH/2 0.05 0.00 Standard deviation Bias 5 4 3 2 1 0 1 2 3 4 5 0.05 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 RH + drh/2 Figure 3: Forecast differences for the symmetric δrh RH b 1 2 δrh at 850 hpa. The left panel shows the pdf s for lowest, median and highest 2.5% values of RH b 1 2δRH, and the right panel shows the standard deviation and bias as a function of RH b 1 2δRH. All bins can be reasonably modelled by a gaussian. Note that the extreme bins are particularly affected by model and analysis effects of supersaturation clipping and resetting humidity to positive values. 4.3 Direct gaussianization An alternative way to derive a gaussian control variable would be to define a nonlinear transformation f δϕ Φ directly, based on forecast error differences. As we have seen in previous sections, we need to make the transformation a function of the background conditions if we want to end up with gaussian distributions for all conditions. The way to achieve this is to base the gaussianization transformation on the conditional pdf P δϕ Φ instead of basing it on P δϕ. For a given Φ, we just need to find a transformation f δϕ Φ of the δϕ axis so that the probability that x δϕ equals the probability that ξ f δϕ Φ for a normal gaussian 4

distribution, Π δϕ Φ δϕ P x Φ dx f δϕ Φ 1 2π e ξ 2 2 dξ Π G f δϕ Φ (6) where Π are the cumulative pdf s. Inverting the gaussian cumulative distribution then gives f δϕ Φ Π 1 G Π δϕ Φ (7) This will always work, but we may end up with a bias term which does not fit into the analysis framework as has been explained above. However, we could apply this transform as a finishing step after the symmetrizing transform and in that way avoid the bias term. This approach makes the search for a gaussian control variable automatic, although it does not guarantee the best control variable. There may be another choice which is less nonlinear for example. For the normalized relative humidity control variable, we do not need to apply this final gaussianization, since the departures are already close to gaussian. But if there is no other way to find a reasonably control variable, this approach can always be applied. 4.4 Implementation of the nonlinear control variable The difficulty with the symmetric control variable is that a nonlinear variable transform is needed to go from the model to the control variable. This nonlinearity is unavoidable when converting from non-gaussian to gaussian control variables. The ECMWF analysis consists of a series of minimizations (inner loops) linearized around a nonlinear reference state (outer loop). In the inner loops we must use a control variable which is linearly related to the model variables, but there is no such restriction in the outer loops. So we can use the nonlinear symmetrizing transform when going between inner and outer loops and use a linearization around the last outer loop in the inner loops. The new control variable is implemented as a change of variable in the analysis, with the background error at each level approximated by a polynomial P RH b 1 2δRH. This makes the relative humidity background error a function of relative humidity and pressure, whereas the earlier formulation was a function of relative humidity and temperature. In additon, there is now a multivariate coupling between q, T, and p through the use of linearized relative humidity. The background error covariance matrices are now calculated for normalized relative humidity instead of specific humidity. In Fig. 4 we show a comparison of the forecast differences δq and δrh at ca. 850 hpa. These are the actual inputs for the calculation of the covariances. The transformed relative humidity shows a homogeneous difference field which will be much easier to characterize statistically than the specific humidity difference field. 5 Conclusions We have shown how a detailed study of the probability distributions of forecast differences can help in the design of a gaussian control variable. The main point is to find an axis along which the differences follow gaussian distributions, then determine the bias and standard deviation of the differences along the axis, and finally normalize the differences with the bias and standard deviation to obtain a normal gaussian distribution. The background errors are directly related to the forecast differences in that if the forecast differences are gaussian, so are the background errors. There are a few problems which can arise while finding a gaussian control variable. First, it may be difficult to find an appropriate axis, expecially for variables like relative humidity which are bounded from above and below. Asymmetric distributions are a common problem here. To solve this a symmetrizing transform has been 5

Figure 4: Forecast differences for specific humifity δq (left, isolines 0.005) and normalized relative humidity δrh (right, isolines 1) at a single level (ca. 850 hpa). The two fields show what goes into the statistical determination of covariance matrices for the two variables. Normalized relative humidity is more homogeneous. defined. Second, from a technical point of view, we would like to be able to neglect any bias term in the change of variable, and this is also automatically solved by the symmetrizing transform. Third, the transform from a non-gaussian to a gaussian variable introduces nonlinearities in the analysis. The solution to this is to apply the full nonlinear transform only when going between inner and outer loops of the analysis, and use a linearized transform in inner loops. The control variable we found for (total) humidity is a normalized relative humidity. At each model gridpoint relative humidity is divided by a polynomial approximation of the background error, which varies as a function of the background relative humidity in the inner loops, with an additional nonlinear dependency of the increment itself in the outer loops. Since relative humidity is not a model variable, this choice of control variable introduces a multivariate relation between the background errors of specific humidity, temperature, and pressure. We plan to extend the approach here to an unbalanced humidity control variable, as a part of work to further integrate the humidity with other analysis variables. Acknowledgements I owe many thanks to my colleagues Mike Fisher, Erik Andersson, and Lars Isaksen for discussions and suggestions during this work. 6