M.Sc. in Meteorology UCD Numerical Weather Prediction Prof Peter Lynch Meteorology & Climate Cehtre School of Mathematical Sciences University College Dublin Second Semester, 2005 2006.
Text for the Course The lectures will be based closely on the text Atmospheric Modeling, Data Assimilation and Predictability by Eugenia Kalnay published by Cambridge University Press (2002). 2
Data Assimilation (Kalnay, Ch. 5) NWP is an initial/boundary value problem 3
Data Assimilation (Kalnay, Ch. 5) NWP is an initial/boundary value problem Given an estimate of the present state of the atmosphere (initial conditions) appropriate surface and lateral boundary conditions the model simulates or forecasts the evolution of the atmosphere. 3
Data Assimilation (Kalnay, Ch. 5) NWP is an initial/boundary value problem Given an estimate of the present state of the atmosphere (initial conditions) appropriate surface and lateral boundary conditions the model simulates or forecasts the evolution of the atmosphere. The more accurate the estimate of the initial conditions, the better the quality of the forecasts. 3
Data Assimilation (Kalnay, Ch. 5) NWP is an initial/boundary value problem Given an estimate of the present state of the atmosphere (initial conditions) appropriate surface and lateral boundary conditions the model simulates or forecasts the evolution of the atmosphere. The more accurate the estimate of the initial conditions, the better the quality of the forecasts. Operational NWP centers produce initial conditions through a statistical combination of observations and short-range forecasts. 3
Data Assimilation (Kalnay, Ch. 5) NWP is an initial/boundary value problem Given an estimate of the present state of the atmosphere (initial conditions) appropriate surface and lateral boundary conditions the model simulates or forecasts the evolution of the atmosphere. The more accurate the estimate of the initial conditions, the better the quality of the forecasts. Operational NWP centers produce initial conditions through a statistical combination of observations and short-range forecasts. This approach is called data assimilation 3
5
6
7
8
9
10
11
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. 12
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. For a time window of ±3 hours, there are typically 10 to 100 thousand observations of the atmosphere, two orders of magnitude less than the number of degrees of freedom of the model. 12
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. For a time window of ±3 hours, there are typically 10 to 100 thousand observations of the atmosphere, two orders of magnitude less than the number of degrees of freedom of the model. Moreover, they are distributed nonuniformly in space and time. 12
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. For a time window of ±3 hours, there are typically 10 to 100 thousand observations of the atmosphere, two orders of magnitude less than the number of degrees of freedom of the model. Moreover, they are distributed nonuniformly in space and time. It is necessary to use additional information, called the background field, first guess or prior information. 12
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. For a time window of ±3 hours, there are typically 10 to 100 thousand observations of the atmosphere, two orders of magnitude less than the number of degrees of freedom of the model. Moreover, they are distributed nonuniformly in space and time. It is necessary to use additional information, called the background field, first guess or prior information. A short-range forecast is used as the first guess in operational data assimilation systems. 12
Insufficiency of Data Coverage Modern primitive equations models have a number of degrees of freedom of the order of 10 7. For a time window of ±3 hours, there are typically 10 to 100 thousand observations of the atmosphere, two orders of magnitude less than the number of degrees of freedom of the model. Moreover, they are distributed nonuniformly in space and time. It is necessary to use additional information, called the background field, first guess or prior information. A short-range forecast is used as the first guess in operational data assimilation systems. Present-day operational systems typically use a 6-h cycle performed four times a day. 12
Typical 6-hour analysis cycle. 13
Suppose the background field is a model 6-h forecast: x b 14
Suppose the background field is a model 6-h forecast: x b To obtain the background or first guess observations, the model forecast is interpolated to the observation location 14
Suppose the background field is a model 6-h forecast: x b To obtain the background or first guess observations, the model forecast is interpolated to the observation location If the observed quantities are not the same as the model variables, the model variables are converted to observed variables y o. 14
Suppose the background field is a model 6-h forecast: x b To obtain the background or first guess observations, the model forecast is interpolated to the observation location If the observed quantities are not the same as the model variables, the model variables are converted to observed variables y o. The first guess of the observations is denoted H(x b ) where H is called the observation operator. 14
Suppose the background field is a model 6-h forecast: x b To obtain the background or first guess observations, the model forecast is interpolated to the observation location If the observed quantities are not the same as the model variables, the model variables are converted to observed variables y o. The first guess of the observations is denoted H(x b ) where H is called the observation operator. The difference between the observations and the background, y o H(x b ), is called the observational increment or innovation. 14
The analysis x a is obtained by adding the innovations to the background field with weights W that are determined based on the estimated statistical error covariances of the forecast and the observations: x a = x b + W[y o H(x b )] 15
The analysis x a is obtained by adding the innovations to the background field with weights W that are determined based on the estimated statistical error covariances of the forecast and the observations: x a = x b + W[y o H(x b )] Different analysis schemes (SCM, OI, 3D-Var, and KF) are based on this equation, but differ by the approach taken to combine the background and the observations to produce the analysis. 15
The analysis x a is obtained by adding the innovations to the background field with weights W that are determined based on the estimated statistical error covariances of the forecast and the observations: x a = x b + W[y o H(x b )] Different analysis schemes (SCM, OI, 3D-Var, and KF) are based on this equation, but differ by the approach taken to combine the background and the observations to produce the analysis. Earlier methods such as the SCM used weights which were determined empirically. The weights were a function of the distance between the observation and the grid point, and the analysis wass iterated several times. 15
In Optimal Interpolation (OI), the matrix of weights W is determined from the minimization of the analysis errors at each grid point. 16
In Optimal Interpolation (OI), the matrix of weights W is determined from the minimization of the analysis errors at each grid point. In the 3D-Var approach one defines a cost function proportional to the square of the distance between the analysis and both the background and the observations. This cost function is minimized to obtain the analysis. 16
In Optimal Interpolation (OI), the matrix of weights W is determined from the minimization of the analysis errors at each grid point. In the 3D-Var approach one defines a cost function proportional to the square of the distance between the analysis and both the background and the observations. This cost function is minimized to obtain the analysis. Lorenc (1986) showed that OI and the 3D-Var approach are equivalent if the cost function is defined as: J = 1 { } [y 2 o H(x)] T R 1 [y o H(x)] + (x x b ) T B 1 (x x b ) 16
In Optimal Interpolation (OI), the matrix of weights W is determined from the minimization of the analysis errors at each grid point. In the 3D-Var approach one defines a cost function proportional to the square of the distance between the analysis and both the background and the observations. This cost function is minimized to obtain the analysis. Lorenc (1986) showed that OI and the 3D-Var approach are equivalent if the cost function is defined as: J = 1 { } [y 2 o H(x)] T R 1 [y o H(x)] + (x x b ) T B 1 (x x b ) The cost function J measures: The distance of a field x to the observations (first term) The distance to the background x b (second term). 16
The distances are scaled by the observation error covariance R and by the background error covariance B respectively. The minimum of the cost function is obtained for x = x a, which is defined as the analysis. 17
The distances are scaled by the observation error covariance R and by the background error covariance B respectively. The minimum of the cost function is obtained for x = x a, which is defined as the analysis. The analysis obtained by OI and 3DVar is the same if the weight matrix is given by W = BH T (HBH T + R 1 ) 1 17
The distances are scaled by the observation error covariance R and by the background error covariance B respectively. The minimum of the cost function is obtained for x = x a, which is defined as the analysis. The analysis obtained by OI and 3DVar is the same if the weight matrix is given by W = BH T (HBH T + R 1 ) 1 The difference between OI and the 3D-Var approach is in the method of solution: 17
The distances are scaled by the observation error covariance R and by the background error covariance B respectively. The minimum of the cost function is obtained for x = x a, which is defined as the analysis. The analysis obtained by OI and 3DVar is the same if the weight matrix is given by W = BH T (HBH T + R 1 ) 1 The difference between OI and the 3D-Var approach is in the method of solution: In OI, the weights W are obtained for each grid point or grid volume, using suitable simplifications. 17
The distances are scaled by the observation error covariance R and by the background error covariance B respectively. The minimum of the cost function is obtained for x = x a, which is defined as the analysis. The analysis obtained by OI and 3DVar is the same if the weight matrix is given by W = BH T (HBH T + R 1 ) 1 The difference between OI and the 3D-Var approach is in the method of solution: In OI, the weights W are obtained for each grid point or grid volume, using suitable simplifications. In 3D-Var, the minimization of J is performed directly, allowing for additional flexibility and a simultaneous global use of the data. 17
Schematic of the size of matrix operations in OI and 3D-Var (from www.ecmwf.int)
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). In the analysis cycle, the importance of the model cannot be overemphasized: 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). In the analysis cycle, the importance of the model cannot be overemphasized: It transports information from data-rich to data-poor regions 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). In the analysis cycle, the importance of the model cannot be overemphasized: It transports information from data-rich to data-poor regions It provides a complete estimation of the four-dimensional state of the atmosphere. 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). In the analysis cycle, the importance of the model cannot be overemphasized: It transports information from data-rich to data-poor regions It provides a complete estimation of the four-dimensional state of the atmosphere. The introduction of 4DVar at ECMWF has resulted in marked improvements in the quality of medium-range forecasts. 18
Recently, the variational approach has been extended to four dimensions, by including within the cost function the distance to observations over a time interval (assimilation window). This is called four-dimensional variational assimilation (4DVar). In the analysis cycle, the importance of the model cannot be overemphasized: It transports information from data-rich to data-poor regions It provides a complete estimation of the four-dimensional state of the atmosphere. The introduction of 4DVar at ECMWF has resulted in marked improvements in the quality of medium-range forecasts. End of Introduction 18
21
Background Field For operational models, it is not enough to perform spatial interpolation of observations into regular grids: There are not enough data available to define the initial state. 23
Background Field For operational models, it is not enough to perform spatial interpolation of observations into regular grids: There are not enough data available to define the initial state. The number of degrees of freedom in a modern NWP model is of the order of 10 7. The total number of conventional observations is of the order of 10 4 10 5. 23
Background Field For operational models, it is not enough to perform spatial interpolation of observations into regular grids: There are not enough data available to define the initial state. The number of degrees of freedom in a modern NWP model is of the order of 10 7. The total number of conventional observations is of the order of 10 4 10 5. There are many new types of data, such as satellite and radar observations, but: they don t measure the variables used in the models their distribution in space and time is very nonuniform. 23
Background Field In addition to observations, it is necessary to use a first guess estimate of the state of the atmosphere at the grid points. 24
Background Field In addition to observations, it is necessary to use a first guess estimate of the state of the atmosphere at the grid points. The first guess (also known as background field or prior information) is our best estimate of the state of the atmosphere prior to the use of the observations. 24
Background Field In addition to observations, it is necessary to use a first guess estimate of the state of the atmosphere at the grid points. The first guess (also known as background field or prior information) is our best estimate of the state of the atmosphere prior to the use of the observations. A short-range forecast is normally used as a first guess in operational systems in what is called an analysis cycle. 24
Background Field In addition to observations, it is necessary to use a first guess estimate of the state of the atmosphere at the grid points. The first guess (also known as background field or prior information) is our best estimate of the state of the atmosphere prior to the use of the observations. A short-range forecast is normally used as a first guess in operational systems in what is called an analysis cycle. If a forecast is unavailable (e.g., if the cycle is broken), we may have to use climatological fields...... but they are normally a poor estimate of the initial state. 24
Global 6-h analysis cycle (00, 06, 12, and 18 UTC). 25
Regional analysis cycle, performed (perhaps) every hour. 26
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. 27
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. The model forecast plays a very important role: 27
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. The model forecast plays a very important role: Over data-rich regions, the analysis is dominated by the information contained in the observations. 27
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. The model forecast plays a very important role: Over data-rich regions, the analysis is dominated by the information contained in the observations. In data-poor regions, the forecast benefits from the information upstream. 27
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. The model forecast plays a very important role: Over data-rich regions, the analysis is dominated by the information contained in the observations. In data-poor regions, the forecast benefits from the information upstream. For example, 6-h forecasts over the North Atlantic Ocean are relatively good, because of the information coming from North America. 27
Intermittent data assimilation is used in most global operational systems, typically with a 6-h cycle performed four times a day. The model forecast plays a very important role: Over data-rich regions, the analysis is dominated by the information contained in the observations. In data-poor regions, the forecast benefits from the information upstream. For example, 6-h forecasts over the North Atlantic Ocean are relatively good, because of the information coming from North America. The model is able to transport information from data-rich to data-poor areas. 27
Least Squares Method (Kalnay, 5.3) We start with a toy model example, the two temperatures problem.
Least Squares Method (Kalnay, 5.3) We start with a toy model example, the two temperatures problem. We use two methods to solve it, a sequential and a variational approach, and find that they are equivalent: they yield identical results.
Least Squares Method (Kalnay, 5.3) We start with a toy model example, the two temperatures problem. We use two methods to solve it, a sequential and a variational approach, and find that they are equivalent: they yield identical results. The problem is important because the methodology and results carry over to multivariate OI, Kalman filtering, and 3D-Var and 4D-Var assimilation.
Least Squares Method (Kalnay, 5.3) We start with a toy model example, the two temperatures problem. We use two methods to solve it, a sequential and a variational approach, and find that they are equivalent: they yield identical results. The problem is important because the methodology and results carry over to multivariate OI, Kalman filtering, and 3D-Var and 4D-Var assimilation. If you fully understand the toy model, you should find the more realistic application straightforward.
Statistical estimation Introduction. Each of you: Guess the temperature in this room right now. How can we get a best estimate of the temperature? 2
Statistical estimation Introduction. Each of you: Guess the temperature in this room right now. How can we get a best estimate of the temperature? The best estimate of the state of the atmosphere is obtained by combining prior information about the atmosphere (background or first guess) with observations. 2
Statistical estimation Introduction. Each of you: Guess the temperature in this room right now. How can we get a best estimate of the temperature? The best estimate of the state of the atmosphere is obtained by combining prior information about the atmosphere (background or first guess) with observations. In order to combine them optimally, we also need statistical information about the errors in these pieces of information. 2
Statistical estimation Introduction. Each of you: Guess the temperature in this room right now. How can we get a best estimate of the temperature? The best estimate of the state of the atmosphere is obtained by combining prior information about the atmosphere (background or first guess) with observations. In order to combine them optimally, we also need statistical information about the errors in these pieces of information. As an introduction to statistical estimation, we consider the simple problem, that we call the two temperatures problem: 2
Statistical estimation Introduction. Each of you: Guess the temperature in this room right now. How can we get a best estimate of the temperature? The best estimate of the state of the atmosphere is obtained by combining prior information about the atmosphere (background or first guess) with observations. In order to combine them optimally, we also need statistical information about the errors in these pieces of information. As an introduction to statistical estimation, we consider the simple problem, that we call the two temperatures problem: Given two independent observations T 1 and T 2, determine the best estimate of the true temperature T t. 2
Simple (toy) Example Let the two observations of temperature be } T 1 = T t + ε 1 T 2 = T t + ε 2 [For example, we might have two iffy thermometers]. 3
Simple (toy) Example Let the two observations of temperature be } T 1 = T t + ε 1 T 2 = T t + ε 2 [For example, we might have two iffy thermometers]. The observations have errors ε i, which we don t know. 3
Simple (toy) Example Let the two observations of temperature be } T 1 = T t + ε 1 T 2 = T t + ε 2 [For example, we might have two iffy thermometers]. The observations have errors ε i, which we don t know. Let E( ) represent the expected value, i.e., the average of many similar measurements. 3
Simple (toy) Example Let the two observations of temperature be } T 1 = T t + ε 1 T 2 = T t + ε 2 [For example, we might have two iffy thermometers]. The observations have errors ε i, which we don t know. Let E( ) represent the expected value, i.e., the average of many similar measurements. We assume that the measurements T 1 and T 2 are unbiased: or equivalently, E(T 1 T t ) = 0, E(T 2 T t ) = 0 E(ε 1 ) = E(ε 2 ) = 0 3
We also assume that we know the variances of the observational errors: E ( ε 2 1) = σ 2 1 E ( ε 2 2) = σ 2 2 4
We also assume that we know the variances of the observational errors: E ( ε 2 1) = σ 2 1 E ( ε 2 2) = σ 2 2 We next assume that the errors of the two measurements are uncorrelated: E(ε 1 ε 2 ) = 0 4
We also assume that we know the variances of the observational errors: E ( ε 2 1) = σ 2 1 E ( ε 2 2) = σ 2 2 We next assume that the errors of the two measurements are uncorrelated: E(ε 1 ε 2 ) = 0 This implies, for example, that there is no systematic tendency for one thermometer to read high (ε 2 > 0) when the other is high (ε 2 > 0). 4
We also assume that we know the variances of the observational errors: E ( ε 2 1) = σ 2 1 E ( ε 2 2) = σ 2 2 We next assume that the errors of the two measurements are uncorrelated: E(ε 1 ε 2 ) = 0 This implies, for example, that there is no systematic tendency for one thermometer to read high (ε 2 > 0) when the other is high (ε 2 > 0). The above equations represent the statistical information that we need about the actual observations. 4
We estimate T t as a linear combination of the observations: T a = a 1 T 1 + a 2 T 2 5
We estimate T t as a linear combination of the observations: T a = a 1 T 1 + a 2 T 2 The analysis T a should be unbiased: E(T a ) = E(T t ) 5
We estimate T t as a linear combination of the observations: T a = a 1 T 1 + a 2 T 2 The analysis T a should be unbiased: E(T a ) = E(T t ) This implies a 1 + a 2 = 1 5
We estimate T t as a linear combination of the observations: T a = a 1 T 1 + a 2 T 2 The analysis T a should be unbiased: E(T a ) = E(T t ) This implies a 1 + a 2 = 1 T a will be the best estimate of T t if the coefficients are chosen to minimize the mean squared error of T a : σ 2 a = E[(T a T t ) 2 ] = E {[ a 1 (T 1 T t ) + a 2 (T 2 T t ) ] 2 } subject to the constraint a 1 + a 2 = 1. 5
We estimate T t as a linear combination of the observations: T a = a 1 T 1 + a 2 T 2 The analysis T a should be unbiased: E(T a ) = E(T t ) This implies a 1 + a 2 = 1 T a will be the best estimate of T t if the coefficients are chosen to minimize the mean squared error of T a : σ 2 a = E[(T a T t ) 2 ] = E {[ a 1 (T 1 T t ) + a 2 (T 2 T t ) ] 2 } subject to the constraint a 1 + a 2 = 1. This may be written σ 2 a = E[(a 1 ε 1 + a 2 ε 2 ) 2 ] 5
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. 6
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. Naïve solution: σ 2 a/ a 1 = 2a 1 σ 2 1 = 0, so a 1 = 0. Similarly, σ 2 a/ a 2 = 0 implies a 2 = 0. 6
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. Naïve solution: σ 2 a/ a 1 = 2a 1 σ 2 1 = 0, so a 1 = 0. Similarly, σ 2 a/ a 2 = 0 implies a 2 = 0. We have forgotten the constraint a 1 + a 2 = 1. So, a 1 and a 2 are not independent. 6
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. Naïve solution: σ 2 a/ a 1 = 2a 1 σ 2 1 = 0, so a 1 = 0. Similarly, σ 2 a/ a 2 = 0 implies a 2 = 0. We have forgotten the constraint a 1 + a 2 = 1. So, a 1 and a 2 are not independent. Substituting a 2 = 1 a 1, we get σa 2 = a 2 1 σ2 1 + (1 a 1) 2 σ2 2 6
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. Naïve solution: σ 2 a/ a 1 = 2a 1 σ 2 1 = 0, so a 1 = 0. Similarly, σ 2 a/ a 2 = 0 implies a 2 = 0. We have forgotten the constraint a 1 + a 2 = 1. So, a 1 and a 2 are not independent. Substituting a 2 = 1 a 1, we get σa 2 = a 2 1 σ2 1 + (1 a 1) 2 σ2 2 Equating the derivative w.r.t. a 1 to zero, σa/ a 2 1 = 0, gives a 1 = σ2 2 σ 2 1 + σ2 2 a 2 = σ2 1 σ 2 1 + σ2 2 6
Expanding this expression for σ 2 a, we get σ 2 a = a 2 1 σ2 1 + a2 2 σ2 2 To minimize σ 2 a w.r.t. a 1, we require σ 2 a/ a 1 = 0. Naïve solution: σ 2 a/ a 1 = 2a 1 σ 2 1 = 0, so a 1 = 0. Similarly, σ 2 a/ a 2 = 0 implies a 2 = 0. We have forgotten the constraint a 1 + a 2 = 1. So, a 1 and a 2 are not independent. Substituting a 2 = 1 a 1, we get σa 2 = a 2 1 σ2 1 + (1 a 1) 2 σ2 2 Equating the derivative w.r.t. a 1 to zero, σa/ a 2 1 = 0, gives a 1 = σ2 2 σ 2 1 + σ2 2 a 2 = σ2 1 σ 2 1 + σ2 2 Thus, we have expressions for the weights a 1 and a 2 in terms of the variances (which are assumed to be known). 6
We define the precision to be the inverse of the variance. It is a measure of the accuracy of the observations. 7
We define the precision to be the inverse of the variance. It is a measure of the accuracy of the observations. Note: The term precision, while a good one, does not have universal currency, so it should be defined when used. 7
We define the precision to be the inverse of the variance. It is a measure of the accuracy of the observations. Note: The term precision, while a good one, does not have universal currency, so it should be defined when used. Substituting the coefficients in σa 2 = a 2 1 σ2 1 + a2 2 σ2 2, we obtain σ 2 a = σ2 1 σ2 2 σ 2 1 + σ2 2 7
We define the precision to be the inverse of the variance. It is a measure of the accuracy of the observations. Note: The term precision, while a good one, does not have universal currency, so it should be defined when used. Substituting the coefficients in σa 2 = a 2 1 σ2 1 + a2 2 σ2 2, we obtain σ 2 a = σ2 1 σ2 2 σ 2 1 + σ2 2 This can be written in the alternative form: 1 = 1 σ 2 + 1 1 σ 2 2 σ 2 a 7
We define the precision to be the inverse of the variance. It is a measure of the accuracy of the observations. Note: The term precision, while a good one, does not have universal currency, so it should be defined when used. Substituting the coefficients in σa 2 = a 2 1 σ2 1 + a2 2 σ2 2, we obtain σ 2 a = σ2 1 σ2 2 σ 2 1 + σ2 2 This can be written in the alternative form: 1 = 1 σ 2 + 1 1 σ 2 2 σ 2 a Thus, if the coefficients are optimal, the precision of the analysis is the sum of the precisions of the measurements. 7
Variational approach We can also obtain the same best estimate of T t by minimizing a cost function. 8
Variational approach We can also obtain the same best estimate of T t by minimizing a cost function. The cost function is defined as the sum of the squares of the distances of T to the two observations, weighted by their observational error precisions: J(T ) = 1 2 [ (T T 1 ) 2 σ 2 1 ] + (T T 2) 2 σ2 2 8
Variational approach We can also obtain the same best estimate of T t by minimizing a cost function. The cost function is defined as the sum of the squares of the distances of T to the two observations, weighted by their observational error precisions: J(T ) = 1 2 [ (T T 1 ) 2 σ 2 1 ] + (T T 2) 2 σ2 2 The minimum of the cost function J is obtained is obtained by requiring J/ T = 0. 8
Variational approach We can also obtain the same best estimate of T t by minimizing a cost function. The cost function is defined as the sum of the squares of the distances of T to the two observations, weighted by their observational error precisions: J(T ) = 1 2 [ (T T 1 ) 2 σ 2 1 ] + (T T 2) 2 σ2 2 The minimum of the cost function J is obtained is obtained by requiring J/ T = 0. Exercise: Prove that J/ T = 0 gives the same value for T a as the least squares method. 8
The control variable for the minimization of J (i.e., the variable with respect to which we are minimizing the cost function) is the temperature. For the least squares method, the control variables were the weights. 9
The control variable for the minimization of J (i.e., the variable with respect to which we are minimizing the cost function) is the temperature. For the least squares method, the control variables were the weights. The equivalence between the minimization of the analysis error variance and the variational cost function approach is important. 9
The control variable for the minimization of J (i.e., the variable with respect to which we are minimizing the cost function) is the temperature. For the least squares method, the control variables were the weights. The equivalence between the minimization of the analysis error variance and the variational cost function approach is important. This equivalence also holds true for multidimensional problems, in which case we use the covariance matrix rather than the scalar variance. It indicates that OI and 3D-Var are solving the same problem by different means. 9
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 a 1 = σ2 2 σ 2 1 + σ2 2 = 1 5 a 2 = σ2 1 σ 2 1 + σ2 2 = 4 5 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 a 1 = σ2 2 σ 2 1 + σ2 2 = 1 5 a 2 = σ2 1 σ 2 1 + σ2 2 = 4 5 CHECK: a 1 + a 2 = 1. 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 a 1 = σ2 2 σ 2 1 + σ2 2 = 1 5 a 2 = σ2 1 σ 2 1 + σ2 2 = 4 5 CHECK: a 1 + a 2 = 1. T a = a 1 T 1 + a 2 T 2 = 1 5 2 + 4 5 0 = 0.4 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 a 1 = σ2 2 σ 2 1 + σ2 2 = 1 5 a 2 = σ2 1 σ 2 1 + σ2 2 = 4 5 CHECK: a 1 + a 2 = 1. T a = a 1 T 1 + a 2 T 2 = 1 5 2 + 4 5 0 = 0.4 σ 2 a = σ2 1 σ2 2 σ 2 1 + σ2 2 = 4 1 4 + 1 = 0.8 10
Example: Suppose T 1 = 2 σ 1 = 2 T 2 = 0 σ 2 = 1. Show that T a = 0.4 and σ a = 0.8. σ 2 1 + σ2 2 = 5 a 1 = σ2 2 σ 2 1 + σ2 2 = 1 5 a 2 = σ2 1 σ 2 1 + σ2 2 = 4 5 CHECK: a 1 + a 2 = 1. T a = a 1 T 1 + a 2 T 2 = 1 5 2 + 4 5 0 = 0.4 σ 2 a = σ2 1 σ2 2 σ 2 1 + σ2 2 = 4 1 4 + 1 = 0.8 This solution is illustrated in the next figure. 10
The probability distribution for a simple case. The analysis has a pdf with a maximum closer to T 2, and a smaller standard deviation than either observation. 11
Conclusion of the foregoing. 12
Simple Sequential Assimilation We consider the toy example as a prototype of a full multivariate OI. 13
Simple Sequential Assimilation We consider the toy example as a prototype of a full multivariate OI. Recall that we wrote the analysis as a linear combination T a = a 1 T 1 + a 2 T 2 13
Simple Sequential Assimilation We consider the toy example as a prototype of a full multivariate OI. Recall that we wrote the analysis as a linear combination T a = a 1 T 1 + a 2 T 2 The requirement that the analysis be unbiassed led to a 1 + a 2 = 1, so T a = T 1 + a 2 (T 2 T 1 ) 13
Simple Sequential Assimilation We consider the toy example as a prototype of a full multivariate OI. Recall that we wrote the analysis as a linear combination T a = a 1 T 1 + a 2 T 2 The requirement that the analysis be unbiassed led to a 1 + a 2 = 1, so T a = T 1 + a 2 (T 2 T 1 ) Assume that one of the two temperatures, say T 1 = T b, is not an observation, but a background value, such as a forecast. Assume that the other value is an observation, T 2 = T o. 13
Simple Sequential Assimilation We consider the toy example as a prototype of a full multivariate OI. Recall that we wrote the analysis as a linear combination T a = a 1 T 1 + a 2 T 2 The requirement that the analysis be unbiassed led to a 1 + a 2 = 1, so T a = T 1 + a 2 (T 2 T 1 ) Assume that one of the two temperatures, say T 1 = T b, is not an observation, but a background value, such as a forecast. Assume that the other value is an observation, T 2 = T o. We can write the analysis as T a = T b + W (T o T b ) where W = a 2 can be expressed in terms of the variances. 13
The least squares method gave us the optimal weight: W = σ2 b σ 2 b + σ2 o 14
The least squares method gave us the optimal weight: W = σ2 b σ 2 b + σ2 o When the analysis is written as T a = T b + W (T o T b ) the quantity (T o T b ) is called the observational innovation, i.e., the new information brought by the observation. 14
The least squares method gave us the optimal weight: W = σ2 b σ 2 b + σ2 o When the analysis is written as T a = T b + W (T o T b ) the quantity (T o T b ) is called the observational innovation, i.e., the new information brought by the observation. It is also known as the observational increment (with respect to the background). 14
The analysis error variance is, as before, given by 1 σ 2 a = 1 σ 2 b + 1 σ 2 o or σ 2 a = σ2 b σ2 o σ 2 b + σ2 o 15
The analysis error variance is, as before, given by 1 σ 2 a = 1 σ 2 b + 1 σ 2 o or σ 2 a = σ2 b σ2 o σ 2 b + σ2 o The analysis variance can be written as σ 2 a = (1 W )σ 2 b 15
The analysis error variance is, as before, given by 1 σ 2 a = 1 σ 2 b + 1 σ 2 o or σ 2 a = σ2 b σ2 o σ 2 b + σ2 o The analysis variance can be written as σ 2 a = (1 W )σ 2 b Exercise: Verify all the foregoing formulæ. 15
The analysis error variance is, as before, given by 1 σ 2 a = 1 σ 2 b + 1 σ 2 o or σ 2 a = σ2 b σ2 o σ 2 b + σ2 o The analysis variance can be written as σ 2 a = (1 W )σ 2 b Exercise: Verify all the foregoing formulæ. We have shown that the simple two-temperatures problem serves as a paradigm for the problem of objective analysis of the atmospheric state. 15
Collection of Main Equations We gather the principal equations here: 16
Collection of Main Equations We gather the principal equations here: T a = T b + W (T o T b ) W = σ2 b σ 2 b + σ2 o σ 2 a = σ2 b σ2 o σ 2 b + σ2 o = W σ 2 o σ 2 a = (1 W )σ 2 b 16
These four equations have been derived for the simplest scalar case...... but they are important for the problem of data assimilation because they have exactly the same form as more general equations: 17
These four equations have been derived for the simplest scalar case...... but they are important for the problem of data assimilation because they have exactly the same form as more general equations: The least squares sequential estimation method is used for real multidimensional problems (OI, interpolation, 3D-Var and even Kalman filtering). 17
These four equations have been derived for the simplest scalar case...... but they are important for the problem of data assimilation because they have exactly the same form as more general equations: The least squares sequential estimation method is used for real multidimensional problems (OI, interpolation, 3D-Var and even Kalman filtering). Therefore we will interpret these four equations in detail. 17
The first equation T a = T b + W (T o T b ) 18
The first equation This says: T a = T b + W (T o T b ) The analysis is obtained by adding to the background value, or first guess, the innovation (the difference between the observation and first guess), weighted by the optimal weight. 18
The second equation This says: W = σ2 b σ 2 b + σ2 o The optimal weight is the background error variance multiplied by the inverse of the total error variance (the sum of the background and the observation error variances). 19
The second equation This says: W = σ2 b σ 2 b + σ2 o The optimal weight is the background error variance multiplied by the inverse of the total error variance (the sum of the background and the observation error variances). Note that the larger the background error variance, the larger the correction to the first guess. 19
The second equation This says: W = σ2 b σ 2 b + σ2 o The optimal weight is the background error variance multiplied by the inverse of the total error variance (the sum of the background and the observation error variances). Note that the larger the background error variance, the larger the correction to the first guess. Look at the limits: σ 2 o = 0 ; σ 2 b = 0. 19
The third equation The variance of the analysis is σ 2 a = σ2 b σ2 o σ 2 b + σ2 o 20
The third equation The variance of the analysis is σ 2 a = σ2 b σ2 o σ 2 b + σ2 o This can also be written 1 σ 2 a = 1 σ 2 b + 1 σ 2 o 20
The third equation The variance of the analysis is σ 2 a = σ2 b σ2 o σ 2 b + σ2 o This can also be written 1 σ 2 a = 1 σ 2 b + 1 σ 2 o This says: The precision of the analysis (inverse of the analysis error variance) is the sum of the precisions of the background and the observation. 20
The fourth equation This says: σ 2 a = (1 W )σ 2 b The error variance of the analysis is the error variance of the background, reduced by a factor equal to one minus the optimal weight. 21
The fourth equation This says: σ 2 a = (1 W )σ 2 b The error variance of the analysis is the error variance of the background, reduced by a factor equal to one minus the optimal weight. It can also be written σ 2 a = W σ 2 o 21
All the above statements are important because they also hold true for sequential data assimilation systems (OI and Kalman filtering) for multidimensional problems. 22
All the above statements are important because they also hold true for sequential data assimilation systems (OI and Kalman filtering) for multidimensional problems. In these problems, in which T b and T a are three-dimensional fields of size order 10 7 and T o is a set of observations (typically of size 10 5 ), we have to replace expressions as follows: 22
All the above statements are important because they also hold true for sequential data assimilation systems (OI and Kalman filtering) for multidimensional problems. In these problems, in which T b and T a are three-dimensional fields of size order 10 7 and T o is a set of observations (typically of size 10 5 ), we have to replace expressions as follows: error variance = error covariance matrix 22
All the above statements are important because they also hold true for sequential data assimilation systems (OI and Kalman filtering) for multidimensional problems. In these problems, in which T b and T a are three-dimensional fields of size order 10 7 and T o is a set of observations (typically of size 10 5 ), we have to replace expressions as follows: error variance = error covariance matrix optimal weight = optimal gain matrix. 22
All the above statements are important because they also hold true for sequential data assimilation systems (OI and Kalman filtering) for multidimensional problems. In these problems, in which T b and T a are three-dimensional fields of size order 10 7 and T o is a set of observations (typically of size 10 5 ), we have to replace expressions as follows: error variance = error covariance matrix optimal weight = optimal gain matrix. Note that there is one essential tuning parameter in OI: It is the ratio of the observationalvariance to the background error variance: ( ) 2 σo σ b 22
Application to Analysis If the background is a forecast, we can use the four equations to create a simple sequential analysis cycle. The observation is used once at the time it appears and then discarded. 23
Application to Analysis If the background is a forecast, we can use the four equations to create a simple sequential analysis cycle. The observation is used once at the time it appears and then discarded. Assume that we have completed the analysis at time t i (e.g., at 06 UTC), and we want to proceed to the next cycle (time t i+1, or 12 UTC). 23
Application to Analysis If the background is a forecast, we can use the four equations to create a simple sequential analysis cycle. The observation is used once at the time it appears and then discarded. Assume that we have completed the analysis at time t i (e.g., at 06 UTC), and we want to proceed to the next cycle (time t i+1, or 12 UTC). The analysis cycle has two phases, a forecast phase to update the background T b and its error variance σb 2, and an analysis phase, to update the analysis T a and its error variance σa. 2 23
Typical 6-hour analysis cycle. 24
Forecast Phase In the forecast phase of the analysis cycle, the background is first obtained through a forecast: T b (t i+1 ) = M [T a (t i )] where M represents the forecast model. 25
Forecast Phase In the forecast phase of the analysis cycle, the background is first obtained through a forecast: T b (t i+1 ) = M [T a (t i )] where M represents the forecast model. We also need the error variance of the background. 25
Forecast Phase In the forecast phase of the analysis cycle, the background is first obtained through a forecast: T b (t i+1 ) = M [T a (t i )] where M represents the forecast model. We also need the error variance of the background. In OI, this is obtained by making a suitable simple assumption, such as that the model integration increases the initial error variance by a fixed amount, a factor a somewhat greater than 1: σ 2 b (t i+1) = aσ 2 a(t i ) 25
Forecast Phase In the forecast phase of the analysis cycle, the background is first obtained through a forecast: T b (t i+1 ) = M [T a (t i )] where M represents the forecast model. We also need the error variance of the background. In OI, this is obtained by making a suitable simple assumption, such as that the model integration increases the initial error variance by a fixed amount, a factor a somewhat greater than 1: σ 2 b (t i+1) = aσ 2 a(t i ) This allows the new weight W (t i+1 ) to be estimated using W = σ2 b σ 2 b + σ2 o 25
Analysis Phase In the analysis phase of the cycle we get the new observation T o (t i+1 ), and we derive the new analysis T a (t i+1 ) using T a = T b + W (T o T b ) 26
Analysis Phase In the analysis phase of the cycle we get the new observation T o (t i+1 ), and we derive the new analysis T a (t i+1 ) using T a = T b + W (T o T b ) The estimates of σ 2 b is from σ 2 b (t i+1) = aσ 2 a(t i ) 26
Analysis Phase In the analysis phase of the cycle we get the new observation T o (t i+1 ), and we derive the new analysis T a (t i+1 ) using T a = T b + W (T o T b ) The estimates of σ 2 b is from σb 2 (t i+1) = aσa(t 2 i ) The new analysis error variance σa(t 2 i+1 ) comes from σa 2 = (1 W )σb 2 It is smaller than the background error. 26
Analysis Phase In the analysis phase of the cycle we get the new observation T o (t i+1 ), and we derive the new analysis T a (t i+1 ) using T a = T b + W (T o T b ) The estimates of σ 2 b is from σ 2 b (t i+1) = aσ 2 a(t i ) The new analysis error variance σ 2 a(t i+1 ) comes from σ 2 a = (1 W )σ 2 b It is smaller than the background error. After the analysis, the cycle for time t i+1 is completed, and we can proceed to the next cycle. 26
Reading Assignment Study the Remarks in Kalnay, 5.3.1 27