Data Assimilation Training Course, Reading, -4 arch 4 Tangent linear and adjoint models for variational data assimilation Angela Benedetti with contributions from: arta Janisková, Philippe Lopez, Lars Isaksen, Gabor Radnoti and Yannick Tremolet
Data Assimilation Training Course, Reading, -4 arch 4 Introduction 4D-Var is based on minimization of a cost function which measures the distance between the model with respect to the observations and with respect to the background state The cost function and its gradient are needed in the minimization. The tangent linear model provides a computationally efficient (although approimate) way to calculate the model trajectory, and from it the cost function. The adjoint model is a very efficient tool to compute the gradient of the cost function. Overview: Brief introduction to 4D-Var with focus on TL/AD aspects General definitions of Tangent Linear and Adjoint models and why they are etremely useful in variational assimilation Writing TL and AD models and testing them Brief mention of automatic differentiation software
Data Assimilation Training Course, Reading, -4 arch 4 4D-Var In 4D-Var the cost function can be epressed as follows: n T T J ( b) B ( b) + ( H i i[ ] yi) Ri ( H i i[ ] yi) i J b J o B background error covariance matri, R observation error covariance matri (instrumental + interpolation + observation operator error), forward nonlinear forecast model (time evolution of the model state, inde i), H observation operator (model space observation space). n T ' T B b i Hi Ri i i yi i ( [ ] ) min J J ( ) + [ t,t ] H H T adjoint of observation operator and T adjoint of forecast model.
Incremental 4D-Var at ECWF In incremental 4D-Var, the cost function is minimized in terms of increments: [ t, t ] i i with the model state defined at any time t i as: i ti i is the trajectory around which the linearization is performed ( t +, ti ( t, ti ) t t at t) b 4D-Var cost function can then be approimated to the first order by: n T ' T ' J ( ) B + ( H i '[ t, ti ] di ) R ( Hi '[ t, ti ] ( ) where di yi Hi ti is the so-called departure computed using the nonlinear model and observational operator. The gradient of the cost function to be minimized is: Data Assimilation Training Course, Reading, -4 arch 4 ( ) n T ' T ' i i i i i i min J J B + [ t, t ] H R H d and are the tangent linear models which are used in the computations of incremental updates during the minimization (iterative procedure). T i i ' H i 'T H i and are the adjoint models which are used to obtain the gradient of the cost function with respect to the initial condition. d )
Data Assimilation Training Course, Reading, -4 arch 4 Details on linearisation In the first order approimation, a perturbation of the control variable (initial condition) evolves according to the tangent linear model: where i is the time-step inde. The perturbation of the cost function around the initial state is: n T ' T ' J(, ) B + ( Hii di) Ri ( Hii di) i n T T ' ' ' ' ' ' ' ' B + H... d R H... i i d ' H i i i i ( ) where is the linearised version of about and i are the departures from observations. ( ) ( ) i i i i i i i H i i d y H ( ) i ' i i i i i i
Data Assimilation Training Course, Reading, -4 arch 4 Details of the linearisation (cnt.) The gradient of the cost function with respect to is given by: n ' ' ' ' T ' ' ' ' J B H i i i R H i i i i di + (... ) (... ) remembering that ( AB) T B T A T n ' T ' T ' T ' T ' ' ' ' i i i i i i i i B +... H R ( H... d) n ' T ' T ' i i i i i i B + [ t, t ] H R ( H d ) The optimal initial perturbation is obtained by finding the value of for which: J The gradient of the cost function with respect to the initial condition is provided by the adjoint solution at time t. Let s see how
For any linear operator such as:, Definition of adjoint operator y Data Assimilation Training Course, Reading, -4 arch 4 ' there eist an adjoint operator ',, y where is an inner scalar product and, y are vectors (or functions) of the space where this product is defined. It can be shown that for the inner product defined in the Euclidean space : 'T J We will now show that the gradient of the cost function at time t is provided by the solution of the adjoint equations at the same time:
Data Assimilation Training Course, Reading, -4 arch 4 Adjoint solution Usually the initial guess is chosen to be equal to the background b so that the initial perturbation The gradient of the cost function is hence simplified as: n ' T ' T J [, ti t] Hi R di i We choose the solution of the adjoint system as follows: n i + (arbitrary final condition) i+ i+ + H i R d i, i,..., n (initial condition) (iterative solution) We then substitute progressively the solution epression for i into the
Data Assimilation Training Course, Reading, -4 arch 4 Adjoint solution (cnt.) J Finally, regrouping and remembering that and that and we obtain the following equality: n + 'T ' ' [, ] n T T i i i i t t H R d The gradient of the cost function with respect to the control variable (initial condition) is obtained by a backward integration of the adjoint model. 'T H H... )) ( ( ) ( 3 3 + + + d R H d R H d R H
Data Assimilation Training Course, Reading, -4 arch 4 Iterative steps in the 4D-Var Algorithm. Integrate forward model gives.. Integrate adjoint model backwards gives. 3. If then stop. 4. Compute descent direction (Newton, CG, ). 5. Compute step size : 6. Update initial condition:
Data Assimilation Training Course, Reading, -4 arch 4 Finding the minimum of cost function J is an iterative minimization procedure cost function J J( b ) + +ρ D m m m m ρ D m m J mini
An analysis cycle in 4D-Var st ifstraj: Non-linear model is used to compute the high-res trajectory (T79 operational, -h forecast) High-res departures are computed at eact obs time and location Trajectory is interpolated at low res (T59) minimizations in the old configuration Now 3 minimizations are operational! st ifsmin (7 iterations): Iterative minimization at T59 resolution Tangent linear with simplified physics to calculate the increments i The Adjoint is used to compute the gradient of the cost function with respect to the departure in initial condition J Analysis increment at initial time is interpolated back linearly from low-res to high-res and it provides a new initial state for the nd trajectory run Data Assimilation Training Course, Reading, -4 arch 4 nd ifstraj: repeat st ifstraj and interpolates at T55 resolution nd ifsmin (3 iterations): repeat st ifsmin at T55 Last ifstraj: Uses updated initial condition to run another -h forecast and stores analysis departures in the Observational Data Base (ODB)
Data Assimilation Training Course, Reading, -4 arch 4 Recap on TL and AD models
Simple eample of adjoint writing Data Assimilation Training Course, Reading, -4 arch 4
Data Assimilation Training Course, Reading, -4 arch 4 Simple eample of adjoint writing (cnt.) Often the adjoint variables in mathematical formulations are indicated with an asterisk Do not forget the last equation!!! That too is part of the adjoint! As an alternative to the matri method, adjoint coding can be carried out using a line-by-line approach.
Data Assimilation Training Course, Reading, -4 arch 4 ore practical eamples on adjont coding: the Lorenz model where is the Prandtl number, the Rayleigh number, and the aspect ratio. is the intensity of convection, is the maimum temperature difference is the stratification change due to convection. Details on the Lorenz model can be found in the references.
Data Assimilation Training Course, Reading, -4 arch 4 The linear code in Fortran Linearize each line of the code one by one, and set d/dty for simplicity: y() -p() +p() :Nonlinear statement ()yd() -pd() +pd() :Tangent linear y() ()(r-(3)) -() :Nonlinear statement ()yd() d()(r-(3)) -()d(3) -d() :Tangent linear etc Remember that p, r, b are constants; (), () and (3) are the independent variables; y(), y() and y(3) are the dependent variables. We chose the suffi d for the tangent linear variable for consistency with the automatic differentation software TAPENADE (see optional practical eercise). Adjoint variables are indicated with the suffi b. This is just a convention. Note that in the ECWF Integrated Forecast System (IFS) the tangent linear and adjoint variables are indicated without any subscripts and the nonlinear trajectory () is indicated with the suffi 5 (5).
Data Assimilation Training Course, Reading, -4 arch 4 Adjoint of one instruction We start from the tangent linear code: yd()-pd()+pd() In matri form, it can be written as: which can easily be transposed (asterisk indicates adjoint variables): The corresponding adjoint code in FORTRAN is: b()b()-pyb() b()b()+pyb() yb() y p p y y p p y
Data Assimilation Training Course, Reading, -4 arch 4 Adjoint of one instruction (II) We start again from the tangent linear code: yd() d()(r-(3))-d()- ()d(3) In matri form, it can be written as: which can easily be transposed (asterisk indicates transposition): The corresponding adjoint code in FORTRAN is: b()b()+(r-(3))yb() b()b()- yb() b(3)b(3)- ()yb() yb() 3 3 3 ) ( y r y 3 3 3 ) ( y r y These terms come from the trajectory! Needs to be stored in memory or recomputed
Data Assimilation Training Course, Reading, -4 arch 4 Trajectory The trajectory has to be available. It can be: saved which costs memory, recomputed which costs CPU time. Depending on the compleity of the code, one option or the other is adopted (or both options at the same time).
Data Assimilation Training Course, Reading, -4 arch 4 The Adjoint Code Property of adjoints (transposition): Application: where represents the line of the tangent linear model. The adjoint code is made of the transpose of each line of the tangent linear code in reverse order.
Data Assimilation Training Course, Reading, -4 arch 4 Adjoint of loops In the TL code for the Lorenz model we have: DO i,3 d(i)d(i)+dtyd(i) ENDDO dt is a constant for our purposes. This loop can be written eplicitly: d()d()+dtyd() d()d()+dtyd() d(3)d(3)+dtyd(3) We can now transpose and reverse the lines to get the adjoint: yb(3)yb(3)+dtb(3) yb()yb()+dtb() yb()yb()+dtb() which is equivalent to DO i3,,-!reverse order of indeces! yb(i)yb(i)+dtb(i) ENDDO
Data Assimilation Training Course, Reading, -4 arch 4 Conditional statements ( IF statements) What we want is the adjoint of the statements which were actually eecuted in the direct model. We need to know which branch of the IF statement was eecuted The result of the conditional statement has to be stored: it is part of the trajectory!!!
Summary of basic rules for line-by-line adjoint coding () Adjoint statements are derived from tangent linear ones in a reversed order Tangent linear code A y + B z y y + A Data Assimilation Training Course, Reading, -4 arch 4 z z + B A + B z z z + B do k, N (k) A (k) + B y(k) end do Order of operations is important when variable is updated! if (condition) tangent linear code Adjoint code A do k N,, (Reverse the loop!) (k ) (k) + A (k) y (k ) y (k) + B (k) (k) end do if (condition) adjoint code And do not forget to initialize local adjoint variables to zero!
Summary of basic rules for line-by-line adjoint coding () To save memory, the trajectory can be recomputed just before the adjoint calculations (again it depends on the compleity of the model). if ( > ) then A / A Log() end if Tangent linear code Trajectory and adjoint code ------------- Trajectory ---------------- store (storage for use in adjoint) if ( > ) then A Log() end if --------------- Adjoint ------------------ if ( store > ) then Data Assimilation Training Course, Reading, -4 arch 4 A / store end if The most common sources of error in adjoint coding are: ) Pure coding errors ) Forgotten initialization of local adjoint variables to zero 3) ismatching trajectories in tangent linear and adjoint (even slightly) 4) Bad identification of trajectory updates
Data Assimilation Training Course, Reading, -4 arch 4 ore facts about adjoints The adjoint always eists and it is unique, assuming spaces of finite dimension. Hence, coding the adjoint does not raise questions about its eistence, only questions related to the technical implementation. In the meteorological literature, the term adjoint is often improperly used to denote the adjoint of the tangent linear of a non-linear operator. In reality, the adjoint can be defined for any linear operator. One must be aware that discussions about the eistence of the adjoint usually should address the eistence of the tangent linear model. Without re-computation, the cost of the TL is usually about.5 times that of the non-linear code, the cost of the adjoint between and 3 times. The tangent linear model is not strictly necessary to run a 4D-Var system (but it is needed in the incremental 4D-Var formulation in use operationally at ECWF). It is also needed as an intermediate step to write and test the adjoint.
Data Assimilation Training Course, Reading, -4 arch 4 Test for tangent linear model Perturbation scaling factor machine precision reached
Data Assimilation Training Course, Reading, -4 arch 4 Test for adjoint model The adjoint test is truly unforgiving. If you do not have a ratio of the norm close to within the precision of the machine, you know there is a bug in your adjoint. At the end of your debugging you will have a perfect adjoint. If properly tested, the adjoint is the only piece of code on Earth to be entirely bug-free (although you may still have an imperfect tangent linear)!
Data Assimilation Training Course, Reading, -4 arch 4 Test of adjoint in practice Compute perturbed variable (y) using perturbation in input variables (,z) with the tangent linear code y y + NOR _ TL y z z Compute TL norm: Call adjoint routine to obtain gradients in and z with respect to initial perturbation in and z (, z ) from perturbation in y. z z Compute the norm from the adjoint calculation, using unperturbed state and gradients: NOR _ According to the test of adjoint NOR_TL must be equal to NOR_AD to the machine precision! AD y y + y + z z + z z
Data Assimilation Training Course, Reading, -4 arch 4 Automatic differentiation Because of the strict rules of tangent linear and adjoint coding, automatic differentiation is possible. Eisting tools: TAF (TAC), TAPENADE (Odyssée),... Reverse the order of instructions, Transpose instructions instantly without typos!!! Especially good in deriving tangent linear codes! There are still unresolved issues: It is NOT a black bo tool, Cannot handle non-differentiable instructions (TL is wrong), Can create huge arrays to store the trajectory, The codes often need to be cleaned-up and optimised. Look in the Supplementary material for more information!
Data Assimilation Training Course, Reading, -4 arch 4 Useful References Variational data assimilation: Lorenc, A., 986, Quarterly Journal of the Royal eteorological Society,, 77-94. Courtier, P. et al., 994, Quarterly Journal of the Royal eteorological Society,, 367-387. Rabier, F. et al.,, Quarterly Journal of the Royal eteorological Society, 6, 43-7. The adjoint technique: Errico, R.., 997, Bulletin of the American eteorological Society, 78, 577-59. Tangent-linear approimation: Errico, R.. et al., 993, Tellus, 45A, 46-477. Errico, R.., and K. Reader, 999, Quarterly Journal of the Royal eteorological Society, 5, 69-95. Janisková,. et al., 999, onthly Weather Review, 7, 6-45. ahfouf, J.-F., 999, Tellus, 5A, 47-66. Lorenz model: X. Y. Huang and X. Yang. Variational data assimilation with the Lorenz model. Technical Report 6, HIRLA, April 996. Available on ftp site (see notes for practical session). E. Lorenz. Deterministic nonperiodic flow. J. Atmos. Sci., :3-4, 963. Automatic differentiation: Giering R., Tangent Linear and Adjoint odel Compiler, Users anual Center for Global Change Sciences, Department of Earth, Atmospheric, and PlanetaryScience,IT,997 Giering R. and T. Kaminski, Recipes for Adjoint Code Construction, AC Transactions on athematical Software, 998 TAC: http://www.autodiff.org/ TAPENADE: http://www-sop.inria.fr/tropics/tapenade.html Sensitivity studies using the adjoint technique Janiskova,. and J.-J. orcrette., 5. Investigation of the sensitivity of the ECWF radiation scheme to input parameters using adjoint technique. Quart. J. Roy. eteor. Soc., 3,975-996.