Model parameters of chaotic dynamics: metrics for comparing trajectories

Model parameters of chaotic dynamics: metrics for comparing trajectories H. Haario 1,2, L.Kalachev 3, J. Hakkarainen 2, A. Bibov 1, FMI people June 25, 215 1 Lappeenranta University of Technology, Finland 2 Finnish Meteorological Institute (FMI) 3 University of Montana, USA

Contents No standard likelihood due to chaoticity Likelihoods based on Summary statistics State space filtering Metrics based on fractal dimension concepts Examples: Lorenz, Shallow Water

Chaotic Systems: construction of the likelihood (cost function)? 1 8 6 state variable 1 4 2 2 4 6 5 1 15 time Small changes in the initial values (or solver settings) can lead to unpredictable deviations from the observations.

Likelihood: 1st attempts, summary statistics Observations and simulations are transformed to summary statistics: s = s(y 1:n ), s θ = s(z θ,1:n ) Likelihood is formulated for s, which yields the posterior p (θ s) p(θ)p(s θ) The approach was implemented for the ECHAM5 climate model, using likelihood based on monthly global and zonal net radiation averages. MCMC was used to estimate four parameters related to cloud formation and precipitation

Climate model MCMC results CMFCTOP.4.2 CAULOC.15 CMFCTOP CPRCON.1.5 ENTRSCV 6 x 1 3 4 2 2 4 CPRCON.2.4.1.2

Climate model MCMC results CMFCTOP.4.2 CAULOC.15 CMFCTOP CPRCON.1.5 ENTRSCV 6 x 1 3 4 2 2 4 CPRCON.2.4.1.2 The summary statistics do not identify the parameters

6.2. Individual Climate Variables Chapter 6. Performance Metrics Applied Nonsmooth cost function log(cmfctop) log(cmfctop) 4 3 2 1 J1 with t5 1.2 1.8.6.4 3.5 2 2 2 2 4 3 2 1 log(cauloc) J3 with t5 115 11 15 1 95 4 3 2 1 J2 with t5 5 4.5 4 9 75 2 2 2 2 log(cauloc) log(cauloc) (a) the large perturbations of the parameters log(cmfctop) log(cmfctop) 4 3 2 1 log(cauloc) J4 with t5 95 9 85 8 x 1 5 1 J1 with t5 1.2 x 1 5 J2 with t5 1 5 fault 5 1 fault 5

6.2. Individual Climate Variables Chapter 6. Performance Metrics Applied Nonsmooth cost function log(cmfctop) log(cmfctop) 4 3 2 1 J1 with t5 1.2 1.8.6.4 3.5 2 2 2 2 4 3 2 1 log(cauloc) J3 with t5 115 11 15 1 95 4 3 2 1 J2 with t5 5 4.5 4 9 75 2 2 2 2 log(cauloc) log(cauloc) (a) the large perturbations of the parameters log(cmfctop) log(cmfctop) 4 3 2 1 log(cauloc) J4 with t5 95 9 85 8 x 1 5 x 1 5 J1 with t5 J2 with t5 1 1 1.2 5 projections: 5 helps but does 1 bot 5 solve the problem Data Analysis for summary cost functions to get informative fault fault

EKF Likelihood: by State Space filtering Tame chaos by filtering out the state space. Use state estimation methods to keep the model close to the data. Filtering: integrate out the state space, what remains gives a likelihood for parameters. Standard way for linear time series (DLM, Dynamical Linear Models) and SDE (stochastic differential equations) systems. Less standard for chaotic dynamics, but can be implemented with EKF. 1 Lorenz95 system Not assimilated Assimilated Observations 5 State variable 5 2 4 6 8 1 12 14 16 18 2 Time

The toy model: parameterized Lorenz 95 3 2 1 4 39 dx k dt NATURE: = x k 1 (x k 2 x k+1) x k + F hc b Jk j=j(k 1)+1 dy j dt = cbyj+1 (yj+2 yj 1) cyj + c hc Fy + b b x 1+ j 1 J y j FORECAST MODEL: dx k = x k 1 (x k 2 x k+1) x k + F g(x k, θ) dt We use a polynomial parameterization: g(x k, θ) = d i=1 θ ix (i 1) k

The toy model: parameterized Lorenz 95 1 8 forecast model slow state fast state 6 4 state variable 2 2 4 6 1 2 3 4 5 6 time

Results: filter likelihood θ.12 θ 1.1.8.6 1.8 1.9 2 2.1 θ 1 log(σ 2 ) 4 5 6 7 8 4 5 6 7 8 9 1.8 1.9 2 2.1 9.6.8.1.12 Scattering plot of parameter pairs from MCMC runs using 1 (blue), 2 (red), 5 (black) and 5 (green) day simulations.

Comments, so far The summary statistics approach have problems in properly identifying the parameters of chaotic dynamical systems The likelihood can be computed by integrating out the uncertain model states using filtering methods. Each filter algorithm has built-in tuning parameters (model error covariance, linearization...). The amount of bias introduced by them? Filtering rather avoids the problem than solves it.

Metric based on fractal dimension concepts Several concepts exist to define the fractal dimension of a chaotic trajectory, such as the Hausdorff dimension or box-counting. We employ the concept of Correlation Dimension (CD), as it is simple for computations. Recall first the Correlation Integral:

Denote by s i, i = 1, 2,..., N points of a trajectory vector s R n, evaluated at time points t i. For R > set C(R, N) = 1/N 2 i,j #( s i s j < R) and define then the correlation integral as the limit C(R) = lim N C(R, N). So we take the total number of points closer than R, normalize by the number of pairs N 2 and take the limit. Not that for each N we have 1/N C(R, N) 1. If ν is the dimension of the trajectory, we should have C(R) R ν and the Correlation Dimension ν is defined as the limit ν = lim R log C(R)/ log(r).

Numerical estimation of Correlation Dimension In numerical practice, we have a finite time interval [, T ] the trajectory vector s i is evaluated on a finite number of time instants t i, i = 1, 2,..., N. The above limit may be approximated by a log-log plot obtained by computing log C(R) at various values of R. A few constants to be selected first: The number of points, N. The maximum radius R, large enough for each ball B(s i, R) to contain all the points s j, j i The set of smaller radii by R k = b k R, with k = 1, 2,..., M. Select M and the base b. Typical defaults M = 1, b = 2. Then: For each k, compute the C(R k, N) Create the log-log curve log(r k ) vs log C(R k, N), k = 1, 2,...N. Estimate ν from the linear part of the slope.

CD for distance between trajectories The above CD is a standard method to calculate the dimension of a (chaotic) trajectory. Here, we want to modify it to get a measure for the distance between two model trajectories, as given, e.g., with different model parameters. Due to chaoticity, even small differences in initial values or numerical solvers change the trajectories. We want to separate this variability with fixed model parameters from that due to different model parameters.

Distance via a generalized correlation sum Fix again the numerical tuning factors (T,N,R,b,M) to cover the range of the trajectory s. The generalized correlation sum between trajectories s = s(θ, x) and s = s( θ, x) is then defined as C(R, N, θ, x, θ, x) = 1/N 2 i,j #( s i s j < R), (1) where θ, θ denote the respective model parameters and x, x the initial values. For θ = θ, x = x the formula reduced to the original definition of the correlation sum.

Correlation Curve variability with fixed model parameter First, characterize the within variability of a chaotic dynamical system with fixed model parameter vector: 1. Repeatedly simulate the trajectory, with varying initial values (and solver tolerances), but fixed model parameter θ. 2. Compute the distance matrix between (all) different trajectory pairs, to get the values C(R, N, θ, x, θ, x). An example for Lorenz3, with a log-scale for R: 2 4 6 8 1 12 14 16 18 2 1 9 8 7 6 5 4 3 2 1

Cost function for parameter estimation We treat the above vectors y = C(R k, N, θ, x, θ, x), k = 1,..., M as measurements of the variability of a chaotic trajectory with a given fixed model parameter. Construct the respective likelihood: 1. Empirically estimate the statistics of y = C(R, N, θ, x, θ, x) from repeated simulations. 2. Create the empirical likelihood function. 3. For any trajectory s(θ) compute the distance matrix from the reference trajectory, and the respective C(R k, N, θ, x, θ, x). Evaluate the likelihood.

Example: Likelihood for 3D Lorenz dx dt = σ(y X), dy dt = X(ρ Z) Y, dz dt = XY βz. (2) 5 4 3 2 1 1 2 3 245 246 247 248 249 25 251 252 253 254 255 TIME Figure: Observation samples of 3D Lorenz.

Example: Likelihood for 3D Lorenz The values y k = C(R k, N, θ, x, θ, x), k = 1,..., M are averages of distances between state vectors. In analogy with the Central Limit Theorem, test a Gaussian distribution for the vector y: calculate the mean value µ and covariance matrix Σ of the training set. Compute the statistics of the expression (µ y)σ 1 (µ y), should obey the χ 2 distribution for a Gaussian y, (µ y)σ 1 (µ y) χ 2 M (3)

Example: Likelihood for 3D Lorenz.12 χ 2 distribution, dof: 1.35 χ 2 distribution, dof: 92.1.3.8.25.2.6.15.4.1.2.5 5 1 15 2 25 3 35 4 45 4 6 8 1 12 14 16 Figure: Normality check of the correlation integral vector by the χ 2 test for the Lorenz 63 system. Left: with 1 radius values used. Right: with 92 radius values

Inference as a pseudo-marginal MCMC algorithm Due to chaoticity and randomised x the likelihood is non-deterministic. But sampling from can be interpreted as sampling from the joint distribution of the initial values and model parameters. Denote the likelihood function of y, evaluated for an arbitrary θ by T θ (θ, x). The target distribution for for θ is given as π(θ) = T θ (θ, x)λ(x)dx, where λ(x) is the distribution of the initial values x. In our situation, T θ (θ, x) is unknown, but an empirical approximation can be created as above. The method we implement is a bivariate Markov chain: (θ n, T n ) n, where T n are auxiliary variables that are non-negative, unbiased estimators of the underlying intractable target density π(θ n ). In other words, the method is a pseudo-marginal algorithm targeting π.

Pseudo-marginal MCMC Start from a pair (θ, T ) and iterate the following steps for n : 1. Propose θ = θ n + Z, where Z is sampled from a Gaussian proposal distribution. 2. Propose x λ and calculate T = T θ (θ, x ). 3. With probability min { 1, T /T n } accept and set (θ n+1, T n+1 ) = (θ, T ); otherwise reject and set (θ n+1, T n+1 ) = (θ n, T n ). In our cases T is non-negative (Gaussian), and the conditional expectation of T given θ is π(θ ). Therefore, the method provides correct simulation in the sense that the ergodic averages n 1 n k=1 f(θ k) converge to π(f) almost surely given minimal irreducibility and aperiodicity assumptions

Example: 3D Lorenz β 1.5 σ 1 9.5 σ 28.4 28.2 ρ 28 27.8 27.6 2.55 2.6 2.65 2.7 2.75 9.5 1 1.5 Figure: Marginal distributions of model parameters obtained using MCMC simulations for the three dimensional Lorenz system are very close to Gaussian.

Example: L95 2 Lorenz 95 MCMC results (no splitting) 1.5 1 h.5 8 9 1 11 12 F The sampled values for the parameters (F, h) (only, keeping other model parameters fixed) of L95 when the correlation integral vector is computed from the whole system s = (x k, y j ), i.e., the slow and fast, weakly coupled subsystems together.

Example: L95 Lorenz 95 MCMC results 1.3 1.2 1.1 h 1.99.98.97 9.8 9.85 9.9 9.95 1 1.5 1.1 1.15 1.2 F The sampled values for the parameters (F, h) of L95 when the correlation integral vector calculations are splitted, separately computed for the slow and fast subsystems, and both vectors used as data for the sampling cost function.

High dimension: Shallow water model Given as h t + (hu) x + (hv) y =, (hu) t + ( hu 2 + 1 2 gh2) x + (huv) y = ghb x, (hv t ) + (huv) x + ( hu 2 + 1 2 gh2) y = ghb y. Here h denotes water elevation, u and v are horizontal and vertical velocity components, B x and B y denote gradient direction of the surface implementing topography, and g is acceleration of gravity. It is possible to account for additional phenomena (e.g. wind stresses, friction etc.) by playing with the right-hand-side part of the equations

Discretization by finite volumes Numerics: Kurganov-Petrova second-order well-balanced positivity preserving central-upwind scheme The problem is solved for a huge set of discretization cells that form a staggered grid. Each cell describes solution by 5 components, where each component contains related data for velocity field and water elevation

Introduction to CUDA Idea: move computations to the GPU side Each GPU computes information for billions of screen pixels almost independently GPUs are highly parallel Modern GPUs comprise thousands of universal computation cores GPUs can be arranged into arrays forming a multi-gpu node Simple application interfaces that allow general purpose GPU programming exist (e.g. OpenCL, OpenGL Compute Shaders, CUDA etc.)

CPU vs CUDA GPU implementation Both implementations run at resolution of 256-by-256 grid cells Time step is.6 units of model time Time cost of a single model step for CPU implementation:.365 sec Time cost of a single model step for GPU implementation:,5 sec

SWE: can you distinguish change of flow pattern? A stone in river, with Van Karman vortex shedding. Dimension of the state (h, u, v) around 2.. Two slightly different cases. Example:

SWE: can you distinguish change of flow pattern? Snapshots at 1 time points, repeat simulations 5 times (producing a training set of 1225 pairs). Split the simulated vectors in two parts, use 1/2 as the training set, 1/2 as test set:.5 KHI2 NORMALITY TEST OF TRAINING VECTORS.4.3.2.1 2 4 6 8 1 12 14 16 18 2.5 KHI2 SIMILARITY TEST FOR NEW CASES.4.3.2.1 2 4 6 8 1 12 14 16 18 2

Where is the real data? No measured data is directly used for parameter estimation. Instead, assume basic model parameters given, and want to determine the posterior of parameters that would produce essentially the same chaotic dynamics. A real example: reanalysis studies of weather and climate models (e.g., the ERA-4 data and ECHAM5), that combine past real data and model predictions to achieve the best understanding of the systems. The aim here: characterize the parameter distributions of the reanalyzed models, that fit the climatology of long time runs of a given climate model. Further use them to quantify the uncertainty of model predictions with respect to the given parameters, by parameter ensemble simulations under various scenarios, such as increased CO 2 levels.

Summary, next The Correlation distance a promising way to characterize distances between chaotic trajectories. Quite insensitive with respect to varying initial values, solver numerics, etc., in small systems Next: more applications to high dimensional systems. No technical obstacles, in principle: Only L2 norms between vectors computed after model simulation. Can be done in parallel, during the simulation. From K simulations get K(K 1)/2 trajectory pairs to create the empirical distributions: moderate K enough. Problems/modifications expected for multi fractal situations, integration times: Careful with rare but large outliers in the training set. Distinguish, generally, a difference from normal behaviour: use for various classification, pattern recognition problems?

References Järvinen, H., Räisänen, P., Laine, M., Tamminen, J., Ilin, A., Oja, E., Solonen, A., and Haario, H.: Estimation of ECHAM5 climate model closure parameters with adaptive MCMC, Atmos. Chem. Phys., Vol. 1, nro. 2, 9993-12, 21. J. Hakkarainen, A. Ilin, S. Solonen, M. Laine, H. Haario, J. Tamminen, E. Oja, H. Järvinen: On Closure Parameter Estimation in Chaotic Systems. Nonlin. Processes Geophys., 19,127 143, 212. A. Solonen, P. Ollinaho, M. Laine, H. Haario, J. Tamminen, H. Järvinen: Efficient MCMC for Climate model Parameter Estimation: Parallel Adaptive chains and Early Rejection. Bayesian Analysis, 7, Number 2, pp 1 22, 212. Janne Hakkarainen, Antti Solonen, Alexander Ilin, Jouni Susiluoto, Marko Laine, Heikki Haario and Heikki Järvinen. A dilemma on the uniqueness of weather and climate model closure parameters. Tellus A 213, 65, 2147. Ollinaho P, Laine M, Solonen A,Haario H, Järvinen H.:NWP model forecast skill optimization via closure parameter variations. Q. J. R. Meteorol. Soc., 139, 675, pp. 152-1532, 213. Ollinaho, P., Bechtold, P., Leutbecher, M., Laine, M., Solonen, A., Haario, H., and Järvinen, H.: Parameter variations in prediction skill optimization at ECMWF, Nonlin. Processes Geophys., 2, 6,11-11, 213. Heikki Haario, Leonid Kalachev, Janne Hakkarainen Generalized Correlation Integral Vectors: A New Distance Concept for Chaotic Dynamical Systems. UM Tech.Rep 8/214. (http://cas.umt.edu/math/reports/) Generalized Correlation integral vectors: A distance concept for chaotic dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 25, 6312, 215; doi: 1.163/1.4921939