NRCSE. Misalignment and use of deterministic models

NRCSE Misalignment and use of deterministic models

Work with Veronica Berrocal Peter Craigmile Wendy Meiring Paul Sampson Gregory Nikulin

The choice of spatial scale some questions 1. Which spatial scale is correct? 2. What if there is spatial misalignment? 3. How do we change from one spatial scale to another? 4. What if we have different spatial datasets that come to us on different spatial scales? 5. How do we combine data sources? We need to be careful not to be misled in our inferences.

Changes of support Observed Inference Method point point kriging point line contouring point area block kriging area point ecological inference area area misalignment

Some issues in model assessment Spatiotemporal misalignment Grid boxes vs observations Types of error Measurement error and bias Model error Approximation error Manipulate data or model output?

Assessing the SARMAP model 60 days of hourly observations at 32 sites in Sacramento region Hourly model runs for three episodes

Task Estimate from data the ozone level in a grid square. Issues: Transformation Diurnal cycle Temporal dependence Spatial dependence Space-time interaction

Transformation Heterogeneous variability mean and variance positively related Square root transformation All modeling now on square root scale approximately normal

Diurnal cycle

Spatial dependence

V t (s) = Estimate Estimating a grid square average Z t (s) V t (s) = µ t (s) + W t (s) W t (s) = α 1 (s)w t 1 (s) + α 2 (s)w t 2 (s) + Y t (s) 1 A A V 2 t (s)ds using 1 M E { V (s t j )2 data from 1,...,t } (not averages of squares of kriging estimates on the square root scale)

Afternoon comparison

Nighttime comparison

Regional climate models Not possible to do long runs of global models at fine resolution Regional models (dynamic downscaling) use global model as boundary conditions and runs on finer resolution Output is averaged over land use classes Weather prediction mode uses reanalysis as boundary conditions

Comparison of model to data Model output daily averaged 3hr predictions on (12.5 km) 2 grid Use open air predictions only RCA3 driven by ERA 40/ERA Interim Data daily averages point measurements (actually weighted average of three hourly measurements, min and max) Aggregate model and data to seasonal averages

Data SMHI synoptic stations in south central Sweden, 1961-2008 (61 missing) Longitude 56 58 60 62 64 66 68 2 Sundsvall 1 3 4 6 5 Borlänge 7 * 8 10 Stockholm 9,11 12 13 14 Göteborg 15 17 16 Lund Sweden Site ID 5 10 15 (6 missing) (66 missing) (14 missing) (129 missing) (30 missing) (1 missing) (2 missing) 12 14 16 18 20 22 24 Latitude 1960 1970 1980 1990 2000 2010 Year

Upscaling Geostatistics: predicting grid square averages from data Difficulties: Trends Seasonal variation Long term memory features Short term memory features

Long term memory models Realization ACS vs. lag Log SDF vs. log freq d=0 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1 d=0.25 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1 d=0.45 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1

A simple model Y t (s) = µ t (s) + ϕ t (s) + exp(α t (s))η t (s) space-time trend noise periodic seasonal component seasonal variability

Seasonal part φ t (s) = A(s)cos(2πt / 365.25 + θ(s)) (a) Latitude Estimated seasonality 56 58 60 62 64 10 0 5 10 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec (b) 9.71 Sundsvall 11.69 11.86 8.9 11.44 10.25 Borlänge 10.1 10.34 9.78 Stockholm 10.31 9.62 9.31 9.85 9.1 Göteborg 9.47 9.68 9.03 Latitude Month 56 58 60 62 64 (c) 0.31 Sundsvall 0.31 0.26 0.44 0.27 0.28 Borlänge 0.31 0.33 0.3 Stockholm 0.34 0.37 0.33 0.44 Göteborg 0.33 0.31 0.44 12 14 16 18 20 Longitude 12 14 16 18 20 Longitude

Seasonal variability Log standard deviation 0.5 1.0 1.5 2.0 2.5 Sundsvalls Flygplats 0.5 1.0 1.5 2.0 2.5 Målilla 0.5 1.0 1.5 2.0 2.5 Films Kyrkby A 0.5 1.0 1.5 2.0 2.5 Kilsbergen Suttarboda Jan Apr Jul Oct Month Jan Apr Jul Oct Month Jan Apr Jul Oct Month Jan Apr Jul Oct Month Modulate noise ζ t (s) = exp(α t (s))η t (s) α t (s) two term Fourier series

Both long and short memory Consider a stationary Gaussian process with spectral density Short term memory Examples: S η (f) = B(f) 4sin 2 (πf) δ Long term memory B(f) constant: fractionally differenced process (FD) B(f) exponential: fractional exponential process (FEXP) (log B truncated Fourier series)

Estimated SDFs of standardized noise Log SDF 30 10 10 30 Sundsvalls Flygplats FEXP FD 30 10 10 30 Målilla 10 8 6 4 2 10 8 6 4 Log SDF 30 10 10 30 Films Kyrkby A 30 10 10 30 Kilsbergen Suttarboda A 10 8 6 4 2 Log frequency 8 6 4 2 Log frequency Clear evidence of both short and long memory parts

Space-time model Gaussian white measurement error Process model in wavelet space scaling coefficients have mean linear in time and latitude separable space-time covariance Gaussian spatially varying parameters

Dependence parameters LTM 57 58 59 60 61 62 63 post. mean d 0.50 0.48 0.46 0.44 0.42 0.40 0.38 57 58 59 60 61 62 63 post. stdev d 0.09 0.08 0.07 0.06 0.05 0.04 0.03 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 57 58 59 60 61 62 63 post. mean theta1 0.8 0.6 0.4 0.2 0.0 57 58 59 60 61 62 63 post. stdev theta1 0.35 0.30 0.25 0.20 0.15 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 Short term 57 58 59 60 61 62 63 post. mean theta2 0.30 0.25 0.20 0.15 0.10 57 58 59 60 61 62 63 post. stdev theta2 0.14 0.12 0.10 0.08 0.06 0.04 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 57 58 59 60 61 62 63 post. mean theta3 0.25 0.20 0.15 57 58 59 60 61 62 63 post. stdev theta3 0.12 0.10 0.08 0.06 0.04 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 Figure 8: Posterior predicted means (left panel) and standard deviations (right panel) for the parameters characterizing the spatially-varying FEXP process. 25

Trend estimates

Estimating grid squares Pick q locations systematically in the grid square Draw sample from posterior distribution of Y(s,t) for s in the locations and t in the season Compute seasonal average Compute grid square average

Downscaling Climatology terms: Dynamic downscaling Stochastic downscaling Statistical downscaling Here we are using the term to allow data assimilation for RCM point prediction using RCM

Downscaling model (0.91,0.95) Y(s,t) =! β 0 (s,t) + β 1!x(s,t) + ε(s,t)!β 0 (s,t) = β 0 (t) + β(s,t) smoothed RCM Winter Spring Beta_0,t 4 2 0 2 Beta_0,t 1 0 1 2 1965 1975 1985 1995 2005 Year 1965 1975 1985 1995 2005 Year Summer Autumn Beta_0,t 1 0 1 2 3 1965 1975 1985 1995 2005 Year Beta_0,t 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1965 1975 1985 1995 2005 Year

Comparisons

Reserved stations Borlänge: Airport that has changed ownership, lots of missing data Stockholm: One of the longest temperature series in the world. Located in urban park. Göteborg: Urban site, located just outside the grid of model output

Predictions and data

Annual scale Downscaling Upscaling Borlänge 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 Stockholm 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 Göteborg 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010

Comments Nonstationarity in mean in covariance Uncertainty in model output Extreme seasons where down-and upscaling agree with each other but not with the model output Model correction approaches

Statistical analysis of computer code output Often the process model is expensive to run (in time, at least), especially if different runs needed for MCMC Need to develop real-time approximation to process model Kalman filter is a dynamic linear model approximation SACCO is an alternative Bayesian approach

Basic framework An emulator is a random (Gaussian) process η(x) approximating the process model for input x in R m. Prior mean m(x) = h(x) T β Prior covariance v(x 1,x 2 ) = σ 2 c(x 1,x 2 ) Run the model at n input values to get n output values, so (d β,σ 2 ) N(Hβ,σ 2 C) (η( i) β,σ 2,d) N(m,Σ )

The emulator Integrating out β and σ 2 we get η(x) m (x) t ˆσc (x,x) 1 2 n q where q = dim(β) and m (x) = h(x) T ˆβ + t(x) T C 1 (d Hˆβ) where t(x) T = (c(x,x 1 ),,c(x,x n )) m** is the emulator, and we can also calculate its variance

y=7+x+cos(2x) q=1, h T (x)=(1 x) n=5 An example

Conclusions Model assessment constraints: amount of data data quality ease of producing model runs degree of misalignment Ideally the model should have similar first and second order properties to the data similar peaks and troughs to data (or simulations based on the data)