NRCSE Misalignment and use of deterministic models
Work with Veronica Berrocal Peter Craigmile Wendy Meiring Paul Sampson Gregory Nikulin
The choice of spatial scale some questions 1. Which spatial scale is correct? 2. What if there is spatial misalignment? 3. How do we change from one spatial scale to another? 4. What if we have different spatial datasets that come to us on different spatial scales? 5. How do we combine data sources? We need to be careful not to be misled in our inferences.
Changes of support Observed Inference Method point point kriging point line contouring point area block kriging area point ecological inference area area misalignment
Some issues in model assessment Spatiotemporal misalignment Grid boxes vs observations Types of error Measurement error and bias Model error Approximation error Manipulate data or model output?
Assessing the SARMAP model 60 days of hourly observations at 32 sites in Sacramento region Hourly model runs for three episodes
Task Estimate from data the ozone level in a grid square. Issues: Transformation Diurnal cycle Temporal dependence Spatial dependence Space-time interaction
Transformation Heterogeneous variability mean and variance positively related Square root transformation All modeling now on square root scale approximately normal
Diurnal cycle
Spatial dependence
V t (s) = Estimate Estimating a grid square average Z t (s) V t (s) = µ t (s) + W t (s) W t (s) = α 1 (s)w t 1 (s) + α 2 (s)w t 2 (s) + Y t (s) 1 A A V 2 t (s)ds using 1 M E { V (s t j )2 data from 1,...,t } (not averages of squares of kriging estimates on the square root scale)
Afternoon comparison
Nighttime comparison
Regional climate models Not possible to do long runs of global models at fine resolution Regional models (dynamic downscaling) use global model as boundary conditions and runs on finer resolution Output is averaged over land use classes Weather prediction mode uses reanalysis as boundary conditions
Comparison of model to data Model output daily averaged 3hr predictions on (12.5 km) 2 grid Use open air predictions only RCA3 driven by ERA 40/ERA Interim Data daily averages point measurements (actually weighted average of three hourly measurements, min and max) Aggregate model and data to seasonal averages
Data SMHI synoptic stations in south central Sweden, 1961-2008 (61 missing) Longitude 56 58 60 62 64 66 68 2 Sundsvall 1 3 4 6 5 Borlänge 7 * 8 10 Stockholm 9,11 12 13 14 Göteborg 15 17 16 Lund Sweden Site ID 5 10 15 (6 missing) (66 missing) (14 missing) (129 missing) (30 missing) (1 missing) (2 missing) 12 14 16 18 20 22 24 Latitude 1960 1970 1980 1990 2000 2010 Year
Upscaling Geostatistics: predicting grid square averages from data Difficulties: Trends Seasonal variation Long term memory features Short term memory features
Long term memory models Realization ACS vs. lag Log SDF vs. log freq d=0 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1 d=0.25 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1 d=0.45 4 2 0 2 4 ACS 0.0 0.4 0.8 1 0 1 2 3 0 200 400 600 800 1000 0 10 20 30 40 6 5 4 3 2 1
A simple model Y t (s) = µ t (s) + ϕ t (s) + exp(α t (s))η t (s) space-time trend noise periodic seasonal component seasonal variability
Seasonal part φ t (s) = A(s)cos(2πt / 365.25 + θ(s)) (a) Latitude Estimated seasonality 56 58 60 62 64 10 0 5 10 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec (b) 9.71 Sundsvall 11.69 11.86 8.9 11.44 10.25 Borlänge 10.1 10.34 9.78 Stockholm 10.31 9.62 9.31 9.85 9.1 Göteborg 9.47 9.68 9.03 Latitude Month 56 58 60 62 64 (c) 0.31 Sundsvall 0.31 0.26 0.44 0.27 0.28 Borlänge 0.31 0.33 0.3 Stockholm 0.34 0.37 0.33 0.44 Göteborg 0.33 0.31 0.44 12 14 16 18 20 Longitude 12 14 16 18 20 Longitude
Seasonal variability Log standard deviation 0.5 1.0 1.5 2.0 2.5 Sundsvalls Flygplats 0.5 1.0 1.5 2.0 2.5 Målilla 0.5 1.0 1.5 2.0 2.5 Films Kyrkby A 0.5 1.0 1.5 2.0 2.5 Kilsbergen Suttarboda Jan Apr Jul Oct Month Jan Apr Jul Oct Month Jan Apr Jul Oct Month Jan Apr Jul Oct Month Modulate noise ζ t (s) = exp(α t (s))η t (s) α t (s) two term Fourier series
Both long and short memory Consider a stationary Gaussian process with spectral density Short term memory Examples: S η (f) = B(f) 4sin 2 (πf) δ Long term memory B(f) constant: fractionally differenced process (FD) B(f) exponential: fractional exponential process (FEXP) (log B truncated Fourier series)
Estimated SDFs of standardized noise Log SDF 30 10 10 30 Sundsvalls Flygplats FEXP FD 30 10 10 30 Målilla 10 8 6 4 2 10 8 6 4 Log SDF 30 10 10 30 Films Kyrkby A 30 10 10 30 Kilsbergen Suttarboda A 10 8 6 4 2 Log frequency 8 6 4 2 Log frequency Clear evidence of both short and long memory parts
Space-time model Gaussian white measurement error Process model in wavelet space scaling coefficients have mean linear in time and latitude separable space-time covariance Gaussian spatially varying parameters
Dependence parameters LTM 57 58 59 60 61 62 63 post. mean d 0.50 0.48 0.46 0.44 0.42 0.40 0.38 57 58 59 60 61 62 63 post. stdev d 0.09 0.08 0.07 0.06 0.05 0.04 0.03 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 57 58 59 60 61 62 63 post. mean theta1 0.8 0.6 0.4 0.2 0.0 57 58 59 60 61 62 63 post. stdev theta1 0.35 0.30 0.25 0.20 0.15 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 Short term 57 58 59 60 61 62 63 post. mean theta2 0.30 0.25 0.20 0.15 0.10 57 58 59 60 61 62 63 post. stdev theta2 0.14 0.12 0.10 0.08 0.06 0.04 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 57 58 59 60 61 62 63 post. mean theta3 0.25 0.20 0.15 57 58 59 60 61 62 63 post. stdev theta3 0.12 0.10 0.08 0.06 0.04 12 13 14 15 16 17 18 19 12 13 14 15 16 17 18 19 Figure 8: Posterior predicted means (left panel) and standard deviations (right panel) for the parameters characterizing the spatially-varying FEXP process. 25
Trend estimates
Estimating grid squares Pick q locations systematically in the grid square Draw sample from posterior distribution of Y(s,t) for s in the locations and t in the season Compute seasonal average Compute grid square average
Downscaling Climatology terms: Dynamic downscaling Stochastic downscaling Statistical downscaling Here we are using the term to allow data assimilation for RCM point prediction using RCM
Downscaling model (0.91,0.95) Y(s,t) =! β 0 (s,t) + β 1!x(s,t) + ε(s,t)!β 0 (s,t) = β 0 (t) + β(s,t) smoothed RCM Winter Spring Beta_0,t 4 2 0 2 Beta_0,t 1 0 1 2 1965 1975 1985 1995 2005 Year 1965 1975 1985 1995 2005 Year Summer Autumn Beta_0,t 1 0 1 2 3 1965 1975 1985 1995 2005 Year Beta_0,t 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1965 1975 1985 1995 2005 Year
Comparisons
Reserved stations Borlänge: Airport that has changed ownership, lots of missing data Stockholm: One of the longest temperature series in the world. Located in urban park. Göteborg: Urban site, located just outside the grid of model output
Predictions and data
Annual scale Downscaling Upscaling Borlänge 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 Stockholm 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 Göteborg 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010
Comments Nonstationarity in mean in covariance Uncertainty in model output Extreme seasons where down-and upscaling agree with each other but not with the model output Model correction approaches
Statistical analysis of computer code output Often the process model is expensive to run (in time, at least), especially if different runs needed for MCMC Need to develop real-time approximation to process model Kalman filter is a dynamic linear model approximation SACCO is an alternative Bayesian approach
Basic framework An emulator is a random (Gaussian) process η(x) approximating the process model for input x in R m. Prior mean m(x) = h(x) T β Prior covariance v(x 1,x 2 ) = σ 2 c(x 1,x 2 ) Run the model at n input values to get n output values, so (d β,σ 2 ) N(Hβ,σ 2 C) (η( i) β,σ 2,d) N(m,Σ )
The emulator Integrating out β and σ 2 we get η(x) m (x) t ˆσc (x,x) 1 2 n q where q = dim(β) and m (x) = h(x) T ˆβ + t(x) T C 1 (d Hˆβ) where t(x) T = (c(x,x 1 ),,c(x,x n )) m** is the emulator, and we can also calculate its variance
y=7+x+cos(2x) q=1, h T (x)=(1 x) n=5 An example
Conclusions Model assessment constraints: amount of data data quality ease of producing model runs degree of misalignment Ideally the model should have similar first and second order properties to the data similar peaks and troughs to data (or simulations based on the data)