Fusing point and areal level space-time data. data with application to wet deposition

Similar documents
Fusing point and areal level space-time data with application

Combining Monitoring Data and Computer model Output in Assessing Environmental Exposure

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Space-time downscaling under error in computer model output

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Fusing space-time data under measurement error for computer model output

Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts

The Importance of Ammonia in Modeling Atmospheric Transport and Deposition of Air Pollution. Organization of Talk:

Hierarchical Modeling for Multivariate Spatial Data

A Spatio-Temporal Downscaler for Output From Numerical Models

NRCSE. Misalignment and use of deterministic models

Hierarchical Modelling for Multivariate Spatial Data

Spatio-temporal modeling of fine particulate matter

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Statistics for extreme & sparse data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Hierarchical Modeling for Spatio-temporal Data

Bruno Sansó. Department of Applied Mathematics and Statistics University of California Santa Cruz bruno

Standardized Anomaly Model Output Statistics Over Complex Terrain.

Space-Time Data fusion Under Error in Computer Model Output: An Application to Modeling Air Quality

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Climate also has a large influence on how local ecosystems have evolved and how we interact with them.

Bayesian spatial quantile regression

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Bayesian Approaches for Model Evaluation and Uncertainty Estimation

The Canadian ADAGIO Project for Mapping Total Atmospheric Deposition

Sub-kilometer-scale space-time stochastic rainfall simulation

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Hierarchical Bayesian models for space-time air pollution data

A spatial causal analysis of wildfire-contributed PM 2.5 using numerical model output. Brian Reich, NC State

Non-gaussian spatiotemporal modeling

Analysis of Radar-Rainfall Uncertainties and effects on Hydrologic Applications. Emad Habib, Ph.D., P.E. University of Louisiana at Lafayette

Impacts of climate change on flooding in the river Meuse

Hierarchical Modeling for Univariate Spatial Data

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

Chapter 4 - Fundamentals of spatial processes Lecture notes

Hierarchical Bayesian auto-regressive models for large space time data with applications to ozone concentration modelling

Spatial Inference of Nitrate Concentrations in Groundwater

Climate Change Impact Analysis

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

X t = a t + r t, (7.1)

Modeling daily precipitation in Space and Time

Hierarchical Modelling for Univariate Spatial Data

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Spatial Misalignment

GAMINGRE 8/1/ of 7

Current and Future Impacts of Wildfires on PM 2.5, Health, and Policy in the Rocky Mountains

Convective-scale NWP for Singapore

Temporal and spatial variations in radiation and energy fluxes across Lake Taihu

Summary STK 4150/9150

Bayesian Forecasting Using Spatio-temporal Models with Applications to Ozone Concentration Levels in the Eastern United States

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Record date Payment date PID element Non-PID element. 08 Sep Oct p p. 01 Dec Jan p 9.85p

Multivariate spatial modeling

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

2016 Meteorology Summary

Contents 1 Introduction 2 Statistical Tools and Concepts

Hierarchical Bayesian Modeling of Multisite Daily Rainfall Occurrence

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Climate Change: the Uncertainty of Certainty

Recovery Analysis Methods and Data Requirements Study

Communicating Climate Change Consequences for Land Use

Statistical Models for Rainfall with Applications to Index Insura

Forecasting Wind Ramps

Climatic and Ecological Conditions in the Klamath Basin of Southern Oregon and Northern California: Projections for the Future

Statistical downscaling daily rainfall statistics from seasonal forecasts using canonical correlation analysis or a hidden Markov model?

Daily Rainfall Disaggregation Using HYETOS Model for Peninsular Malaysia

Geostatistical Modeling for Large Data Sets: Low-rank methods

Overview on the modelling setup Seven new features in the modelling framework First results for the Alpine Rhine and Engadin Conclusions and Outlook

SIS Meeting Oct AgriCLASS Agriculture CLimate Advisory ServiceS. Phil Beavis - Telespazio VEGA UK Indicators, Models and System Design

Overview of Spatial Statistics with Applications to fmri

Gridded observation data for Climate Services

A spatio-temporal model for extreme precipitation simulated by a climate model

Challenges in modelling air pollution and understanding its impact on human health

Weather generators for studying climate change

Hierarchical Modelling for Univariate Spatial Data

Gaussian predictive process models for large spatial data sets.

A STATISTICAL TECHNIQUE FOR MODELLING NON-STATIONARY SPATIAL PROCESSES

Wrapped Gaussian processes: a short review and some new results

GIST 4302/5302: Spatial Analysis and Modeling

Chapter 6 Problems with the calibration of Gaussian HMMs to annual rainfall

The Generalized Likelihood Uncertainty Estimation methodology

Simulations of Lake Processes within a Regional Climate Model

Spatial Misalignment

The National Atmospheric Deposition Program (NADP)

Climatography of the United States No

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

STATISTICS OF CLIMATE EXTREMES// TRENDS IN CLIMATE DATASETS

IBM Research Report. Gust Speed Forecasting Using Weather Model Outputs and Meteorological Observations

Application of Real-Time Rainfall Information System to CSO control. 2 October 2011 Naruhito Funatsu METAWATER Co., Ltd.

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Predicting Long-term Exposures for Health Effect Studies

Transcription:

Fusing point and areal level space-time data with application to wet deposition Alan Gelfand Duke University Joint work with Sujit Sahu and David Holland

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

Chemical Deposition Combustion of fossil fuel produces various chemicals including sulfate and nitrate gases. In the eastern U.S., most SO 2, and NO x release attributed to power plants. Emitted to the air; wet deposition and dry deposition; interest in total deposition. Deposition means return to the earth s surface by means of precipitation (rain or snow) for example. Wet Deposition = Precipitation Concentration. Wet deposition is responsible for damage to lakes, forests, and streams.

The first stage likelihood f (P, Z, Q U, Y, X, V, Ṽ) = f (P U, V) f (Z Y, V) f (Q X, Ṽ) which takes the form [ { } T n t=1 i=1 1 exp u(si, t) 1 exp y(si, t) I (v(s i, t) > 0) { }] 1 exp x(aj, t) I (ṽ(a j, t) > 0) J j=1 where 1 x denotes a degenerate distribution with point mass at x and I( ) is the indicator function.

Deposition Model Y (s i, t) = β 0 + β 1 U(s i, t) + β 2 V (s i, t) + (b 0 + b(s i )) X(A ki, t) +η(s i, t) + ɛ(s i, t). Spatially varying coefficients, b = (b(s 1 ),..., b(s n )) is a Gaussian process (GP). Spatio-temporal intercept η t = (η(s 1, t),..., η(s n, t)) is a GP independent in time. Allow for spatially varying calibration of CMAQ. Could imagine common η(s i ). ɛ(s i, t) N(0, σ 2 ɛ ), provides the nugget effect.

The second stage models... Precipitation U(s i, t) = α 0 + α 1 V (s i, t) + δ(s i, t), where δ t = (δ(s 1, t),..., δ(s n, t)) is a GP independent in time. CMAQ output X(A j, t) = γ 0 + γ 1 Ṽ (A j, t) + ψ(a j, t), j = 1,..., J. Assume ψ(a j, t) N(0, σψ 2 ), independently.

Specification of latent processes Measurement Error Model (MEM) V (s i, t) N(Ṽ (A k i, t), σv 2 ), i = 1,..., n, t = 1,..., T. The process Ṽ (A j, t) is AR in time and CAR in space Ṽ (A j, t) = ρṽ (A j, t 1) + ζ(a j, t), ( J ) ζ(a j, t) N h ji ζ(a i, t), σ2 ζ, m j i=1 Let j define the m j neighboring grid cells of the cell A j. h ji = { 1 m j if i j 0 otherwise.

AR in time and CAR in Space Assume the initial condition for Ṽ0: Ṽ (A j, 0) = 1 T T X(A j, t), t=1 giving Ṽ0. Now we can write the CAR in closed form: f (Ṽt Ṽt 1, ρ, σ 2 ζ ) { exp 1 (Ṽt ρṽt 1) D 1 (I H) (Ṽt ρṽt 1) }, 2 D is a diagonal matrix with entries σ 2 ζ /m j. Note that this is an improper CAR.

National Atmospheric Deposition Program (NADP) NADP collects point-referenced data at several sites. They then use simple interpolation to produce maps. 12 13 11 11 12 19 13 13 19 22 17 14 10 12 8 14 13 4 5 6 7 7 6 6 4 13 16 6 9 16 6 8 15 8 14 4 6 14 15 16 9 4 7 8 10 10 17 18 16 13 11 15 15 16 17 12 13 28 11 10 12 17 14 16 16 19 24 13 18 21 16 21 15 9 13 19 8 12 12 30 16 20 20 10 11 11 13 17 18 10 11 8 9 10 15 11 10 16 11 19 13 16 18 20 15 18 22 13 16 15 11 12 14 22 6 5 10 16 14 13 6 8 12 6 6 11

The CMAQ model Community Multi-scale Air Quality Model (CMAQ) A computer simulation model which produces averaged output on 36km, 12 km(used here), and now 4 km grid cells Uses variables such as power station emission volumes, meteorological data, land-use, etc. with atmospheric science (appropriate differential equations) to predict deposition levels. Not driven by monitoring station data. Predictions are biased but no missing data; monitoring data provide more accurate deposition but missingness

The CMAQ model Community Multi-scale Air Quality Model (CMAQ) A computer simulation model which produces averaged output on 36km, 12 km(used here), and now 4 km grid cells Uses variables such as power station emission volumes, meteorological data, land-use, etc. with atmospheric science (appropriate differential equations) to predict deposition levels. Not driven by monitoring station data. Predictions are biased but no missing data; monitoring data provide more accurate deposition but missingness

The CMAQ model Community Multi-scale Air Quality Model (CMAQ) A computer simulation model which produces averaged output on 36km, 12 km(used here), and now 4 km grid cells Uses variables such as power station emission volumes, meteorological data, land-use, etc. with atmospheric science (appropriate differential equations) to predict deposition levels. Not driven by monitoring station data. Predictions are biased but no missing data; monitoring data provide more accurate deposition but missingness

The CMAQ model Community Multi-scale Air Quality Model (CMAQ) A computer simulation model which produces averaged output on 36km, 12 km(used here), and now 4 km grid cells Uses variables such as power station emission volumes, meteorological data, land-use, etc. with atmospheric science (appropriate differential equations) to predict deposition levels. Not driven by monitoring station data. Predictions are biased but no missing data; monitoring data provide more accurate deposition but missingness

Our Contribution A fully model-based framework for fusing the NADP and CMAQ wet deposition data To accommodate the point masses at 0, i.e., no wet deposition if no precipitation To accommodate misalignment between NADP data at points and CMAQ data at grid cells in a computational feasible way across space and time To provide spatial interpolation and temporal aggregation Both sulfate and nitrate deposition

Our Contribution A fully model-based framework for fusing the NADP and CMAQ wet deposition data To accommodate the point masses at 0, i.e., no wet deposition if no precipitation To accommodate misalignment between NADP data at points and CMAQ data at grid cells in a computational feasible way across space and time To provide spatial interpolation and temporal aggregation Both sulfate and nitrate deposition

Our Contribution A fully model-based framework for fusing the NADP and CMAQ wet deposition data To accommodate the point masses at 0, i.e., no wet deposition if no precipitation To accommodate misalignment between NADP data at points and CMAQ data at grid cells in a computational feasible way across space and time To provide spatial interpolation and temporal aggregation Both sulfate and nitrate deposition

Our Contribution A fully model-based framework for fusing the NADP and CMAQ wet deposition data To accommodate the point masses at 0, i.e., no wet deposition if no precipitation To accommodate misalignment between NADP data at points and CMAQ data at grid cells in a computational feasible way across space and time To provide spatial interpolation and temporal aggregation Both sulfate and nitrate deposition

Our Contribution A fully model-based framework for fusing the NADP and CMAQ wet deposition data To accommodate the point masses at 0, i.e., no wet deposition if no precipitation To accommodate misalignment between NADP data at points and CMAQ data at grid cells in a computational feasible way across space and time To provide spatial interpolation and temporal aggregation Both sulfate and nitrate deposition

Inverse Distance weighting (IDW) Poor person s methodology Value at a new site = weighted mean of observations, Weights inversely proportional to the square of the distance. Problems Not model based! Unable to accommodate known covariate - precipitation! Unable to handle 0 s, unable to handle missing data Can t fuse with model output data With dynamic data, can only do independent weekly or aggregated annually No associated uncertainty maps!

Inverse Distance weighting (IDW) Poor person s methodology Value at a new site = weighted mean of observations, Weights inversely proportional to the square of the distance. Problems Not model based! Unable to accommodate known covariate - precipitation! Unable to handle 0 s, unable to handle missing data Can t fuse with model output data With dynamic data, can only do independent weekly or aggregated annually No associated uncertainty maps!

Inverse Distance weighting (IDW) Poor person s methodology Value at a new site = weighted mean of observations, Weights inversely proportional to the square of the distance. Problems Not model based! Unable to accommodate known covariate - precipitation! Unable to handle 0 s, unable to handle missing data Can t fuse with model output data With dynamic data, can only do independent weekly or aggregated annually No associated uncertainty maps!

Change of support problem Fuentes and Raftery, 2005 Need to upscale (block-average) point level Z (s, t) to obtain grid level Z (A j, t). Z (A j, t) = 1 Z (s, t) ds, (1) A j A j PSfrag replacements A ki si Many more A s than s s Use MEM (measurement error model) at point level centred around grid level values Make inference at the point level by downscaling Huge computational advantages, NO integration (1).

Change of support problem Fuentes and Raftery, 2005 Need to upscale (block-average) point level Z (s, t) to obtain grid level Z (A j, t). Z (A j, t) = 1 Z (s, t) ds, (1) A j A j PSfrag replacements A ki si Many more A s than s s Use MEM (measurement error model) at point level centred around grid level values Make inference at the point level by downscaling Huge computational advantages, NO integration (1).

Hierarchical Bayesian Model Model data weekly. Fuse with gridded weekly CMAQ output. Use weekly precipitation information, available from other monitoring networks. Interpolate in space, predict in time Obtain quarterly and annual maps. Reveal spatial pattern in deposition. Results are illustrative, not definitive.

Our data set Data Weekly deposition data from 128 sites in the eastern U.S. for the year 2001. Use 120 sites to estimate, remaining 8 to validate. Weekly CMAQ output from J = 33, 390 grid cells (about 1.8 million values!) Weekly precipitation data from 2827 predictive sites.

Location of the NADP A B C D E F G H Figure: A map of the study region; points denote the NADP sites for fitting and A-H denote the eight validation sites.

Annual Precipitation 200 150 100 50 Figure: Map of annual total precipitation in 2001.

Exploratory Analyses 4 2.5 2.0 3 1.5 wetso4 2 wetno3 1.0 1 0.5 0 0.0 4 8 12 16 21 25 30 34 38 43 48 52 4 8 12 16 21 25 30 34 38 43 48 52 (a) (b) Figure: Boxplot of weekly depositions: (a) sulfate and (b) nitrate.

Exploratory Analyses... log precipitation log sulfate 6 4 2 0 2 6 4 2 0 (a) log precipitation log nitrate 6 4 2 0 2 6 4 2 0 (b) Figure: Deposition against precipitation (both on the log scale): (a) sulfate and (b) nitrate.

Exploratory Analyses... log cmaq vals log sulfate 6 4 2 0 6 4 2 0 (a) log cmaq vals log nitrate 6 4 2 0 6 4 2 0 (b) Figure: Deposition at the NADP sites against the CMAQ values in the grid cell covering the corresponding NADP site on the log scale: (a) sulfate and (b) nitrate.

Modeling Requirements No deposition, Z (s i, t), without precipitation, P(s i, t); enforced by a latent atmospheric space-time process V (s i, t) below, i = 1,..., n = 120, and for each week t, t = 1,..., 52. Similarly, model CMAQ output, Q(A j, t) for each grid cell A j, j = 1,..., J = 33, 390 and for each week t, modeled using a latent atmospheric areal process Ṽ (A j, t). Model everything on the log-scale; latent processes take care of point masses at zero. Avoid log(0) problems.

First stage models Precipitation model { exp (U(si, t)) if V (s P(s i, t) = i, t) > 0 0 otherwise, Deposition model Z (s i, t) = { exp (Y (si, t)) if V (s i, t) > 0 0 otherwise. Model for CMAQ output { exp (X(Aj, t)) if Q(A j, t) = Ṽ (A j, t) > 0 0 otherwise.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

Clarification P s, Z s, and Q s are the observed precipitation, NADP deposition, and CMAQ deposition, respectively. V (s, t) is a conceptual point level latent atmospheric process which drives P(s, t) and Z (s, t). P(s, t) and Z (s, t) = 0 if V (s, t) 0. U(s, t) and Y (s, t) are log precipitation and deposition, respectively. Models below will specify their values when V (s, t) 0 or if P(s, t) or Z (s, t) are missing. Ṽ (A j, t) is a conceptual areal level latent atmospheric process which drives Q(A j, t). X(A j, t) is log CMAQ output where modeling below will specify its values when Ṽ (A j, t) 0.

The first stage likelihood f (P, Z, Q U, Y, X, V, Ṽ) = f (P U, V) f (Z Y, V) f (Q X, Ṽ) which takes the form [ { } T n t=1 i=1 1 exp u(si, t) 1 exp y(si, t) I (v(s i, t) > 0) { }] 1 exp x(aj, t) I (ṽ(a j, t) > 0) J j=1 where 1 x denotes a degenerate distribution with point mass at x and I( ) is the indicator function.

Deposition Model Y (s i, t) = β 0 + β 1 U(s i, t) + β 2 V (s i, t) + (b 0 + b(s i )) X(A ki, t) +η(s i, t) + ɛ(s i, t). Spatially varying coefficients, b = (b(s 1 ),..., b(s n )) is a Gaussian process (GP). Spatio-temporal intercept η t = (η(s 1, t),..., η(s n, t)) is a GP independent in time. Allow for spatially varying calibration of CMAQ. Could imagine common η(s i ). ɛ(s i, t) N(0, σ 2 ɛ ), provides the nugget effect.

The second stage models... Precipitation U(s i, t) = α 0 + α 1 V (s i, t) + δ(s i, t), where δ t = (δ(s 1, t),..., δ(s n, t)) is a GP independent in time. CMAQ output X(A j, t) = γ 0 + γ 1 Ṽ (A j, t) + ψ(a j, t), j = 1,..., J. Assume ψ(a j, t) N(0, σψ 2 ), independently.

Specification of latent processes Measurement Error Model (MEM) V (s i, t) N(Ṽ (A k i, t), σv 2 ), i = 1,..., n, t = 1,..., T. The process Ṽ (A j, t) is AR in time and CAR in space Ṽ (A j, t) = ρṽ (A j, t 1) + ζ(a j, t), ( J ) ζ(a j, t) N h ji ζ(a i, t), σ2 ζ, m j i=1 Let j define the m j neighboring grid cells of the cell A j. h ji = { 1 m j if i j 0 otherwise.

AR in time and CAR in Space Assume the initial condition for Ṽ0: Ṽ (A j, 0) = 1 T T X(A j, t), t=1 giving Ṽ0. Now we can write the CAR in closed form: f (Ṽt Ṽt 1, ρ, σ 2 ζ ) { exp 1 (Ṽt ρṽt 1) D 1 (I H) (Ṽt ρṽt 1) }, 2 D is a diagonal matrix with entries σ 2 ζ /m j. Note that this is an improper CAR.

Clarification Note that we can have Z > 0, Q = 0 and vice versa. Therefore V and Ṽ can have opposite signs This arises because we are modeling at two different scales - need processes at two different scales. We can view V (s, t) Ṽ (A, t)as a deviation from the areal average. We assume these realized deviations are independent across space and time We have a conditional model for V and X given Ṽ. The resulting marginal model for U and Y given Ṽ is multiscale - additive random effects at two scales

Clarification Note that we can have Z > 0, Q = 0 and vice versa. Therefore V and Ṽ can have opposite signs This arises because we are modeling at two different scales - need processes at two different scales. We can view V (s, t) Ṽ (A, t)as a deviation from the areal average. We assume these realized deviations are independent across space and time We have a conditional model for V and X given Ṽ. The resulting marginal model for U and Y given Ṽ is multiscale - additive random effects at two scales

Clarification Note that we can have Z > 0, Q = 0 and vice versa. Therefore V and Ṽ can have opposite signs This arises because we are modeling at two different scales - need processes at two different scales. We can view V (s, t) Ṽ (A, t)as a deviation from the areal average. We assume these realized deviations are independent across space and time We have a conditional model for V and X given Ṽ. The resulting marginal model for U and Y given Ṽ is multiscale - additive random effects at two scales

Clarification Note that we can have Z > 0, Q = 0 and vice versa. Therefore V and Ṽ can have opposite signs This arises because we are modeling at two different scales - need processes at two different scales. We can view V (s, t) Ṽ (A, t)as a deviation from the areal average. We assume these realized deviations are independent across space and time We have a conditional model for V and X given Ṽ. The resulting marginal model for U and Y given Ṽ is multiscale - additive random effects at two scales

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

The second stage specification Deposition model: f (Y t U t, V t, X t, η t, b, θ). Space-time intercept: f (η t θ). Precipitation model: f (U t V t, θ). Measurement Error model: f (V t Ṽ(1) t, θ) CMAQ Model: f (X t Ṽt, θ) AR and CAR Model: f (Ṽt Ṽt 1, θ). T t=1 [ f (Yt U t, V t, X t, η t, b, θ) f (η t θ) f (U t V ] t, θ) f (V t Ṽ(1) t, θ) f (X t Ṽt, θ)f (Ṽt Ṽt 1, θ) f (b θ). θ = (α 0, α 1, β 0, β 1, β 2, b 0, γ 0, γ 1, ρ, σ 2 δ, σ2 b, σ2 η, σ2 ɛ, σ2 ψ, σ2 v, σ 2 ζ ).

Graphical representation of our model. Regional atmospheric driver (point) Regional atmospheric centering process (areal) replacements δ(s i, t) V (s i, t) MEM Ṽ (A ki, t) b(s i ) η(s i, t) U(s i, t) Y (s i, t) X(A ki, t) Ṽ (A j, t 1) P(s i, t) (observed precipitation) Z(s i, t) (observed deposition) Q(A ki, t) (CMAQ model output)

Graphical representation of a fusion model. replacements Regional atmospheric driver (point) δ(s i, t) η(s i, t) V (s i, t) block average V (A ki, t) a(s i ) U(s i, t) Y (true) (s i, t) block average X(A ki, t) Q(A ki, t) Z (true) (s i, t) true P(s i, t) (observed precipitation) Y (s i, t) Q(A ki, t) Z(s i, t) (observed deposition) (CMAQ model output)

Predictions at new locations At a new site s and time t we need Z (s, t ) which depends on Y (s, t ). If P(s, t ) = 0 then Z (s, t ) = 0. Suppose otherwise. Bayesian predictive distributions: π(z pred z obs ) = π(z pred par) π(par z obs )dpar. Need to simulate Y (s, t ). V (s, t ) N(Ṽ (A, t ), σ 2 v). U(s, t ), η(s, t ) and b(s ) are simulated from the conditional distributions at s given s 1,..., s n. Kriging. X(A, t ) =logq(a, t ) if Q(A, t ) > 0, otherwise updated in the MCMC. More details in the paper.

Predictions at new locations At a new site s and time t we need Z (s, t ) which depends on Y (s, t ). If P(s, t ) = 0 then Z (s, t ) = 0. Suppose otherwise. Bayesian predictive distributions: π(z pred z obs ) = π(z pred par) π(par z obs )dpar. Need to simulate Y (s, t ). V (s, t ) N(Ṽ (A, t ), σ 2 v). U(s, t ), η(s, t ) and b(s ) are simulated from the conditional distributions at s given s 1,..., s n. Kriging. X(A, t ) =logq(a, t ) if Q(A, t ) > 0, otherwise updated in the MCMC. More details in the paper.

Predictions at new locations At a new site s and time t we need Z (s, t ) which depends on Y (s, t ). If P(s, t ) = 0 then Z (s, t ) = 0. Suppose otherwise. Bayesian predictive distributions: π(z pred z obs ) = π(z pred par) π(par z obs )dpar. Need to simulate Y (s, t ). V (s, t ) N(Ṽ (A, t ), σ 2 v). U(s, t ), η(s, t ) and b(s ) are simulated from the conditional distributions at s given s 1,..., s n. Kriging. X(A, t ) =logq(a, t ) if Q(A, t ) > 0, otherwise updated in the MCMC. More details in the paper.

Predictions at new locations At a new site s and time t we need Z (s, t ) which depends on Y (s, t ). If P(s, t ) = 0 then Z (s, t ) = 0. Suppose otherwise. Bayesian predictive distributions: π(z pred z obs ) = π(z pred par) π(par z obs )dpar. Need to simulate Y (s, t ). V (s, t ) N(Ṽ (A, t ), σ 2 v). U(s, t ), η(s, t ) and b(s ) are simulated from the conditional distributions at s given s 1,..., s n. Kriging. X(A, t ) =logq(a, t ) if Q(A, t ) > 0, otherwise updated in the MCMC. More details in the paper.

Choosing the spatial decay parameters Estimation is challenging due to weak identifiability of variances and ranges. Inconsistent see Stein (1999) and Zhang (2004). So, we fix φ η, φ δ and φ b Use validation mean-square error VMSE = 1 n v 8 52 i=1 t=1 ( Z (s i, t) Ẑ (s i, t) ) 2 I(observed) n v = total number of validation observed (=407, here). Optimal ranges were 500, 1000 and 500 kilometers. VMSE is not sensitive near these values.

Model Validation 4 6 5 3 4 Prediction 3 Prediction 2 O 2 1 0 O O O O O O O O O O O OOO OO OO OO OO OO OO O O O OO O O O O O O O OO O O O O O O O O O O O 1 0 O O O O O O O O O O O O O O O O O O OO OO OO OO OO OO OOO O O O OO O OO O O O OO O O OO O O O O O O O O O OO O O O O O O O O O O O O 0 1 2 3 Sulfate Observation (a) 0.0 0.5 1.0 1.5 2.0 Nitrate Observation (b) Figure: Validation versus the observed values at the 8 reserved sites. Validation prediction intervals are plotted as vertical lines. (a) sulfate and (b) nitrate.

b(s) Surfaces 0.2 0.1 0.0 0.1 0.2 0.15 0.10 0.05 0.0 0.05 0.10 0.15 (a) (b) 0.25 0.20 0.15 0.10 0.05 0.25 0.20 0.15 0.10 0.05 (c) (d) Figure: (a) Sulfate. (b) Nitrate. (c) The s.d. for sulfate. (d) The s.d. for nitrate.

Spatially varying slopes? Do we need the spatially varying b(s) s? Only a few of the b(s i ) are significant; they are small relative to their standard errors. Importance of precipitation and the spatially varying intercept makes it difficult to find spatially varying contribution of CMAQ Fusion approaches also have not found spatially varying intercepts Still can see space-time bias in CMAQ by comparing model predictions with CMAQ output.

Spatially varying slopes? Do we need the spatially varying b(s) s? Only a few of the b(s i ) are significant; they are small relative to their standard errors. Importance of precipitation and the spatially varying intercept makes it difficult to find spatially varying contribution of CMAQ Fusion approaches also have not found spatially varying intercepts Still can see space-time bias in CMAQ by comparing model predictions with CMAQ output.

Spatially varying slopes? Do we need the spatially varying b(s) s? Only a few of the b(s i ) are significant; they are small relative to their standard errors. Importance of precipitation and the spatially varying intercept makes it difficult to find spatially varying contribution of CMAQ Fusion approaches also have not found spatially varying intercepts Still can see space-time bias in CMAQ by comparing model predictions with CMAQ output.

Spatially varying slopes? Do we need the spatially varying b(s) s? Only a few of the b(s i ) are significant; they are small relative to their standard errors. Importance of precipitation and the spatially varying intercept makes it difficult to find spatially varying contribution of CMAQ Fusion approaches also have not found spatially varying intercepts Still can see space-time bias in CMAQ by comparing model predictions with CMAQ output.

Spatially varying slopes? Do we need the spatially varying b(s) s? Only a few of the b(s i ) are significant; they are small relative to their standard errors. Importance of precipitation and the spatially varying intercept makes it difficult to find spatially varying contribution of CMAQ Fusion approaches also have not found spatially varying intercepts Still can see space-time bias in CMAQ by comparing model predictions with CMAQ output.

Parameter Estimates Sulfate Nitrate mean sd 95%CI mean sd 95%CI α 0 0.4497 0.0871 ( 0.6189, 0.2733) 0.3548 0.0596 ( 0.4695, -0.2369) α 1 0.1787 0.0379 (0.1017, 0.2499) 0.1522 0.0336 (0.0843, 0.2161) β 0 1.9414 0.0196 ( 1.9784, 1.9012) 1.9976 0.0192 ( 2.0344, 1.9605) β 1 0.9103 0.0067 (0.8972, 0.9240) 0.8412 0.0070 (0.8274, 0.8553) β 2 0.0029 0.0062 ( 0.0091, 0.0151) 0.0040 0.0060 ( 0.0078, 0.0159) b 0 0.0490 0.0053 (0.0386, 0.0599) 0.0535 0.0062 (0.0409, 0.0652) γ 0 3.0768 0.0035 ( 3.0836, -3.0700) 3.2177 0.0033 ( 3.2242, 3.2112) γ 1 0.8957 0.0034 (0.8891, 0.9025) 0.7368 0.0033 (0.7303, 0.7433) ρ 0.7688 0.0012 (0.7664, 0.7712) 0.7492 0.0013 (0.7468, 0.7517) σδ 2 2.6438 0.0602 (2.5254, 2.7631) 1.8694 0.0387 (1.7942, 1.9476) ση 2 0.2812 0.0101 (0.2616, 0.3010) 0.3354 0.0105 (0.3149, 0.3564) σɛ 2 0.0718 0.0057 (0.0607, 0.0832) 0.0727 0.0074 (0.0588, 0.0878) σψ 2 2.5062 0.0033 (2.4997, 2.5127) 2.2148 0.0028 (2.2092, 2.2203) σv 2 0.8087 0.0259 (0.7601, 0.8620) 0.7821 0.0237 (0.7366, 0.8290) σζ 2 0.4345 0.0011 (0.4322, 0.4367) 0.4340 0.0012 (0.4316, 0.4363)

IDW vs. our model Validation PMSE for the annual totals using the 8 holdout sites For sulfates, IDW PMSE is 20.4, our model PMSE is 8.1 For nitrates, IDW PMSE is 3.5, our model PMSE is 1.3 Would expect improvement given the complexity of our model. However 60% is substantial and perhaps justifies the effort

IDW vs. our model Validation PMSE for the annual totals using the 8 holdout sites For sulfates, IDW PMSE is 20.4, our model PMSE is 8.1 For nitrates, IDW PMSE is 3.5, our model PMSE is 1.3 Would expect improvement given the complexity of our model. However 60% is substantial and perhaps justifies the effort

IDW vs. our model Validation PMSE for the annual totals using the 8 holdout sites For sulfates, IDW PMSE is 20.4, our model PMSE is 8.1 For nitrates, IDW PMSE is 3.5, our model PMSE is 1.3 Would expect improvement given the complexity of our model. However 60% is substantial and perhaps justifies the effort

IDW vs. our model Validation PMSE for the annual totals using the 8 holdout sites For sulfates, IDW PMSE is 20.4, our model PMSE is 8.1 For nitrates, IDW PMSE is 3.5, our model PMSE is 1.3 Would expect improvement given the complexity of our model. However 60% is substantial and perhaps justifies the effort

Annual Sulfate Deposition 6 6 8 5 9 9 6 6 7 11 13 15 14 11 16 20 12 19 18 12 15 17 16 19 16 19 13 30 28 24 13 11 17 16 13 19 22 17 14 13 10 18 12 8 21 21 16 14 16 15 13 15 14 16 13 11 15 16 13 8 9 14 10 12 12 16 15 11 12 19 16 13 16 18 16 22 20 15 18 14 22 13 10 15 10 11 11 17 6 6 10 6 8 16 14 13 30 25 20 15 10 5 6 13 20 11 Figure: Model predicted map of annual sulfate deposition in 2001. The observed annual totals are labeled; a larger font size is used for the validation sites.

Annual Nitrate Deposition 9 7 10 7 11 10 8 9 9 13 13 17 13 13 15 16 11 14 14 14 14 15 13 13 12 15 11 17 17 18 13 9 11 13 13 12 16 14 11 11 9 11 10 8 11 11 12 11 9 10 10 12 11 7 10 12 8 6 7 11 8 8 8 11 13 10 12 21 12 10 14 13 14 16 15 14 11 12 11 14 9 8 9 10 7 8 8 8 10 6 7 11 7 8 13 10 10 20 15 10 5 6 9 12 10 Figure: Model predicted map of annual nitrate deposition in 2001. The observed annual totals are labeled; a larger font size is used for the validation sites.

Map of the length of 95% prediction intervals 5 10 15 20 Figure: Uncertainty map of annual sulfate deposition. 2 4 6 8 10 12 14 Figure: Uncertainty map of annual nitrate deposition.

Quarterly Sulfate Deposition (a) (b) 15 10 5 0 (c) (d) Figure: (a) Jan Mar, (b) Apr-Jun, (c) Jul Sep, (d) Oct-Dec.

Quarterly Nitrate Deposition (a) (b) 10 8 6 4 2 0 (c) (d) Figure: (a) Jan Mar, (b) Apr-Jun,(c) Jul Sep, (d) Oct-Dec.

Discussion Novel spatio-temporal model for fusing point and areal data which validates well. Inference can be provided for any spatial or temporal aggregation. With yearly data can study trends in deposition with regard to regulatory assessment There are models in between IDW and ours but may sacrifice the features we accommodate. Preferable to fusion using block averaging since number of modeled grid cells much greater than number of monitoring sites, even worse if across time. Develop model for dry deposition, hence total deposition.