Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts

Similar documents
A Spatio-Temporal Downscaler for Output From Numerical Models

Bayesian spatial quantile regression

Fusing point and areal level space-time data. data with application to wet deposition

Statistics for extreme & sparse data

Bayesian Forecasting Using Spatio-temporal Models with Applications to Ozone Concentration Levels in the Eastern United States

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Hierarchical Bayesian models for space-time air pollution data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Air Quality Modelling under a Future Climate

Preliminary Experiences with the Multi Model Air Quality Forecasting System for New York State

Fusing point and areal level space-time data with application

Air Quality Modelling for Health Impacts Studies

NOAA-EPA s s U.S. National Air Quality Forecast Capability

Statistical integration of disparate information for spatially-resolved PM exposure estimation

PRELIMINARY EXPERIENCES WITH THE MULTI-MODEL AIR QUALITY FORECASTING SYSTEM FOR NEW YORK STATE

NRCSE. Statistics, data, and deterministic models

Fusing space-time data under measurement error for computer model output

Bayesian Estimation of Input Output Tables for Russia

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

FORECASTING: A REVIEW OF STATUS AND CHALLENGES. Eric Grimit and Kristin Larson 3TIER, Inc. Pacific Northwest Weather Workshop March 5-6, 2010

Climate Change: the Uncertainty of Certainty

Global Change and Air Pollution (EPA-STAR GCAP) Daniel J. Jacob

University of Southampton Research Repository eprints Soton

Numerical simulation of relationship between climatic factors and ground ozone concentration over Kanto area using the MM5/CMAQ Model

Introduction to Bayesian Inference

Presented at the NADP Technical Meeting and Scientific Symposium October 16, 2008

Probabilities for climate projections

What is one-month forecast guidance?

Motivation & Goal. We investigate a way to generate PDFs from a single deterministic run

Combining Deterministic and Probabilistic Methods to Produce Gridded Climatologies

MULTI-MODEL AIR QUALITY FORECASTING OVER NEW YORK STATE FOR SUMMER 2008

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Bruno Sansó. Department of Applied Mathematics and Statistics University of California Santa Cruz bruno

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Harrison B. Prosper. Bari Lectures

Challenges in modelling air pollution and understanding its impact on human health

Space-Time Data fusion Under Error in Computer Model Output: An Application to Modeling Air Quality

Space-time downscaling under error in computer model output

Extremes and Atmospheric Data

Probabilistic assessment of danger zones using a surrogate model of CFD simulations

Bayesian PalaeoClimate Reconstruction from proxies:

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Flexible Spatio-temporal smoothing with array methods

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

GLOBAL MAPPING OF OZONE DISTRIBUTION USING SOFT INFORMATION AND DATA FROM TOMS AND SBUV INSTRUMENTS ON THE NIMBUS 7 SATELLITE

Hierarchical Modeling for Multivariate Spatial Data

A Complete Spatial Downscaler

Importance of Numerical Weather Prediction in Variable Renewable Energy Forecast

Hierarchical Bayesian auto-regressive models for large space time data with applications to ozone concentration modelling

BAYESIAN HIERARCHICAL MODELS FOR EXTREME EVENT ATTRIBUTION

The document was not produced by the CAISO and therefore does not necessarily reflect its views or opinion.

Combining Monitoring Data and Computer model Output in Assessing Environmental Exposure

1 INTRODUCTION 2 DESCRIPTION OF THE MODELS. In 1989, two models were able to make smog forecasts; the MPA-model and

Modelling trends in the ocean wave climate for dimensioning of ships

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

The ENSEMBLES Project

ASSESSMENT OF PM2.5 RETRIEVALS USING A COMBINATION OF SATELLITE AOD AND WRF PBL HEIGHTS IN COMPARISON TO WRF/CMAQ BIAS CORRECTED OUTPUTS

Bayesian spatial quantile regression

EPA AIRNow Program. Air Quality Index Reporting and Forecasts. Prepared by: John E. White

Ergodicity in data assimilation methods

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

OPTIMISING THE TEMPORAL AVERAGING PERIOD OF POINT SURFACE SOLAR RESOURCE MEASUREMENTS FOR CORRELATION WITH AREAL SATELLITE ESTIMATES

Monte Carlo in Bayesian Statistics

Wind Forecasting using HARMONIE with Bayes Model Averaging for Fine-Tuning

QPE and QPF in the Bureau of Meteorology

Calibration of numerical model output using nonparametric spatial density functions

A short introduction to INLA and R-INLA

Lecture 13 Fundamentals of Bayesian Inference

August Forecast Update for Atlantic Hurricane Activity in 2015

Current and Future Impacts of Wildfires on PM 2.5, Health, and Policy in the Rocky Mountains

Professor David B. Stephenson

Infer relationships among three species: Outgroup:

CSC321 Lecture 18: Learning Probabilistic Models

Responsibilities of Harvard Atmospheric Chemistry Modeling Group

Combining physical and statistical models to predict environmental processes.

Accuracy and Cost Considerations in Choosing a Chemical Mechanism for Operational Use in AQ Models

CALIOPE EU: Air Quality

Statistics Introduction to Probability

Wrapped Gaussian processes: a short review and some new results

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

Improvement of Meteorological Inputs for Air Quality Study

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

USING UNCERTAIN REGIONAL AIR QUALITY MODEL OUTPUTS FOR OZONE SOURCE APPORTIONMENT. Deliverable 1C. June 2014

The priority program SPP1167 Quantitative Precipitation Forecast PQP and the stochastic view of weather forecasting

The Role of Uncertainty Quantification in Model Verification and Validation. Part 6 NESSUS Exercise Review and Model Calibration

William Battye * EC/R Incorporated, Chapel Hill, NC. William Warren-Hicks, Ph.D. EcoStat, Inc., Mebane, NC

Bayesian Hierarchical Models

AREP GAW. Section 14 Daily Air Quality Forecast Operations

Bayesian Regression Linear and Logistic Regression

Climate Variables for Energy: WP2

A hierarchical Bayesian model to estimate and forecast ozone through space and time

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

What is (certain) Spatio-Temporal Data?

Spatiotemporal Analysis of Solar Radiation for Sustainable Research in the Presence of Uncertain Measurements

Transcription:

Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint work with Sujit Sahu in University of Southampton MPI-PKS, Dresden, 31st July 2009 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 1 / 27

Motivation Fusing ground level ozone concentration observations with computer deterministic model output. Improving biased forecast. Capturing spatio-temporal variation. Quantifying uncertainty through Bayesian probabilistic forecast. Producing high resolution maps. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 2 / 27

Forecast EPA s www.airnow.gov website Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 3 / 27

Ground Level Ozone Ground level ozone: bad health effects: primarily respiratory, lung function, coughing, throat irritation, congestion, bronchitis, emphysema, asthma. Ozone is a secondary pollutant. VOC s (Volatile Organic Compounds) - organic gases but really chemicals that participate in the formation of ozone. Sunlight + VOC + NOx = Ozone. Meteorological conditions - sunlight, high temperature (so primarily from April to September), wind direction and wind speed. High spatial-temporal correlation. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 4 / 27

-90-80 -80-70 Observations 409 spatial point locations are in the area. Recorded hourly. Measured by unattended photometers. About 20 percent data is missing over 15 days. Sparse data. 40 30 40 30 0 187.5 375 750 1,125 1,500 Kilometers Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 5 / 27

CMAQ modelling system - Computer model output National Oceanic and Atmospheric Administration (NOAA) have designed the Community Multi-scale Air Quality (CMAQ) modelling system. The model is used by Environmental Protection Agency (EPA) CMAQ consists of a set of deterministic physical models from first principle. The forecasts are biased. Computer model outputs are in grid cell, but in the real siuation, we want point location prediction. Uncertainty has not been taken into account. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 6 / 27

CMAQ Location in NY State, MSE = 299 ozone concentration/ppb 0 20 40 60 80 100 CMAQ Ozone 0 50 100 150 200 250 300 time/hour Location in MD State, MSE = 754 ozone concentration/ppb 0 20 40 60 80 100 0 50 100 150 200 250 300 time/hour Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 7 / 27

CMAQ modules Ching and Byun,1999 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 8 / 27

CMAQ modules Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 9 / 27

Problem Daily 8-hour maximum Prediction 8-hour average ozone concentration is an important indicator for environmental monitoring. Measuring the daily 8-hour average maximum ozone concentration is required by the law. One day ahead 8-hour average maximum ozone concentration at an arbitrary location is needed. High resolution map can be produced from the prediction outputs. Obtaining forecasts within few hours. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 10 / 27

Work done by others Fuentes and Raftery (2005) combine the computer model and observation by joint multivariate normal distribution. Zimmerman and Holland (2005) use different data sources with different measurement error and bias. Jun and Stein (2004) compare the correlation structure of computer model and observations. None of them deal with space-time forecast at the same time. The measurement is not ground truth. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 11 / 27

Why do we adopt Bayesian approach? Probabilistic forecast addresses the uncertainty through distribution (pdf). Modelling becomes more flexible. Linear regression model doesn t work here, it cannot capture spatial correlation. The approach distinguish "ground truth", "measurement" and "biased forecast". Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 12 / 27

Model Structure Observation Historical Data Z(s, t) ǫ(s, t) Forecasts Ground Truth O(s, t 1) O(s, t) O(s, t + 1) CMAQ x(s, t) x(s, t + 1) Measurement Equation: Z(s i, t) = O(s i, t) + ǫ(s i, t) System Equation: O(s i, t) = ξ t + ρ O(s i, t 1) + β 0 x(s i, t) + η(s i, t) Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 13 / 27

Model Specification Observation Historical Data Z(s, t) ǫ(s, t) Forecasts Ground Truth O(s, t 1) O(s, t) O(s, t + 1) CMAQ x(s, t) x(s, t + 1) Measurement Equation: Z(s i, t) N(O(s i, t), σ 2 ǫ ), System Equation: O(t) N(ξ t + ρ O(t 1) + β 0 x(t), σ 2 ωσ), where O(t) = (O(s 1, t),...,o(s n, t)), x(t) = (x(s 1, t),...,x(s n, t)). Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 14 / 27

How do we forecast? Posterior Predictive Distribution The posterior predictive distribution of Z(s, t ) is obtained by integrating over the unknown quantities with respect to the joint posterior distribution, i.e., π (Z(s, t ) z) = π ( Z(s, t ) O(s, [t ]), σǫ 2 ) π (O(s, [t ]) θ, w) do(s, [t ]) dθ dw. It can be done by Monte Carlo integration in the Markov chain Monte Carlo routine. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 15 / 27

Prediction Maps The 1-day ahead forecast surfaces on 11th Aug: Bayes and CMAQ Bayes forecast map for the following day: 11th Aug CMAQ forecast map for the following day: 11th Aug 28 28 62 62 69 82 97 87 64 62 50 87 87 7772 6560 69 36 62 62 69 82 97 87 64 62 50 87 87 7772 6560 69 36 57 57 59 57 57 59 59 59 90 70 50 30 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 16 / 27

Prediction Maps The 1-day ahead forecast surfaces on 11th Aug: Bayes and its uncertainty Bayes forecast map for the following day: 11th Aug Length of 95% predictive interval for the following day: 11th Aug 90 70 50 30 130 110 90 70 50 30 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 17 / 27

Prediction Quality Comparison of root mean square error (ppb) (MSE) and relative bias (ppb) (rbias) RMSE rbias Validation Days CMAQ Bayes CMAQ Bayes Aug 2 9 15.15 7.47 0.1588-0.0042 Aug 3 10 15.70 7.20 0.1687-0.0070 Aug 4 11 16.14 8.03 0.1732-0.0174 Aug 5 12 15.92 7.51 0.1728-0.0215 Aug 6 13 15.51 6.53 0.1724-0.0083 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 18 / 27

Validation Plot Validation plot for one day ahead forecast on 11th Aug Validation plot of one day ahead prediction on 11th Forecast/ppb 40 60 80 100 CMAQ Bayes 20 40 60 80 100 Observation/ppb Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 19 / 27

Hit and error percentages for O 3 exceeding 80 ppb. Period CMAQ Hit Error Bayes Hit Error Aug 2-9 84.76 15.24 95.12 4.88 Aug 3-10 82.20 17.80 94.24 5.76 Aug 4-11 82.05 17.95 94.36 5.64 Aug 5-12 84.78 15.22 94.92 5.08 Aug 6-13 83.92 16.08 93.97 6.03 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 20 / 27

Extreme Value Theory Extention Not accurate to predict high values (> 80ppb). Non-normal distribution. 1 Measurement Equation: Z(s, t) GEV(µ(s, t), σ g, ν). 2 Second Equation: µ(s, t) = O(s, t) + ǫ(s, t). 3 System Equation: O(s i, t) = ξ t + ρ O(s i, t 1) + β 0 x(s i, t) + η(s i, t). Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 21 / 27

Validation of the upper tail on Aug 13th Forecasts/ppb 60 70 80 90 EVTDLM DLM(1) 70 75 80 85 90 95 Observation/ppb Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 22 / 27

RMSE of the upper tail on Aug 13th. Observed Value DLM(1) EVTDLM All 6.94 7.37 > 50 7.26 7.30 > 60 7.64 7.61 > 70 8.59 8.28 > 80 10.53 9.45 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 23 / 27

Conclusion The forecast is consistent, more accurate, faster than running another computer model. Maps of probability statement could be produced. The approach is general. We also forecast hourly data under the same framework. C language code is developed and a simplifed version S-plus package for a faster hourly model has been developed. Future work will focus on using monitoring data from different data sources. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 24 / 27

Future Work Modelling the whole USA is also needed. Using other non-normal distributions. Other types of spatial correlation structure could be used. The speed of forecast could be further improved which is a trade-off between accuracy and time. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 25 / 27

Acknowledgements EPSRC Doctoral Training Account in University of Southampton. Data provided by Dave Holland in USEPA. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 26 / 27

References Fuentes and Raftery (2005) Model evaluation and spatial interpolation by Bayesian combination of observations with outputs from numerical models. Biometrics, 61 (1). Zimmerman and Holland (2005) Complementary co-kriging: spatial prediction using data combined from several environmental monitoring networks. Environmetrics, 16. Jun and Stein (2004) Statistical comparison of observed and CMAQ modeled daily sulfate levels. Atmospheric Environment, 38. Sahu, Yip and Holland (2009) Improved space-time prediction of daily ozone concentration levels in the eastern U.S. Atmospheric Environment, 43. Sahu, Yip and Holland (2008) A fast Bayesian method for updating and forecasting hourly ozone levels. Univesity of Southampton, Technical Report. Harrison and West (1997) Bayesian Forecasting and Dynamic Models. Springer. Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 27 / 27