Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Similar documents
PREDICTIVE SPATIO-TEMPORAL MODELS FOR SPATIALLY SPARSE ENVIRONMENTAL DATA

Elements of Multivariate Time Series Analysis

Bo Li. National Center for Atmospheric Research. Based on joint work with:

Ch 6. Model Specification. Time Series Analysis

Geostatistical Space-Time Models, Stationarity, Separability and Full Symmetry. Technical Report no. 475

Some Time-Series Models

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

Econ 424 Time Series Concepts

A General Class of Nonseparable Space-time Covariance

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Time Series Analysis -- An Introduction -- AMS 586

Univariate ARIMA Models

X t = a t + r t, (7.1)

Statistícal Methods for Spatial Data Analysis

A time series is called strictly stationary if the joint distribution of every collection (Y t

New Introduction to Multiple Time Series Analysis

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Lecture 2: Univariate Time Series

Geostatistical Space-Time Models, Stationarity, Separability, and Full Symmetry

APPLIED TIME SERIES ECONOMETRICS

Classic Time Series Analysis

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Forecasting Levels of log Variables in Vector Autoregressions

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

at least 50 and preferably 100 observations should be available to build a proper model

Chapter 3: Regression Methods for Trends

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Statistics 910, #5 1. Regression Methods

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Spatio-Temporal Analysis of Air Pollution Data in Malta

Time Series: Theory and Methods

MCMC analysis of classical time series algorithms.

A Gaussian state-space model for wind fields in the North-East Atlantic

11. Further Issues in Using OLS with TS Data

Statistical Methods for Forecasting

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Multivariate forecasting with VAR models

Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Recursive Estimation and Order Determination of Space-Time Autoregressive Processes

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

FE570 Financial Markets and Trading. Stevens Institute of Technology

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

Univariate Time Series Analysis; ARIMA Models

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Empirical Market Microstructure Analysis (EMMA)

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Time Series I Time Domain Methods

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

Chapter 3 - Temporal processes

Cross-sectional space-time modeling using ARNN(p, n) processes

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm

Applied Time. Series Analysis. Wayne A. Woodward. Henry L. Gray. Alan C. Elliott. Dallas, Texas, USA

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Time Series Analysis

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Spatial Dynamic Factor Analysis

This is a repository copy of Estimation and testing for covariance-spectral spatial-temporal models.

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

S-GSTAR-SUR Model for Seasonal Spatio Temporal Data Forecasting ABSTRACT

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Gaussian Copula Regression Application

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Separable approximations of space-time covariance matrices

FORECASTING THE INVENTORY LEVEL OF MAGNETIC CARDS IN TOLLING SYSTEM

3 Theory of stationary random processes

Chapter 2: Unit Roots

Paper Review: NONSTATIONARY COVARIANCE MODELS FOR GLOBAL DATA

Stochastic Processes

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Gaussian processes. Basic Properties VAG002-

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

KALMAN-TYPE RECURSIONS FOR TIME-VARYING ARMA MODELS AND THEIR IMPLICATION FOR LEAST SQUARES PROCEDURE ANTONY G AU T I E R (LILLE)

AR, MA and ARMA models

Discrete time processes

Spatial Statistics for Energy Challenges

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Time Series 4. Robert Almgren. Oct. 5, 2009

ARIMA modeling to forecast area and production of rice in West Bengal

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Part I State space models

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

A Diagnostic for Seasonality Based Upon Autoregressive Roots

Forecasting 1 to h steps ahead using partial least squares

Chapter 18 The STATESPACE Procedure

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Acta Universitatis Carolinae. Mathematica et Physica

Transcription:

Seminar p.1/28 Predictive spatio-temporal models for spatially sparse environmental data Xavier de Luna and Marc G. Genton xavier.deluna@stat.umu.se and genton@stat.ncsu.edu http://www.stat.umu.se/egna/xdl/index.html http://www4.stat.ncsu.edu/ mggenton/ Umeå University

Seminar p.2/28 Outline 1. Introduction

Seminar p.2/28 Outline 1. Introduction 2. VAR models with spatial structure

Seminar p.2/28 Outline 1. Introduction 2. VAR models with spatial structure 3. Spatio-temporal correlation analysis

Seminar p.2/28 Outline 1. Introduction 2. VAR models with spatial structure 3. Spatio-temporal correlation analysis 4. Prediction performance analysis

Seminar p.2/28 Outline 1. Introduction 2. VAR models with spatial structure 3. Spatio-temporal correlation analysis 4. Prediction performance analysis 5. Conclusions

Seminar p.3/28 1. Introduction Spatio-temporal data: at stations (irregular) and times (regular) Data: sparse in space and rich in time Applications: Environmental: e.g. air pollution levels at meteorological stations in a given region/country Economics: e.g. socio-economic measurements made at different geographical levels, where stations are whole sub-regions or countries Goal: build a simple model for predictions in the future at the stations with as few assumptions as possible (e.g. no stationarity or isotropy in space)

Seminar p.4/28 Approaches to space-time modeling Direct joint modeling of space-time covariance Hierarchical Bayesian models Vector of time series Vector of spatial random fields see survey by Kyriakidis and Journel (1999) Many papers in the literature, for example: Wikle and Cressie (1999): Kalman filter Stroud, Müller, and Sansó (2001): state space Tonellato (2001): Bayesian multivar. time series Pfeifer and Deutsch (1980), Stoffer (1986): STARMAX Previous talks...

2. VAR models with spatial structure A simple model for is: effects (spatial trend),, are unknown parameter matrices is a vector of spatial, -dimensional white noise process: E, E, E Vector autoregressive (VAR) model commonly used in multivariate time series analysis (e.g., Lütkepohl, 1991). However: spatial structure of data typically has major consequences on (specific structure) Seminar p.5/28

Seminar p.6/28 Remarks: The deterministic dynamic system is stable if: for Stability implies time-stationarity: E for all, and Cov, i.e. is a function of only, for all and Covariance matrix from vec and and can easily be computed. For instance, : vec A temporal trend needs to be modeled/removed No assumption of spatial stationarity/isotropy,

Estimation and inference Estimation of parameters of our model with: maximum likelihood (if distributional assumptions are made) least squares moments estimators (Yule-Walker type) More about estimation and inference in, e.g., Lütkepohl (1991) Robust estimation of the parameters with: robust estimators of moments (Ma and Genton, 2000) together with Yule-Walker estimating equations (de Luna and Genton, 2002) Estimation of the parameters: all stations simultaneously or station-wise Model building must be performed separately for each station Seminar p.7/28

Spatio-temporal trends Deterministic trend specification: where is a time stationary process Often removed by time differencing with : where is a time stationary process can be modeled by a VAR if is a function of Will happen most often in practice, at least only approximately (as soon as is a polynomial function in with coefficients possibly function of location ), e.g. Seminar p.8/28

Seminar p.9/28 Periodic time trends or cycles may also be tackled by differencing, e.g. observations taken monthly will typically have to be stationarized with the seasonal differencing operator Modeling a deterministic trend with a weighted sum of known basis functions, where the weights are typically estimated by regression: Periodic functions: to account for seasonal effects along the time axis Polynomials: to model smooth variations in space see review article by Kyriakidis and Journel (1999) When other variables are observed at the same locations and time, these may also be used to model trends by regressing on them

Seminar p.10/28 Model building strategy Identify zeroes in the matrices Each station is modeled separately: covariate selection problem, where the available predictors are the lagged values at each station Complex model selection problem From the spatio-temporal structure: define an ordering to sequentially introduce the predictors in the model for say Such ordering is used in purely time series models: to explain the lag one variable is considered first, then lag two, etc...

Seminar p.11/28 To explain following order:, consider predictors in the where is an ordering of the stations, e.g., in ascending order with respect to their distance (using a given metric) to Other ordering motivated by physical knowledge about the underlying process, e.g. Irish wind speed application Predictors can be entered in the model sequentially: simplification of the model building stage

Spatio-temporal ordering of the stations t t 1 t 2 s Several strategies to decide on the number of predictors to be used Seminar p.12/28

Popular technique in time series modeling: partial autocorrelations Generalization to space-time: look at partial correlation along the ordering of predictors Three random variables: The partial correlation of,, and and given : Corr Corr where is the best linear predictor of This partial correlation has the property that it is zero when & are independent conditional on Renaming: Seminar p.13/28

: Partial correlation function (PCF) for station Corr PCF is useful for model selection Define to be such that and for Similarly, can be defined for each time lag that and for, such with the sample PCF: Identify the Corr : under joint normality is -distributed Test for see e.g. Krzanowski (1988, Sec. 14.4) Seminar p.14/28

Strategy Step 1: Identify, by looking at the sample PCF. Step 2: Identify by looking at the sample PCF, when have been put to zero Step 3: Identify, zero, by looking at the sample PCF when have been put to Step 4: Step 3 is repeated in a similar manner for all necessary time lags in order to identify,, etc. Deletion of uninteresting predictors Use model selection criterion: AIC, BIC,... Seminar p.15/28

Carbone monoxide in Venice Implementation of the methodology using Splus hourly observations of atmospheric concentrations (micrograms per cubic meter) of carbon monoxide (CO) recorded in September 1995 at in Mestre (Venice, Italy) Locations of the 4 stations monitoring stations located Locations of stations -0.6-0.4-0.2 0.0 0.2 0.4 2 3 4-0.5 0.0 0.5 1.0 1 Seminar p.16/28

Seminar p.17/28 Time series of CO concentrations Station 1 Station 2 CO concentrations 0 1 2 3 4 5 6 CO concentrations 0 1 2 3 4 5 6 0 50 100 150 200 250 300 Time (hours) Station 3 0 50 100 150 200 250 300 Time (hours) Station 4 CO concentrations 0 1 2 3 4 5 6 CO concentrations 0 1 2 3 4 5 6 0 50 100 150 200 250 300 Time (hours) 0 50 100 150 200 250 300 Time (hours) Stations 2 and 4 are located along streets with high intensity traffic, whereas station 1 is in a garden and station 3 in a pedestrian area

Seminar p.18/28 Data set has been analyzed by Tonellato (2001) in a Bayesian dynamic linear model framework Our methodology is appropriate for this application: Italian law requires public authorities to produce short-term forecasts of air pollutant concentrations at locations where monitoring stations are present With so few stations, it is not possible to model the spatial dependence adequately, therefore, the use of a spatial stationary isotropic exponential correlation function suggested by Tonellato (2001) seems questionable Wind speed and direction can influence air pollutant concentrations in a nonstationary and anisotropic way Explanatory variables such as wind and road traffic information are not available

We take the logarithm of the CO concentrations and we estimate a trend for each station by regressing on a family of daily harmonics Fitted models with BIC: Univariate AR time series models Spatial VAR model Seminar p.19/28

Station 3 time t-1 1 2 3 4 station station time t-2 PCF -0.6-0.4-0.2 0.0 0.2 0.4 0.6 PCF -0.6-0.4-0.2 0.0 0.2 0.4 0.6 1 2 3 4 time t-3 PCF -0.6-0.4-0.2 0.0 0.2 0.4 0.6 1 2 3 4 station PCF for the station 3 up to lag Seminar p.20/28

Seminar p.21/28 3. Spatio-temporal correlation analysis Irish wind data set analyzed by Haslett and Raftery (1989) Daily averages of wind speeds recorded at meteorological stations in Ireland 1961-1978 Square root transformation to stabilize the variance Subtraction of an estimated seasonal effect and spatially varying mean from the data Gneiting (2002): assumption of full symmetry of the spatio-temporal correlation not realistic because wind patterns are predominantly westerly over Ireland. We incorporate this physical information by defining a special ordering of the stations With BIC, we fit a spatial VAR(3) model

correlation 0.5 0.6 0.7 0.8 0.9 1.0 correlation 0.1 0.2 0.3 0.4 0.5 0.6 0 100000 200000 300000 400000 0 100000 200000 300000 400000 distance (m) distance (m) correlation 0.0 0.1 0.2 0.3 0.4 0.5 correlation 0.0 0.1 0.2 0.3 0.4 0.5 0 100000 200000 300000 400000 0 100000 200000 300000 400000 distance (m) distance (m) Spatial correlation: from fitted VAR(3) (open circles) and empirical (pluses) Seminar p.22/28

0 100000 200000 300000 400000 0 100000 200000 300000 400000 distance (m) distance (m) correlation 0.5 0.6 0.7 0.8 0.9 1.0 correlation 0.1 0.2 0.3 0.4 0.5 0.6 0 100000 200000 300000 400000 0 100000 200000 300000 400000 distance (m) distance (m) correlation 0.5 0.6 0.7 0.8 0.9 1.0 correlation 0.1 0.2 0.3 0.4 0.5 0.6 0 100000 200000 300000 400000 0 100000 200000 300000 400000 distance (m) distance (m) correlation 0.5 0.6 0.7 0.8 0.9 1.0 correlation 0.1 0.2 0.3 0.4 0.5 0.6 WE NS SW-NE Seminar p.23/28

4. Prediction performance analysis Comparison of model selection strategies through out-of-sample validation based on recursive prediction errors, see de Luna and Skouras (2003) For given, a model selection strategy can be evaluated with the accumulated prediction error criterion: where is the prediction of obtained by applying the model selection strategy on the sub-sample Choosing the model strategy minimizing the above criterion will eventually identify the best strategy with probability one Seminar p.24/28

Seminar p.25/28 RAMSE with, Station Strategy AR-BIC AR-AIC SVAR-BIC SVAR-AIC dist order wind order dist order wind order Roche s Pt. (1) 0.492 0.492 0.474 0.473 0.473 0.473 Valentia (2) 0.503 0.503 0.498 0.498 0.499 0.499 Kilkenny (3) 0.446 0.446 0.424 0.424 0.423 0.424 Shannon (4) 0.461 0.460 0.455 0.454 0.452 0.452 Birr (5) 0.481 0.480 0.470 0.470 0.469 0.468 Dublin (6) 0.461 0.461 0.445 0.445 0.444 0.444 Claremorris (7) 0.488 0.488 0.485 0.485 0.483 0.482 Mullingar (8) 0.446 0.445 0.433 0.433 0.433 0.433 Clones (9) 0.477 0.477 0.467 0.468 0.467 0.467 Belmullet (10) 0.490 0.490 0.491 0.490 0.489 0.489 Malin Head (11) 0.499 0.499 0.494 0.492 0.493 0.493

For two strategies and, plot the sums: against. AR-AIC vs SVAR-AIC (dist order) SVAR-BIC (dist order) vs SVAR-AIC (dist order) 0 20 40 60 80 100-1 0 1 2 3 4 5 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 SVAR-BIC (wind order) vs SVAR-AIC (dist order) SVAR-AIC (wind order) vs SVAR-AIC (dist order) -2 0 2 4-3 -2-1 0 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Seminar p.26/28

5. Conclusions Simple predictive model for nonstationary spatio-temporal data (sparse in space, rich in time) Spatio-temporal trend handled by time differencing/weighted sum of known basis functions Estimation by least squares, Yule-Walker, or maximum likelihood Spatio-temporal ordering of the stations Model building strategy with PCF/AIC/BIC Each station is modeled separately Applications: Carbone monoxide in Venice; Irish wind speed;... Future research: comparison with other predictive methods Seminar p.27/28

Seminar p.28/28 Predictive spatio-temporal models for spatially sparse environmental data Xavier de Luna and Marc G. Genton xavier.deluna@stat.umu.se and genton@stat.ncsu.edu http://www.stat.umu.se/egna/xdl/index.html http://www4.stat.ncsu.edu/ mggenton/ Umeå University