Extending clustered point process-based rainfall models to a non-stationary climate

Similar documents
Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

Stochastic disaggregation of spatial-temporal rainfall with limited data

Daily Rainfall Disaggregation Using HYETOS Model for Peninsular Malaysia

Statistical downscaling methods for climate change impact assessment on urban rainfall extremes for cities in tropical developing countries A review

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Econ 582 Nonparametric Regression

Maximum likelihood estimation for the seasonal Neyman-Scott rectangular pulses model for rainfall

Specialist rainfall scenarios and software package

Modelling trends in the ocean wave climate for dimensioning of ships

Efficient simulation of a space-time Neyman-Scott rainfall model

LJMU Research Online

Spatial and temporal variability of wind speed and energy over Greece

Exploring spectral wave climate variability using a weather type approach

A rainfall disaggregation scheme for sub-hourly time scales: Coupling a. Bartlett-Lewis based model with adjusting procedures

Stochastic Generation Of Point Rainfall Data At Sub-Daily Timescales: A Comparison Of DRIP And NSRP

A doubly stochastic rainfall model with exponentially decaying pulses

Nonparametric Methods

Efficient estimation of a semiparametric dynamic copula model

Supplementary figures

Local Polynomial Modelling and Its Applications

New Intensity-Frequency- Duration (IFD) Design Rainfalls Estimates

12 - Nonparametric Density Estimation

A model-based approach to the computation of area probabilities for precipitation exceeding a certain threshold

GL Garrad Hassan Short term power forecasts for large offshore wind turbine arrays

Understanding Weather and Climate Risk. Matthew Perry Sharing an Uncertain World Conference The Geological Society, 13 July 2017

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Climate Downscaling 201

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Bayesian hierarchical space time model applied to highresolution hindcast data of significant wave height

Time Series and Forecasting Lecture 4 NonLinear Time Series

Evaluating Error of Temporal Disaggregation from Daily into Hourly Rainfall using Heytos Model at Sampean Catchments Area

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Motivational Example

Modelling Non-linear and Non-stationary Time Series

A re-sampling based weather generator

Climate Prediction Center Research Interests/Needs

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Nonlinear atmospheric response to Arctic sea-ice loss under different sea ice scenarios

OBJECTIVE CALIBRATED WIND SPEED AND CROSSWIND PROBABILISTIC FORECASTS FOR THE HONG KONG INTERNATIONAL AIRPORT

Estimating Design Rainfalls Using Dynamical Downscaling Data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

High-Resolution MPAS Simulations for Analysis of Climate Change Effects on Weather Extremes

What is MuDRain (Multivariate disaggregation of rainfall)? # $ K

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Simulation of 6-hourly rainfall and temperature by two resampling schemes

Dongkyun Kim Francisco Olivera Huidae Cho. 1 Introduction

Seasonal forecasting of climate anomalies for agriculture in Italy: the TEMPIO Project

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Cross-covariance Functions for Tangent Vector Fields on the Sphere

Additive Isotonic Regression

Single Equation Linear GMM with Serially Correlated Moment Conditions

At the start of the talk will be a trivia question. Be prepared to write your answer.

Function of Longitudinal Data

Transformation and Smoothing in Sample Survey Data

Anonymous Referee #1 Received and published: 21 June 2018

GPC Exeter forecast for winter Crown copyright Met Office

Stochastic swell events generator

A Gaussian state-space model for wind fields in the North-East Atlantic

Generating projected rainfall time series at sub-hourly time scales using statistical and stochastic downscaling methodologies

Francina Dominguez*, Praveen Kumar Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign

Does the stratosphere provide predictability for month-ahead wind power in Europe?

Ocean data assimilation for reanalysis

Indices of droughts (SPI & PDSI) over Canada as simulated by a statistical downscaling model: current and future periods

A re-sampling based weather generator

Flexible Spatio-temporal smoothing with array methods

A nonparametric method of multi-step ahead forecasting in diffusion processes

Statistics: Learning models from data

Reduced Overdispersion in Stochastic Weather Generators for Statistical Downscaling of Seasonal Forecasts and Climate Change Scenarios

Downscaling rainfall in the upper Blue Nile basin for use in

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Section 7: Local linear regression (loess) and regression discontinuity designs

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Seasonal prediction of extreme events

Hindcasting Wave Conditions on the North American Great Lakes

Markov chain optimisation for energy systems (MC-ES)

Uncertainty in merged radar - rain gauge rainfall products

WEATHER NORMALIZATION METHODS AND ISSUES. Stuart McMenamin Mark Quan David Simons

Rainfall Disaggregation Methods: Theory and Applications

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Bruno Sansó. Department of Applied Mathematics and Statistics University of California Santa Cruz bruno

Bootstrapping, Randomization, 2B-PLS

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Forecasting Using Time Series Models

Estimating the intermonth covariance between rainfall and the atmospheric circulation

Rainfall Disaggregation Methods: Theory and Applications

CMIP5-based global wave climate projections including the entire Arctic Ocean

Smooth functions and local extreme values

Improving linear quantile regression for

X random; interested in impact of X on Y. Time series analogue of regression.

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Supplementary Information Dynamical proxies of North Atlantic predictability and extremes

A Weighted Multiple Regression Model to Predict Rainfall Patterns: Principal Component Analysis approach

Statistical Reconstruction and Projection of Ocean Waves

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

STATISTICAL DOWNSCALING OF DAILY PRECIPITATION IN THE ARGENTINE PAMPAS REGION

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Transcription:

Extending clustered point process-based rainfall models to a non-stationary climate Jo Kaczmarska 1, 2 Valerie Isham 2 Paul Northrop 2 1 Risk Management Solutions 2 Department of Statistical Science, University College London Berlin Workshop on Weather Generators September 2017

Introduction/Motivation Requirement: Ability to use Poisson cluster models to simulate rainfall in a changing climate. Method: Non-parametric kernel-based approach relates parameters of the model to large scale atmospheric variables such as sea-level pressure and temperature, which can be simulated from a climate model. Focus of research: Sub-hourly resolution at single site.

Structure of Presentation 1. Point process-based rainfall model used here 2. Fitting methodology (Generalised Method of Moments) Current approach: discrete covariate: month Proposed approach: smooth seasonality and add or replace with continuous covariates 3. Empirical Study and Results

Original Bartlett-Lewis Rectangular Pulse model Summary of parameters: η rate of cell death µ X mean cell intensity Rainfall intensity Individual cells Storm duration Exp(γ) Cell inter-arrival times Exp(β) 2 Storm arrivals at rate λ Cell arrivals at rate β with cell at storm origin Time λ rate of storm arrival γ rate of storm death β rate of cell arrival

GMM fitting with discrete covariate With a separate model for each calendar month, m, we solve: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. where: Y t is the rainfall data in observation month t and T (Y t ) is the vector of statistics for that month, with τ(θ m ) the vector of expected values for calendar month m } T

GMM fitting with discrete covariate With a separate model for each calendar month, m, we solve: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. where: Y t is the rainfall data in observation month t and T (Y t ) is the vector of statistics for that month, with τ(θ m ) the vector of expected values for calendar month m We have: θ with 5 components, and T & τ with 13 (mean hourly rainfall, plus the coeffn of variation, skewness, & lag-1 ac at timescales of 5 minutes, and 1,6 and 24 hours) } T

GMM fitting with discrete covariate With a separate model for each calendar month, m, we solve: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. where: Y t is the rainfall data in observation month t and T (Y t ) is the vector of statistics for that month, with τ(θ m ) the vector of expected values for calendar month m We have: θ with 5 components, and T & τ with 13 (mean hourly rainfall, plus the coeffn of variation, skewness, & lag-1 ac at timescales of 5 minutes, and 1,6 and 24 hours) W m is the weights matrix for month m - we use the diagonal matrix of inverse sample variances of the 13 properties. } T

GMM fitting with discrete covariate With a separate model for each calendar month, m, we solve: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. where: Y t is the rainfall data in observation month t and T (Y t ) is the vector of statistics for that month, with τ(θ m ) the vector of expected values for calendar month m We have: θ with 5 components, and T & τ with 13 (mean hourly rainfall, plus the coeffn of variation, skewness, & lag-1 ac at timescales of 5 minutes, and 1,6 and 24 hours) W m is the weights matrix for month m - we use the diagonal matrix of inverse sample variances of the 13 properties. } T

What if our covariate is continuous rather than discrete? We could partition our continuous covariate into a number of discrete ordered bins and fit a separate model for each bin as per existing method. But is there a better way?

What if our covariate is continuous rather than discrete? We could partition our continuous covariate into a number of discrete ordered bins and fit a separate model for each bin as per existing method. But is there a better way? Motivating Example: Scatterplot smoothing Bins y 0.0 1.0 2.0 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x

What if our covariate is continuous rather than discrete? We could partition our continuous covariate into a number of discrete ordered bins and fit a separate model for each bin as per existing method. But is there a better way? Motivating Example: Scatterplot smoothing Bins Gaussian kernel weights y 0.0 1.0 2.0 3.0 y 0.0 1.0 2.0 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 x

Kernel Smoothing The example showed a local mean or Nadaraya-Watson estimate, using Gaussian kernel weights. It is given by: ˆf (x 0 ) = n j=1 K h(x j x 0 ) y j n j=1 K h(x j x 0 ) which must be calculated for each required value of x 0. ( ) K h (X t x 0 ) = 1 h K are local weights; (X t x 0 ) h h determines the width of the local neighbourhood. Properties of kernel function, K: integrates to 1 peaks at zero decreases as X t x 0 increases.

Methodology - Local Mean GMM Recall the formula with the discrete covariate, month: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. } T

Methodology - Local Mean GMM Recall the formula with the discrete covariate, month: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. Now, for a continuous covariate, we replace the indicator functions with kernel weights to get parameters at X = x 0 : [ { 1 n } T ˆθ(x 0 ) = argmin θx0 K h (X t x 0 ) [T (Y t ) τ(θ x0 )] n t=1 { 1 n } ] W x0 K h (X t x 0 ) [T (Y t ) τ(θ x0 )] n t=1 } T

Methodology - Local Mean GMM Recall the formula with the discrete covariate, month: [ { n 1 ˆθ m = argmin θm n t=1 I (mt=m) I (m t = m) [T (Y t ) τ(θ m )] W m { 1 n t=1 I (mt=m) t=1 t=1 n } ] I (m t = m) [T (Y t ) τ(θ m )]. Now, for a continuous covariate, we replace the indicator functions with kernel weights to get parameters at X = x 0 : [ { 1 n } T ˆθ(x 0 ) = argmin θx0 K h (X t x 0 ) [T (Y t ) τ(θ x0 )] n t=1 { 1 n } ] W x0 K h (X t x 0 ) [T (Y t ) τ(θ x0 )] n t=1 We use intervals of a month; covariate will typically be a mean monthly value. } T

The different timescales Time intervals in the summation - months

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary Long enough for small sample biases in the properties to be negligible.

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary Long enough for small sample biases in the properties to be negligible. Long enough to permit treatment of the properties as independent between the intervals.

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary Long enough for small sample biases in the properties to be negligible. Long enough to permit treatment of the properties as independent between the intervals. Timescales for the properties - 5 min to 24 hours

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary Long enough for small sample biases in the properties to be negligible. Long enough to permit treatment of the properties as independent between the intervals. Timescales for the properties - 5 min to 24 hours To provide information about the observed rainfall structure at both cell and storm levels

The different timescales Time intervals in the summation - months Short enough that it s reasonable to treat the series within each interval as stationary Long enough for small sample biases in the properties to be negligible. Long enough to permit treatment of the properties as independent between the intervals. Timescales for the properties - 5 min to 24 hours To provide information about the observed rainfall structure at both cell and storm levels Including a range of levels (especially sub-hourly) helps with parameter identification.

Data Practical application 5-min rainfall data from Bochum, Germany, 1931 to 1999; Monthly NCEP reanalysis data (available from Jan 1948, grid point: lat 52.5N, long 7.5E) plus NAO index. Combined data gives 624 monthly observations.

Data Practical application 5-min rainfall data from Bochum, Germany, 1931 to 1999; Monthly NCEP reanalysis data (available from Jan 1948, grid point: lat 52.5N, long 7.5E) plus NAO index. Combined data gives 624 monthly observations. Practical Issues Numerical optimisation Extend to local linear? ( design-adaptive ) Calculation of weighting matrix Choice of bandwidth: variance versus bias; Extension for multiple covariates: How many? Which ones? Bandwidth matrix

Single covariate: effect of the bandwidth h = 0.5 h = 1.5 h = 5 log λ log µ x log β 4.0 4.5 5 0 5 10 15 20 3 2 1 0 1 5 0 5 10 15 20 2 1 0 5 0 5 10 15 20 log β log µ x log λ 4.0 4.5 5 0 5 10 15 20 3 2 1 0 1 5 0 5 10 15 20 2 1 0 5 0 5 10 15 20 log β log µ x log λ 4.0 4.5 5 0 5 10 15 20 3 2 1 0 1 5 0 5 10 15 20 2 1 0 5 0 5 10 15 20 The choice of bandwidth, h, affects the smoothness of the curves. A higher h gives a smoother, flatter curve, with lower variance, but higher bias. log η log γ 0.5 1.0 1.5 2.0 5 0 5 10 15 20 3.0 2.5 2.0 log η log γ 0.5 1.0 1.5 2.0 5 0 5 10 15 20 3.0 2.5 2.0 log η log γ 0.5 1.0 1.5 2.0 5 0 5 10 15 20 3.0 2.5 2.0 We are assuming a global bandwidth: i.e. the same h across the whole curve. 1.5 5 0 5 10 15 20 1.5 5 0 5 10 15 20 temperature, deg C 1.5 5 0 5 10 15 20

Optimal bandwidth - temperature 1. Visualisation 2. A variant of Cross-Validation: 25 hold-out samples, splitting our 624 observation months randomly each time into: training sample of 399, and test sample of 225 For each split, find h that minimises the sum of weighted squared residuals over the test set, with parameters derived from the training set. The median optimal h was 1.28, the mean 1.35 Density scaled prediction error 1.5 1.0 0.5 0.0 1.08 1.06 1.04 1.02 1.00 1 2 3 4 5 bandwidth 1.0 1.5 2.0 optimal bandwidth

Multiple Covariates Now require bandwidth matrix (controls size and direction of smoothing) Diagonal H product kernels: smoothing by different amounts in the directions of the co-ordinate axes (2) K H (X t x 0 ) = K h1 (X t1 x 01 ) K h2 (X t2 x 0 )... K h3 (X t3 x 03 ) 2 (1) 2 (2) 2 (3) 0 0 0 2 2 2 2 1 0 1 2 2 1 0 1 2 2 1 0 1 2 Curse of dimensionality

Choosing covariates: preliminary analysis mean coeff of var skewness ac lag 1 5 min 6 hour 5 min 6 hour 5 min 6 hour slp -0.538 0.304 0.526 0.121 0.385 0.046-0.013 geo200 0.030 0.608 0.414 0.525 0.364-0.470-0.305 geo500-0.096 0.629 0.499 0.512 0.420-0.422-0.280 geo700-0.195 0.627 0.550 0.485 0.451-0.372-0.258 temp 0.170 0.577 0.275 0.546 0.265-0.542-0.344 thick 0.085 0.592 0.370 0.527 0.331-0.484-0.307 rhum 0.140-0.586-0.417-0.465-0.318 0.413 0.240 rhum700 0.169-0.580-0.434-0.448-0.328 0.385 0.229 shum 0.243 0.538 0.224 0.528 0.238-0.534-0.343 shum700 0.289 0.479 0.195 0.467 0.217-0.478-0.295 uwind 0.374-0.204-0.384-0.092-0.275-0.068-0.022 vwind 0.213-0.356-0.368-0.232-0.286 0.124 0.084 nao 0.069-0.045-0.094-0.030-0.087-0.036-0.012 nao(winter) 0.372-0.182-0.235 0.070-0.207-0.164 0.016

Model Comparison: Optimal covariates Scaled Error Statistic 110 100 90 80 100.0 86.1 87.2 83.0 74.3 slp/temp temp sm.month month none Covariates We compare prediction errors with different covariates (weights based on global variances). For multiple covariates, the optimal bandwidth calculation is limited to re-scaling the diagonal matrix of the univariate h s i.e. assume relative differences stay the same.

Optimal pair of covariates: slp and temp 0.04 10 0.04 10 0.03 λ 0.02 µ x 5 0.03 λ 0.02 µ x 5 0.01 1000 1010 1020 1030 sea level pressure,mb 0 1000 1010 1020 1030 sea level pressure,mb 0.01 0 10 20 temperature, deg C 0 0 10 20 temperature, deg C 0.4 0.4 β 5 γ β 5 γ 0.2 0.2 0 1000 1010 1020 1030 sea level pressure,mb 1000 1010 1020 1030 sea level pressure,mb 0 0 10 20 temperature, deg C 0 10 20 temperature, deg C 10 10 η 5 temperature, deg C η 5 sea level pressure,mb 5 0 5 10 15 20 1005 1010 1015 1020 1025 1030 1000 1010 1020 1030 0 10 20 sea level pressure,mb temperature, deg C Bandwidths: sea-level pressure: 2.0; temperature: 1.75

Comparison of fit v current approach mean rate, mm/h 0.14 0.12 0.10 0.08 0.06 0.04 2 4 6 8 10 month mean rate, mm/h 0.14 0.12 0.10 0.08 0.06 0.04 1005 1015 1025 sea level pressure,mb mean rate, mm/h 0.14 0.12 0.10 0.08 0.06 0.04 0 5 10 15 temperature, deg C mean rate, mm/h 0.14 0.12 0.10 0.08 0.06 0.04 4 0 2 4 6 zonal wind, m/s zonal wind, m/s Model with covariate Month, v Model with 3 optimal covariates.

Scaled prediction errors Comparison of fit: Other Properties Breakdown of weighted prediction errors by component (mean over 15 test samples) 0 20 40 60 80 100 none month temp slp/temp Covariate slp/temp/ uwind ac1.24h ac1.6h ac1.1h ac1.5m skew24h skew6h skew1h skew5m coeffv24h coeffv6h coeffv1h coeffv5m mean

Interannual Variability Observed v fitted (by calendar month) Observed v fitted (NCEP covariates) 0.14 0.14 Mean hourly rainfall, mm 0.12 0.10 0.08 Mean hourly rainfall, mm 0.12 0.10 0.08 0.06 0.06 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 Simulated distributions of mean annual rainfall (expressed in mm per hour) for Bochum. The bands show the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles and the thick black line shows the observed values.

Summary Local mean GMM appears to offer a useful new approach to fitting point-process rainfall models. With just 2 or 3 covariates, we can produce a model with better explanatory power than the current approach, and produce simulations that reflect future climate change scenarios. Key advantages: Can relate various properties of rainfall to covariates, not just mean or probability of occurrence. Extends existing methodology so can use with other versions of model (including spatial-temporal versions). Framework allows estimation of standard errors.

Limitations: Future developments? Parameter identifiability still an issue Interannual variability improved, but doesn t help fit to extremes very much Same level of smoothing for every parameter Boundary bias & high variance at boundaries Curse of dimensionality Options: Penalised splines - allows additive covariate effects & different levels of smoothing for different parameters. Initial tests with single covariate gave similar results. Addressing extremes: Adding other properties and/or covariates or combining with other model versions?

Acknowledgements Richard Chandler (University College London) Valerie Isham (University College London) Paul Northrop (University College London) Christian Onof (Imperial College London) EPSRC (UK Engineering and Physical Sciences Research Council)

Research presented here Useful References Kaczmarska, J. M., Isham, V. S. & Northrop, P. J. (2015), Local generalised method of moments: an application to point process-based rainfall models, Environmetrics 26 (4), 312-325 Point process based models/gmm Onof, C., Chandler, R.E., Kakou, A., Northrop, P., Wheater, H.S. & Isham, V. (2000), Rainfall modelling using Poisson-cluster processes: a review of developments, Stochastic Environmental Research and Risk Assessment, 14, 384-411 Jesus, J. & Chandler, R. E. (2011), Estimating functions and the generalized method of moments, Interface Focus, 1(6), 871-885 Local fitting Fan & Gijbels, L. (1996), Local Polynomial Modelling and its Applications, Chapman and Hall. Lewbel, A. (2007), A local generalized method of moments estimator, Economics Letters 94. Carroll, R. J., Ruppert, D. & Welsh, A. H. (1998), Local estimating equations, Journal of the American Statistical Association 93 (441), 214-227.