Emma Simpson. 6 September 2013

Similar documents
A Conditional Approach to Modeling Multivariate Extremes

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Assessing Dependence in Extreme Values

Financial Econometrics and Volatility Models Extreme Value Theory

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

New Classes of Multivariate Survival Functions

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Bayesian Modelling of Extreme Rainfall Data

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

A conditional approach for multivariate extreme values

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Statistics for extreme & sparse data

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

ESTIMATING BIVARIATE TAIL

MULTIVARIATE EXTREMES AND RISK

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Peaks-Over-Threshold Modelling of Environmental Data

Bayesian Inference for Clustered Extremes

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Generalized additive modelling of hydrological sample extremes

Extreme Value Analysis and Spatial Extremes

Overview of Extreme Value Analysis (EVA)

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

Sharp statistical tools Statistics for extremes

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Modelação de valores extremos e sua importância na

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

Models and estimation.

Estimating Bivariate Tail: a copula based approach

Bivariate generalized Pareto distribution

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

RISK ANALYSIS AND EXTREMES

Frequency Estimation of Rare Events by Adaptive Thresholding

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

Data. Climate model data from CMIP3

Multivariate generalized Pareto distributions

Accommodating measurement scale uncertainty in extreme value analysis of. northern North Sea storm severity

Using statistical methods to analyse environmental extremes.

Estimation of spatial max-stable models using threshold exceedances

Fin285a:Computer Simulations and Risk Assessment Section 6.2 Extreme Value Theory Daníelson, 9 (skim), skip 9.5

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

Change Point Analysis of Extreme Values

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

Classical Extreme Value Theory - An Introduction

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Regression, Curve Fitting and Optimisation

Wei-han Liu Department of Banking and Finance Tamkang University. R/Finance 2009 Conference 1

Estimation of the Angular Density in Multivariate Generalized Pareto Models

Tail dependence in bivariate skew-normal and skew-t distributions

Spatial and temporal extremes of wildfire sizes in Portugal ( )

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

Kevin Ewans Shell International Exploration and Production

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

High-frequency data modelling using Hawkes processes

Statistics of Extremes

Math 576: Quantitative Risk Management

On the Estimation and Application of Max-Stable Processes

Large sample distribution for fully functional periodicity tests

Bivariate Analysis of Extreme Wave and Storm Surge Events. Determining the Failure Area of Structures

Spatial Extremes in Atmospheric Problems

On the Estimation and Application of Max-Stable Processes

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Statistical modelling of extreme ocean environments for marine design : a review

Some conditional extremes of a Markov chain

Physically-Based Statistical Models of Extremes arising from Extratropical Cyclones

Shape of the return probability density function and extreme value statistics

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

Method of Conditional Moments Based on Incomplete Data

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

A Bayesian Spatial Model for Exceedances Over a Threshold

U.K. Ozone and UV Trends and Extreme Events

Change Point Analysis of Extreme Values

Uncertainties in extreme surge level estimates from observational records

Semi-parametric estimation of non-stationary Pickands functions

What Can We Infer From Beyond The Data? The Statistics Behind The Analysis Of Risk Events In The Context Of Environmental Studies

Bivariate generalized Pareto distribution in practice: models and estimation

High-frequency data modelling using Hawkes processes

The Spatial Variation of the Maximum Possible Pollutant Concentration from Steady Sources

Extreme Event Modelling

Change Point Analysis of Extreme Values

NEW METHOD FOR ESTIMATING DIRECTIONAL EXTREME WIND SPEED BY CONSIDERING THE CORRELATION AMONG EXTREME WIND SPEED IN DIFFERENT DIRECTIONS

The extremal elliptical model: Theoretical properties and statistical inference

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

Driving Restriction, Traffic Congestion, and Air Pollution: Evidence from Beijing

AREP GAW. AQ Forecasting

Estimation of Quantiles

Analysis methods of heavy-tailed data

Inference for clusters of extreme values

Discussion on Human life is unlimited but short by Holger Rootzén and Dmitrii Zholud

Two practical tools for rainfall weather generators

FRAPPÉ/DISCOVER-AQ (July/August 2014) in perspective of multi-year ozone analysis

Spatial extreme value theory and properties of max-stable processes Poitiers, November 8-10, 2012

Central Ohio Air Quality End of Season Report. 111 Liberty Street, Suite 100 Columbus, OH Mid-Ohio Regional Planning Commission

Estimation of Stress-Strength Reliability for Kumaraswamy Exponential Distribution Based on Upper Record Values

Transcription:

6 September 2013 Test

What is? Beijing during periods of low and high air pollution Air pollution is composed of sulphur oxides, nitrogen oxides, carbon monoxide and particulates. Particulates are small particles of solid or liquid material in the air. PM 2.5 and PM 10 are particulates that are smaller than 2.5 and 10 micrometres respectively. Test

Measuring The US Embassy and Chinese Government release hourly PM 2.5 readings for Beijing. Some people believe that there is a discrepancy between the two sources of data. Test

Measuring The US Embassy and Chinese Government release hourly PM 2.5 readings for Beijing. Some people believe that there is a discrepancy between the two sources of data. Both use the same formula to calculate the PM 2.5 index I from the concentration C: I = I high I low C high C low (C C low ) + I low, (1) but the breakpoints are different for the US AQI and Chinese API. Test

Air Quality Index & Index US breakpoints breakpoints C low C high I low I high C low C high I low I high 0 12 0 50 0 35 0 50 12.1 35.4 51 100 35.1 75 51 100 35.5 55.4 101 150 75.1 115 101 150 55.5 150.4 151 200 115.1 150 151 200 150.5 250.4 201 300 150.1 250 201 300 250.5 350.4 301 400 250.1 350 301 400 350.5 500 401 500 350.1 500 401 500 Table: PM 2.5 breakpoints for the US AQI and Chinese API. I = I high I low C high C low (C C low ) + I low Test

Air Quality Index & Index Plot of AQI/API vs Concentration Concentration 0 100 200 300 400 500 US Test 0 100 200 300 400 500 AQI/API

Our data consists of: six months of hourly PM 2.5 readings for Beijing from the US and Chinese sources; twelve years of daily PM 10 readings for Beijing, Tianjin, Shanghai and Suzhou from the Chinese Government. Test

There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index API 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 Index Test

There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 1. Separate the data into blocks and take the maximum value in each block; API Index Test

There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 1. Separate the data into blocks and take the maximum value in each block; 2. Choose a suitable threshold above which points are considered extreme. API Index Test

Generalised Pareto Distribution (GPD) For the hourly PM 2.5 data, we first took the daily maxima and then applied a threshold to determine the extremes. A GPD distribution could then be fitted to the data. Test

Generalised Pareto Distribution (GPD) For the hourly PM 2.5 data, we first took the daily maxima and then applied a threshold to determine the extremes. A GPD distribution could then be fitted to the data. The GPD has distribution functions of the form: { 1 ( 1 + H(y) = ξy ) 1/ξ σ if ξ 0 1 exp( y σ ) if ξ = 0, for y > 0, and subject to the constraint (1 + ξy σ ) > 0. Test

Testing the Reliability of the US/Chinese Data We want to test whether there is a difference between the US AQI and Chinese API data. Since the US and Chinese data are measured on different scales, it cannot be compared directly. Instead, we fit GPD models to the US and Chinese data sets separately and compared the threshold exceedance probabilities. Then we used a bootstrapping technique to test for differences. Test

Suppose we have data x 1,...,x n, and a model fitted to this data with parameters θ. works as follows: 1. Resample (with replacement) from these n observations, obtaining another sample also of length n. 2. Fit the model to the resampled data to get a new set of parameters θ 1. 3. Repeat the process of resampling and fitting the model N times, obtaining new parameters θ i each time, for i = 1,..., N. 4. These θ 1,...,θ N, then allow us to make inferences about the parameter θ. 5. Block bootstrapping involves taking blocks of the original data when resampling rather than individual data points. Test

Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. Test

Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. The confidence intervals were: US: (0.00530, 0.05989) : (0.00540, 0.06373). Test

Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. The confidence intervals were: US: (0.00530, 0.05989) : (0.00540, 0.06373). The confidence intervals overlap, suggesting there is no significant difference in the two data sets. Test

Result of the Test The boxplots of the bootstrapped probabilities are also very similar. Test

Result of the Test The boxplots of the bootstrapped probabilities are also very similar. 0.00 0.02 0.04 0.06 0.08 0.10 US 0.00 0.02 0.04 0.06 0.08 0.10 Figure: Boxplot of the bootstrapped probabilities Test

Result of the Test The boxplots of the bootstrapped probabilities are also very similar. 0.00 0.02 0.04 0.06 0.08 0.10 US 0.00 0.02 0.04 0.06 0.08 0.10 Figure: Boxplot of the bootstrapped probabilities This reiterates that there is no significant difference between the data from the US and. Test

Asymptotic Dependence It is interesting to investigate whether high API/AQI levels in one city correlate with high readings elsewhere. Test

Asymptotic Dependence It is interesting to investigate whether high API/AQI levels in one city correlate with high readings elsewhere. Two sets of data, X 1 and X 2, are: asymptotically dependent if lim Pr(X 1 > u X 2 > u) = α > 0; u asymptotically independent if lim Pr(X 1 > u X 2 > u) = 0. u Test

Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Test

Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Then the model is as follows: Pr(Y 1 > y, Y 2 > y) c(y)y 1/η, for y u, (2) where u is the threshold of interest, c is a slowly varying function of y, and η (0, 1]. Test

Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Then the model is as follows: Pr(Y 1 > y, Y 2 > y) c(y)y 1/η, for y u, (2) where u is the threshold of interest, c is a slowly varying function of y, and η (0, 1]. The parameter η can be used as a measure of asymptotic dependence: If η = 1, there is asymptotic dependence; if 0 < η < 1, there is asymptotic independence. Test

Comparison Between Beijing and Shanghai Initially, the asymptotic dependence of the PM 10 levels in Beijing and Shanghai was tested. Test

Comparison Between Beijing and Shanghai Initially, the asymptotic dependence of the PM 10 levels in Beijing and Shanghai was tested. The η value was 0.619804, which relates to asymptotic independence. Applying block bootstrapping gave a 95% confidence interval of (0.4573141, 0.6360939) for the η values. This confidence interval does not contain 1, suggesting that the PM 10 levels in Beijing and Shanghai are asymptotically independent. It is possible that the distance between Beijing and Shanghai is causing the asymptotic independence. Test

Time Series for Shanghai and Suzhou Time Series Plot of Shanghai API API 0 300 0 1000 2000 3000 4000 Time Time Series Plot of Suzhou API Test API 0 300 0 1000 2000 3000 4000 Time PM 10 levels are known to vary between seasons, so we focus on just the summer data for Shanghai and Suzhou.

Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 50 100 150 Suzhou The correlation between all the data is approximately 0.82. There is some positive linear correlation between the PM 10 levels in Shanghai and Suzhou. Test

Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou

Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou

Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou

Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values Test

Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values The 95% confidence interval for the η values was (0.4147481, 0.6823547). Test

Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values The 95% confidence interval for the η values was (0.4147481, 0.6823547). This suggests there is asymptotic independence between the air pollution levels in Shanghai and Suzhou. Test

The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. Test

The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. The bootstrapping test revealed that there is no asymptotic dependence between the two sets of data. Test

The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. The bootstrapping test revealed that there is no asymptotic dependence between the two sets of data. We can conclude that there are underlying factors that affect the pollution levels of cities in the same region, but that different factors contribute to the extreme air pollution levels in individual cities. Test

Coles, S. (2001) An to Statistical Modelling of Extreme Values, Springer, 2001. Ledford, A.W. and Tawn, J.A. (1996) Modelling Dependence within Joint Tail Regions, Journal of the Royal Statistical Society, 1996. Hill, B.M. (1975) A Simple General Approach to Inference About the Tail of a Distribution The Annals of Statistics, 1975. Test

Any Questions? Test