6 September 2013 Test
What is? Beijing during periods of low and high air pollution Air pollution is composed of sulphur oxides, nitrogen oxides, carbon monoxide and particulates. Particulates are small particles of solid or liquid material in the air. PM 2.5 and PM 10 are particulates that are smaller than 2.5 and 10 micrometres respectively. Test
Measuring The US Embassy and Chinese Government release hourly PM 2.5 readings for Beijing. Some people believe that there is a discrepancy between the two sources of data. Test
Measuring The US Embassy and Chinese Government release hourly PM 2.5 readings for Beijing. Some people believe that there is a discrepancy between the two sources of data. Both use the same formula to calculate the PM 2.5 index I from the concentration C: I = I high I low C high C low (C C low ) + I low, (1) but the breakpoints are different for the US AQI and Chinese API. Test
Air Quality Index & Index US breakpoints breakpoints C low C high I low I high C low C high I low I high 0 12 0 50 0 35 0 50 12.1 35.4 51 100 35.1 75 51 100 35.5 55.4 101 150 75.1 115 101 150 55.5 150.4 151 200 115.1 150 151 200 150.5 250.4 201 300 150.1 250 201 300 250.5 350.4 301 400 250.1 350 301 400 350.5 500 401 500 350.1 500 401 500 Table: PM 2.5 breakpoints for the US AQI and Chinese API. I = I high I low C high C low (C C low ) + I low Test
Air Quality Index & Index Plot of AQI/API vs Concentration Concentration 0 100 200 300 400 500 US Test 0 100 200 300 400 500 AQI/API
Our data consists of: six months of hourly PM 2.5 readings for Beijing from the US and Chinese sources; twelve years of daily PM 10 readings for Beijing, Tianjin, Shanghai and Suzhou from the Chinese Government. Test
There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index API 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 Index Test
There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 1. Separate the data into blocks and take the maximum value in each block; API Index Test
There are two methods for deciding which data points are extreme: API 100 200 300 400 500 Block Maxima 0 50 100 150 200 Index 100 200 300 400 500 Threshold Exceedances 0 50 100 150 200 1. Separate the data into blocks and take the maximum value in each block; 2. Choose a suitable threshold above which points are considered extreme. API Index Test
Generalised Pareto Distribution (GPD) For the hourly PM 2.5 data, we first took the daily maxima and then applied a threshold to determine the extremes. A GPD distribution could then be fitted to the data. Test
Generalised Pareto Distribution (GPD) For the hourly PM 2.5 data, we first took the daily maxima and then applied a threshold to determine the extremes. A GPD distribution could then be fitted to the data. The GPD has distribution functions of the form: { 1 ( 1 + H(y) = ξy ) 1/ξ σ if ξ 0 1 exp( y σ ) if ξ = 0, for y > 0, and subject to the constraint (1 + ξy σ ) > 0. Test
Testing the Reliability of the US/Chinese Data We want to test whether there is a difference between the US AQI and Chinese API data. Since the US and Chinese data are measured on different scales, it cannot be compared directly. Instead, we fit GPD models to the US and Chinese data sets separately and compared the threshold exceedance probabilities. Then we used a bootstrapping technique to test for differences. Test
Suppose we have data x 1,...,x n, and a model fitted to this data with parameters θ. works as follows: 1. Resample (with replacement) from these n observations, obtaining another sample also of length n. 2. Fit the model to the resampled data to get a new set of parameters θ 1. 3. Repeat the process of resampling and fitting the model N times, obtaining new parameters θ i each time, for i = 1,..., N. 4. These θ 1,...,θ N, then allow us to make inferences about the parameter θ. 5. Block bootstrapping involves taking blocks of the original data when resampling rather than individual data points. Test
Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. Test
Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. The confidence intervals were: US: (0.00530, 0.05989) : (0.00540, 0.06373). Test
Result of the Test The block bootstrapping procedure was applied to the probabilities that the PM 2.5 concentrations exceed the 500 threshold, with: blocks of seven days; 1000 iterations. 95% confidence intervals were found for US and Chinese bootstrapped probabilities. If the confidence intervals overlap, there is no significant difference between the sets of data. The confidence intervals were: US: (0.00530, 0.05989) : (0.00540, 0.06373). The confidence intervals overlap, suggesting there is no significant difference in the two data sets. Test
Result of the Test The boxplots of the bootstrapped probabilities are also very similar. Test
Result of the Test The boxplots of the bootstrapped probabilities are also very similar. 0.00 0.02 0.04 0.06 0.08 0.10 US 0.00 0.02 0.04 0.06 0.08 0.10 Figure: Boxplot of the bootstrapped probabilities Test
Result of the Test The boxplots of the bootstrapped probabilities are also very similar. 0.00 0.02 0.04 0.06 0.08 0.10 US 0.00 0.02 0.04 0.06 0.08 0.10 Figure: Boxplot of the bootstrapped probabilities This reiterates that there is no significant difference between the data from the US and. Test
Asymptotic Dependence It is interesting to investigate whether high API/AQI levels in one city correlate with high readings elsewhere. Test
Asymptotic Dependence It is interesting to investigate whether high API/AQI levels in one city correlate with high readings elsewhere. Two sets of data, X 1 and X 2, are: asymptotically dependent if lim Pr(X 1 > u X 2 > u) = α > 0; u asymptotically independent if lim Pr(X 1 > u X 2 > u) = 0. u Test
Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Test
Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Then the model is as follows: Pr(Y 1 > y, Y 2 > y) c(y)y 1/η, for y u, (2) where u is the threshold of interest, c is a slowly varying function of y, and η (0, 1]. Test
Modelling Bivariate Extremes The data, X 1 and X 2, first needs to be transformed to unit Fréchet random variables, Y 1 and Y 2, using a Probability Integral Transform. Then the model is as follows: Pr(Y 1 > y, Y 2 > y) c(y)y 1/η, for y u, (2) where u is the threshold of interest, c is a slowly varying function of y, and η (0, 1]. The parameter η can be used as a measure of asymptotic dependence: If η = 1, there is asymptotic dependence; if 0 < η < 1, there is asymptotic independence. Test
Comparison Between Beijing and Shanghai Initially, the asymptotic dependence of the PM 10 levels in Beijing and Shanghai was tested. Test
Comparison Between Beijing and Shanghai Initially, the asymptotic dependence of the PM 10 levels in Beijing and Shanghai was tested. The η value was 0.619804, which relates to asymptotic independence. Applying block bootstrapping gave a 95% confidence interval of (0.4573141, 0.6360939) for the η values. This confidence interval does not contain 1, suggesting that the PM 10 levels in Beijing and Shanghai are asymptotically independent. It is possible that the distance between Beijing and Shanghai is causing the asymptotic independence. Test
Time Series for Shanghai and Suzhou Time Series Plot of Shanghai API API 0 300 0 1000 2000 3000 4000 Time Time Series Plot of Suzhou API Test API 0 300 0 1000 2000 3000 4000 Time PM 10 levels are known to vary between seasons, so we focus on just the summer data for Shanghai and Suzhou.
Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 50 100 150 Suzhou The correlation between all the data is approximately 0.82. There is some positive linear correlation between the PM 10 levels in Shanghai and Suzhou. Test
Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou
Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou
Asymptotic Dependence: Plot of Suzhou and Shanghai Summer APIs Shanghai 20 40 60 80 100 140 Test 50 100 150 Suzhou
Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values Test
Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values The 95% confidence interval for the η values was (0.4147481, 0.6823547). Test
Asymptotic Dependence: The results for the bootstrapping of the η values were as follows: Eta 0.4 0.5 0.6 0.7 0.8 0.9 Bootstrapped Eta Values The 95% confidence interval for the η values was (0.4147481, 0.6823547). This suggests there is asymptotic independence between the air pollution levels in Shanghai and Suzhou. Test
The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. Test
The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. The bootstrapping test revealed that there is no asymptotic dependence between the two sets of data. Test
The correlation coefficient of 0.82 shows that overall, there is a positive linear relationship between the PM 10 data from Shanghai and Suzhou. The bootstrapping test revealed that there is no asymptotic dependence between the two sets of data. We can conclude that there are underlying factors that affect the pollution levels of cities in the same region, but that different factors contribute to the extreme air pollution levels in individual cities. Test
Coles, S. (2001) An to Statistical Modelling of Extreme Values, Springer, 2001. Ledford, A.W. and Tawn, J.A. (1996) Modelling Dependence within Joint Tail Regions, Journal of the Royal Statistical Society, 1996. Hill, B.M. (1975) A Simple General Approach to Inference About the Tail of a Distribution The Annals of Statistics, 1975. Test
Any Questions? Test