Analysing River Discharge Data for Flood Analysis This suggested response provides answers to the eleven questions posed on the web-site. Your report should have covered all of this material. (A) Produce a plot of the time-series of the data. Visualising your data is a crucial part of any form of data analysis. (B) 1
The autocorrelation function is reproduced here. It is clear that in just over 3500 days (actually 3650 days) there are 10 full cycles of the data from positive correlation through to negative correlation and back. 3650/10 = 365 days. Hence, the dominant periodicity is an annual one due to seasonal variation (as one would expect). (C) The new time series plot, focussing on the period from 2-3.5 years: This major discharge event (which is presumably associated with snow melt as frontal processes do not have this duration) has a timescale of close to half a year (rising at 0.5 years and only returning to base level at ~ 2.9 years). (D) The annual maximum series would appear to be a useful concept because the dominant periodicity in the data is annual (B). Hence, choosing one event a year should decorrelate the events and make them independent of one another. The largest event in the record has a time scale of close to half a year. Hence, choosing one event a year would permit all such maxima to be sampled, but would not permit to maxima on the same event to be chosen. (E) The annual maximum series looks like: 2
There is no obvious trend for such maxima to be increasing or decreasing through time, which suggests that climate changes have not had a major impact on this record. It is perhaps interesting that the highest recorded values were in the first half of the dataset and the lowest two in the most recent eight years, but this could just be random variation. It is possibly the case that there are periods lasting approximately ten years where there are preferentially high values followed by lower values. But this is again difficult to place any confidence on given the length of the record. (F) b will be the slope of the straight line through the data (the gradient) and a will be the intercept (the constant displacing the graph vertically about the origin). Hence, we can use regression to evaluate the parameters of the Gumbel distribution. (G) The scatterplot of discharges against the reduced variate: (H) The value for the coefficient of determination (R 2 ) is 0.9898 (to 4 decimal places), which shows that our regression line explains the vast majority of the variability in the dataset (nearly 99%). Hence, we have some confidence that the Gumbel distribution describes our data. Our value for R 2 (0.9898) can be obtained from the following equation: R 2 = regression sum of squares error / (regression sum of squares error + residual sum of squares error). That is, if the total sum of squares is the sum of that due to the regression line and that due to the residuals, R 2 is the ratio between the error explained by the regression line and the total error. 3
(I) The graph you obtain should look like: The fitted line seems to do a very good job of explaining the fit apart from the very smallest events. Indeed, if the analysis is repeated without these 6 points, the value for R 2 increases to 0.994. This new fit is shown as the black line, below. We appear to have a good fit to the very largest events and the fact that we overestimate the largest one is probably good as we would wish our estimate to be a conservative (safer) one. (J) The added values will be: Return (years) period Non-exceedance probability, P -ln ln (P) Est. Discharge (m 3 s -1 ) 10 0.9 2.250367 1024.434 20 0.95 2.970195 1163.739 50 0.98 3.901939 1344.055 100 0.99 4.600149 1479.176 200 0.995 5.295812 1613.804 4
When these points are added to the plot (blue triangles) we obtain: It would appear that our estimates of the ten and twenty year floods are pretty good. There is some doubt about the fifty year event and as soon as we extrapolate beyond the end of our fit, as we must do for the latter two design floods, then our confidence must decrease. For example, are there processes operating at time scales greater than the length of our record that would cause the actual data to plot below, or more worryingly, above the blue triangles if we had more years of data? This uncertainty is inherent to any risk analysis of this type. (K) There are several possibilities here: If we are expecting the climate in the area to warm in the future, then we could partition these data into the warmest and coldest years and see if the predicted Gumbel fits were significantly different If we know that particular discharge events were associated with different types of events (snow melt, frontal rainfall, convective storms) then we could use predictions of the future relative frequency of these (from model simulations) to modify the mixture of events making up the annual maximum series for example if these data consist of 30 snowmelt maxima, 30 frontal rainfall maxima, and 28 convective storm maxima, and in the future we expect these percentages to change from 34%, 34%, 32% to 40%, 25%, 35% then we could fit distributions to the separate extreme event series for different mechanisms and then produce random realisations of an annual maximum series using the new proportions to see how that would affect the overall fitted curve. This could be repeated 100 times to give a set of Gumbel fits that could be compared to the original to see if they differed significantly If there are patterns to these data of groups of years with high maxima followed by groups of years where the annual maxima are lower than average then this suggests that there is a coupling to a climatic fluctuation such as El Nino or the North Atlantic Oscillation. It could be seen if there is a difference in the models fitted to years that are grouped by their status regarding these climatic fluctuations. If so, one of these years could be seen as providing an analogue for climatic conditions in the future, or model predictions about the changing nature of this climatic oscillation under global warming could be used to infer a possible future annual maximum series curve Another idea would be to extend the actual dataset using palaeodata such as tree rings data, or similar, to see if there was a different response in the past (e.g. in the Little Ice Age). If so, perhaps clearer patterns might appear in the annual maximum series, which would assist with the above suggestion 5
If it transpires that there is no evidence for a change in climate and yet the annual maximum series is increasing or decreasing through time, then there are probably two main explanations; either the instrumentation has changed and the new discharge estimates are not really compatible with those in the past; or there has been land use change in the catchment that has altered the timing for water arriving at the river from different parts of the catchment meaning that floods increase if this is more synchronous, or decrease if existing synchronicity is reduced. 6