Regionalization in Ifylwltm (Proceedings of the Ljubljana Symposium, April 1990). IAHS Publ. no. 191, 1990. The use of L-moments for regionalizing flow records in the Rio Uruguai basin: a case study ROBM T. CLARKE & LUIS EDGAR MONTENEGRO TERRAZAS Institute de Pesquisas Hidtiulicas, CP 530, Porto Alegre, RS, Brasil Abstract This paper explores the use of L-moments to regionalize annual maximum mean daily discharge (yl), using data from 29 sub-basins of the Rio Uruguai in southern Brazil. As first assumptions, a Gumbel distribution was taken to describe the probability distribution of yl, and basin area was taken as the principal basin characteristic in regression analyses. Multivariable (as distinct from multivariate) regressions were used to obtain estimates of (a) L-moments of yl for ungauged basins; (b) conventional moments of yl for ungauged basins, and the two sets of moments were used to derive estimates of (i) Gumbel parameters and hence (ii) estimates of yl with given return periods, together with their approximate confidence limits. Use of L-moments gave estimates with narrower confidence limits than conventional moments, although the difference was not large. L'utilisation des moments en L pour la régionalisation des données de crues dans le bassin de l'uruguay: étude d'un cas Résumé L'emploi des moments en L pour la régionalisation du maximum annuel du débit moyen journalier (yl) est étudié dans cette communication, en partant des données de 29 sous-bassins du fleuve Uruguay, situés dans le sud du Brésil. On prend pour hypothèses de départ le fait que la distribution de Gumbel décrit la probabilité de distribution de yl et que la surface du bassin est la caractéristique principale à prendre en compte dans les analyses de régression. Des régressions multivariables (et non pas multivariées) sont employées pour estimer (a) les moments en L de yl pour les bassins sans mesures de débits; (b) les moments conventionnels de yl pour ces mêmes bassins, et ces deux ensembles de moments sont utilisés pour évaluer (i) les paramètres de Gumbel et (ii) les valeurs de yl pour des périodes de récurrence données, en même temps que leurs intervalles de confiance approximatifs. L'emploi des moments en L conduit à des intervalles de confiance plus étroits que ceux obtenus avec les moments conventionnels, encore que la différence soit faible. 179
Robin T. Clarke & Luis Edgar Montenegro Terrazas 180 INTRODUCTION The hydrological application of probability-weighted moments (denoted by PWMs) began with Greenwood et al (1979) and Landwehr et al (1979). Their exploration has subsequently been much extended by Hosking and Wallis (Hosking et al, 1985; Hosking & Wallis, 1987; Hosking, 1986) who derived from PWMs a set of quantities termed L-moments, which are weighted sums of the expected values of order statistics. Hosking has shown that: (a) the first two L-moments / x (equal to the sample mean), l 2, and the L-moment ratios / 3 // 2 and / 4 // 2 give a useful and informative description of any random sample of statistical data: they summarize a data set in a manner similar to, but in many ways preferable to, that of the conventional sample moments; (b) estimates of the parameters in probability distributions can be obtained using PWMs: the estimation procedures are commonly more tractable than the method of maximum likelihood with less frequent resource to iteration procedures. They are reasonably efficient (typically greater than 90% for means, and 70-80%, or higher, for scale and shape parameters. For small samples of data, estimates given by L-moment procedures are sometimes more accurate than maximum likelihood estimates. The theory of PWMs parallels the theory of conventional moments, but their principal advantage over conventional moments is that PWMs are more robust and less sensitive to outlying values in the data, and enable more secure inferences to be made, from small samples, about an underlying probability distribution. Hosking (1986) also reports that, compared with conventional moments, L-moments are less subject to bias in estimation and approximate their asymptotic normal distribution more closely in finite samples. It is clear that PWMs and L-moments appear promising tools for use in many areas of statistical analysis of data, and the purpose of this paper is to report a preliminary study of how L-moments perform when used for regionalization of flow records from sub-basins of the Rio Uruguai. Whilst regionalization commonly involves the multiple regression of flood characteristics on a number of variables that describe basin topography, geology and climate, the exploratory nature of this paper limited the number of independent variables in the regression to one: namely basin area, which commonly accounts for a large part of the total variation when a flow statistic is taken as the dependent variable in a regression. Calculation of L-moments We assume that we have an ordered random sample x l < x 2 < x 3 <... < x n with n > r, drawn from a probability distribution; the x could be, for example, an ordered set of annual maximum discharged. Define: a= n 1 I a [7']x./ ["J) r = 0, 1, 2,..., «- 1 (1)
181 The use of L-moments for regionalizingflowrecords 5 r = n 1 X? =1 pjx./ [?) r = 0, 1, 2 «- 1 (2) where by convention [ ]= 0 if & < ;'. Then Hosking (1986) shows that unbiassed estimates of the first four L-moments are: l x = a o = h l 2 = a Q -2a 1 =2b x - b Q / 3 = a 0 - oflj + 6a 2 = 6b 2-6b 1 + b 0 / 4 = a 0-12a t + 30a 2-20«3 = 206 3-30& 2 + 12Z> X - b Q Natural estimators of the L-moment ratios are then / 3 // 2 and / 4 // 2. However whilst it is true that l v l 2, / 3 and l 4 are unbiassed estimators, it is not true that / 4 // 2 and / 3 // 2 are unbiassed (Hosking, 1986) although they are consistent estimators. We now turn to the hydrological data used in the base study that is the subject of this paper, and for which the sample L-moments l v l 2, l 3 and / 4 were calculated. We calculated the regression relationships between the /. and basin area; if these regressions account for a large part of the basin-to-basin variation in the /, there will then be the prospect of securing good estimates of the / ;. for other basins for which the area can be estimated (from maps or otherwise) but which have no hydrological records. Having obtained such estimates from the regression equations on basin area, the estimates / ;. for the ungauged basin can be used to fit a selected probability distribution; if, for example, it were considered appropriate to obtain the Gumbel distribution f(x) appropriate for the representation of annual maximum discharge from the ungauged basin, where: f(x) = 4)' 1 exph* - 8)/< )}exp[-exp{-(x - 8)/< >}] -» < x < < then the parameters $> and 9 could be estimated, using the estimates l x and l 2 obtained from the regression equations, by means of the following equations: $=/ 2 /log e 2 A A 9 = l t - 7 4> where y is Euler's constant 0.5772.... Similar equations are given by Hosking (1986) for cases where the desired distributions are gamma, generalized extreme value (GEV), Wakeby, or indeed others which are less widely considered for the representation of hydrological variables. Our own work has included the fitting of GEV distributions; this work is to be reported elsewhere.
Robin T. Clarke & Luis Edgar Montenegro Terrazas 182 THE DATA The data used in this exploratory study were the flow data from the basins within the Rio Uruguai, as published in the volume Boletim Fluviométrico Série F-7.02: Bacia do Rio Uruguai (1979). This volume presents flow data up to 1975; a more thorough analysis would have used the considerable quantity collected since that time, but the published data were more readily available and served the purposes of the present preliminary study. Basins with less than 10 years of flow record (up to 1975) were not included in the study; in terms of area, the basins ranged from 41 km 2 for the Rio Antinhas at Ponte do Rio Antinhas to 52 832 km 2 for the Rio Uruguai at Passo Caxambu. For each of the 29 basins, estimates of the first four L-moments were calculated for each of the following variables: the annual maximum (yl) of the 365(366) mean daily discharges; the annual minimum (y2) of the mean daily discharges; and the mean daily discharges below which 25% (y3), 50% (y4), 75% (y5) and 90% (y6) of the mean daily discharges lay. For each of these variables, the standard deviation and coefficients of asymmetry and kurtosis were also calculated. For reasons of space, this paper presents only the results from the analysis of annual maximum discharge mean daily discharge, yl. RESULTS "We assumed that it was appropriate to describe variation in yl for each basin by a Gumbel distribution (although there is some evidence, not presented here, for believing that some other distribution would be more suitable). To give focus to the investigation, it was further assumed (a) that the annual maximum discharges p with return periods P = 50 years, 100 years and 200 years were to be calculated for individual ungauged basins with areas equal to 0.5, 0.75, 1.00, 1.25 and 1.5 times the mean area of the 29 basins used in the analysis; (b) that approximate confidence intervals of ±2 / (var p) were required for the estimates of each of the p. Since the cumulative distribution function for the Gumbel distribution is given by: F(y) = exp[-exp -fy - 9)/0] in the notation used earlier, the annual maximum discharge with return period P' 1 years is: p = 9-4> In In (1/[1 - P]) with variance given by: var p = var 6 + {In In [1/(1 - P)]} 2 var $ - 2 {In In [1/(1 - P)]} cov(8,$) in which the variances and covariances are given by:
183 The use of L-moments for regionalizing flow records var <f> = var / 2 /(ln 2) 2 var 0 = var l\ + (y/ln 2) 2 var l 2-2(7/ln 2) cov^.y cov(#, ) = (1/ln 2) cov(î v î 2 ) - 7 var ^(In 2) A A A A Expressions for var / j, var / 2 and cov (/ vl 2) can be obtained from the multivariable regression equations of l x and / 2 on basin area A, after substitution of A = 0.53, ^ = 0.753,.4 = 1.(0, ^ = 1.253 and A = 1.53 respectively. The above equations provide estimates of the annual maximum discharges with return periods 50, 100 and 200 years for basins without flow records, when a Gumbel distribution is assumed and when the Gumbel parameters have been regionalized through the regression of L-moments on basin area A. A similar procedure could be followed in which the mean and standard deviation could be regionalized, through regressions on basin area, with the Gumbel parameters estimated by the ordinary method of moments. The A expression A for var p remains the same, but the expressions for var $, var 6 and cov(#,8) now become: Table 1 Regionalized estimates of annual maximum mean daily discharges (. units m 3 s' 1 ) with return periods 20, 50 and 100 years, for basins with areas 0.5Â~, 0.7SA, 1.0Â, 1.25Â and 1.5A~, with confidence limits: regionalization by regression of L-moments on basin area (Gumbel distribution assumed: A is mean basin area) Return period (years): 50 100 200 A = 0.SA: 1594 1775 1956 ±2/(var 7 ) ±1329 ±1509 ±1690 A = 0.75A:. V A ±2/(var ^ 2344 ±1328 2613 ±1508 2882 ±1689 A=1.0A: y P A ±2/(var y 3093 ±1327 3451 ±1508 3808 ±1689 A=1.25Â: 3842 4290 4735 ±2/(var ) ±1328 ±1508 ±1689 A=1.SÂ:. "A ±2/(var ) 4592 ±1329 5128 ±1509 5661 ±1690
Robin T. Clarke & Luis Edgar Montenegro Terrazas 184 var $ = 6 var s lir var ê = var m + 6 7 2 var s /n 2 ~2j -J (6/n z ) cov (m,s) cov ($,9) = JS/n 2 ) cov (s,m) - 6 7 var s/n z A Approximate confidence limits for the regionalized estimates of p are then given by ±2 J (var p, and the width of this confidence band may be compared with the width of the confidence band obtained by the method of L-moments. With the assumption that the annual maxima are Gumbel-distributed, Tables 1 and 2 show the estimates of the annual maxima with return periods 50, 100 and 200 years, for ungauged basins with areas equal to multiples of the mean basin area of the 29 basins. It can be seen that, where regionalized L-moments were used to estimate the Gumbel parameters, the widths of the approximate confidence band for the estimates of annual maxima with given return periods were consistently less than where the regionalized mean and standard deviation were used to estimate the Gumbel parameters. This result is consistent with the theoretical results of Hosking (1986) concerning the Table 2 Regionalàed estimates of annual maximum mean daily discharges (. units m 3 s' 2 ) with return periods 20, 50 and 100 years, for basins with areas 0.5Â, 0.75Â, 1.0Â, 1.2SÂ and 1.5Â, with confidence limits: regionalization of basin mean (also equal to lj) and standard deviation by regression on basin area (Gumbel distribution assumed: Â is mean basin area) Return period (years): 50 100 200 A = 0.5A: 1560 1734 1907 ±V (var J ±1389 ±1567 ±1747 A = 0.751: 2314 2578 2840 ±3/ (var y ±1388 ±1566 ±1745 A=1.0l:. * A ±2/(var J 3069 ±1388 3422 ±1565 3774 ±1745 A=1.25Â: 3823 4265 4706 ±2/(var J ±1388 ±1566 ±1745 A=1.SÂ: 4577 5109 5639 ±2j(var J ±1389 ±1567 ±1747
185 The use ofl-momenis for regionalizing flow records efficiency of L-moment estimation procedures relative to ordinary moment estimation procedures; however, with the assumptions used in the present pilot study it can be seen that the increases in efficiency are not large. Comparison of the estimates p in Tables 1 and 2 shows that estimates p given by L-moment regionalization are consistently greater than the estimates given by regionalization of the mean and standard deviation. Explanations for this observation have been investigated and will be reported in a later paper. DISCUSSION The analyses described in this paper are capable of deeper study in a number of ways. First, the data can be assembled from those basins that were not included in the study because, in 1975, the record length was less than 10 years; these basins provide estimates of p which can be compared with the regionalized estimates of p. Second, the use of probability distributions other than the Gumbel requires further exploration. Third, the regionalization used in the preliminary study has been of a particularly rudimentary kind, since basin area was the only variable used; clearly other variables descriptive of basin morphology and climate must be included. The results given in the paper are of interest principally because they are consistent with theoretical results derived for simpler cases, and because they show the need for further exploration of L-moment properties. REFERENCES Greenwood, J. A., Landwehr, J. M., Matalas, N. C. & Wallis, J. R. (1979) Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Wat. Resour. Res. 15,1049-1054. Hosking, J. R. M. (1986) The theory of probability weighted moments. Research Report RC12210, IBM Research, orktown Heights, New ork. Hosking, J. R. M. & Wallis, J. R. (1987) Parameter and quantile estimation for the generalised Pareto distribution. Technomeaics29, 339-349. Hosking, J. R. M., Wallis, J. R. & Wood, E. F. (1985) Estimation of the generalised extremevalue distribution by the method of probability-weighted moments. Technometrics 27, 251-261. Landwehr, J. M., Matalas, N. C. & Wallis, J. R. (1979) Probability weighted moments compared with some traditional techniques in estimating Gumbel parameters and quantiles. Wat. Resour. Res. 15, 1055-1064.