Advances in Water Resources 23 (2000) 571±578 Basin level statistical properties of topographic index for North America Praveen Kumar a, *, Kristine L. Verdin b, Susan K. Greenlee b a Environmental Hydrology and Hydraulic Engineering, Department of Civil and Environmental Engineering, University of Illinois, Urbana, IL 61801, USA b Earth Resources Observation Systems (EROS) Data Center, Sioux Falls, SD 57198, USA Received 17 July 1999; accepted 14 October 1999 Abstract For land±atmosphere interaction studies several Topmodel based land-surface schemes have been proposed. For the implementation of such models over the continental (and global) scales, statistical properties of the topographic indices are derived using GTOPO30 (30-arc-second; 1 km resolution) DEM data for North America. River basins and drainage network extracted using this dataset are overlaid on computed topographic indices for the continent and statistics are extracted for each basin. A total of 5020 basins are used to cover the entire continent with an average basin size of 3640 km 2. Typically, the rst three statistical moments of the distribution of the topographic indices for each basin are required for modeling. Departures of these statistical moments to those obtained using high resolution data have important implications for the prediction of soil-moisture states in the hydrologic models and consequently on the dynamics of the land±atmosphere interaction. It is found that a simple relationship between the statistics obtained at the 1 km and 90 m resolutions can be developed. The mean, standard deviation, skewness, L-scale and L-skewness all show approximate linear relationships between the two resolutions making it possible to use the moment estimates from the GTOPO30 data for hydrologic studies by applying a simple linear downscaling scheme. This signi cantly increases the utility value of the GTOPO30 datasets for hydrologic modeling studies. Ó 2000 Elsevier Science Ltd. All rights reserved. 1. Introduction The treatment of land-surface heterogeneity in land± atmosphere coupled model studies has emerged as a pressing research issue during the last several years. The concept of topographic index (also called wetness index), originally proposed in the Topmodel [1], for characterizing the distribution of moisture states in a basin has gained considerable attention and successful models based on this concept have been developed for land± atmosphere interaction studies [3,4,11,15]. For this purpose basins, and not typically used rectangular grids conforming to the atmospheric models, are more appropriate units for modeling the terrestrial hydrologic processes as they are better suited to capture the heterogeneity arising from topographic controls over surface and sub-surface ow. Regions of ow convergence and low vertical soil moisture de cit are identi ed as large values for the topographic index, and low values correspond to uphill areas of ow divergence and/or high * Corresponding author. E-mail address: kumar1@uiuc.edu (P. Kumar). vertical soil-moisture de cit. In Topmodel a hydrologic similarity assumption is invoked. This states that all locations in a basin with the same topographic index will have the same hydrologic response. Models for land± atmosphere studies that utilize the probability distribution of the topographic index over the basin to capture this behavior for land±atmosphere interaction studies have been tested and validated for individual basins or over limited areas [4,15], but have not been implemented in GCMs, among other reasons, for lack of basin level topographic characteristics with continental scale coverage. Recently the United States Geological Survey (USGS) has developed a digital elevation model (DEM) at 30-arc-second resolution with global coverage [5]. This enables the e cient estimation of derivative information such as slope, aspect, hydrologic ow paths, ow accumulation and basin boundaries [17]. The basins are represented hierarchically at ve levels of subdivisions with the average basin size ranging from 2; 209; 207 km 2 at Level 1 to 3640 km 2 at Level 5. These developments pave the way for the implementation of basin level hydrologic models with continental and global coverage. 0309-1708/00/$ - see front matter Ó 2000 Elsevier Science Ltd. All rights reserved. PII: S 0309-1708(99)00049-4
572 P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 The objective of this paper is to describe the characteristics of the topographic indices at the basin level for North America, extracted from the 30-arc-second DEM. Modeling studies using the data are not discussed. Usually, the rst three statistical moments of the distribution of the topographic indices for each basin is required for modeling. These moments enable the parameter estimation of the probability distribution. Typically a 3-parameter gamma (or Pearson-III) distribution is used [14]. The properties of the tted distribution are sensitive to the DEM resolution and this impacts the performance of the hydrologic model [19]. In order to address this problem in the use of 30- arc-second DEM, we adopt the following procedure: (i) estimate topographic index using the single ow algorithm from the 30-arc-second DEM data; (ii) overlay basin boundaries and estimate statistical moments; (iii) analyze topographic indices obtained from high resolution 3-arc-second data for selected regions representing a range of topographic features to quantify the impact of resolution on the estimates of the statistical moments; (iv) establish a functional relationship (downscaling function) to convert statistical moments from the 1 km data to equivalent estimates at the higher resolution. Once the downscaling function is identi ed, we can convert the estimates from the 30-arc-second data to estimates that would be representative of topographic index if it was estimated from high resolution data. This method is useful because using high resolution data directly for the estimation of the distributional properties of the topographic index is very di cult due to the enormous volume of the data, especially if global applications are considered. In a similar study, Wolock and McCabe [20] studied the di erences in topographic attributes obtained at the two di erent resolutions to identify if the di erences were due to terrain-discretization e ects or smoothing. Our work is complimentary in that it is geared toward studying the properties at the basin scale for GCM applications. Analysis regions for the higher resolution data are chosen to coincide with that of Wolock and McCabe [20] so that the two results can be used together for further studies. 2. Data description US Geological Survey's EROS (Earth Resources Observation Systems) Data Center in Sioux Falls, SD has developed a global digital elevation model called GTOPO30. The resolution of this dataset is 30-arc-second (8 1 3 10 3 degrees). The vertical resolution is 1 m, and the elevation values for the globe range from 407 to 8752 m above mean sea level. Additional details about the dataset are available in [5]. A subset of this dataset corresponding to the North American Continent is used for this study. The dataset was rst projected from geographic coordinates to Lambert Azimuthal Equal Area coordinate system at a resolution of 1 km. This renders each cell, regardless of the latitude, to represent the same ground dimensions (length and area) as every other cell. Consequently, derivative estimates such as drainage areas, slope, etc. are easier, consistent, and reliable. The extraction of these hydrographic features from the 30-arc-second dataset are based on the drainage analysis algorithm of Jenson and Domingue [9]. This algorithm rst identi es and lls spurious sinks (or pits). E ort is made to preserve natural sinks such as lakes in the landscape. Then for each cell, direction of steepest descent from among its eight neighbors is computed. This information is then used to compute the ow accumulation for each cell. A 1000 km 2 threshold is then applied to the ow accumulation values to obtain a drainage network in raster format and then vectorized [17]. The drainage network is then used to identify basins and sub-basins. In order to represent the basins hierarchically, a system developed by Pfafstetter [12] is used which utilizes an e cient coding scheme [17] (the reader is encouraged to refer to the web publication [16]). Basins at ve levels of subdivision are developed. For this study, Level 5 description was chosen with the mean basin size of 3640 km 2. Fig. 1 illustrates a typical layout of basin patterns at Level 5 for a region in the north-eastern United States. It is found that at this level the basins provide a su cient subgrid resolution for GCM applications [3,10,11]. In order to assess the impact of the resolution on the estimates of topographic indices, a higher resolution dataset (3-arc-second) available from USGS is also used for selected regions (see Fig. 2) These datasets are available in blocks of 1 1 latitude±longitude coverage for the entire United States. The ground dimensions are 92.6 m along the latitude and 92:6 cos ph=180 m along the longitude where h is the latitude in degrees. Consequently the ground resolution along the longitude ranges from roughly 80 m in the southern US to 60 m near the Canadian border. This dataset is also referred to as the 90 m resolution data in this paper. Thirty- ve 1 1 latitude±longitude grids, as shown in the Fig. 2, were selected for comparative analysis at the high and low resolutions. 3. Analysis The topographic index at any location in the watershed is de ned as ln a= tan b where a is the upstream contributing area, from the watershed divide, per unit contour length, and tan b is the local slope. It was estimated by applying the single ow algorithm [18] for the entire continent using the 30-arc-second dataset, and for
P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 573 Fig. 1. Level 5 subdivision of a region of the eastern United States overlaid on the GTOPO30 DEM. Black and white lines represent the streams and the basin boundaries, respectively. the selected 1 1 latitude±longitude regions (Fig. 2) using the 3-arc-second dataset. At these resolutions the multiple ow algorithm [13] does not provide a better estimate and consequently were not used. Appropriate measures are taken to account for the distortions in the estimates of contributing area and slopes due to the curvilinear latitude±longitude coordinates of the 3-arcsecond data. Spatial statistical moments of topographic indices over each of the 5020 basins were computed. Figs. 3 Fig. 2. 1 km DEM data for the North American basin. Overlaid 1 1 latitude±longitude boxes indicate regions where 90 m data were used to extract the ner resolution topographic indices.
574 P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 and 4 show the spatial distribution of these statistical moments. We observe that the mean is generally larger in at areas and the standard deviation is generally larger in mountainous regions. This is intuitive since in at regions we tend to have larger contributing areas and smaller slopes giving rise to a larger topographic index. The variability of the topographic relief and local slopes in mountainous regions gives rise to more spread in the distribution function and consequently to larger values of the standard deviation. Fig. 5 shows the spatial distribution of the skew. As expected the skew is generally larger in areas where there are large ow accumulations. For a very small fraction of the continental area (2.2%) the skew was found to be negative. The basins corresponding to these areas were typically very small. Consequently very few values were used in computing the skew and therefore the estimates have large estimation errors. We believe that the presence of the negative value is a consequence of the limitation of statistical estimation due to small sample size rather than due to the estimation of topographic index from a low resolution dataset. For each of the thirty- ve 1 1 latitude±longitude regions selected for high resolution analysis, the spatial statistical moments were computed from estimates obtained from both the 3-arc-second and 30-arc-second DEM data. Fig. 6 shows the plots of the three moments and skewness obtained at the two resolutions. The regression lines (solid lines) are obtained using the least trimmed square robust regression which is based on a genetic algorithm [2]. This algorithm provides better linear ts by excluding outliers as compared to the usual least-squares regression (plotted as dotted lines). On the downside, however, no coe cient of correlation can be estimated. As seen from the gures, there are nice linear relationships for the rst three moments. However, there are noticeably large deviations from linearity in the skewness plots (no systematic dependence of these deviations on topographic features could be established). This can have signi cant in uence on the estimate of the probability distribution and subsequently on the predicted hydrologic response. For example, the parameter a of the gamma distribution (see Eq.(A.1)), often used to model the probability distribution of the topographic index [14], is completely determined by the skewness (see Eq. (A.10)). Thus error in this parameter quickly propagates to the model response. The estimates of second and higher order statistical moments su er from the limitation that a few very large values can distort the estimate, particularly if the sample size is small. The parameter estimates for a probability distribution obtained from these moments can be severely a ected. In order to overcome this limitation, L-moments based on probability weighted moments [6±8], can be used. A probability weighted moment of order r, for a random variable X with cumulative distribution function F x, is given as a r ˆ EfX 1 F x Š r g. The L-moments of the rst four orders are given as k 1 ˆ a 0 ; k 2 ˆ a 0 2a 1 ; Fig. 3. Spatial distribution of basin mean of the topographic index over North America.
P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 575 Fig. 4. Spatial distribution of basin standard deviation of the topographic index over North America. Fig. 5. Spatial distribution of basin skew of the topographic index over North America. Negetive skew values for approximately 2.2% of the total continental area are displayed as 0 (dark blue). k 3 ˆ a 0 6a 1 6a 2 ; k 4 ˆ a 0 12a 1 30a 2 20a 3 : Notice that k 1 is the usual mean. The moment k 2 is called the L-scale as it measures the spread of the distribution. The L-moment ratio of order r is de ned as s r ˆ k r =k 2 ; r ˆ 3; 4;...: The quantities s 3 and s 4 are called L-skewness and L-kurtosis, respectively. Note that all moments are estimated using linear combination of X thereby
576 P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 Fig. 6. Downscaling functions for obtaining 90 m equivalent of usual statistical moments of topographic index from estimates using the 1 km DEM data for the North American Continent. Solid lines represent linear t using the least trimmed square robust regression and the dotted lines represent the usual least squares regression t. The equations correspond to the solid lines. overcoming the problem associated with raising a few large values to higher powers. Parameters for several distributions including gamma, normal, log-normal, etc. can be estimated using the L-moments [8]. Fig. 7 (top) shows the plots of L-moments at the two resolutions. Since the rst moment is the same as the usual rst moment, i.e., the mean, only the second (k 2 : L-scale) and the third (s 3 : L-skewness) moments are tested for linearity at the two resolutions. We see that L- skewness shows signi cantly improved linear relationship as compared to the usual skewness estimates. No noticeable improvement in the estimate of L-scale as compared to skewness is gained. Linear approximation for the L-kurtosis between the two resolutions is very weak and hence is not shown. These results suggest that: (i) simple linear relationship between spatial moments of topographic indices obtained at the two di erent resolutions (1 km and 90 m) exists; and (ii) L-moments provide a better regression equation to downscale the moment estimates from the low resolution of 1 km to higher resolution of 90 m. A sense for the appropriate distribution to describe the topographic indices can be obtained from the L- moment ratio diagram which is a plot between L- skewness and L-kurtosis [8]. Fig. 7 (bottom) shows this diagram obtained from the estimates at the two resolutions. Comparing these with the theoretical values of gamma distribution (solid line) we see that in the majority of the cases it provides a better approximation at the higher resolution. The theoretical curves for other commonly used 3-parameter distributions such as lognormal lie above the gamma distribution curve and are not plotted. In Appendix A we summarize the parameter estimation equations for the gamma distribution using the L-moments, for completeness. Additional details can be found in [8]. 4. Discussion and conclusions The topographic index obtained using the GTOPO30 DEM data captures the general spatial distribution over the North American continent. The existence of simple linear downscaling functions to obtain the rst three statistical moments (usual and L-moments) from the 30- arc-second DEM data with global coverage to a resolution that is a factor of 10 higher signi cantly increases its utility for hydrologic applications. The analysis also suggests that the use of L-moments for downscaling will provide better estimates than the usual moments. For hydrologic applications, this is particularly signi cant since the models utilizing the data are quite sensitive to the tail of the probability distribution. The strong linear relationship of L-skewness suggests that information
P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 577 Fig. 7. (Top) Same as 6 but for L-scale and L-skewness. (Bottom) L-moment ratio diagram at the two resolutions. Dots represent the observed values. The theoretical values for the 3-parameter gamma distribution are plotted as the solid line. about the tail of the distribution is simply scaled but not lost. The L-moment ratio diagram suggests that gamma distribution provides a reasonable approximation to the probability distribution function at the higher resolution of 90 m. Its parameters can be estimated from the rst three L-moments. Several issues can be raised about the use of these results for Topmodel application where typically DEM data at 30 m or higher resolution are recommended. However, the empirically observed linear relationships suggest that in the absence of high resolution data, a course resolution data as used in this paper, along with the linear downscaling scheme, provide a viable means for the application of Topmodel concepts over signi cantly larger areas. Obviously, care has to be taken that other model assumptions should be valid in the region of interest. In addition, sensitivity studies should be performed to assess the magnitude of the model response error associated with the approximations resulting from the linear downscaling scheme. The theoretical reason of the empirically observed linear relationship is not clear but we speculate that it is a result of similarity in topographic features at the different resolutions. Similar conclusions were also reached by Wolock and McCabe [20] where they also present a more detailed account of the e ects of discretization and smoothing on the estimates of the slopes and contributing area. Whether fractal or multifractal characteristics of topographic features can give rise to such a behavior is an intriguing hypothesis and will be pursued in a separate study. Acknowledgements This research was partially funded by NASA grants NAG5-3661, NAGW-5247, and NAG5-7170, and NSF grant EAR 97-06121. Thanks are also due to Dave Wolock for providing the code to estimate the topographic index from the GTOPO30 dataset using ARC/ INFO and to Margie Caisley for performing a lot of the data analysis work. Appendix A. Estimation of gamma distribution using L-moments Parameter estimation for the 3-parameter gamma distribution using the L-moments is described here for completeness (see [8] for details). The probability distribution for the gamma distributions is given as f x ˆ x n a 1 e x n =b b a ; A:1 C a where a, n, andb are the parameters of the distribution. The L-moments are given as
578 P. Kumar et al. / Advances in Water Resources 23 (2000) 571±578 k 1 ˆ n ab; A:2 k 2 ˆ p 1=2 bc a 1 C a ; A:3 2 s 3 ˆ 6I 1=3 a; 2a 3: A:4 Here I x p; q is the incomplete beta function ratio Z x q I x p; q ˆC p t p 1 1 t q 1 dt: A:5 C p C q 0 If 0 < j^s 3 j < 1=3, let z ˆ 3p^s 2 3, where ^s 3 is the estimated value of s 3. Then estimate a as 1 0:2906z ^a z 0:1882z 2 0:0442z : A:6 3 If 1=3 6 j^s 3 j < 1, let z ˆ 1 j^s 3 j. Then ^a 0:36067z 0:59567z2 0:25361z 3 1 2:78861z 2:56096z 2 0:77045z : A:7 3 Given ^a, estimate mean : ^l ˆ ^k 1 ; A:8 std: dev: : ^r ˆ ^k p 2 p^a C ^a =C ^a 1 ; 2 A:9 skewness : ^c ˆ 2^a 1=2 sign ^s 3 ; A:10 where ^k 1 and ^k 2 are the estimated values of the L-moments k 1 and k 2, respectively. Then the estimates of parameters n and b are given by the equations ^n ˆ ^l 2^r=^c; A:11 ^b ˆ 1 ^rj^cj: 2 A:12 References [1] Beven KJ, Kirkby MJ. A physically based variable contributing area model of basin hydrology. Hydrol Sci Bull 1979;24(1):43±69. [2] Burns PJ. A genetic algorithm for robust regression estimation. Statsci Technical Note 1992. [3] Ducharne A, Koster RD, Suarez MJ, Kumar P. A catchmentbased land-surface model for GCMs and the framework for its evaluation. Phys Chem Earth 1999;24(7):769±73. [4] Famiglietti JS, Wood EF. Multiscale modeling of spatially variable water and energy balance processes. Water Resour Res 1994;30(11):3061±78. [5] Gesch DB, Verdin KL, Greenlee SK. New land surface digital elevation model covers the Earth, EOS, transactions, American Geophysical Union February 9 1999;80(6):69±70. [6] Greenwood JA, Landwehr JM, Matalas NC, Wallis JR. Probability weighted moments: de nition and relation to parameters of several distributions expressible in inverse form. Water Resour Res 1979;15:1049±54. [7] Hosking JRM. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J Royal Stat Soc Ser B 1990;52:105±24. [8] Hosking JRM, Wallis JR. Regional frequency analysis: an approach based on L-moments. Cambridge: Cambridge University Press, 1997. p. 224. [9] Jenson S, Domingue J. Extracting topographic structure from digital elevation data for geographic information system analysis. Photogrammetric Eng Remote Sensing 1988;54:1593±600. [10] Koster R, Suarez MJ, Kumar P, Ducarne A. A catchment-based land surface model for GCMs, presented at American Geophysical Union Fall Meeting, December 1997 [Abstract published in EOS, Transactions, American Geophysical Union 1997;78 (46):F259]. [11] Chen J, Kumar P. Study of hydrologic response over North America using a basin scale model, presented at American Geophysical Union Spring Meeting, May 1999 [Abstract published in EOS, Transactions, American Geophysical Union 1999;80(17):S159]. [12] Pfafstetter O. Classi cation of hydrographic basins: coding methodology, unpublished manuscript, DNOS, 18 August 1989, Rio de Janeiro [translated by Verdin JP. US Bureau of Reclamation, Brasilia, Brazil, 5 September 1991]. [13] Quinn P, Beven K, Chevallier P, Planchon O. The prediction of hillslope ow paths for distributed hydrologic modeling using digital terrain models. Hydrol Process 1991;5:59±79. [14] Sivapalan M, Beven KJ, Wood EF. On hydrologic similarity, 2, a scaled model of storm runo production. Water Resour Res 1987;23(7):1289±99. [15] Stieglitz M, Rind D, Famiglietti J, Rosenzweig C. An e cient approach to modeling the topographic control of surface hydrology for regional global cimate modeling. J Climate 1997;10:118±37. [16] Verdin KL. A system for topologically coding global drainage basins and stream networks, http://edcwww.cr.usgs.gov/landdaac/ gtopo30/hydro/p311.html, 1997. [17] Verdin KL, Verdin JP. A topological system for delineation and codi cation of the Earth's river basins. J Hydrol 1999;218:1±12. [18] Wolock DM. Simulating the variable-source-area concept of stream ow generation with the watershed model TOPMODEL. Water-Resources Investigations Report 93-4124, US Geological Survey, 1993. [19] Wolock DM, Price CV. E ects of digital elevation model map scale and data resolution on a topography-based watershed model. Water Resour Res 1994;30(11):3041±52. [20] Wolock DM, McCabe GJ Jr. Di erences in topographic characteristics computed from 100- and 1000-meter resolution digital elevation model. Hydrol Process 2000, to appear.