Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically, ad parametrically. First, empiricism. Let s take a buch of data. For ow, we ll take flood data the maimum flood for each year for some umber of years: Year Flow, cfs 1945 90 1946 1470 1947 0 1948 970 1949 300 1950 110 1951 490 195 3170 1953 30 1954 1760 1955 8800 1956 880 1957 1310 1958 500 1959 1960 1960 140 1961 4340 196 3060 1963 1780 1964 1380 1965 980 1966 1040 1967 1580 1968 3630 To plot this data empirically, we eed to order these accordig to rak. That is, the highest flow comes first, ad the the et highest, o dow. Year Flow, cfs Rak 1955 8800 1 1956 880 1961 4340 3 1968 3630 4 1953 30 5
195 3170 6 196 3060 7 1949 300 8 1948 970 9 1958 500 10 1951 490 11 1945 90 1 1947 0 13 1960 140 14 1959 1960 15 1963 1780 16 1954 1760 17 1967 1580 18 1946 1470 19 1964 1380 0 1957 1310 1 1950 110 1966 1040 3 1965 980 4 Icidetally, you ca get Ecel to do this for you. Select the data, the go to Data Sort, ad it will order all the data! From here, use the formula: T +1 m Where T is the recurrece iterval, is the umber of years i the record, ad m is the rak. Thus, for our data: Year Flow, cfs Rak T 1955 8800 1 5.00 1956 880 1.50 1961 4340 3 8.33 1968 3630 4 6.5 1953 30 5 5.00 195 3170 6 4.17 196 3060 7 3.57 1949 300 8 3.13 1948 970 9.78 1958 500 10.50 1951 490 11.7 1945 90 1.08 1947 0 13 1.9 1960 140 14 1.79 1959 1960 15 1.67 1963 1780 16 1.56 1954 1760 17 1.47
1967 1580 18 1.39 1946 1470 19 1.3 1964 1380 0 1.5 1957 1310 1 1.19 1950 110 1.14 1966 1040 3 1.09 1965 980 4 1.04 All that s left is to plot T o the horizotal ais ad Flow o the vertical, ad shoot a best-fit lie through the whole mess. Oh, ad typically we plot it o log-log paper: 100000 Flow, cfs 10000 1000 100 1.00 10.00 100.00 T, years By etrapolatig the tred lie, you ca determie the 100-year or 500-year flood. What is actually meat by 100-year flood, by the way, is that it has a 1% chace of happeig every year, ot that it oly happes every 100 years. Here s a ifty formula for determiig frequecy or probability rather tha recurrece. F m 1 + 1 I other words, F 1 1. T Here s the basic problem, though. Especially with small data sets, each ew data poit will sigificatly alter the rak of all the other poits, ad therefore chage the whole curve. As a result, it s ofte
easier to perform this aalysis parametrically, with a few importat statistics. Let s talk. Let s cosider the height of every perso i the room. The result of this (assumig we have adults ad we ve got eough people) is a oddly shaped curve. Most everyoe fits betwee about 150 cm ad 190 cm, with a proouced hump aroud 170 cm or so. However, the distributio tails off to iclude the Shaq s ad the Billy Bartletts of our populatio. This curve is called may thigs, ad is of vital importace to lots of the atural world. It s called the ormal distributio, the bell curve, the Gaussia distributio, or the biomial distributio (after a easy way to create it). Turs out lots of atural distributios look like this especially if there s some sort of cotrol makig a average. To talk about these distributios, we have a umber of parameters that describe ormal distributios. Here they are. Average (aka 1 st momet) the average value of all the idividual values. 1 Variace (aka d momet) the average of the differece betwee each idividual ad the mea. This is a measure of the spread of the data s 1 ( ) 1 where the square is placed to esure a positive umber. To regai the uits of the mea (e.g. the uits of variace i our height eample are cm ), we have Stadard deviatio which is just the positive square root of variace. At this poit we eed to make a little quibble. Wheever you measure a group of objects, like we just did with height, you are takig samples from some larger populatio. Although your sample may approimate the whole populatio, it may ot. Statisticias,
the, draw a distictio betwee the mythic populatio statistic ad the measurable sample statistic. Mea, for eample, is a populatio statistic; average is a sample statistic. As defied, we have sample stadard deviatio, ad we abbreviate it s. Populatio stadard deviatio is defied with N i place of -1, ad is abbreviated. Most of the time, however, geologists blithely igore this distictio, ad refer to average as mea, ad use for sample stadard deviatio. It s all good. These two parameters (mea ad stadard deviatio) suffice to eplai the ormal curve. The equatio for a ormal curve is: p ( ) ( η ) 1 e Oh, right, η is the populatio mea. Just as it s difficult to use fuctios of because they re hard to compare, so are ormal distributios hard to look at. As a result of the whole thig we came up with a set of skewed aes (log paper) that make fuctios of look like straight lies. So it is with the biomial distributio. If you take oe ad sum the compoets rather tha plottig them like a histogram you get somethig called a cumulative frequecy chart or s-curve. We ca also make skewed aes that plot s-curves as straight lies. This ais is called a probability ais. Plottig o probability paper makes thigs easy ormal distributios plot as straight lies, ad determiig stadard deviatio is easy it s the distace o the lie from 50% to 84% (or from 50% to 16%). This eplais the ormal curve icely, ad it would be ice if that were all there was. But there s more. It turs out that there are a umber of curves called quasi-ormal curves. These ivolve two other statistics skewess ad kurtosis. We ll talk about these ow. Skewess (aka 3 rd momet) this is a measure of lea o the curve. Curves that lea to the left are positively skewed, ad those that lea to the right are egatively skewed.
S k 1 1 s ( )( ) ( ) 3 3 Notice that this effectively takes the ratio of stadard deviatio to stadard deviatio, but allows for a sig (because it s cubed), thus allowig for cotributios o oe side to outweigh those o the other, ad force the parameter to be positive or egative. Note that a ormal distributio has a skewess of zero. A sidebar o media. Media ( ˆ ) is like mea i that it shows somethig about where the bulk of the data lie. It is determied, however, by fidig the middle value of the distributio, rather tha averagig all values. That is, if we take our heights from lowest to highest, ad there s 11 of us i the room, the 6 th value coutig up (or dow) is the media value. Why do we care? I a ormal distributio, the mea ad media must lie at the same poit. If the media is to the left of the mea, the bulk of the data is also to the left of the mea, ad the distributio is positively skewed. There you have it. Kurtosis (aka 4 th momet) this somewhat ebulous statistic says somethig about how peaked the curve is. High values of kurtosis represet curves more peaked tha ormal, ad low values flatter. K 1 1 s ( )( )( 3) ( ) 4 4 Note that because these parameters are themselves oly recombiatios of average ad stadard deviatio, effectively these are oly variats of the ormal curve. Why do we care about all this? It turs out that there is o particular reaso why flood data should be ormally distributed, so we may have to use some OTHER distributio. What do I mea by this? Remember that I gave you a mathematical statemet of the ormal distributio: p ( ) ( η ) 1 e
Meaig that if you re give η ad you ca work out the probability of a evet yourself. There are other distributios, though oe of the most popular i flood aalysis is the gamma-3 or Pearso 3 distributio: f ( ) α 1 β α e Γ β ( α ) although the etreme value distributio (EV1 or Gumbel) is ofte used as well: f ( ) e u γ e To use these, simply determie your statistical parameters (amely mea ad stadard deviatio), the covert these to the parameters used i the distributios. Here: α β γ 6 π u 0.577γ While I m here, it s worth talkig about Gamma 3 ad parameters. Most probability distributios (ad there are lots) have two or three parameters that are i tur fuctios of the elemetary statistics we talked about. I Gamma 3, α is called a shape factor, ad β is called a scale factor. This is because varyig β just stretches the fuctio o the y ais, but chagig α chages the shape of the distributio as a whole. {graphs} You could also iclude a parameter that moves the whole distributio back ad forth o the -ais that would be a locatio parameter. The importat thig is that if you read about some other distributio, you may be able to determie (or be told) what the parameters do.
Eough about this. Back to floods. Oe quick solutio would be to take the average ad stadard deviatio of your flood data (or the log of your flood data) ad use these theoretical curves to estimate peak flow istead. For eample, usig the data above, I got: 7.756 s 0.566 So α 1. 97, β 1410.