WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still oe sees referece to the 00- year ad 500-year tsuamis. I fact, i the USA, FEMA requires that at all coastal regios, those wave heights due to tsuamis ad hurricaes be specified. The same is required for stream floodig at ay locatio where stream floodig is possible. How are the 00 ad 500-year tsuami wave ad stream floodig heights predicted ad how defesible are they? This paper discusses these questios. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 8 (006)

PROBABILITY FUNCTION The theory of probabilities for extreme evets is a well developed subject ad is routiely applied to: stream ad river floodig, wid pressure, miimum ad maximum raifall, life expectacies, breakig of cables ad fasteers ad more. The theory ad may applicatios are described i the book by Gumbel. This is a advaced statistics book with lots of defiitios ad mathematics from which I am extractig a small part of the theory for applicatio to tsuami wave heights. Followig Gumbel, I use f(x) as the probability desity fuctio ad F(x) as the probability distributio fuctio which Gumbel calls simply the probability fuctio, ad I will use the same laguage. I other words, F( x) = prob( X x) Where X is the radom variable, i.e. the result of a experimet or a measuremet, ad F ( x) = f ( x). Let s assume that at a give locatio there actually is some probability fuctio for wave heights of a series of tsuamis over time, ad we wat to determie what that probability fuctio is ad its parameters. At a give locatio it is assumed that each maximum wave height for a tsuami evet is a realizatio of a radom variable ad that all of these radom variables are draw from the same probability fuctio F ( x). It is this F( x) that we wat to determie. If the collectio of wave heights is arraged accordig to size, the the variable at each locatio i the sample has its ow probability fuctio, ad it is ot the same as the overall probability fuctio from which the sample is draw. This subject is called order statistics. We are iterested i the probability fuctio, x, for the radom variable X, the largest wave height i the sample. If someoe were iterested i miimum raifall or miimum breakig stregth, they would be iterested i F ( x), the probability fuctio for the smallest value of the sample. I order for the largest member of the sample to be x it is ecessary that every member of the sample be x, so ( x) F ( x) F =. Note that the probability fuctio for the largest value i the sample is differet from the probability fuctios of the idividual variables i the sample. Here is where extreme value statistics get iterestig. It turs out that there are oly 3 asymptotic forms for these extreme value probabilities, depedig o whether the probability desity fuctios F ( ) The book by Gumbel refereced at the ed has a bibliography of papers ad examples o this subject pretty complete up util the year 958. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 9 (006)

for the idividual radom variable goes to zero like e x k, like x, or is bouded i some way. Note that this is ot a limit as, ( is a fixed umber!) but rather a asymptotic approximatio as x, which is appropriate because it is for large values of x that we wat the probability fuctio. If oe kew exactly the origial probability fuctio oe could evaluate F ( x) for ay give x ad. However, eve ot kowig the origial probability fuctio, but reasoig i some way that it should fall off expoetially, or like a power of x, or that the rage of x is limited i some way, we ca still arrive at the asymptotic probability fuctio for the largest wave height i the sample. I fact, the umber is ot required to be kow either as will be show later Takig first the case where the iitial probability fuctio is exactly expoetial, we have f F x ( x) = αe α, αx ( x) e =. I this case the asymptotic probability fuctio is give by F αx ( x) ( e ) =. The asymptotic value of this expressio is give by F ( x) = ( exp( α( x b) )) exp () Such a double expoetial fuctio is surprisig. Oe would ever guess it from physical priciples, but the above fuctio is derived logically which will be demostrated. That this is so is similar to the well kow fact that x lim( x ) = e as. What follows is ot a proof (which ca be foud i Gumbel, i fact two of them) but rather a simple demostratio that the double expoetial is reasoable. First a useful value u, the characteristic largest value, is defied as the largest value oe would expect ( ) i a sample of size, amely, the value for which F u =. If this is i the rage where the probability fuctio is approximately expoetial, the from which. F α ( u ) = e u = u ( ) e α = Sciece of Tsuami Hazards, Vol. 4, No. 3, page 0 (006)

Substitutig this i the equatio for F ( x) ( ) α ( x u ) ( x) = e F, oe has which gives the asymptotic expressio () for F ( x). A similar kid of argumet works for the other two assumptios about the ature of the origial F( x) leadig to movig the origial probability fuctio up to the expoetial level. However, for tsuamis it seems like this first asymptotic expressio is most reasoable, so we preset oly the first oe. So how does oe fid the 00-year ad 500-year wave heights? You make a observatioal probability fuctio out of the existig data, i.e. X, X,..., X are arraged i order of size ad X m is assiged the cumulative probability of m(+) so that the plottig poits are ( m ( + ), X m ). Gumbel has several sectios o the choice of what to use for plottig poits ad the oe chose seems to be the best decisio. The the chose form of the true probability fuctio is fitted to this observatioal probability fuctio by adjustig α ad b. α is like a scale factor ad b is like the mea. These actually ca be estimated from the data accordig to various statistical formulas, but sice we are plottig the data ayway, it is easier to get them from the plot. Normally a chage of variable is made so that the plot (if you are usig the right probability fuctio) ca be fitted with a straight lie. With a chage of variable we have l( l( F ) = α( x b) ad α is the slope of the lie ad b is its itercept. Oce the lie is plotted, oe picks off the value of x where F =.99 ad that is the height of the wave which will be exceeded with probability.0. If a evet has probability p of occurrig durig each time uit, the T ~, the average umber of time uits betwee such evets will be p. This is ot a give but is the result of calculatig the average retur time. For this reaso the probability paper usually has the probabilities scaled alog the bottom axis ad T ~ scaled alog the top. Similarly, for the 500-year wave, oe picks off the wave height correspodig to F =. 00 =.998. Oe ca be suspicious of the 500-year wave predictio because oe expects geological chages over that period of time, i.e. the sea level could rise sigificatly or a period of itese volcaic activity could occur. What the 00-year ad 500-year predictios really mea is that give coditios as they are ow, the first has a probability of.0year ad the secod has a probability of.00year. Sice the plotted data is scattered about a straight lie (hopefully) it is obvious that there is ucertaity i drawig the lie ad thus i the predictios. Gumbel discusses this ad i give istaces shows how to calculate these ucertaities. Furthermore, i theory it is Sciece of Tsuami Hazards, Vol. 4, No. 3, page (006)

possible that the largest value might lie above the lie ad might be larger that the wave height correspodig to F =.0. I other words, there may be a wave observed i a period of time shorter tha 00years that actually exceeds the predicted 00-year wave. There is a very useful added advatage if you have chose the asymptotic probability fuctio correctly the you ca take the 50-year wave ad scale it up to the 00-year wave by a simple arithmetic formula. That is you ca scale from ay time iterval to ay other time iterval with this formula. This asymptotic expressio for waves with probability p ca be used (approximately) to compare maximum wave heights for differet time itervals. Suppose that T = p ad T = p, the ad ( l( p )) = ( x b) l α ( l( p )) = ( x b) l α. Makig use of the series ( )... 3 l + x = x + x + x 3 +, l( p) = p p + p 3 3... Sice p is small, we ca take oly the first term of the series, so that the origial equatios ca be writte as ( p ) = ( T ) = ( x b) l l α, ( p ) ( T ) = ( x b) l = l α. Subtractig the first from the secod we have so that l x ( T ) l( T ) = l( T T ) = ( x x ) α, ( α ) l( T T ) x + =. This is very useful. If T =, the = x + ( ) l( ) T x, or = x (.693α ) α x. + Sciece of Tsuami Hazards, Vol. 4, No. 3, page (006)

My first reaso for usig the extreme value statistics (IUGG, Vacouver 987) was that it is widely used i may similar situatios. Also, it seemed basically right because the asymptotic probability fuctio was the right choice, give expoetial fall off of the idividual probabilities, o matter what the origial probability fuctio was. I cotiuig to reflect o the matter, some questios arise. First of all, what is the sample of wave heights from which the maximum is chose? It could be the collectio of wave heights i the ear viciity of the reported wave height which would surely be the largest. I should poit out that i the applicatio of extreme value statistics to tsuamis, there is ot eough data to really determie what probability fuctio to use. The usual test is that the observed cumulative probability fuctio will lie approximately o a straight lie whe plotted o the correct probability paper which is i effect, the choice of the correct probability fuctio. However, sice we have at most 5 values at ay locatio (ad may of those values are questioable) i Hawaii this is ot a good test. Therefore the choice of the probability fuctio will be maily a exercise i logical reasoig. How about augmetig the data with artificial values from imagied tsuamis? There are o probabilities coected with the imagied tsuamis so that does t expad the data for probability calculatios. How about extedig wave measuremets of a give tsuami to places where o measuremets were made by creatig a umerical model of a give tsuami that agrees well at places where the tsuami was measured? This system, which was used for the FEMA maps has some validity. However, there still are too few data poits to decide whether or ot a straight lie describes them well eough. If I were to guess what the uderlyig probability fuctio were for wave heights at ay locatio, I would guess ormal or Gaussia 3. This is based o the Cetral Limit Theorem which says that a sum of radom variables approaches the Gaussia whatever the probabilities of those radom variables. I this case thik of the may variables such as source size, locatio, mechaism, ad all of the additioal factors affectig ruup size at ay give shore locatio. Thik of these as radom variables. It seems that there are eough variables here to assume Gaussia for the total effect. Gumbel has a sectio i which he establishes that Gaussia qualifies as beig essetially expoetial so that the first extreme value probability fuctio applies. (The coditios to qualify are actually broader tha just fallig off expoetially!) Eve if it is so that the probability fuctio for wave heights at each poit o the shorelie is Gaussia, the value reported ad recorded should be treated as the st asymptotic probability fuctio. The reaso for this is that the wave height actually reported will be the largest of the wave heights from the immediate viciity of that locatio. IUGG Tsuami Symposium, Vacouver, B.C., August 8-9, 987 3 We would be focusig o the larger tsuamis beig fit to the upper ed of the Gaussia probability fuctio sice it is probabilities of large tsuamis that we are lookig for. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 3 (006)

Give that the double expoetial probability fuctio for wave height is correct, there is aother problem with the predictio of the tsuami wave height with retur time 00 years. The followig simple calculatio will demostrate the problem. Suppose oe has estimated that the wave height h is the height exceeded with probability.0, (or F =.99 ). I other words,.0 is the probability that if a tsuami occurs, its size will exceed. Suppose that o the average there are 5 sigificat tsuamis i 00 years. h The the probability that all 5 are less that h is ( ) 99 (.99). 95 5 =. A larger value must be 5 5 foud so that F h =., or F =.99 =. 998. This would, i fact, be the 500-year wave with probability.00 per tsuami. At the rate of 5 tsuamis00 years, the probability of a tsuami exceedig h would be 5 x.00 =.0, or o the average, oce i 00 years. The above suggests a scheme appropriate whe tsuamis occur rather ifrequetly, say k per 00 years (based o experiece.) Assume that the uderlyig probability fuctio is the st asymptotic probability fuctio. It is ecessary to create the l( l y ). vs. x graph paper with your computer. The observed probability histogram poits for the data from a give locatio are plotted. At this poit you ca pick off the values of α ad b ad solve for x for ay value of y usig the double expoetial probability formula. Or graphically you ca pick off the value of x for which y = (.99) k which gives the 00- year wave at that locatio. How well will these methods predict the 00-year ad 500-year waves? Ufortuately, or fortuately, we ll ever kow! REFERENCES Gumbel, E.J., Statistics of Extreme Values, Columbia Uiversity Press, 957 Gumbel, E.J., Statistical Theory of Extreme Values ad Some Practical Applicatios, Natioal Bureau of Stadards, Applied Math Series, No. 33 h Sciece of Tsuami Hazards, Vol. 4, No. 3, page 4 (006)