WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Similar documents
1 Inferential Methods for Correlation and Regression Analysis

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Frequentist Inference

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Estimation for Complete Data

Understanding Samples

Statistics 511 Additional Materials

Topic 9: Sampling Distributions of Estimators

Problem Set 4 Due Oct, 12

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Final Review for MATH 3510

Kinetics of Complex Reactions

Chapter 2 Descriptive Statistics

Topic 9: Sampling Distributions of Estimators

Random Variables, Sampling and Estimation

Probability, Expectation Value and Uncertainty

Stat 421-SP2012 Interval Estimation Section

Topic 9: Sampling Distributions of Estimators

Polynomial Functions and Their Graphs

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Sequences I. Chapter Introduction

Lesson 10: Limits and Continuity

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Seunghee Ye Ma 8: Week 5 Oct 28

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01


MA131 - Analysis 1. Workbook 2 Sequences I

Analysis of Experimental Measurements

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Ray-triangle intersection

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Chapter 4. Fourier Series

Bertrand s Postulate

Median and IQR The median is the value which divides the ordered data values in half.

6.3 Testing Series With Positive Terms

x c the remainder is Pc ().

Riemann Sums y = f (x)

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Lecture 1 Probability and Statistics

This is an introductory course in Analysis of Variance and Design of Experiments.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Simulation. Two Rule For Inverting A Distribution Function

Approximations and more PMFs and PDFs

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Chapter 6. Sampling and Estimation

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Chapter 6 Sampling Distributions

Application to Random Graphs

The standard deviation of the mean

10-701/ Machine Learning Mid-term Exam Solution

Empirical Distributions

x 2 x x x x x + x x +2 x

Lecture 2: Monte Carlo Simulation

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

4.3 Growth Rates of Solutions to Recurrences

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Activity 3: Length Measurements with the Four-Sided Meter Stick

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Analysis of Experimental Data

Infinite Sequences and Series

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Math 2784 (or 2794W) University of Connecticut

Lecture 1 Probability and Statistics

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Bayesian Methods: Introduction to Multi-parameter Models

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Posted-Price, Sealed-Bid Auctions

Sequences. Notation. Convergence of a Sequence

Properties and Hypothesis Testing

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

4.1 Sigma Notation and Riemann Sums

11 Correlation and Regression

Discrete probability distributions

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

f(x) dx as we do. 2x dx x also diverges. Solution: We compute 2x dx lim

PRACTICE PROBLEMS FOR THE FINAL

The Random Walk For Dummies

7.1 Convergence of sequences of random variables

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Statisticians use the word population to refer the total number of (potential) observations under consideration

Stat 319 Theory of Statistics (2) Exercises

Expectation and Variance of a random variable

Chapter 8: Estimating with Confidence

MA131 - Analysis 1. Workbook 3 Sequences II

Statistical Pattern Recognition

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Distribution of Random Samples & Limit theorems

Recurrence Relations

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

CS / MCS 401 Homework 3 grader solutions

ANALYSIS OF EXPERIMENTAL ERRORS

Confidence Intervals for the Population Proportion p

Transcription:

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still oe sees referece to the 00- year ad 500-year tsuamis. I fact, i the USA, FEMA requires that at all coastal regios, those wave heights due to tsuamis ad hurricaes be specified. The same is required for stream floodig at ay locatio where stream floodig is possible. How are the 00 ad 500-year tsuami wave ad stream floodig heights predicted ad how defesible are they? This paper discusses these questios. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 8 (006)

PROBABILITY FUNCTION The theory of probabilities for extreme evets is a well developed subject ad is routiely applied to: stream ad river floodig, wid pressure, miimum ad maximum raifall, life expectacies, breakig of cables ad fasteers ad more. The theory ad may applicatios are described i the book by Gumbel. This is a advaced statistics book with lots of defiitios ad mathematics from which I am extractig a small part of the theory for applicatio to tsuami wave heights. Followig Gumbel, I use f(x) as the probability desity fuctio ad F(x) as the probability distributio fuctio which Gumbel calls simply the probability fuctio, ad I will use the same laguage. I other words, F( x) = prob( X x) Where X is the radom variable, i.e. the result of a experimet or a measuremet, ad F ( x) = f ( x). Let s assume that at a give locatio there actually is some probability fuctio for wave heights of a series of tsuamis over time, ad we wat to determie what that probability fuctio is ad its parameters. At a give locatio it is assumed that each maximum wave height for a tsuami evet is a realizatio of a radom variable ad that all of these radom variables are draw from the same probability fuctio F ( x). It is this F( x) that we wat to determie. If the collectio of wave heights is arraged accordig to size, the the variable at each locatio i the sample has its ow probability fuctio, ad it is ot the same as the overall probability fuctio from which the sample is draw. This subject is called order statistics. We are iterested i the probability fuctio, x, for the radom variable X, the largest wave height i the sample. If someoe were iterested i miimum raifall or miimum breakig stregth, they would be iterested i F ( x), the probability fuctio for the smallest value of the sample. I order for the largest member of the sample to be x it is ecessary that every member of the sample be x, so ( x) F ( x) F =. Note that the probability fuctio for the largest value i the sample is differet from the probability fuctios of the idividual variables i the sample. Here is where extreme value statistics get iterestig. It turs out that there are oly 3 asymptotic forms for these extreme value probabilities, depedig o whether the probability desity fuctios F ( ) The book by Gumbel refereced at the ed has a bibliography of papers ad examples o this subject pretty complete up util the year 958. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 9 (006)

for the idividual radom variable goes to zero like e x k, like x, or is bouded i some way. Note that this is ot a limit as, ( is a fixed umber!) but rather a asymptotic approximatio as x, which is appropriate because it is for large values of x that we wat the probability fuctio. If oe kew exactly the origial probability fuctio oe could evaluate F ( x) for ay give x ad. However, eve ot kowig the origial probability fuctio, but reasoig i some way that it should fall off expoetially, or like a power of x, or that the rage of x is limited i some way, we ca still arrive at the asymptotic probability fuctio for the largest wave height i the sample. I fact, the umber is ot required to be kow either as will be show later Takig first the case where the iitial probability fuctio is exactly expoetial, we have f F x ( x) = αe α, αx ( x) e =. I this case the asymptotic probability fuctio is give by F αx ( x) ( e ) =. The asymptotic value of this expressio is give by F ( x) = ( exp( α( x b) )) exp () Such a double expoetial fuctio is surprisig. Oe would ever guess it from physical priciples, but the above fuctio is derived logically which will be demostrated. That this is so is similar to the well kow fact that x lim( x ) = e as. What follows is ot a proof (which ca be foud i Gumbel, i fact two of them) but rather a simple demostratio that the double expoetial is reasoable. First a useful value u, the characteristic largest value, is defied as the largest value oe would expect ( ) i a sample of size, amely, the value for which F u =. If this is i the rage where the probability fuctio is approximately expoetial, the from which. F α ( u ) = e u = u ( ) e α = Sciece of Tsuami Hazards, Vol. 4, No. 3, page 0 (006)

Substitutig this i the equatio for F ( x) ( ) α ( x u ) ( x) = e F, oe has which gives the asymptotic expressio () for F ( x). A similar kid of argumet works for the other two assumptios about the ature of the origial F( x) leadig to movig the origial probability fuctio up to the expoetial level. However, for tsuamis it seems like this first asymptotic expressio is most reasoable, so we preset oly the first oe. So how does oe fid the 00-year ad 500-year wave heights? You make a observatioal probability fuctio out of the existig data, i.e. X, X,..., X are arraged i order of size ad X m is assiged the cumulative probability of m(+) so that the plottig poits are ( m ( + ), X m ). Gumbel has several sectios o the choice of what to use for plottig poits ad the oe chose seems to be the best decisio. The the chose form of the true probability fuctio is fitted to this observatioal probability fuctio by adjustig α ad b. α is like a scale factor ad b is like the mea. These actually ca be estimated from the data accordig to various statistical formulas, but sice we are plottig the data ayway, it is easier to get them from the plot. Normally a chage of variable is made so that the plot (if you are usig the right probability fuctio) ca be fitted with a straight lie. With a chage of variable we have l( l( F ) = α( x b) ad α is the slope of the lie ad b is its itercept. Oce the lie is plotted, oe picks off the value of x where F =.99 ad that is the height of the wave which will be exceeded with probability.0. If a evet has probability p of occurrig durig each time uit, the T ~, the average umber of time uits betwee such evets will be p. This is ot a give but is the result of calculatig the average retur time. For this reaso the probability paper usually has the probabilities scaled alog the bottom axis ad T ~ scaled alog the top. Similarly, for the 500-year wave, oe picks off the wave height correspodig to F =. 00 =.998. Oe ca be suspicious of the 500-year wave predictio because oe expects geological chages over that period of time, i.e. the sea level could rise sigificatly or a period of itese volcaic activity could occur. What the 00-year ad 500-year predictios really mea is that give coditios as they are ow, the first has a probability of.0year ad the secod has a probability of.00year. Sice the plotted data is scattered about a straight lie (hopefully) it is obvious that there is ucertaity i drawig the lie ad thus i the predictios. Gumbel discusses this ad i give istaces shows how to calculate these ucertaities. Furthermore, i theory it is Sciece of Tsuami Hazards, Vol. 4, No. 3, page (006)

possible that the largest value might lie above the lie ad might be larger that the wave height correspodig to F =.0. I other words, there may be a wave observed i a period of time shorter tha 00years that actually exceeds the predicted 00-year wave. There is a very useful added advatage if you have chose the asymptotic probability fuctio correctly the you ca take the 50-year wave ad scale it up to the 00-year wave by a simple arithmetic formula. That is you ca scale from ay time iterval to ay other time iterval with this formula. This asymptotic expressio for waves with probability p ca be used (approximately) to compare maximum wave heights for differet time itervals. Suppose that T = p ad T = p, the ad ( l( p )) = ( x b) l α ( l( p )) = ( x b) l α. Makig use of the series ( )... 3 l + x = x + x + x 3 +, l( p) = p p + p 3 3... Sice p is small, we ca take oly the first term of the series, so that the origial equatios ca be writte as ( p ) = ( T ) = ( x b) l l α, ( p ) ( T ) = ( x b) l = l α. Subtractig the first from the secod we have so that l x ( T ) l( T ) = l( T T ) = ( x x ) α, ( α ) l( T T ) x + =. This is very useful. If T =, the = x + ( ) l( ) T x, or = x (.693α ) α x. + Sciece of Tsuami Hazards, Vol. 4, No. 3, page (006)

My first reaso for usig the extreme value statistics (IUGG, Vacouver 987) was that it is widely used i may similar situatios. Also, it seemed basically right because the asymptotic probability fuctio was the right choice, give expoetial fall off of the idividual probabilities, o matter what the origial probability fuctio was. I cotiuig to reflect o the matter, some questios arise. First of all, what is the sample of wave heights from which the maximum is chose? It could be the collectio of wave heights i the ear viciity of the reported wave height which would surely be the largest. I should poit out that i the applicatio of extreme value statistics to tsuamis, there is ot eough data to really determie what probability fuctio to use. The usual test is that the observed cumulative probability fuctio will lie approximately o a straight lie whe plotted o the correct probability paper which is i effect, the choice of the correct probability fuctio. However, sice we have at most 5 values at ay locatio (ad may of those values are questioable) i Hawaii this is ot a good test. Therefore the choice of the probability fuctio will be maily a exercise i logical reasoig. How about augmetig the data with artificial values from imagied tsuamis? There are o probabilities coected with the imagied tsuamis so that does t expad the data for probability calculatios. How about extedig wave measuremets of a give tsuami to places where o measuremets were made by creatig a umerical model of a give tsuami that agrees well at places where the tsuami was measured? This system, which was used for the FEMA maps has some validity. However, there still are too few data poits to decide whether or ot a straight lie describes them well eough. If I were to guess what the uderlyig probability fuctio were for wave heights at ay locatio, I would guess ormal or Gaussia 3. This is based o the Cetral Limit Theorem which says that a sum of radom variables approaches the Gaussia whatever the probabilities of those radom variables. I this case thik of the may variables such as source size, locatio, mechaism, ad all of the additioal factors affectig ruup size at ay give shore locatio. Thik of these as radom variables. It seems that there are eough variables here to assume Gaussia for the total effect. Gumbel has a sectio i which he establishes that Gaussia qualifies as beig essetially expoetial so that the first extreme value probability fuctio applies. (The coditios to qualify are actually broader tha just fallig off expoetially!) Eve if it is so that the probability fuctio for wave heights at each poit o the shorelie is Gaussia, the value reported ad recorded should be treated as the st asymptotic probability fuctio. The reaso for this is that the wave height actually reported will be the largest of the wave heights from the immediate viciity of that locatio. IUGG Tsuami Symposium, Vacouver, B.C., August 8-9, 987 3 We would be focusig o the larger tsuamis beig fit to the upper ed of the Gaussia probability fuctio sice it is probabilities of large tsuamis that we are lookig for. Sciece of Tsuami Hazards, Vol. 4, No. 3, page 3 (006)

Give that the double expoetial probability fuctio for wave height is correct, there is aother problem with the predictio of the tsuami wave height with retur time 00 years. The followig simple calculatio will demostrate the problem. Suppose oe has estimated that the wave height h is the height exceeded with probability.0, (or F =.99 ). I other words,.0 is the probability that if a tsuami occurs, its size will exceed. Suppose that o the average there are 5 sigificat tsuamis i 00 years. h The the probability that all 5 are less that h is ( ) 99 (.99). 95 5 =. A larger value must be 5 5 foud so that F h =., or F =.99 =. 998. This would, i fact, be the 500-year wave with probability.00 per tsuami. At the rate of 5 tsuamis00 years, the probability of a tsuami exceedig h would be 5 x.00 =.0, or o the average, oce i 00 years. The above suggests a scheme appropriate whe tsuamis occur rather ifrequetly, say k per 00 years (based o experiece.) Assume that the uderlyig probability fuctio is the st asymptotic probability fuctio. It is ecessary to create the l( l y ). vs. x graph paper with your computer. The observed probability histogram poits for the data from a give locatio are plotted. At this poit you ca pick off the values of α ad b ad solve for x for ay value of y usig the double expoetial probability formula. Or graphically you ca pick off the value of x for which y = (.99) k which gives the 00- year wave at that locatio. How well will these methods predict the 00-year ad 500-year waves? Ufortuately, or fortuately, we ll ever kow! REFERENCES Gumbel, E.J., Statistics of Extreme Values, Columbia Uiversity Press, 957 Gumbel, E.J., Statistical Theory of Extreme Values ad Some Practical Applicatios, Natioal Bureau of Stadards, Applied Math Series, No. 33 h Sciece of Tsuami Hazards, Vol. 4, No. 3, page 4 (006)