Lecture 24 Floods and flood frequency

Similar documents
Chapter 23: Inferences About Means

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Mathematical Notation Math Introduction to Applied Statistics

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Data Analysis and Statistical Methods Statistics 651

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Elementary Statistics

Understanding Samples

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Sampling Distributions, Z-Tests, Power

Confidence Intervals for the Population Proportion p

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

6.3 Testing Series With Positive Terms

Infinite Sequences and Series

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Topic 9: Sampling Distributions of Estimators

Frequentist Inference

Topic 9: Sampling Distributions of Estimators

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram.

Introduction There are two really interesting things to do in statistics.

Statistical Fundamentals and Control Charts

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Homework 5 Solutions

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Statistics 511 Additional Materials

2 Definition of Variance and the obvious guess

Topic 9: Sampling Distributions of Estimators

Chapter 2 Descriptive Statistics

AP Calculus BC Review Applications of Derivatives (Chapter 4) and f,

Power and Type II Error

Final Examination Solutions 17/6/2010

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Parameter, Statistic and Random Samples

Module 1 Fundamentals in statistics

1 Approximating Integrals using Taylor Polynomials

Department of Mathematics

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Economics Spring 2015


Read through these prior to coming to the test and follow them when you take your test.

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

MATH/STAT 352: Lecture 15

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Simulation. Two Rule For Inverting A Distribution Function

Chapter 8: Estimating with Confidence

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

The standard deviation of the mean

Polynomial Functions and Their Graphs

SYDE 112, LECTURE 2: Riemann Sums

4.3 Growth Rates of Solutions to Recurrences

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Sample Size Determination (Two or More Samples)

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 18: Sampling Distribution Models

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

1 Lesson 6: Measure of Variation

11 Correlation and Regression

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Continuous Functions

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Median and IQR The median is the value which divides the ordered data values in half.

(7 One- and Two-Sample Estimation Problem )

Random Variables, Sampling and Estimation

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

( 1) n (4x + 1) n. n=0

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

CURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

1 Inferential Methods for Correlation and Regression Analysis

Common Large/Small Sample Tests 1/55

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Math 140 Introductory Statistics

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Final Review for MATH 3510

Sampling, Sampling Distribution and Normality

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Lecture 2: Monte Carlo Simulation

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Probability and statistics: basic terms

Chapter 18 Summary Sampling Distribution Models

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Transcription:

Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically, ad parametrically. First, empiricism. Let s take a buch of data. For ow, we ll take flood data the maimum flood for each year for some umber of years: Year Flow, cfs 1945 90 1946 1470 1947 0 1948 970 1949 300 1950 110 1951 490 195 3170 1953 30 1954 1760 1955 8800 1956 880 1957 1310 1958 500 1959 1960 1960 140 1961 4340 196 3060 1963 1780 1964 1380 1965 980 1966 1040 1967 1580 1968 3630 To plot this data empirically, we eed to order these accordig to rak. That is, the highest flow comes first, ad the the et highest, o dow. Year Flow, cfs Rak 1955 8800 1 1956 880 1961 4340 3 1968 3630 4 1953 30 5

195 3170 6 196 3060 7 1949 300 8 1948 970 9 1958 500 10 1951 490 11 1945 90 1 1947 0 13 1960 140 14 1959 1960 15 1963 1780 16 1954 1760 17 1967 1580 18 1946 1470 19 1964 1380 0 1957 1310 1 1950 110 1966 1040 3 1965 980 4 Icidetally, you ca get Ecel to do this for you. Select the data, the go to Data Sort, ad it will order all the data! From here, use the formula: T +1 m Where T is the recurrece iterval, is the umber of years i the record, ad m is the rak. Thus, for our data: Year Flow, cfs Rak T 1955 8800 1 5.00 1956 880 1.50 1961 4340 3 8.33 1968 3630 4 6.5 1953 30 5 5.00 195 3170 6 4.17 196 3060 7 3.57 1949 300 8 3.13 1948 970 9.78 1958 500 10.50 1951 490 11.7 1945 90 1.08 1947 0 13 1.9 1960 140 14 1.79 1959 1960 15 1.67 1963 1780 16 1.56 1954 1760 17 1.47

1967 1580 18 1.39 1946 1470 19 1.3 1964 1380 0 1.5 1957 1310 1 1.19 1950 110 1.14 1966 1040 3 1.09 1965 980 4 1.04 All that s left is to plot T o the horizotal ais ad Flow o the vertical, ad shoot a best-fit lie through the whole mess. Oh, ad typically we plot it o log-log paper: 100000 Flow, cfs 10000 1000 100 1.00 10.00 100.00 T, years By etrapolatig the tred lie, you ca determie the 100-year or 500-year flood. What is actually meat by 100-year flood, by the way, is that it has a 1% chace of happeig every year, ot that it oly happes every 100 years. Here s a ifty formula for determiig frequecy or probability rather tha recurrece. F m 1 + 1 I other words, F 1 1. T Here s the basic problem, though. Especially with small data sets, each ew data poit will sigificatly alter the rak of all the other poits, ad therefore chage the whole curve. As a result, it s ofte

easier to perform this aalysis parametrically, with a few importat statistics. Let s talk. Let s cosider the height of every perso i the room. The result of this (assumig we have adults ad we ve got eough people) is a oddly shaped curve. Most everyoe fits betwee about 150 cm ad 190 cm, with a proouced hump aroud 170 cm or so. However, the distributio tails off to iclude the Shaq s ad the Billy Bartletts of our populatio. This curve is called may thigs, ad is of vital importace to lots of the atural world. It s called the ormal distributio, the bell curve, the Gaussia distributio, or the biomial distributio (after a easy way to create it). Turs out lots of atural distributios look like this especially if there s some sort of cotrol makig a average. To talk about these distributios, we have a umber of parameters that describe ormal distributios. Here they are. Average (aka 1 st momet) the average value of all the idividual values. 1 Variace (aka d momet) the average of the differece betwee each idividual ad the mea. This is a measure of the spread of the data s 1 ( ) 1 where the square is placed to esure a positive umber. To regai the uits of the mea (e.g. the uits of variace i our height eample are cm ), we have Stadard deviatio which is just the positive square root of variace. At this poit we eed to make a little quibble. Wheever you measure a group of objects, like we just did with height, you are takig samples from some larger populatio. Although your sample may approimate the whole populatio, it may ot. Statisticias,

the, draw a distictio betwee the mythic populatio statistic ad the measurable sample statistic. Mea, for eample, is a populatio statistic; average is a sample statistic. As defied, we have sample stadard deviatio, ad we abbreviate it s. Populatio stadard deviatio is defied with N i place of -1, ad is abbreviated. Most of the time, however, geologists blithely igore this distictio, ad refer to average as mea, ad use for sample stadard deviatio. It s all good. These two parameters (mea ad stadard deviatio) suffice to eplai the ormal curve. The equatio for a ormal curve is: p ( ) ( η ) 1 e Oh, right, η is the populatio mea. Just as it s difficult to use fuctios of because they re hard to compare, so are ormal distributios hard to look at. As a result of the whole thig we came up with a set of skewed aes (log paper) that make fuctios of look like straight lies. So it is with the biomial distributio. If you take oe ad sum the compoets rather tha plottig them like a histogram you get somethig called a cumulative frequecy chart or s-curve. We ca also make skewed aes that plot s-curves as straight lies. This ais is called a probability ais. Plottig o probability paper makes thigs easy ormal distributios plot as straight lies, ad determiig stadard deviatio is easy it s the distace o the lie from 50% to 84% (or from 50% to 16%). This eplais the ormal curve icely, ad it would be ice if that were all there was. But there s more. It turs out that there are a umber of curves called quasi-ormal curves. These ivolve two other statistics skewess ad kurtosis. We ll talk about these ow. Skewess (aka 3 rd momet) this is a measure of lea o the curve. Curves that lea to the left are positively skewed, ad those that lea to the right are egatively skewed.

S k 1 1 s ( )( ) ( ) 3 3 Notice that this effectively takes the ratio of stadard deviatio to stadard deviatio, but allows for a sig (because it s cubed), thus allowig for cotributios o oe side to outweigh those o the other, ad force the parameter to be positive or egative. Note that a ormal distributio has a skewess of zero. A sidebar o media. Media ( ˆ ) is like mea i that it shows somethig about where the bulk of the data lie. It is determied, however, by fidig the middle value of the distributio, rather tha averagig all values. That is, if we take our heights from lowest to highest, ad there s 11 of us i the room, the 6 th value coutig up (or dow) is the media value. Why do we care? I a ormal distributio, the mea ad media must lie at the same poit. If the media is to the left of the mea, the bulk of the data is also to the left of the mea, ad the distributio is positively skewed. There you have it. Kurtosis (aka 4 th momet) this somewhat ebulous statistic says somethig about how peaked the curve is. High values of kurtosis represet curves more peaked tha ormal, ad low values flatter. K 1 1 s ( )( )( 3) ( ) 4 4 Note that because these parameters are themselves oly recombiatios of average ad stadard deviatio, effectively these are oly variats of the ormal curve. Why do we care about all this? It turs out that there is o particular reaso why flood data should be ormally distributed, so we may have to use some OTHER distributio. What do I mea by this? Remember that I gave you a mathematical statemet of the ormal distributio: p ( ) ( η ) 1 e

Meaig that if you re give η ad you ca work out the probability of a evet yourself. There are other distributios, though oe of the most popular i flood aalysis is the gamma-3 or Pearso 3 distributio: f ( ) α 1 β α e Γ β ( α ) although the etreme value distributio (EV1 or Gumbel) is ofte used as well: f ( ) e u γ e To use these, simply determie your statistical parameters (amely mea ad stadard deviatio), the covert these to the parameters used i the distributios. Here: α β γ 6 π u 0.577γ While I m here, it s worth talkig about Gamma 3 ad parameters. Most probability distributios (ad there are lots) have two or three parameters that are i tur fuctios of the elemetary statistics we talked about. I Gamma 3, α is called a shape factor, ad β is called a scale factor. This is because varyig β just stretches the fuctio o the y ais, but chagig α chages the shape of the distributio as a whole. {graphs} You could also iclude a parameter that moves the whole distributio back ad forth o the -ais that would be a locatio parameter. The importat thig is that if you read about some other distributio, you may be able to determie (or be told) what the parameters do.

Eough about this. Back to floods. Oe quick solutio would be to take the average ad stadard deviatio of your flood data (or the log of your flood data) ad use these theoretical curves to estimate peak flow istead. For eample, usig the data above, I got: 7.756 s 0.566 So α 1. 97, β 1410.