Note Packet #14 Frequency Analysis & Probability Plots CEE 3710 October 0, 017 Frequency Analysis Process by which engineers formulate magnitude of design events (i.e. 100 year flood) or assess risk associated with various outcomes/events Based on use of sample data to hypothesize probability model and infer characteristics of the population of interest Works with any probability distribution 1
Motivation: You need to design a levee to withstand the 100 year flood (X 0.99 ). Given the sample data {x 1, x,, x n } below corresponding to the ˆX 0.99 magnitude of n = 50 annual maximum flood flows, what is? 4000 Annual Maximum Discharge (cfs) 3500 3000 500 000 1500 1000 500 0 1960 1970 1980 1990 000 010 Year (1) Compute sample moments, descriptive statistics Frequency 14 1 10 8 6 4 0 Histogram 410 860 1310 1760 10 660 3110 More Annual Maximum Discharge (cfs) 1 x n n i 1 x 1503.9 cfs i n 1 sx xi x n 1 i1 81.0 cfs () Select an appropriate model (probability distribution) of annual maximum flood flows Considerations: What does data look like? Is data skewed? Are variables strictly positive? Continuous or Discrete?
(3) Fit selected model to data using MOM to estimate distribution parameters (Point estimates of parameters) xf ( x) dx x 1503.9 cfs x x x x x x ( x ) f ( x) dx s 81.0 cfs If X ~ Lognormal ln( x) 1 1 Y fx( x) exp x Y Y for x > 0; 0 otherwise 1/ Y ln 1X X 0.577 1 Y ln[ X] Y 7.161 Frequency 16 14 1 10 8 6 4 0 Histogram 410 860 1310 1760 10 660 3110 More Annual Maximum Discharge (cfs) (4) Assess goodness of fit:how well does model represent data How good are our parameter estimates? (How good is our estimate of 100 year event?) Compute confidence intervals Construct Quantile Quantile Plot (5) Compute ˆX 0.99 (or other values of interest) 3
General Procedure: 1. Obtain a sample of size n, compute sample moments and descriptive statistics. Hypothesize underlying probability density function (pdf) of the population 3. Apply method of moments and compute parameters of the assumed pdf (i.e. fit probability model to the data) 4. Assess fit of probability model by graphing the fitted cumulative distribution function (cdf) relative to sample data (empirical cdf, probability plot, or quantile quantile plot) 5. Use the fitted cdf to obtain percentiles (design events) or probabilities associated with outcomes of interest Smooth line/curve corresponds to probability model (representation of population) Dots/points correspond to observed sample data 4
Empirical Cumulative Distribution Function (CDF) Representation of the cumulative distribution function based on the relative magnitude of observations in a sample of size n Obtained by graphing the plotting positions versus the ranked observations Plotting Position ( ): provides an estimate of the cumulative probability associated with the observation of rank i(x (i) ) = i/(n+1) In other words, = P[X x (i) ] and thus, x (i) represents an empirical percentile (or quantile) Example: Construct an empirical CDF for the following sample data: {90, 105, 65, 135, 95, 115, 80, 73, 76, 88} i x (i) 1 65 0.091 73 0.18 3 76 0.73 4 80 0.364 5 88 0.455 6 90 0.545 7 95 0.636 8 105 0.77 9 115 18 10 135 0.909 Empirical CDF 1.0 0.6 0.4 0. 0.0 0 50 100 150 x Note: Construction of the empirical CDF does not require consideration of the form of the underlying probability distribution for the random variable/population; however, we can assess the goodness of fit of a probability distribution by plotting the assumed/fitted cdf (model) on the same figure as the empirical cdf (observed). 5
Example: Use the method of moments to fit a normal distribution to the data above, and then assess how well it represents the data by plotting the fitted CDF relative to the empirical CDF. Empirical CDF vs. Fitted Normal xˆ CDF i x(i) pi zpi xˆ() i 11.0 65 0.091-1.335 63.8 73 0.18-0.908 7.9 3 76 0.73-0.605 79.4 40.6 80 0.364-0.349 84.8 50.4 88 0.455-0.114 89.8 6 90 0.545 0.114 94.6 70. 95 0.636 0.349 99.6 80.0 105 0.77 0.605 105.0 9 0 115 18 50 0.908 111.5 100 150 10 135 0.909 1.335 x 10.6 pi Sample Data Fitted Normal Example: Use the method of moments to fit a lognormal distribution to the data, and then assess how well it represents the data by plotting the fitted CDF relative to the empirical CDF. Empirical CDF vs. Fitted Lognormal xˆpi CDF i x(i) pi zpi xˆ() i 1.0 1 65 0.091-1.335 66.3 73 0.18-0.908 73.1 3 76 0.73-0.605 78.3 0.6 4 80 0.364-0.349 83.0 5 0.4 88 0.455-0.114 87.5 6 90 0.545 0.114 9. 0. 7 95 0.636 0.349 97.3 8 105 0.77 0.605 103.1 0.0 9 0 115 1850 0.908 100 110.5 150 10 135 0.909 1.335 x 11.7 Sample Data Fitted LN 6
1.0 Empirical CDF vs. Fitted Normal CDF 0.6 0.4 0. Sample Data Fitted Normal 0.0 0 50 100 150 x 1.0 Empirical CDF vs. Fitted Lognormal CDF 0.6 0.4 0. Sample Data Fitted LN 0.0 0 50 100 150 x Example: Use the method of moments to fit a Gumbel distribution to the data, and then assess how well it represents the data by plotting the fitted CDF relative to the empirical CDF. i x(i) pi xˆ() i Empirical CDF vs. xˆpifitted Gumbel CDF 1 1.0 65 0.091 68.1 73 0.18 73.8 3 76 0.73 78.3 4 0.6 80 0.364 8.4 5 0.4 88 0.455 86.6 6 90 0.545 90.9 7 0. 95 0.636 95.8 8 0.0 105 0.77 101.6 9 0 115 18 50 109.3 100 150 10 135 0.909 11.6 x Sample Data Fitted Gumbel 7
Quantile Quantile (Q Q) Plots Constructed by plotting ranked observations ( fitted percentiles, or quantiles ( ) ) against the Observed or Empirical Quantiles vs. Modeled or Fitted Quantiles ˆx (i) x (i) Sample data should fall approximately on a straight line (1:1) if the fitted distribution adequately describes the true population 8
ˆx (i) ˆx (i) ˆx (i) 9
Probability Plots Sample data is plotted so that the observations should fall approximately on a straight line if a selected distribution describes the true population however, unlike Q Q plots, assessment of the selected distribution (model) does not depend on estimated parameters Can be created with special commercially available probability papers for some distributions (normal, lognormal, Gumbel), or the general technique developed here (easy with a spreadsheet) Constructed by plotting ranked observations ( x (i) ) against standardized percentiles Example: Reconsider the sample data above. Use a probability plot to assess how well the normal distribution fit using the method of moments represents the sample data. i x (i) z pi 1 65 0.091-1.335 73 0.18-0.908 3 76 0.73-0.605 4 80 0.364-0.349 5 88 0.455-0.114 6 90 0.545 0.114 7 95 0.636 0.349 8 105 0.77 0.605 9 115 18 0.908 10 135 0.909 1.335 10
Example: Reconsider the sample data above. Use a probability plot to assess how well the lognormal distribution fit using the method of moments represents the sample data. i x (i) ln( x (i)) z pi 1 65 4.174 0.091-1.335 73 4.90 0.18-0.908 3 76 4.330 0.73-0.605 4 80 4.38 0.364-0.349 5 88 4.477 0.455-0.114 6 90 4.500 0.545 0.114 7 95 4.554 0.636 0.349 8 105 4.654 0.77 0.605 9 115 4.745 18 0.908 10 135 4.905 0.909 1.335 Example: Reconsider the sample data above. Use a probability plot to assess how well the Gumbel distribution fit using the method of moments represents the sample data. i x(i) pi -ln(-ln(pi)) 1 65 0.091-75 73 0.18-0.533 3 76 0.73-0.6 4 80 0.364-0.01 5 88 0.455 0.38 6 90 0.545 0.501 7 95 0.636 0.794 8 105 0.77 1.144 9 115 18 1.606 10 135 0.909.351 11