A world-wide investigation of the probability distribution of daily rainfall

Similar documents
The battle of extreme value distributions: A global survey on the extreme

Review of existing statistical methods for flood frequency estimation in Greece

Scaling properties of fine resolution point rainfall and inferences for its stochastic modelling

Long-term properties of annual maximum daily rainfall worldwide

Battle of extreme value distributions: A global survey on extreme daily rainfall

Scaling properties of fine resolution point rainfall and inferences for its stochastic modelling

Spatial and temporal variability of wind speed and energy over Greece

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Reprinted from MONTHLY WEATHER REVIEW, Vol. 109, No. 12, December 1981 American Meteorological Society Printed in I'. S. A.

Probability Plots. Summary. Sample StatFolio: probplots.sgp

Model Fitting. Jean Yves Le Boudec

On Generalized Entropy Measures and Non-extensive Statistical Mechanics

Method of Moments. which we usually denote by X or sometimes by X n to emphasize that there are n observations.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

ENTROPY-BASED PARAMETER ESTIMATION IN HYDROLOGY

Knowable moments and K-climacogram

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

2/23/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY THE NORMAL DISTRIBUTION THE NORMAL DISTRIBUTION

On the modelling of extreme droughts

Consideration of prior information in the inference for the upper bound earthquake magnitude - submitted major revision

On the Comparison of Fisher Information of the Weibull and GE Distributions

Towards a more physically based approach to Extreme Value Analysis in the climate system

Asymptotic distribution of the sample average value-at-risk

Title: Closed-form approximations to the Error and Complementary Error Functions and their applications in atmospheric science

MAXIMUM LQ-LIKELIHOOD ESTIMATION FOR THE PARAMETERS OF MARSHALL-OLKIN EXTENDED BURR XII DISTRIBUTION

Review for the previous lecture

Best Fit Probability Distributions for Monthly Radiosonde Weather Data

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

Assessment of the reliability of climate predictions based on comparisons with historical time series

How to make projections of daily-scale temperature variability in the future: a cross-validation study using the ENSEMBLES regional climate models

Chapter 3 Common Families of Distributions

Optimal Reinsurance Strategy with Bivariate Pareto Risks

Long term properties of monthly atmospheric pressure fields

Introduction to Information Entropy Adapted from Papoulis (1991)

Daily Rainfall Disaggregation Using HYETOS Model for Peninsular Malaysia

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Solutions to the Spring 2015 CAS Exam ST

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

INTRODUCING DIFFERENT ESTIMATORS OF TWO PARAMETERS KAPPA DISTRIBUTION

How Significant is the BIAS in Low Flow Quantiles Estimated by L- and LH-Moments?

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

3 Continuous Random Variables

ISSN Article. Tsallis Entropy, Escort Probability and the Incomplete Information Theory

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Trivariate copulas for characterisation of droughts

Estimation of Quantiles

A Marshall-Olkin Gamma Distribution and Process

Math 576: Quantitative Risk Management

LQ-Moments for Statistical Analysis of Extreme Events

Polynomials and Rational Functions (2.1) The shape of the graph of a polynomial function is related to the degree of the polynomial

Frequency Analysis & Probability Plots

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Experimental Design and Statistics - AGA47A

Shape of the return probability density function and extreme value statistics

Extreme Value Analysis and Spatial Extremes

PLANNED UPGRADE OF NIWA S HIGH INTENSITY RAINFALL DESIGN SYSTEM (HIRDS)

Temporal Analysis of Data. Identification of Hazardous Events

On the average uncertainty for systems with nonlinear coupling

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Probability Distributions Columns (a) through (d)

Simple IDF Estimation Under Multifractality

The role of teleconnections in extreme (high and low) precipitation events: The case of the Mediterranean region

Statistical Analysis of Climatological Data to Characterize Erosion Potential: 2. Precipitation Events in Eastern Oregon/Washington

Introduction & Random Variables. John Dodson. September 3, 2008

Shannon entropy in generalized order statistics from Pareto-type distributions

The Kumaraswamy-Burr Type III Distribution: Properties and Estimation

If we want to analyze experimental or simulated data we might encounter the following tasks:

Modeling Rainfall Intensity Duration Frequency (R-IDF) Relationship for Seven Divisions of Bangladesh

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

Severity Models - Special Families of Distributions

Calculation of maximum entropy densities with application to income distribution

Entropy measures of physics via complexity

Plotting data is one method for selecting a probability distribution. The following

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( )

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

CODING THEOREMS ON NEW ADDITIVE INFORMATION MEASURE OF ORDER

Parsimonious entropy-based stochastic modelling for changing hydroclimatic processes

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

1 Degree distributions and data

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Climate Change and Flood Frequency Some preliminary results for James River at Cartersville, VA Shuang-Ye Wu, Gettysburg College

Financial Econometrics and Volatility Models Extreme Value Theory

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

Products and Ratios of Two Gaussian Class Correlated Weibull Random Variables

STATISTICS SYLLABUS UNIT I

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Extreme precipitation events in the Czech Republic in the context of climate change

Introduction to Probability and Statistics (Continued)

Modelling Risk on Losses due to Water Spillage for Hydro Power Generation. A Verster and DJ de Waal

Exponential Tail Bounds

Probability Distribution

Transcription:

International Precipitation Conference (IPC10) Coimbra, Portugal, 23 25 June 2010 Topic 1 Extreme precipitation events: Physics- and statistics-based descriptions A world-wide investigation of the probability distribution of daily rainfall S.M. Papalexiou and D. Koutsoyiannis Department of Water Resources & Environmental Engineering National Technical University of Athens, Greece

The dataset A subset of the Global Historical Climatology Network-Daily database: stations with daily rainfall time series with length over 50 years (a total of 11 519 stations with very few missing values). Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 2

Entropy and maximum entropy distributions The Boltzmann-Gibbs-Shannon (BGS) entropy for a non-negative random variable X with probability density function f(x) is Φ X f xln f xdx 0 The Havrda-Charvát-Tsallis (HCT) entropy, a generalization of the BGS entropy that has gained popularity in the last decades, was originally introduced axiomatically by Havrda and Charvát [1967] and re-introduced by Tsallis [1988]; this is defined by 1 α Φα X 1 f x dx α 1 0 Application of the principle of maximum entropy (ME) [Jaynes, 1957a, b], for a given set of macroscopic constrains expressed in the form E[g i (X)] = c i, i = 1,,m, for the BGS and the HCT entropies using Lagrange multipliers λ i results, respectively, in density functions f X (x) = exp[ λ 0 λ 1 g 1 (x) λ m g m (x)], x 0 f X (x) = {1 + (1 a) [λ 0 + λ 1 g 1 (x) + + λ m g m (x)]} 1/(1 a), x 0 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 3

The most common maximum entropy distributions The most common constraints in entropy optimization assume known first and second moments. The resulting probability distributions from these constraints for positive variables are: (a) for the BGS entropy, the normal distribution truncated at zero, and, (b) for the HCT entropy, a symmetric bell-shaped distribution with powertype tails truncated at zero, called the Pearson type VII distribution (introduced by Pearson in 1916; now also called the Tsallis distribution). These distributions do not adequately describe various properties of the empirical daily rainfall distribution, e.g., (a) the truncated normal seems to fail to describe both the right and left tail of fine time-scale rainfall [Koutsoyiannis, 2005], (b) the truncated Tsallis distribution performs better but seems to fail to describe the left tail [Papalexiou and Koutsoyiannis, 2008a], and, (c) both distributions have limitations in the shapes they can form, as their skewness is only due to truncation. Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 4

Seeking a better entropy-based distribution for rainfall We seek a probability distribution resulting from the application of the principle of maximum entropy that is capable of describing the daily rainfall at all locations, if possible. It seems reasonable to generalize the constraints, from the first two moments (orders 1 and 2) to moments of unspecified order p, i.e. E(X p ) = m p. Another simple constrain that seems reasonable for non-negative positively skewed variables is the geometric mean, i.e., E(ln X) = ln μ G. Using only two constrains (μ G, m p ) and BGS entropy, the ME density is f X (x) = exp[ λ 0 λ 1 ln x λ 2 x p ], x 0 which after algebraic manipulation and renaming becomes f X x γ 1 γ2 x x exp βγ γ1 / γ2 β β 1 2 Generalized Gamma GG(x; β, γ 1, γ 2 ) where β is a scale and γ 1 and γ 2 are shape s This includes as special cases many common distributions, e.g., the Gamma, Weibull, and Exponential distributions. γ Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 5

A generalization scheme The following generalizations of the exp and ln functions are used (compatible with the definition of the exp function by Euler, 1748) 1/ n expn( x): (1 nx), so that exp( x) limexpn( x) n0 n lnn( x): ( x 1)/ n, so that ln( x) lim lnn( x) n0 Using these functions, the ME distribution for the HCT entropy is f X (x) = exp a 1 [ λ 0 λ 1 g 1 (x) λ n g n (x)], x 0 Using constraints E(X p ) = m p and E(ln n X) = ln n μ Gn, the ME distribution is f X (x) = exp n [ λ 0 λ 1 ln n x λ 2 x p ], x 0 After algebraic manipulations and assuming small n, and after renaming, this is approximated as: 3 1 1 1/ γ γ1/ γ γ 2 γ 2 γ2γ3 x x fx x 1 γ3 βbγ1 / γ2 1 / γ3, γ1 / γ2 β β The Generalized Gamma is a special case of GBII; another three- special case of GBII is the Burr type XII distribution, introduced in 1942: γ x x β β β 1 fx x 1 γγ 1 2 1 γ1 1 γ1 γ1γ2 1 Generalized Beta of 2 nd kind GBII(x; β, γ 1, γ 2, γ 3 ) Burr type XII BurrXII(x; β, γ 1, γ 2 ) Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 6

Empirical points in the Generalized Gamma L-moments space 97.6% of the empirical points are in the space of GG for 0.1 < γ 2 <2 Empirical average Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 7

Empirical points in the Burr type XII L-moments space 87.7% of the empirical points are in the space of BurrXII for 0 < γ 2 < 0.7 Pareto Empirical average Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 8

From the body to the tail of the distribution The upper tail of the distribution is its most important part as it rules the magnitude and the frequency of the extreme events. The heavy-tailed distributions, whose probability density function goes to zero less rapidly than exponentially, result in more frequent and more intense extreme events compared to distributions of exponential type. The most commonly used models for daily rainfall belong to the exponential family (e.g. Gamma). However, several studies suggest that heavy tailed distributions may be more suitable (e.g. a pioneering study by Milke, 1973, which proposed the Kappa distribution, a heavy tailed distribution, for daily rainfall). As only a very small portion of the empirical data belongs to the tail, fitting a simple (e.g. two-) distribution function using all data will be biased against the tail (the estimated s will result in a fitting that best describes the largest portion of the data, i..e the body rather than the tail). An ill-fitted tail may result in serious errors with severe consequences in hydrological design. For example, the magnitude of the 1000-year precipitation may be seriously underestimated if it is calculated from an exponential tail rather than a heavy tail. It may be assumed that a simple, two- distribution can give an acceptable fit on the tail only, i.e. over a certain threshold. Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 9

Two- special cases and their tails * Distribution Survival function F Pareto BurrXII(x, β, 1, γ) Weibull GG(x, β, γ, γ) Gamma GG(x, β, γ, 1) Log-Normal * x FX x 1 γ β F F * X * X x x 1 γ γ x exp β x Γγ, β Γ γ * 1 lnx γ FX x erfc 2 2β X x Comments The Pareto distribution is the simplest heavy-tailed distribution and depending on the shape γ may produce very extreme events. Other distributions, tail-equivalent with Pareto, are the Burr [Tadikamalla, 1980], the Kappa [Mielke, 1973] and the Log- Logistic [e.g. Ahmad et al., 1988]. The Weibull distribution, a common model in hydrology [e.g. Heo et al., 2001], can be considered as a generalization of the exponential distribution, and for a shape γ < 1, results in a heavier tail compared to that of the standard exponential distribution tail. The Gamma distribution is probably the most popular model for describing daily rainfall [e.g. Buishand, 1978]. Asymptotically, it behaves like the standard exponential distribution. The Log-Normal distribution is a very common distribution in hydrology that may approximate power-law distributions for a large portion of the body of the distribution [Mitzenmacher, 2004]. 2 * t Notes: F x 1 F x, erfcx e s 1 t dt, Γ s, x t e s 1 dt and X X 2 π x x t Γ s t e dt 0 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 10

Fitting distribution tails to empirical data The upper tail is assessed through its survival function F X* (x) := 1 F X (x) = P{X > x x > 0} for large x. For each station with record length of N years and a total number n of non-zero values, we derived the empirical survival function at the tail, F n* (x i ), as the empirical probability (according to the Weibull plotting position) of the N largest non-zero rainfall values, i.e., F n* (x i ) = r(x i )/(n + 1), with r(x i ) being the rank of the value x i, i.e., the position of x i in the ordered sample x (1),, x (n). Four different theoretical distributions were chosen and fitted to the empirical tails: Pareto, Weibull, Log-Normal and Gamma. The theoretical survival functions were fitted to the empirical ones by minimizing a modified mean square error (MSE) norm defined as MSE = (F X* ( x i ) / F n* ( x i ) 1) 2 /N; this is superior to the classical MSE norm (more balanced). Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 11

Fitting results: the Generalized Pareto tail MSE Scale Shape Min 0.0019 0.73 0.00 Mean 0.0179 9.99 0.13 Median 0.0153 9.30 0.13 Max 0.0679 50.00 0.53 Standard Deviation 0.0107 4.92 0.07 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 12

Fitting results: the Weibull tail MSE Scale Shape Min 0.0020 0.17 0.31 Mean 0.0191 7.49 0.72 Median 0.0166 6.55 0.70 Max 0.0664 53.17 1.30 Standard Deviation 0.0111 4.79 0.13 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 13

Fitting results: the Log Normal tail MSE Scale Shape Min 0.0021 0.40 0.20 Mean 0.0183 0.76 2.26 Median 0.0157 0.75 2.32 Max 0.0680 1.53 4.34 Standard Deviation 0.0110 0.14 0.60 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 14

Fitting results: the Gamma tail MSE Scale Shape Min 0.0020 0.82 0.01 Mean 0.0212 25.78 0.34 Median 0.0185 22.17 0.27 Max 0.0679 75.00 2.00 Standard Deviation 0.0123 14.57 0.26 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 15

The fatter the better, the more popular the worse! The four tails were compared in couples by means of the resulting MSE, i.e., the tail with the smaller MSE is considered best fitted. Within the couples, the Pareto tail was better fitted in approximately 60% of the stations. Interestingly, the fatter tail of each couple, in all cases, was better fitted in a higher percentage of the stations, i.e., the fatter, the better! The four fitted tails in each station were ranked according to their MSE. The tail with the smaller MSE was ranked as 1 and the one with larger as 4. The figure depicts the mean rank of all stations. The Pareto is the best fitted, while the most common model, the Gamma, is the worst. Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 16

Can we assume a global tail index? Estimates from historical data Estimates from simulated data Historical shape Simulated shape Historical MSE Simulated MSE Min 0.001 0.001 0.002 0.002 Mean 0.133 0.134 0.018 0.017 Median 0.130 0.134 0.015 0.014 95% CI (0.017, 0.255) (0.045, 0.224) (0.006, 0.040) (0.005, 0.036) Max 0.530 0.349 0.068 0.124 Standard Deviation 0.072 0.054 0.011 0.010 To test the assumption of a global tail index (Pareto shape ) [e.g., Koutsoyiannis, 2004] we generated random samples from a Pareto distribution with tail index γ = 0.13 and lengths equal to the historical ones, and applied the same methodology to estimate the shape. The sampling variability is as high as in the real world data but the historical and simulated histograms are not identical. Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 17

Conclusions A four- distribution, the Generalized Beta of the 2 nd kind, derived by maximum-entropy considerations, can describe daily rainfall at all 11 500 examined locations. The same distribution can also describe rainfall at a wide range of time scales, from hourly to annual [Papalexiou and Koutsoyiannis, 2008b]. Three- special cases of this distribution, i.e. the Generalized Gamma distribution and the Burr type XII distribution can describe very large portions of the entire daily rainfall data set (97.6% and 87.7%, respectively). Two- special cases of this distribution, i.e., the Pareto, the Weibull, and the Gamma, along with the widely used Lognormal distribution were tested for their ability to describe the distribution tails, and the following results were obtained: In comparisons in pairs, the distribution with the heavier tail performed better. Overall, the Pareto distribution performed best. The most popular model, the Gamma distribution, performed worst. Overall, the investigation supports the general conclusions that: Distributions bounded from above should be excluded. Distributions with light tails (in particular, Gamma) are generally not appropriate. Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 18

References Ahmad, M., C. Sinclair, and A. Werritty (1988), Log-logistic flood frequency analysis, Journal of Hydrology, 98(3-4), 205-224, doi:10.1016/0022-1694(88)90015-7. Buishand, T. A. (1978), Some remarks on the use of daily rainfall models, Journal of Hydrology, 36(3-4), 295-308, doi:10.1016/0022-1694(78)90150-6. Euler, L., (1748), Introductio in Analysin Infinitorum Havrda, J., and F. Charvát (1967), Concept of structural a-entropy, Kybernetika, 3, 30 35. Heo, J. H., J. D. Salas, and D. C. Boes (2001), Regional flood frequency analysis based on a Weibull model: Part 2. Simulations and applications, Journal of Hydrology, 242(3-4), 171 182. Jaynes, E. T. (1957a), Information Theory and Statistical Mechanics, Phys. Rev., 106(4), 620, doi:10.1103/physrev.106.620. Jaynes, E. T. (1957b), Information Theory and Statistical Mechanics. II, Phys. Rev., 108(2), 171, doi:10.1103/physrev.108.171. Koutsoyiannis, D. (2004), Statistics of extremes and estimation of extreme rainfall, 2, Empirical investigation of long rainfall records,hydrological Sciences Journal, 49(4), 591 610. Koutsoyiannis, D. (2005), Uncertainty, entropy, scaling and hydrological stochastics, 1, Marginal distributional properties of hydrological processes and state scaling, Hydrological Sciences Journal, 50(3), 381-404. Mielke, P. W. (1973), Another Family of Distributions for Describing and Analyzing Precipitation Data, Journal of Applied Meteorology, 12(2), 275-280. Mitzenmacher, M. (2004), A brief history of generative models for power law and lognormal distributions, Internet mathematics, 1(2), 226 251. Papalexiou, S., and D. Koutsoyiannis (2008a), Ombrian curves in a maximum entropy framework, in European Geosciences Union General Assembly 2008, p. 00702. (www.itia.ntua.gr/en/docinfo/851/). Papalexiou, S.M., and D. Koutsoyiannis (2008b), Probabilistic description of rainfall intensity at multiple time scales, IHP 2008 Capri Symposium: The Role of Hydrology in Water Resources Management, Capri, Italy, UNESCO, International Association of Hydrological Sciences (www.itia.ntua.gr/en/docinfo/884/). Tadikamalla, P. R. (1980), A Look at the Burr and Related Distributions, International Statistical Review / Revue Internationale de Statistique, 48(3), 337-344. Brazauskas, V. (2002), Fisher information matrix for the Feller-Pareto distribution, Statistics & Probability Letters, 59(2), 159-167, doi:10.1016/s0167-7152(02)00143-8. Tsallis, C. (1988), Possible generalization of Boltzmann-Gibbs statistics, Journal of Statistical Physics, 52(1), 479-487 Papalexiou & Koutsoyiannis, A world-wide investigation of the probability distribution of daily rainfall 19