GUIDE FOR THE USE OF THE DECISION SUPPORT SYSTEM (DSS)*

Similar documents
Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Properties and Hypothesis Testing

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Math 2784 (or 2794W) University of Connecticut

Groupe de Recherche en Économie et Développement International. Cahier de Recherche / Working Paper 10-18

Efficient GMM LECTURE 12 GMM II

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

4. Partial Sums and the Central Limit Theorem

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Lecture 2: Monte Carlo Simulation

GUIDELINES ON REPRESENTATIVE SAMPLING

Lecture 19: Convergence

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Problem Set 4 Due Oct, 12

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

Monte Carlo Integration

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Recurrence Relations

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

Expectation and Variance of a random variable

Simulation. Two Rule For Inverting A Distribution Function

Frequency Response of FIR Filters

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

1 Introduction to reducing variance in Monte Carlo simulations

Describing the Relation between Two Variables

Regression, Inference, and Model Building

Module 18 Discrete Time Signals and Z-Transforms Objective: Introduction : Description: Discrete Time Signal representation

Math 155 (Lecture 3)

Polynomial identity testing and global minimum cut

Probability and statistics: basic terms

Problem Set 2 Solutions

Lecture 7: October 18, 2017

5. Likelihood Ratio Tests

32 estimating the cumulative distribution function

Chapter 6 Principles of Data Reduction

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Chapter 6 Sampling Distributions

Statistics 3858 : Likelihood Ratio for Multinomial Models

Rank tests and regression rank scores tests in measurement error models

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

7.1 Convergence of sequences of random variables

Department of Mathematics

Lecture 7: Properties of Random Samples

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS


1 Inferential Methods for Correlation and Regression Analysis

Power and Type II Error

Power Comparison of Some Goodness-of-fit Tests

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

Optimally Sparse SVMs

6 Sample Size Calculations

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Gini Index and Polynomial Pen s Parade

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Control Charts for Mean for Non-Normally Correlated Data

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Assessment of extreme discharges of the Vltava River in Prague

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Chapter 13, Part A Analysis of Variance and Experimental Design

Random Variables, Sampling and Estimation

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

NCSS Statistical Software. Tolerance Intervals

The standard deviation of the mean

Access to the published version may require journal subscription. Published with permission from: Elsevier.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

Application to Random Graphs

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

The Random Walk For Dummies

ON POINTWISE BINOMIAL APPROXIMATION

Spurious Fixed E ects Regression

Correlation Regression

Signal Processing. Lecture 02: Discrete Time Signals and Systems. Ahmet Taha Koru, Ph. D. Yildiz Technical University.

Chapter 5: Hypothesis testing

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Transcription:

GUIDE FOR THE USE OF THE DECISION SUPPORT SYSTEM (DSS)* *Note: I Frech SAD (Système d Aide à la Décisio) 1. Itroductio to the DSS Eightee statistical distributios are available i HYFRAN-PLUS software to fit data sets that are idepedet, homogeous ad statioary. A Decisio Support System (DSS) is developed to support selectio of the most appropriate class of distributios, with respect to extreme values. Distributios that are usually used i flood frequecy aalysis ca be grouped i three mai classes: - Class C (regularly varyig distributios): Fréchet (EV2), Halphe IB (HIB), Log-Pearso (LP3), Iverse Gamma (IG). - Class D (sub-expoetial distributios): Halphe type A (HA), Halphe type B (B), Gumbel (EV1), Pearso type 3 (P3), Gamma (G). - Class E (Expoetial distributio). Figure 1 presets expoetial (E), sub-expoetial (D) ad regularly varyig (C) distributios. Distributios are ordered from light tailed (from the left) to heavy tailed (to the right). The limitig cases (bottom squares) represeted by distributios i the limits of classes. The tail of the class C distributios is heavier tha that of the class D distributios, which is heavier tha that of the class E. Thus, estimated quatiles ca be ordered equivaletly. Ideed, for a give sample, the T-evet correspods to the quatile of the probability of o-exceedace p 1 1 T estimated by distributios of the classes C, D ad E, are QT (C), QT (D) ad QT (E) respectively, which verify the followig relatio: QT (E) < QT (D) < QT (C). ~ I Busiess Sice 1971 ~ Water Resources Publicatios, LLC. P.O. Box 630026 Highlads Rach, CO 80163-0026, USA E-mail: ifo@wrpllc.com http://www.wrpllc.com 1

Gumbel Halphe A Gamma Pearso type 3 Fréchet Halphe IB Iverse Gamma Log-Pearso type3 Stable Distributios Light tail Class D Class C Normal Logormal Heavy tail Expoetial Pareto Figure 1: distributios ordered with respect to their right tails (El Adloui et al., 2008). The methods developed i the DSS allow the idetificatio of the most adequate class of distributio to fit a give sample, especially for extremes. These methods are (cf. Diagram): - The Log-Log plot : used to discrimiate betwee o the oe had the class C ad o the other had the classes E ad D; - The mea excess fuctio (MEF) to discrimiate betwee the classes D ad E; ad - Two statistics: Hill's ratio ad modified Jackso statistic, for cofirmatory aalysis of the coclusios suggested by the previous two methods. 2

Use the log-log plot If the curve is liear No Use the graph of the Mea Excess Fuctio (MEF) This curve is liear for the classes D ad E Yes The distributio with regular variatios (class C) i.e. HIB, EV2, LP3, IG If the slope of the curve is ull, the we suggest Expoetial type distributio (class E) i.e. Exp If the slope of the curve is positive, the we suggest Subexpoetial type distributio (class D) i.e. HA, G, P3, EV1, LN, HB Cofirmatory Aalysis - Hill's report - Statistics of Jackso Cofirmatory Aalysis - Hill's report - Statistics of Jackso Figure 2: Diagram for class selectio used i the DSS More theoretical details of this classificatio ad the criteria are available i El Adloui et al. (2008). This article is available as attachmet i the HYFRAN-PLUS setup. 2. Log-Log plot 3

The log-log plot is based o the fact that the survival fuctio F u PX u, is give by u / F u P X u e for expoetial tail with mea, ad for regularly varyig distributio with tail idex, F is equivalet to (for large quatile) : 1 1 x 1 u 1 x 1 (with 1 u F u P( X u ) C dx C C u, which is equivalet to fiite mea). Therefore, takig the logarithm we have regularly varyig distributios log P X u log C 1 log u. This suggests that, for the log-log plot, the tail probability is represeted by a straight lie for power-law (or regularly varyig distributios, class C) but ot for the other sub-expoetial or expoetial distributios (class D or E). As illustrated i figure 3, the curve represeted i the Log-Log plot correspods to a straight lie for the distributios of the class C i.e. Fréchet (EV2), Halphe type IB (HIB), Log-Pearso type 3 (LP3) ad Iverse Gamma (IG), but ot for sub-expoetial or expoetial type tails (class D or E). Whe the diagram is ot liear we suggest the use of the Mea Excess Fuctio (MEF) to discrimiate betwee the classes D ad E. Figure 3: Illustratio of the Log-Log plot to characterize the regularly varyig distributios To check the liearity of the curve i the log-log diagram, a test o the associated correlatio coefficiet is cosidered. Simulatio studies allow the determiatio of critical values correspodig to sigificace levels of 5 % ad 1 %, to test the HYPOTHESIS H0: THE DATA FOLLOW A DISTRIBUTION OF 4

THE CLASS C (i.e. THE CURVE IS LINEAR). These critical values are calculated accordig to the size N of the sample (30 N 200). Note that the decisios give by the DSS are based, by default, o the sigificace level 5 %. If the hypothesis H0 is rejected, at the sigificace level 5 %, we suggest the use of the mea excess fuctio plot (MEF). However the critical values at the sigificace level 1 % are give for more flexibility ad to allow the user to make aother decisio tha that based o the sigificace level 5 %. Ideed, if the observed correlatio coefficiet (ro) is greater tha critical value (rc) at the sigificace level 5 %, the we coclude that it is ot sigificatly differet from 1 at the sigificace level 5 % ad the hypothesis H0 of liearity is accepted at this level (Figure 4). I this case, the most adequate choice correspods to the class C of regularly varyig distributios (power-law type): Halphe type IB (HIB), Fréchet (EV2), Log-Pearso type 3 (LP3), Iverse Gamma (IG). Régio de rejet (1%) Régio de rejet (5%) Régio d acceptatio (5%) Régio d acceptatio (1%) r0 (cas1) r0 (cas2) r0 (cas3) Valeur critique (rc5%) au iveau de sigificatio 5% Valeur critique (rc1%) au iveau de sigificatio 1% Figure 4 : Illustratio de la décisio d u test uilatéral de l hypothèse H0. Figure 4 shows, i geeral, the decisio rule for a uilateral test related to two sigificace levels 1% ad 5%. The critical values correspodig to each sigificace level are, respectively, rc1% ad rc5%. These two critical values are obtaied by Mote Carlo simulatios geerated from regularly varyig distributios. For a give dataset, we calculate the correlatio coefficiet r0. To illustrate the use of this test, three cases are cosidered such as the correlatio coefficiets verify: r0(cas1) < rc5% < r0(cas2) < rc1% < r0(cas3). The hypothesis H0 (case1) is rejected for the sigificace levels 1% ad 5%. Ideed, r0(cas1) < rc5% ad r0(cas1) < rc1%. I this case the distributio is ot regularly varyig (the curve is ot liear). For case2, the hypothesis H0 is rejected at the sigificace level 1%, but it is accepted at the sigificace level of 5%. Ideed, r0(cas2) > rc5% ad r0(cas2) < rc1%. For this case, the hypothesis H0 is 5

accepted by the SAD ad the use of regularly varyig distributio is suggested (based o the sigificace level 5%). However, the critical value at the sigificace level of 1% is preseted to give more flexibility to the user. The case 3, correspods to the case where r0 is higher tha the two critical values (r0(cas3) > rc5% ad r0(cas3) > rc1%). I this case, ad for the two sigificace levels, the hypothesis H0 is accepted ad the suggested distributio belog to the class C of regularly varyig distributios. 3. The Mea Excess Fuctio Diagram (MEF) The mea excess fuctio method is based o the fuctio eu EX u X u costat for expoetial tail distributios ( eu distributio with tail idex 2: eu. This fuctio is ). However, i the case of regularly varyig u 2. The Mea Excess Fuctio (MEF) allows discrimiatig betwee the class D (sub-expoetial distributios) ad the class E (Expoetial distributio). Ideed, the curve preseted i the MEF diagram is liear for high observed values for distributios of both classes D ad E. If i additio the slope of this curve is (Figure 5): - Equal to zero, the most adequate distributio belogs to the class E (Expoetial law); - Strictly positive, the most adequate distributio belogs to the class D of sub-expoetial distributios: Halphe type A (HA), Gumbel (EV1), Halphe type B (HB), Pearso type 3 (P3), Gamma (G). 1.4 Expoetial distributio 0.5 Sub-expoetial distributio 1.2 0.4 E(X-u X>u) 1 0.8 E(X-u X>u) 0.3 0.2 0.6 0.1 0.4 0 200 400 600 800 1000 k 0 0 200 400 600 800 1000 k Figure 5: Mea excess fuctio for expoetial ad sub-expoetial distributios. 6

The use of this diagram i the DSS is based o the slope of the MEF curve for the observatios that exceed the media (50 % of the highest observed value of the sample). Simulatio studies allow the determiatio of critical values correspodig to sigificace levels of 5 % ad 1 %, to test the HYPOTHESIS H0: THE DATA FOLLOW A DISTRIBUTION OF THE CLASS E (i.e. THE SLOPE OF THE MEF IS EQUAL TO ZERO). These critical values are calculated accordig to the size N of the sample (30 N 200). Note that the decisios give by the DSS are based, by default, o the sigificace level 5 %. Whe the hypothesis H0 is accepted we suggest the use of the Expoetial distributio (class E). However, whe it is rejected at the sigificace level 5 %, we suggest the use of a distributio of the class D (HA, EV1, HB, P3, G). Note that the critical values at the sigificace level 1 % are give for more flexibility ad to allow the user to make possibly aother decisio tha that suggested for the sigificace level of 5% (Figure 4). Remark: - The Logormal distributio (LN) does t belog to ay of these classes. It has a asymptotic behaviour which is i the frotier of the classes C ad D. Ideed, the LN tail is lighter (respectively, heavier) tha that of a distributio of the class C (respectively, class D). Thus, the quatiles (QT) estimated by a distributio belogig to the classes C, D ad the LN, verify the followig relatio: QT ( D ) < QT (LN) < QT ( C ). Cosequetly: - If the paret distributio is regularly varyig (class C), ad the LN distributio is cosidered for the fit, thus the estimated quatile, for a fixed retur period, will be lower tha the real value ad there is a risk to uderestimate this quatile; - If the true distributio is sub-expoetial (class D), ad the LN distributio is cosidered for the fit, thus the estimated quatile, for a fixed retur period, will be higher tha the real value ad there is a risk to overestimate this quatile. I the DSS, ad to have a safe choice, LN is cosidered by default as a distributio of the class D. However, the user could make a differet decisio ad associate it to the class C. 7

4. Hill's ratio plot [for the theoretical details cf. El Adloui et al. 2008] The Hill ratio is defied by 1 if where X i x 0 if a i x X i x. X x i X 1 i x i X x X x 1 log / i i This method is based o the fact that a is a cosistet estimator of if the tail is regularly varyig (Class C) with tail idex (Hill, 1975). I the expressio of the Hill ratio, x is chose to be large such that PX x 0 ad PX x, ad is the idicator fuctio. The stadard Hill estimator, of the tail idex, correspods to the particular case where the observatios are ordered X X ad x X 1 k 1, where k is a iteger which teds to ifiity as teds to ifiity. I practice, oe plots a x as a fuctio of x ad looks for some stable regio from which a x ca be cosidered as a estimator of. Figure 4, presets the Hill ratio plot for a sample geerated from the regularly varyig (a) ad Expoetial (b) distributios. Figure 4: Geeralized Hill ratio plot for (a) regularly-varyig ad (b) sub-expoetial distributios. 8

This statistics is used i the DSS to cofirm the suggested choice give by the first two diagrams (the distributio belogs to the class C, D or E). - If the curve coverges to a o-ull costat value, the most adequate distributio belogs to the class C (regularly varyig distributio). We suggest the the use of a distributio of the class C: Fréchet (EV2), Halphe type B Iverse (HIB), Log-Pearso type 3 ( LP3), Iverse Gamma (IG). - If the curve decreases to zero, the distributio belog to the Sub-expoetial class (class D: Halphe type A, Gamma, Pearso type 3, Halphe type B, Gumbel); ad the Expoetial class (class E: Expoetial distributio). Note that (cf. sectio 3) to discrimiate betwee the classes D ad E, we suggest the use of the MEF method. 5. Jackso Statistic [for the theoretical details cf. El Adloui et al. 2008] This method is preseted by Beirlat et al. (2006) ad is based o the Jackso statistic. It allows to test whether the sample is cosistet with Pareto type distributios (Class B). Note that the distributios of the class C (regularly varyig distributio) have asymptotically the same behaviour as that of the Pareto distributio. Origially the Jackso statistic (Jackso, 1967) was proposed as a goodess-of-fit statistic for testig expoetial behaviour, ad give the lik betwee the Expoetial ad the Pareto distributio (if X has a Pareto distributio the logarithmic trasformatio Y log X is expoetially distributed) this statistic is used to assess Pareto-type behaviour. The Jackso statistic is further modified by takig ito accout the secod-order tail behaviour of a Pareto-type model. Beirlat et al. (2006) give the limitig distributio of this statistic with corrected bias versio for fiite size samples. The modified Jackso statistic coverges to 2 for regularly varyig distributio (Power-law) ad has a irregular behaviour for sub-expoetial or expoetial distributios (Figure 5). 9

Figure 5: Modified Jackso statistic for (a) regularly varyig ad (b) sub-expoetial distributios. I the DSS this method is cosidered as a cofirmatory method for suggested decisio based o the Log- Log ad the MEF. So: - If the curve coverges clearly ad regularly to 2, the studied distributio belogs to the class C (regularly varyig distributio). We suggest the, the use of: Fréchet (EV2), Halphe type IB (HIB), Log- Pearso type 3 (LP3), Iverse Gamma (IG); - If the curve presets some irregularities for the distributio tail, tha we suggest the sub-expoetial class (class D: Halphe type A, Gamma, Pearso type 3, Halphe type B, Gumbel); or expoetial (class E: Expoetial distributio). Note that (cf. sectio 3) to discrimiate betwee the classes D ad E, we suggest the use of the MEF method. Remarque: Eve if the modified Jackso statistic was developed to test Pareto type behaviour, it is used i the DSS to check if the of the studied distributio has similar tail as regularly varyig distributio (class C). I deed, distributios of the class C have asymptotically Pareto type tail. I practice, the Geeralized Pareto distributio (GPD) is used i the Peaks-over-threshold model (POT). However, the GPD is available i HYFRAN ad ca be used to fit ay data sets that are idepedet, homogeous ad statioary. 10

Referece: Beirlat, J., de Wet, T., Goegebeur, Y., (2006). A goodess-of-fit statistic for Pareto-type behaviour. Joural of Computatioal ad Applied Mathematics, 186, 99-116. El Adloui, S., Bobée, B. et Ouarda, T. B.M.J (2008). O the tails of extreme evet distributios i Hydrology. Accepted i Joural of Hydrology. Jackso, O.A.Y., (1967). A aalysis of departures from the expoetial distributio. Joural of the Royal Statistical Society B, 29, 540-549. 11