GUIDE FOR THE USE OF THE DECISION SUPPORT SYSTEM (DSS)* *Note: I Frech SAD (Système d Aide à la Décisio) 1. Itroductio to the DSS Eightee statistical distributios are available i HYFRAN-PLUS software to fit data sets that are idepedet, homogeous ad statioary. A Decisio Support System (DSS) is developed to support selectio of the most appropriate class of distributios, with respect to extreme values. Distributios that are usually used i flood frequecy aalysis ca be grouped i three mai classes: - Class C (regularly varyig distributios): Fréchet (EV2), Halphe IB (HIB), Log-Pearso (LP3), Iverse Gamma (IG). - Class D (sub-expoetial distributios): Halphe type A (HA), Halphe type B (B), Gumbel (EV1), Pearso type 3 (P3), Gamma (G). - Class E (Expoetial distributio). Figure 1 presets expoetial (E), sub-expoetial (D) ad regularly varyig (C) distributios. Distributios are ordered from light tailed (from the left) to heavy tailed (to the right). The limitig cases (bottom squares) represeted by distributios i the limits of classes. The tail of the class C distributios is heavier tha that of the class D distributios, which is heavier tha that of the class E. Thus, estimated quatiles ca be ordered equivaletly. Ideed, for a give sample, the T-evet correspods to the quatile of the probability of o-exceedace p 1 1 T estimated by distributios of the classes C, D ad E, are QT (C), QT (D) ad QT (E) respectively, which verify the followig relatio: QT (E) < QT (D) < QT (C). ~ I Busiess Sice 1971 ~ Water Resources Publicatios, LLC. P.O. Box 630026 Highlads Rach, CO 80163-0026, USA E-mail: ifo@wrpllc.com http://www.wrpllc.com 1
Gumbel Halphe A Gamma Pearso type 3 Fréchet Halphe IB Iverse Gamma Log-Pearso type3 Stable Distributios Light tail Class D Class C Normal Logormal Heavy tail Expoetial Pareto Figure 1: distributios ordered with respect to their right tails (El Adloui et al., 2008). The methods developed i the DSS allow the idetificatio of the most adequate class of distributio to fit a give sample, especially for extremes. These methods are (cf. Diagram): - The Log-Log plot : used to discrimiate betwee o the oe had the class C ad o the other had the classes E ad D; - The mea excess fuctio (MEF) to discrimiate betwee the classes D ad E; ad - Two statistics: Hill's ratio ad modified Jackso statistic, for cofirmatory aalysis of the coclusios suggested by the previous two methods. 2
Use the log-log plot If the curve is liear No Use the graph of the Mea Excess Fuctio (MEF) This curve is liear for the classes D ad E Yes The distributio with regular variatios (class C) i.e. HIB, EV2, LP3, IG If the slope of the curve is ull, the we suggest Expoetial type distributio (class E) i.e. Exp If the slope of the curve is positive, the we suggest Subexpoetial type distributio (class D) i.e. HA, G, P3, EV1, LN, HB Cofirmatory Aalysis - Hill's report - Statistics of Jackso Cofirmatory Aalysis - Hill's report - Statistics of Jackso Figure 2: Diagram for class selectio used i the DSS More theoretical details of this classificatio ad the criteria are available i El Adloui et al. (2008). This article is available as attachmet i the HYFRAN-PLUS setup. 2. Log-Log plot 3
The log-log plot is based o the fact that the survival fuctio F u PX u, is give by u / F u P X u e for expoetial tail with mea, ad for regularly varyig distributio with tail idex, F is equivalet to (for large quatile) : 1 1 x 1 u 1 x 1 (with 1 u F u P( X u ) C dx C C u, which is equivalet to fiite mea). Therefore, takig the logarithm we have regularly varyig distributios log P X u log C 1 log u. This suggests that, for the log-log plot, the tail probability is represeted by a straight lie for power-law (or regularly varyig distributios, class C) but ot for the other sub-expoetial or expoetial distributios (class D or E). As illustrated i figure 3, the curve represeted i the Log-Log plot correspods to a straight lie for the distributios of the class C i.e. Fréchet (EV2), Halphe type IB (HIB), Log-Pearso type 3 (LP3) ad Iverse Gamma (IG), but ot for sub-expoetial or expoetial type tails (class D or E). Whe the diagram is ot liear we suggest the use of the Mea Excess Fuctio (MEF) to discrimiate betwee the classes D ad E. Figure 3: Illustratio of the Log-Log plot to characterize the regularly varyig distributios To check the liearity of the curve i the log-log diagram, a test o the associated correlatio coefficiet is cosidered. Simulatio studies allow the determiatio of critical values correspodig to sigificace levels of 5 % ad 1 %, to test the HYPOTHESIS H0: THE DATA FOLLOW A DISTRIBUTION OF 4
THE CLASS C (i.e. THE CURVE IS LINEAR). These critical values are calculated accordig to the size N of the sample (30 N 200). Note that the decisios give by the DSS are based, by default, o the sigificace level 5 %. If the hypothesis H0 is rejected, at the sigificace level 5 %, we suggest the use of the mea excess fuctio plot (MEF). However the critical values at the sigificace level 1 % are give for more flexibility ad to allow the user to make aother decisio tha that based o the sigificace level 5 %. Ideed, if the observed correlatio coefficiet (ro) is greater tha critical value (rc) at the sigificace level 5 %, the we coclude that it is ot sigificatly differet from 1 at the sigificace level 5 % ad the hypothesis H0 of liearity is accepted at this level (Figure 4). I this case, the most adequate choice correspods to the class C of regularly varyig distributios (power-law type): Halphe type IB (HIB), Fréchet (EV2), Log-Pearso type 3 (LP3), Iverse Gamma (IG). Régio de rejet (1%) Régio de rejet (5%) Régio d acceptatio (5%) Régio d acceptatio (1%) r0 (cas1) r0 (cas2) r0 (cas3) Valeur critique (rc5%) au iveau de sigificatio 5% Valeur critique (rc1%) au iveau de sigificatio 1% Figure 4 : Illustratio de la décisio d u test uilatéral de l hypothèse H0. Figure 4 shows, i geeral, the decisio rule for a uilateral test related to two sigificace levels 1% ad 5%. The critical values correspodig to each sigificace level are, respectively, rc1% ad rc5%. These two critical values are obtaied by Mote Carlo simulatios geerated from regularly varyig distributios. For a give dataset, we calculate the correlatio coefficiet r0. To illustrate the use of this test, three cases are cosidered such as the correlatio coefficiets verify: r0(cas1) < rc5% < r0(cas2) < rc1% < r0(cas3). The hypothesis H0 (case1) is rejected for the sigificace levels 1% ad 5%. Ideed, r0(cas1) < rc5% ad r0(cas1) < rc1%. I this case the distributio is ot regularly varyig (the curve is ot liear). For case2, the hypothesis H0 is rejected at the sigificace level 1%, but it is accepted at the sigificace level of 5%. Ideed, r0(cas2) > rc5% ad r0(cas2) < rc1%. For this case, the hypothesis H0 is 5
accepted by the SAD ad the use of regularly varyig distributio is suggested (based o the sigificace level 5%). However, the critical value at the sigificace level of 1% is preseted to give more flexibility to the user. The case 3, correspods to the case where r0 is higher tha the two critical values (r0(cas3) > rc5% ad r0(cas3) > rc1%). I this case, ad for the two sigificace levels, the hypothesis H0 is accepted ad the suggested distributio belog to the class C of regularly varyig distributios. 3. The Mea Excess Fuctio Diagram (MEF) The mea excess fuctio method is based o the fuctio eu EX u X u costat for expoetial tail distributios ( eu distributio with tail idex 2: eu. This fuctio is ). However, i the case of regularly varyig u 2. The Mea Excess Fuctio (MEF) allows discrimiatig betwee the class D (sub-expoetial distributios) ad the class E (Expoetial distributio). Ideed, the curve preseted i the MEF diagram is liear for high observed values for distributios of both classes D ad E. If i additio the slope of this curve is (Figure 5): - Equal to zero, the most adequate distributio belogs to the class E (Expoetial law); - Strictly positive, the most adequate distributio belogs to the class D of sub-expoetial distributios: Halphe type A (HA), Gumbel (EV1), Halphe type B (HB), Pearso type 3 (P3), Gamma (G). 1.4 Expoetial distributio 0.5 Sub-expoetial distributio 1.2 0.4 E(X-u X>u) 1 0.8 E(X-u X>u) 0.3 0.2 0.6 0.1 0.4 0 200 400 600 800 1000 k 0 0 200 400 600 800 1000 k Figure 5: Mea excess fuctio for expoetial ad sub-expoetial distributios. 6
The use of this diagram i the DSS is based o the slope of the MEF curve for the observatios that exceed the media (50 % of the highest observed value of the sample). Simulatio studies allow the determiatio of critical values correspodig to sigificace levels of 5 % ad 1 %, to test the HYPOTHESIS H0: THE DATA FOLLOW A DISTRIBUTION OF THE CLASS E (i.e. THE SLOPE OF THE MEF IS EQUAL TO ZERO). These critical values are calculated accordig to the size N of the sample (30 N 200). Note that the decisios give by the DSS are based, by default, o the sigificace level 5 %. Whe the hypothesis H0 is accepted we suggest the use of the Expoetial distributio (class E). However, whe it is rejected at the sigificace level 5 %, we suggest the use of a distributio of the class D (HA, EV1, HB, P3, G). Note that the critical values at the sigificace level 1 % are give for more flexibility ad to allow the user to make possibly aother decisio tha that suggested for the sigificace level of 5% (Figure 4). Remark: - The Logormal distributio (LN) does t belog to ay of these classes. It has a asymptotic behaviour which is i the frotier of the classes C ad D. Ideed, the LN tail is lighter (respectively, heavier) tha that of a distributio of the class C (respectively, class D). Thus, the quatiles (QT) estimated by a distributio belogig to the classes C, D ad the LN, verify the followig relatio: QT ( D ) < QT (LN) < QT ( C ). Cosequetly: - If the paret distributio is regularly varyig (class C), ad the LN distributio is cosidered for the fit, thus the estimated quatile, for a fixed retur period, will be lower tha the real value ad there is a risk to uderestimate this quatile; - If the true distributio is sub-expoetial (class D), ad the LN distributio is cosidered for the fit, thus the estimated quatile, for a fixed retur period, will be higher tha the real value ad there is a risk to overestimate this quatile. I the DSS, ad to have a safe choice, LN is cosidered by default as a distributio of the class D. However, the user could make a differet decisio ad associate it to the class C. 7
4. Hill's ratio plot [for the theoretical details cf. El Adloui et al. 2008] The Hill ratio is defied by 1 if where X i x 0 if a i x X i x. X x i X 1 i x i X x X x 1 log / i i This method is based o the fact that a is a cosistet estimator of if the tail is regularly varyig (Class C) with tail idex (Hill, 1975). I the expressio of the Hill ratio, x is chose to be large such that PX x 0 ad PX x, ad is the idicator fuctio. The stadard Hill estimator, of the tail idex, correspods to the particular case where the observatios are ordered X X ad x X 1 k 1, where k is a iteger which teds to ifiity as teds to ifiity. I practice, oe plots a x as a fuctio of x ad looks for some stable regio from which a x ca be cosidered as a estimator of. Figure 4, presets the Hill ratio plot for a sample geerated from the regularly varyig (a) ad Expoetial (b) distributios. Figure 4: Geeralized Hill ratio plot for (a) regularly-varyig ad (b) sub-expoetial distributios. 8
This statistics is used i the DSS to cofirm the suggested choice give by the first two diagrams (the distributio belogs to the class C, D or E). - If the curve coverges to a o-ull costat value, the most adequate distributio belogs to the class C (regularly varyig distributio). We suggest the the use of a distributio of the class C: Fréchet (EV2), Halphe type B Iverse (HIB), Log-Pearso type 3 ( LP3), Iverse Gamma (IG). - If the curve decreases to zero, the distributio belog to the Sub-expoetial class (class D: Halphe type A, Gamma, Pearso type 3, Halphe type B, Gumbel); ad the Expoetial class (class E: Expoetial distributio). Note that (cf. sectio 3) to discrimiate betwee the classes D ad E, we suggest the use of the MEF method. 5. Jackso Statistic [for the theoretical details cf. El Adloui et al. 2008] This method is preseted by Beirlat et al. (2006) ad is based o the Jackso statistic. It allows to test whether the sample is cosistet with Pareto type distributios (Class B). Note that the distributios of the class C (regularly varyig distributio) have asymptotically the same behaviour as that of the Pareto distributio. Origially the Jackso statistic (Jackso, 1967) was proposed as a goodess-of-fit statistic for testig expoetial behaviour, ad give the lik betwee the Expoetial ad the Pareto distributio (if X has a Pareto distributio the logarithmic trasformatio Y log X is expoetially distributed) this statistic is used to assess Pareto-type behaviour. The Jackso statistic is further modified by takig ito accout the secod-order tail behaviour of a Pareto-type model. Beirlat et al. (2006) give the limitig distributio of this statistic with corrected bias versio for fiite size samples. The modified Jackso statistic coverges to 2 for regularly varyig distributio (Power-law) ad has a irregular behaviour for sub-expoetial or expoetial distributios (Figure 5). 9
Figure 5: Modified Jackso statistic for (a) regularly varyig ad (b) sub-expoetial distributios. I the DSS this method is cosidered as a cofirmatory method for suggested decisio based o the Log- Log ad the MEF. So: - If the curve coverges clearly ad regularly to 2, the studied distributio belogs to the class C (regularly varyig distributio). We suggest the, the use of: Fréchet (EV2), Halphe type IB (HIB), Log- Pearso type 3 (LP3), Iverse Gamma (IG); - If the curve presets some irregularities for the distributio tail, tha we suggest the sub-expoetial class (class D: Halphe type A, Gamma, Pearso type 3, Halphe type B, Gumbel); or expoetial (class E: Expoetial distributio). Note that (cf. sectio 3) to discrimiate betwee the classes D ad E, we suggest the use of the MEF method. Remarque: Eve if the modified Jackso statistic was developed to test Pareto type behaviour, it is used i the DSS to check if the of the studied distributio has similar tail as regularly varyig distributio (class C). I deed, distributios of the class C have asymptotically Pareto type tail. I practice, the Geeralized Pareto distributio (GPD) is used i the Peaks-over-threshold model (POT). However, the GPD is available i HYFRAN ad ca be used to fit ay data sets that are idepedet, homogeous ad statioary. 10
Referece: Beirlat, J., de Wet, T., Goegebeur, Y., (2006). A goodess-of-fit statistic for Pareto-type behaviour. Joural of Computatioal ad Applied Mathematics, 186, 99-116. El Adloui, S., Bobée, B. et Ouarda, T. B.M.J (2008). O the tails of extreme evet distributios i Hydrology. Accepted i Joural of Hydrology. Jackso, O.A.Y., (1967). A aalysis of departures from the expoetial distributio. Joural of the Royal Statistical Society B, 29, 540-549. 11