Extreme value modelling of rainfalls and

Size: px
Start display at page:

Download "Extreme value modelling of rainfalls and"

Transcription

1 Universite de Paris Sud Master s Thesis Extreme value modelling of rainfalls and flows Author: Fan JIA Supervisor: Pr Elisabeth Gassiat Dr Elena Di Bernardino Pr Michel Bera A thesis submitted in fulfilment of the requirements for the degree of Master of Probabilities and Statistics in the CEDRIC-C.N.A.M September 2013

2 UNIVERSITE DE PARIS SUD Abstract Probabilities and Statistics Department of mathematics Master of Probabilities and Statistics Extreme value modelling of rainfalls and flows by Fan JIA The main topic of this thesis is the statistical analysis of extreme values with applications to hydrology, more specifically, to rainfalls and flows. Estimation of rainfall and flow frequencies is crucial as we consider the potential significant damage to agriculture, ecology, infrastructure systems. The main goal of this study is to find out the best fitting distributions of rainfalls and flows measured in Bièvre region during the year The method of block maxima is applied when we use Generalized Extreme Value (GEV) distribution to fit the data, and the Peak Over Threshold (POT) method is applied when we use Generalized Pareto (GP) distribution[1]. We use maximum likelihood estimation for the parameters. There are two different methods allowing us to treat trends and seasonal factors, for the first method, we remove at the beginning the trends and seasonal factors, then we apply GEV and GP distribution to the remaining stationary series. For the second method, we use point process, which incorporates trends and seasonal variations in the models, instead of deseasonalizing the data[2]. With these models, we derive estimates of T years return levels for different level T. Another extremely important goal is to evaluate the function of the control system in Bièvre region installed by SIAVB ( We estimate the return levels of the flow in Bièvre region without the Bièvre control system, and this allows us to know whether the control system is efficient. In order to get an accurate and robust prediction of the return level, we estimate the hyper-parameters by cross-validation in the Ridge regression and PLS (Partial Least Squares) regression.

3 ii

4 Acknowledgements First and foremost I offer my sincerest gratitude to my supervisor, Dr. Elena Di Bernardino, without her encouragement and effort, the thesis can not been completed or written. One simply could not wish for a better or friendlier supervisor. I also would like to express my deepest appreciation to Professor Elisabeth Gassiat, who has supported me throughout my second year of Master with her patience and knowledge. I attribute the level of my Masters degree to her help and encouragement. My sincere thanks also goes to Professor Michel Béra who taught me everything I need about robust regression methods. Without his guidance, this paper can not be possible. I would like to thank M. Herve Cardinal from SIAVB and M. Jean-Marie Grisot from Veolia for providing the necessary data and very useful information about the project. Last but not least, I would like to thank the Laboratory CEDRIC of C.N.A.M. for providing the support and equipment I have needed to produce and complete my thesis and funding my studies. iii

5 Contents Abstract i Acknowledgements iii Introduction 1 1 Theoretical Framework Some definitions of hydrology Return period Probability of occurrence Return level Extreme value theory Extreme value theory and models Generalized Extreme Value distributions Generalized Pareto distribution Generalized Pareto models Dependence in the series Extremal index Estimation of the extremal index Block method Runs method Point process approach Linear regression Ridge regression Partial Least Squares regression Cross-validation Case Study Procedure of study Numerical results and discussion Descriptive statistics for the data Dependence in the series GEV approach GP approach Point process approach Generalized linear regression iv

6 Contents v Conclusion 35 Bibliography 36

7 Introduction 1 Introduction The significant damage to agriculture, ecology and disruption of human activities caused by floods makes it necessary to study the possibility of having another heavy rainfall in the future. Many areas are affected by the heavy rainfalls and the associated floods and landslides. In our case, we focus on the Bièvre region who faced serious flooding problems before the installation of the fully automated control system by SIAVB ( Our goal is to evaluate the performance of this control system and we adjust extremevalue models of the rainfalls and flows in the Bièvre region. In our case, statistical analysis of extreme rainfalls and flows is based on the observations from 10 stations in the Bièvre region. Among the 10 stations, there are 4 stations for the measurement of flows and 6 stations for the measurement of rainfalls. The data was measured every 5 minutes during 10 years from 2003 to We use two techniques to select our sample: one considers the monthly maxima of daily rainfalls and flows, the other selects exceedances over a specific threshold value. We use Generalized Extreme Value distribution (GEV) and Generalized Pareto (GP) distribution respectively to fit the observed data. The models parameters are estimated by applying the maximum likelihood method. We also use two different methods to treat the trends and the seasonal factors, the first method is a classical approach, which removes at the beginning trends and seasonal factors in order to obtain a stationary series; we then fit GEV and GP distributions to the stationary series; for the second method, we incorporate trends and seasonal variation in the models, this method allows us to treat directly the non stationary series. With the chosen models, we derive estimates for T years return levels, for different T. Since there exists a water control system installed in the region Bièvre, which consists of several computer operated water gates, with the help of these water gates, the flows can be controlled. So it is interesting to see what is going to happen without this system. To this aim, we simulate the flow without the system. The main technique here is the regression, in order to get a better quality of prediction, we apply Ridge regression and PLS regression to make the model more robust.

8 Chapter 1 Theoretical Framework 1.1 Some definitions of hydrology Frequency analysis provides a systematic approach for using historical data to relate the magnitude of a naturally occurring to the probability of its occurring in a given time period or to its recurrence interval. Frequency analysis is typically included in the study of hydrology. We present here some important notions in frequency analysis, these are applied in Section and Section Return period Return period (T ) is the average length of time in years for an event (e.g. flow or river level) of given magnitude to be exceeded. For example, if rainfall with a 100 year return period at a given location is 80 mm, it is equal to say that a rainfall of 80 mm (or greater), should occur at that location on the average only once every 100 years Probability of occurrence Probability of occurrence (p) is the probability that an event of the specified magnitude will be equalled or exceeded during a one year period. For instance, if N is the total number of values and m is the rank of a value in a list ordered by descending magnitude (x 1 > x 2 > x 3.. > x m ), then, the exceedance probability of the m th largest value, x m, is P (X x m ) = m N. 2

9 Chapter 1. Theoretical Framework 3 A fundamental relationship exists between flood return period (T ) and probability of occurrence (p). These two variables are inversely related to each other by: so we have 1 T = p := P (X x m) = m N, T = 1 P (X x m ) = N m. For example, the probability of a 100 year storm occurring in a one year period is 1/100 or Return level A return level with a return period T = 1/p years is a high threshold x(p) (e.g., annual peak flow of a river) whose probability of exceedance is p. For example, if p = 0.01, then the return period is T = 100 years, and if we have P (X x) = 0.01, that means the probability that the peak flow is greater than x is Extreme value theory In this section, we deal with extreme value distributions that are especially designed for maxima analysis. We present at first three types of extreme value distributions, the Gumbel (EV0), Frechet (EV1) and the Weibull (EV2)[3] Extreme value theory and models Let X 1,..., X n be a sequence of independent and identically distributed random variables with distribution function F and let M n = max(x 1,..., X n ) denotes the maximum. In theory, the exact distribution of the maximum can be derived by: Pr(M n z) = Pr(X 1 z,..., X n z) (1.1) = Pr(X 1 z) Pr(X n z) = (F (z)) n. (1.2) The associated indicator function I(X n > z) which is a Bernoulli process with a success probability p(z) = (1 F (z)) that depends on the magnitude z. The number of extreme events within n trials thus follows a binomial distribution and the number of trials

10 Chapter 1. Theoretical Framework 4 until an event occurs follows a geometric distribution with expected value and standard deviation of the same order O(1/p(z)). In practice, we do not have the distribution function F, but the Fisher-Tippett-Gnedenko theorem provides the following asymptotic result: Theorem 1.1 (Theorem in [4]). If there exist sequences of constants a n > 0 and b n R such that P{(M n b n )/a n z} G(z) as n and G is a non-degenerate distribution, then G belongs to one of the following families: { ( ( ))} z b Gumbel : G(z) = exp exp a for z R, (1.3) 0 z b F réchet : G(z) = { exp ( ) z b α } a z > b, { exp ( ( )) z b α } a z < b W eibull : G(z) = 1 z b. (1.4) (1.5) where α > Generalized Extreme Value distributions In probability theory and statistics, the Generalized Extreme Value (GEV) distribution is a family of continuous probability distributions including Gumbel (Equation (1.3)), Fréchet (Equation (1.4)) and Weibull (Equation (1.5)) distributions. By the Extreme Value theorem, the GEV distribution is the limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. For this reason, the GEV distribution is used as an approximation to model the maxima of long (finite) sequences of random variables. The three distributions presented by Equation (1.3) - (1.5) can be characterized by one generalized cumulative distribution function (see Equation (1.6)), that is the reason we call it a Generalized Extreme Value distribution: F (x; µ, σ, ξ) = exp { [ 1 + ξ ( )] } x µ 1/ξ, (1.6) σ

11 Chapter 1. Theoretical Framework 5 for 1+ξ(x µ)/σ > 0, where µ R is the location parameter, σ > 0 the scale parameter and ξ R the shape parameter. For ξ = 0 the expression is formally undefined and is understood as a limiting case. The density function is, consequently f(x; µ, σ, ξ) = 1 σ for 1 + ξ(x µ)/σ > 0. [ 1 + ξ ( )] { x µ ( 1/ξ) 1 exp σ [ 1 + ξ ( )] } x µ 1/ξ, (1.7) σ Link to Fréchet, Weibull and Gumbel families The shape parameter ξ governs the tail behaviour of the distributions. The sub-families defined by ξ = 0, ξ > 0 and ξ < 0 correspond, respectively, to the Gumbel, Fréchet and Weibull families, whose cumulative distribution functions are displayed below (see also Figure 1.1)[5]. ξ = 0: Gumbel F (x; µ, σ, 0) = e e (x µ)/σ for x R, (1.8) which is a distribution with a light upper tail and positively skewed. ξ > 0: Fréchet 0 x µ F (x; µ, σ, ξ) = e ((x µ)/σ) α x > µ, (1.9) which is a distribution with a heavy upper tail and infinite higher order moments. ξ < 0: Weibull e ( (x µ)/σ)α x < µ F (x; µ, σ, ξ) = 1 x µ, (1.10) where σ > 0. The Weibull distribution is a distribution with bounded upper tails.

12 Chapter 1. Theoretical Framework 6 Figure 1.1: Different types of Generalized Extreme Value distributions. We use the shape parameter to determine the behaviour of the tail of a distribution. This method is applied in Section χ 2 Test for the Gumbel distribution In statistics, the χ 2 test is a statistical test used to compare the fit of two models, one of which (the null model) is a special case of the other (the alternative model). Within the Extreme Value distributions, the Gumbel distribution can be considered as a special case of GEV distribution whose shape parameter is 0. So when we have a GEV distribution with shape parameter close to 0, it is natural to test: H 0 : ξ = 0 against H 1 : ξ 0 with unkown location and scale parameters. For a given vector x = (x 1,..., x n ), the likelihood ratio (LR) statistic is T LR (x) = 2log ( L 1 L 0 ) (1.11) where L 1 denotes the likelihood of the more complicated model (Fréchet or Weibull) which have 3 parameters (µ, σ, ξ) and L 0 denotes the likelihood of the Gumbel model which only has 2 parameters (µ, σ). The probability distribution of the test statistic is approximately a chi-squared distribution with 1 degree of freedom. Consequently, the p-value is p LR (x) = 1 χ 2 1(T LR (x)). (1.12)

13 Chapter 1. Theoretical Framework 7 The decision rule is: If p LR (x) > α, accept H 0 ; If p LR (x) < α, reject H 0 ; with significant level α a certain predetermined number (0.05, for instance). 1.3 Generalized Pareto distribution This section deals with the exceedances which is also named Peaks-Over-Threshold (POT) values. We distinguish 3 types of Generalized Pareto (GP) distributions, the Exponential distribution, the Pareto distribution and the Beta distribution Generalized Pareto models The Generalized Pareto (GP) distribution is a well known family of continuous probability distributions. It is often used to model the extreme values in a distribution. It is specified by three parameters: location µ, scale σ, and shape ξ. Sometimes it is specified by only scale σ and shape ξ parameters. The cumulative distribution function is ( ) 1/ξ ξ(x µ) σ for ξ 0, F (ξ,µ,σ) (x) = 1 exp ( x µ ) for ξ = 0. σ (1.13) for x µ when ξ 0, and µ x µ σ/ξ when ξ < 0, where µ R, σ > 0, and ξ R. The probability density function is f (ξ,µ,σ) (x) = 1 σ ( 1 + ) ( ) ξ(x µ) 1 ξ 1. (1.14) σ Link to Exponential, Pareto and Beta families The shape parameter ξ governs the tail behaviour of the distribution. The sub-families defined by ξ = 0, ξ > 0 and ξ < 0 correspond, respectively, to the Exponential, Pareto and Beta families (see Figure 1.2).

14 Chapter 1. Theoretical Framework 8 ξ = 0: Exponential family, which is a light-tailed distribution with a memoryless property. ξ > 0: Pareto family, which is a heavy-tailed distribution ξ < 0: Beta family, which is a bounded distribution. Figure 1.2: Different types of Generalized Pareto distributions. The shape parameter is used to identify the type of the distribution and the tail behaviour. We apply this method in our case study, more details of the application can be found in Section Dependence in the series The classical extremal value theory is based on the hypothesis that the variables are independent and identically distributed (i.i.d). However, in our case, we are not sure if there is any correlation between the extreme events, in order to confirm the condition, we calculate the extremal index. In this section, we use two different methods to estimate the extremal index as mentioned in the book [4], the first method is the block method, and the second is the runs method. We explain these two methods by providing the necessary theories Extremal index The extremal index is an important parameter measuring the degree of clustering of extremes in a stationary process. The extremal index, θ [0, 1], can be interpreted as the reciprocal of the mean cluster size. Apart from being of interest in its own right, it is a crucial parameter for determining the correlation between extremal values. In

15 Chapter 1. Theoretical Framework 9 reality, extremal events often tend to occur in clusters caused by local dependence of the data. The extremal index is an indicator which allows us to characterize the dependence structure of the data and their extremal behaviours. Definition Suppose we have n observations from a stationary process X i, i 0 with distribution function F. For large n and u n, it is typically the case that F n (u n ) = P(max(X 1,..., X n ) u n ) F nθ (u n ) (1.15) where θ [0, 1] is a constant for the process known as the extremal index Estimation of the extremal index One of the characterizations of the extremal index[6] is that 1/θ is the mean cluster size in the point process of exceedance times over a high threshold. This suggest that a suitable way to estimate the extremal index is to identify clusters of high level exceedances, and to calculate the mean size of those clusters. Suppose that we have n observations from a stationary series, and let N n denote the number of observations which exceed a predetermined high threshold u n. We consider two methods of defining clusters. The block method and the runs method[7]. Block method The block method divides the data into approximately k n blocks of length r n, where n k n r n. Each block is treated as one cluster, thus if Z n denotes the number of blocks in which there is at least one exceedance of the threshold u n, we consider N n /Z n to estimate the mean size of a cluster and hence estimate θ by ˆθ n = Z n /N n. (1.16) Runs method Our second method is based on the idea of runs of observations below or above the threshold defining clusters. More precisely, suppose we take any sequence of T n consecutive observations below the threshold as separating two clusters. An equivalent and in some ways more attractive characterization is the following. Define W n,i to be 1 if the i th observation is above the threshold and 0 otherwise. Let N n = n W n,i, Zn = i=1 n W n,i (1 W n,i+1 )... (1 W n,i+rn ), (1.17) i=1

16 Chapter 1. Theoretical Framework 10 then θ n = Z n/n n. (1.18) We call it the runs estimator. The definition of Z n ensures that an exceedance in position i is counted if and only if the following T n observations are all below the threshold u n, in other words, if it is the rightmost member of a cluster according to the runs definition. 1.5 Point process approach This section illustrates an alternative approach to treat the series by using the point process approach. It is interesting to determine whether there is any evidence of a long term increase, so it is natural to look for any possible evidence of the trend. On the other hand, the cycle is also an important factor that we have to take into consideration. We will use the non-homogeneous point process model in which the parameters depend on the time t rather than modelling the cycle and the trends separately. This approach has advantage in treating all of the uncertainty in parameters (see [2] and [3]). The parameters can be presented by (µ t, σ t, ξ t ). A typical model can be written in the form: µ t = q µ j=0 µ j x jt, log(σ t ) = q σ j=0 σ j x jt, ξ t = ξ, (1.19) the x jt, j = 0, 1, 2,... represents the covariates where we usually assume x 0t = 1. In our analysis, we will consider the models in which only µ t depends on the covariates. The covariates taken into consideration in our case are the following: the linear function of time t (trend component) and the sinusoidal terms with the forms sin(2πkt) and cos(2πkt) T (seasonal component), where k=1, 2,... and T =365 which represents one year. In practise, we normally choose k=1. So we can assume that the parameters have the forms as follows[2]: T µ t = µ 0 + µ 1 t + µ 2 sin(2πt)/t + µ 3 cos(2πt)/t (1.20) σ t = σ, ξ t = ξ. (1.21) This approach is used in Section 2.2.5

17 Chapter 1. Theoretical Framework Linear regression Ridge regression The Ridge regression is the most commonly used method of regularization of ill-posed problems. It is also variously known as the constrained linear inversion method, and the method of linear regularization (see [8]). When the following problem Ax = b is not well posed (either because of non-existence or non-uniqueness of x), the standard approach which seeks to minimize the residual Ax b 2 where is the Euclidean norm may not work. This may be due to the system being overdetermined or under determined (may be ill-conditioned or singular). In order to give preference to a particular solution with desirable properties, a regularization term is included in this minimization: Ax b 2 + ξx 2, for some suitably chosen matrix ξ. In many cases, this matrix is chosen as the identity matrix, giving preference to solutions with smaller norms. In other cases, lowpass operators (e.g., a difference operator or a weighted Fourier operator) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous. This regularization improves the conditioning of the problem, thus enabling a numerical solution. An explicit solution, denoted by ˆx, is given by: ˆx = (A T A + ξ T ξ) 1 A T b. The effect of regularization may be varied via the scale of matrix. When ξ is the identity matrix, it reduces to the normal least squares solution Partial Least Squares regression Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of minimum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models (see [8]).

18 Chapter 1. Theoretical Framework 12 PLS regression is used to find the fundamental relations between two matrices (X and Y), i.e. a latent variable approach to modelling the covariance structures in these two spaces. A PLS model will try to find the multidimensional direction in the X space that explains the maximum multidimensional variance direction in the Y space. PLS regression is particularly suited when the matrix of predictors has more variables than observations, and when there is multicollinearity among X values. By contrast, standard regression will fail in these cases Cross-validation Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called training set), and validating the analysis on the other subset (called validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Cross-validation is important in guarding against testing hypotheses suggested by the data, especially where further samples are hazardous, costly or impossible to collect.

19 Chapter 2 Case Study 2.1 Procedure of study The analysis of rainfalls and floods has been developed in the whole world for many years. We mainly follow the steps considered in different previous studies, as several methods are applied on our data. Our study contains the following steps: 1. Descriptive statistics for the data We focus on the descriptive statistics for the data which can give us lots of information, such as the maximum and minimum of the floods during 10 years, the average rainfalls and the flows, etc.. We also present the histograms and the cumulative distribution functions which allow us to have a first idea about the shape of the tails. 2. Dependence in the series In this section we deal with the dependence in the series, as mentioned in Section 1.4.1, we use two estimators to estimate the extremal index θ. 3. GEV approach In this section, we search the Generalized Extreme Value distribution which well fits our data, it allows us to identify which type of extreme value distributions the data belongs to, in other words, we can know the behaviours of the tails (heavy tail or light tail) of our series (see Section 1.2.2). 13

20 Chapter 2. Case Study GP approach We fit the exceedances (Peak-Over-Threshold values) with a Generalized Pareto distribution, it can be considered as another approach to identify specific behaviours for extreme values (see Section 1.3.1). 5. Point process approach As mentioned in Section (1.5), we use the point process to fit the data, in our study we introduce the trends and the cycle in the parameters using Equation (1.20) and Equation Linear regression In order to protect the area from the flood, a control system consisted by several water gates, operated instantly by computer using an automated software, was installed in several stations, in order to identify this system s utility and importance, we simulate the flow without the water gates. In order to do this, we need to calibrate the influence to the flow of the water gates. We assume the flow as a function of the rainfalls and the opening percentage of the water gates (see Section 1.6). 2.2 Numerical results and discussion Descriptive statistics for the data In our study, there are 3 datasets containing the following information respectively, the monthly cumulative rainfalls, the flows of the river Bièvre and the opening percentage of the water gates from 2003 to The data of cumulative rainfalls was collected at 6 stations in the region Bièvre, the flows of the river was collected at 4 stations and the opening percentage of water gate was recorded in 4 stations. All the data was measured each five minutes. The flows were measured by m 3 /s and the rainfalls were measured by mm. We display in Figure 2.1 the coordinates of all the concerning sites so that we can better understand the geographical distribution of these sites.

21 Chapter 2. Case Study 15 Coordinates of the stations lattitude Geneste Arcades de Buc Trou sale Bas Pres Vauboyen longitude Loup Pendu Damoiseaux Sablons Rainfall Flow Water gates Monseigneur Vilgenis Cambaceres Figure 2.1: Coordinates of the stations. The rainfalls in the original dataset were monthly cumulative, thus, the quantity of the rainfall is not continuous as it starts again from zero at the beginning of each month. The first step is to convert the cumulative data to daily rainfall by calculating the difference made in the 24 hours. Then for the flows of the river Bièvre, we use the daily average. Here we present some descriptive statistics of the daily average flow of the station Arcades de Buc for example in order to understand their individual behaviours. min 0.01 m 3 /s max 1.38 m 3 /s mean 0.21 m 3 /s median 0.17 m 3 /s sd 0.15 m 3 /s Table 2.1: Descriptive statistics for average flow of station Arcades de Buc. Figure 2.2: Average flow from 2003 to 2013 station Arcades de Buc.

22 Chapter 2. Case Study 16 From the Figure 2.2, we can observe the seasonal factors. The peaks appear during the summer every year. The maximum of average flow is 1.38 m 3 /s which appeared at 18th July We need to remove the trend and the seasonal component to get a stationary process. We assume that the different components affected the time series additively. Data = Seasonal Component + T rend + Residuals Figure 2.3: Decomposition of the original series. The residuals part doesn t present any trend or seasonal effect which can be considered as a stationary time series. Now we are interested in the maxima during a certain period, normally we use the annual maxima, however, since we have only the data for 10 years, the size is not enough to make any conclusion, so here we use the monthly maxima which gives us about 120 observations.

23 Chapter 2. Case Study 17 Figure 2.4: Monthly average flow of station Arcades de Buc. In Table 2.2, we present the same descriptive statistics of the monthly maxima of the average flow at station Arcades de Buc. We are also interested in its histogram, the empirical density and the empirical cumulative distribution function. The results are displayed in Figure 2.5, which allow us to have some information about the shape of the tails. min 0.14 m 3 /s max 1.38 m 3 /s mean 0.51 m 3 /s median 0.44 m 3 /s sd 0.23 m 3 /s Table 2.2: Monthly maxima of the average flow of station Arcades de Buc. Figure 2.5: Histogram with empirical density and cumulative distribution function of monthly maximum flow at station Arcades de Buc. The histogram and the cumulative distribution function show that the distribution is almost symmetric, the right-tail is slightly heavy: this will be tested in Section 2.2.3

24 Chapter 2. Case Study 18 with numerical methods. We can apply the same method to all the 4 stations to obtain the stationary series. They offer similar results. Table 2.3 shows the maximum and the minimum of the flow in 4 stations during the past 10 years. We can see that for all 4 stations, the maximum appears in the year 2007 and Station Max flow Date Min flow Date Arcades de Buc /07/ /12/10 Cambacérès /08/ /08/22 Monseigneur /05/ /05/24 Vauboyen /05/ /09/08 Table 2.3: Maximum flow and minimum flow of 10 years in 4 stations.

25 Chapter 2. Case Study 19 We do the same study on the daily rainfalls, using the station Geneste as an example. We present the descriptive statistics of flows at station Geneste in Table 2.4 Figure 2.6: Daily rainfall from 2003 to 2013 at station Geneste. Figure 2.7: Decomposition of daily rainfall at station Geneste.

26 Chapter 2. Case Study 20 min max mean median sd 0 mm 51.6 mm 1.52 mm 0 mm 3.58 mm Table 2.4: Descriptive statistics for daily rainfall of station Geneste. We can see that the peaks of the rainfall appear every year in August or September, and in the year 2007, there were many more days presenting a great quantity of rainfall, as we compare with the other years. On the other hand, after 2008, the rainfall seems to decrease until 2010 where we can observe a significant peak. Then we analyze the monthly maxima of rainfalls during the 10 years. We display the monthly maxima of rainfalls at station Geneste in Figure 2.8: Figure 2.8: Monthly maxima of rainfall at station Geneste. In Table 2.5, we present the descriptive statistics of the monthly maxima of rainfalls at station Geneste. We display the histogram, the empirical density and the empirical cumulative distribution function in Figure 2.9. min max mean median sd 0.8 mm 51.6 mm mm 12 mm 8.22 mm Table 2.5: Monthly maxima of the rainfall at station Geneste.

27 Chapter 2. Case Study 21 Figure 2.9: Histogram, empirical density and cumulative distribution function of monthly maximum rainfall at station Geneste. It is clear that the distribution of the rainfalls at station Geneste is not symmetric and that it presents heavier tail than the flows. This will also be verified in Section by estimating the shape parameters Dependence in the series The classical extremal value theory is based on the hypothesis that the variables are independent and identically distributed (i.i.d). However, in our case, we can see in Figure 2.2 and 2.6, the peaks seem to appear together, in order to make sure if there is any correlation between the extreme events in time, we calculate the extremal index. In this section, we use two different estimators to estimate the extremal index, the first method is the block method, and the second is the runs method as mentioned in Section In Table , we present the values of the extremal index θ estimated by two estimators. we calculate the result r=20 (the definition of r can be found in Equation (1.16) and (1.17)).

28 Chapter 2. Case Study 22 Estimations of extremal index θ for the 4 stations of flow Station Arcades de Buc Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.6: Extremal index of average flow of station Arcades de Buc. Station Cambacérès Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.7: Extremal index of average flow of station Cambacérès. Station Monseigneur Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.8: Extremal index of average flow of station Monseigneur. Station Vauboyen Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.9: Extremal index of average flow of station Vauboyen.

29 Chapter 2. Case Study 23 Estimation of extremal index θ for the 6 stations of rainfall Station Geneste Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.10: Extremal index of daily rainfall of station Geneste. Station Loup Pendu Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.11: Extremal index of daily rainfall of station Loup Pendu. Station Trou Salé Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.12: Extremal index of daily rainfall of station Trou Salé.

30 Chapter 2. Case Study 24 Station Sablons Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.13: Extremal index of daily rainfall of station Sablons. Station Vilgenis Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.14: Extremal index of daily rainfall of station Vilgenis. Station Vauboyen Quantile Threshold ˆθ by Block Method ˆθ by Runs Method Table 2.15: Extremal index of daily rainfall of station Vauboyen. With the help of these tables, we see that the extremal indexes ˆθ converge to 1 by increasing the threshold, because the higher threshold we choose, the less Peaks Over Threshold remain, and therefore, the effect of dependence is less obvious. However, the extremal indexes estimated on our series are smaller than 1 at the 95% quantile. We observe the fact that for the rainfalls, the extreme values have the trend to appear separately because the extremal indexes are close to 1, however, the extreme values of the flows seem to be more correlated. Indeed, the size of the cluster is between 1 and 3. The correlation is not very strong but it s still remarkable.

31 Chapter 2. Case Study GEV approach As we discussed in Section 2.2.1, the rainfalls and flows present different shapes of the tails. In order to know the exact result, we use Generalized Extreme Value (GEV) distribution to fit the maxima. Because of the size of our data, we will deal with the monthly maxima instead of the annual maxima. We get 108 maxima from 2003 to The whole study is based on the stationary series. First we will deal with the data of average flow, we will give the details of modelling of the data from station Arcades de Buc, and also the estimations of the parameters for the other stations. We apply the Generalized Extreme Value distribution method to fit the monthly maxima of station Arcades de Buc. We get the shape parameter ξ = 0.007, the scale parameter σ = 0.15 and the location parameter µ = We can see here that the shape parameter ξ is negative and very close to zero, it is natural to think that the monthly maxima may be fitted by a Gumbel distribution which is often used in hydrology to analyze such variables. We display in the following figure several plots of the fitted GEV distribution. Figure 2.10: Quality of adjustment using different plots. According to the Figure 2.10, we can see that the pertaining Gumbel distribution fits well the upper part of our data. So we can use this parametric approach to estimate the T-year return level. The 10-year return level is 0.92 m 3 /s and the 100- year return level is 1.26 m 3 /s.

32 Chapter 2. Case Study 26 The same method can be applied on the remaining stations. We present the estimation of the parameters in Table Station shape ξ location µ scale σ Arcades de Buc Cambacérès Monseigneur Vauboyen Table 2.16: Estimation of the parameters in GEV distribution. Now we focus on the daily rainfall measured by 6 stations. The main idea is the same with the one used for average flow. We present the results in Table Station shape ξ location µ scale σ Geneste Loup Pendu Trou Salé Sablons Vilgenis Vauboyen Table 2.17: Estimation of the parameters in GEV distribution. The χ 2 test mentioned in Section 1.6 is applied here to choose the model. The numerical results can be found in Table 2.18 and Table 2.19: Station p-value Final Model Arcades de Buc Gumbel Cambacérès Gumbel Monseigneur Gumbel Vauboyen 0.57 Gumbel Table 2.18: Likelihood Ration Test for Gumbel distribution of average flow. Station p-value Final Model Geneste Fréchet Louope Pendu 0.03 Fréchet Trou Salé 0.23 Gumbel Sablons 0.03 Fréchet Vilgenis Fréchet Vauboyen 0.01 Fréchet Table 2.19: Likelihood Ration Test for Gumbel distribution of rainfall. From Table 2.16 to Table 2.19, we can see that the flows measured at 4 stations can be fitted by the Gumbel distribution. Among them, the station Cambacérès presents more risk than other stations. Conversely, the rainfalls are fitted by Fréchet distribution

33 Chapter 2. Case Study 27 except for the station Trou Salé. The results can confirm the discussion of the shape of the tail based on descriptive statistics. It is normal that most of the flows do not have a heavy tail because they are controlled by the system which can reduce the risk significantly. However, for the rainfall, we can see that most of them present a heavy tail. From this phenomena we can imagine that without the control, the flow may present heavy tail as the rainfall does GP approach This section deals with the exceedances which is also named Peaks-Over-Threshold (POT) values (see Section 1.3.1). We try to fit a Generalized Pareto distributions on our data. We can also use the χ 2 test to choose our model. The results are displayed in the following tables. Average Flow Station shape ξ scale σ AIC p-value Final model Arcades de Buc exponential Cambacérès exponential Monseigneur exponential Vauboyen exponential Table 2.20: Estimation of the parameters in GPD and accepted exponential distribution model.

34 Chapter 2. Case Study 28 Daily rainfall Station shape ξ scale σ AIC p-value Final model Geneste Pareto Loupe Pendu Pareto Trou Salé exponential Sablons Pareto Vilgenis Pareto Vauboyen Pareto Table 2.21: Estimation of the parameters for the GPD distribution. From Table 2.20 and Table 2.21, we find that the flow measured in all of the four stations can be fitted by the exponential models which do not present heavy tail. Conversely, for the rainfalls, only the station Trou Salé should be fitted by an exponential model, other stations are more likely to be heavy-tail distributed Point process approach This section illustrates an alternative approach to include and model the trend and cycle in extreme value data, using the point process approach (see [2]). This approach is applied, as a matter of example, to the daily rainfall measured at station Loupe Pendu. In our case, it is interesting to determine whether there is any evidence of a long term increase, so it is natural to look for any possible evidence of trend. On the other hand, the cycle is also an important factor that we have to take into consideration. We use the model mentioned in Section 1.5. The location parameter µ is described with Equation (1.20), the scale parameter σ and shape parameter ξ are presented in Equation (1.21). In order to make it easier to read, we rewrite the two equations here: µ t = µ 0 + µ 1 t + µ 2 sin(2πt)/t + µ 3 cos(2πt)/t, σ t = σ, ξ t = ξ.

35 Chapter 2. Case Study 29 The threshold used here is 8.6 mm which is the 95% quantile of our data. The 4 nested candidate models used in our case are: (i) µ 1 = µ 2 = µ 3 = 0 the data presents neither the trend nor the cycle (ii) µ 1 0, µ 2 = µ 3 = 0 the data presents only the trend (iii) µ 1 = 0, µ 2 0, µ 3 0 the data presents only the cycle (iv) µ 1 0, µ 2 0, µ 3 0 the data presents both the trend and the cycle The models (i) - (iii) are three sub-models and model (iv) is the full model. The difference between negative log-likelihood (-logl in Table 2.22) would have an approximate χ 2 distribution with n degrees of freedom (n denotes the difference between the number of parameters in the full model and the sub-model) if the sub-model is correct. Model ˆµ 0 ˆµ 1 ˆµ 2 ˆµ 3 ˆσ ˆξ -LogL (i) µ 1 = µ 2 = µ 3 = (ii) µ 1 0, µ 2 = µ 3 = E (iii) µ 1 = 0, µ 2 0, µ (iv) µ 1 0, µ 2 0, µ E Table 2.22: Estimation of the parameters of 4 models. We use the negative log-likelihood to choose the best fitting model. Full model Sub-model Null Hyp p-value Selected model µ 1 0, µ 2 0, µ 3 0 µ 1 0, µ 2 = µ 3 = 0 µ 2 = µ 3 = µ 2 0, µ 3 0 µ 1 0, µ 2 0, µ 3 0 µ 1 = 0, µ 2 0, µ 3 0 µ 1 = µ 1 = 0 µ 1 = 0, µ 2 0, µ 3 0 µ 1 = µ 2 = µ 3 = 0 µ 2 = µ 3 = µ 2 0, µ 3 0 Table 2.23: Model Selection. Under the condition that the threshold is 8.6 mm, the best fitted model is the one which presents only the cycle in the parameter µ (see Equation (1.20)). We can also choose the best fitted model by using AIC (Akaike Information Criterion). Here we present the the results in the following table: Model criterion AIC µ 1 = µ 2 = µ 3 = µ 1 0, µ 2 = µ 3 = µ 1 = 0, µ 2 0, µ µ 1 0, µ 2 0, µ Table 2.24: Akaike Information Criterion of 4 models.

36 Chapter 2. Case Study 30 The AIC indicates that the model which fits best is the one with cycle in the location parameter, so we will consider model (iii) as the best fitted one. With this alternative approach by a point process model, we find that the shape parameter is between 0 and 0.2 which is close to the result we obtained by the first approach which was based on removing the trend and seasonal components at first (see Table 2.17 and 2.21). We display the probability and quantile plots in Figure Figure 2.11: Probability and quantile plot of chosen model. The same method can be applied on all the other stations which offers similar results Generalized linear regression In order to protect the area from the flood, a control system that consists of several water gates was installed in several stations by SIAVB. the system is working in the following way: As a rainfall occurs, its characteristics are measured by several devices in real time, and those measures are correlated to radar maps describing the weather, thus defining the estimated amount of rain to be expected on the ground. A software relates the current rainfall to post ones from a database, yielding a flood gates strategy that is executed in real time on the ground by tele-transmission to the water gates. It is natural to ask that what would happen(e.g. floods) if such a system was not in place. In order to answer this question, we lead to simulate the flow without the water gates, which can be simulated trough letting the water gates totally open in a flow model. Indeed, we need to know the relation between the flow and the opening percentage of the water gates. We assume the flow as a function of the rainfalls and the opening percentage of the water gates.

37 Chapter 2. Case Study 31 The variable that we try to explain is the flow of the station Arcades de Buc for example, and the explanatory variables are the rainfalls and the opening percentage of the water gates nearby. In the simple linear regression, we have 10 original explanatory variables (6 rainfalls and 4 opening percentages), flow 6 rainfall + 4 opening percentage since we would like also to consider a non linear factor, we add the terms of degree 2 of the combined explanatory variables flow 6 rainfall + 4 opening percentage + all the termes of degree 2. To make sure our method is robust, we use the Ridge regression and the PLS regression. About the cross-validation, we separate the data into two parts, one part is the training dataset (75% of the data) and the other part the validation dataset (25% of the data). We repeat the modelling procedure 10 times and calculate the mean squared error of the models for each method, we also calculate the robustness of these models (the relative error between MSE of training data and MSE of validation data, see Table 2.25). Obviously, we need to pay some price to get the robustness, that means the models may lose some fit. So it is important to get a best balance between fit and robustness, and the Ridge coefficient which is our hyper-parameter can be used to do so. The results can be found in the table below: Model MSE of the MSE of the Relative Error training data validation data 1. Regression simple % 2. Regression with 2 nd degree terms % 3. Ridge simple % 4. Ridge with 2 nd degree terms % 5. PLS simple % 6. PLS with 2 nd degree terms % Table 2.25: Mean Squared Error for different models. What can be seen from Table 2.25 is that when we raise the numbers of explanatory variables, the fit of the model is better, for example, the models with 2 nd degree terms have smaller errors (especially model 2 and model 4). However, if we come to the robustness, we find that model 2 is very unstable, which means the regression depends a lot on the separation of the data. So the ideal model here is the model 4, the Ridge regression with 2 nd degree terms.

38 Chapter 2. Case Study 32 With this final model, we simulate the flow when the water gates are totally open, which makes the influence of the control system is removed. We get the following results: Figure 2.12: Prediction of the flow at station Arcades de Buc without control system. min 3.11 max 4.65 mean 3.82 median 3.82 sd 0.06 Table 2.26: Descriptive statistics for predictive flow at station Arcades de Buc. We also display here the monthly maxima with its histogram and cumulative distribution function: Figure 2.13: Monthly maxima of predictive flow with chosen model.

39 Chapter 2. Case Study 33 Figure 2.14: Histogram with empirical density and cumulative distribution function of predictive monthly maximum flow at Station Arcades de Buc. From the descriptive statistics, we can find that there is a great increase of flow without the control system. We can also observe from Figure 2.13 that the distribution seems to present heavy tails. We estimate then the parameters of GEV and GP distributions of the predictive flow of station Arcades de Buc, the estimations of GEV distribution are presented in Table location µ scale σ shape ξ Estimates Standard Errors Table 2.27: Estimations of the parameters in GEV distribution. With the same test procedure, the p-value is 2.574e-08 which means the hypothesis that the shape parameter is zero should be rejected. Since the shape parameter is greater that 0, so we confirm that the distribution is a Fréchet distribution which presents a heavy tail. The estimation of parameters in GP distribution is showed in Table scale σ shape ξ Estimates Standard Error Table 2.28: Estimations of the parameters in GP distribution. This reconfirm the fact that the distribution presents a heavy tail.

40 Chapter 2. Case Study 34 This is an interesting result, in Section 2.2.3, we conclude that the flows can be fitted by Gumbel distributions, i.e. they don t present the heavy tails. However, when we simulate the flow without the control system, its distribution becomes a Fréchet distribution which presents heavy tail. It is a proof of the efficiency of the control system. The system does not only reduce the flow in the common situation, but also reduce the risk in extreme hydrological events. We estimate the return level of the flow under the condition that there is no control system involved and the result will be compared with the return level based on real data (with control system). Return Levels (mm) 10 years 30 years 50 years 80 years 100 years With control system Without control system Relative improvement 31.27% 57.38% 66.13% 72.69% 73.37% Table 2.29: Return Levels for 10, 30, 50, 80, 100 years, at the station Arcades de Buc. From the results above, we can find that the system is globally efficient. It can reduce the level of extreme flow significantly. This is a really satisfactory result for this control system.

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Kate R. Saunders Supervisors: Peter Taylor & David Karoly University of Melbourne April 8, 2015 Outline 1 Extreme Value

More information

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015 MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function

More information

Financial Econometrics and Volatility Models Extreme Value Theory

Financial Econometrics and Volatility Models Extreme Value Theory Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 27599-3260 rls@email.unc.edu AMS Committee on Probability and Statistics

More information

Bayesian Modelling of Extreme Rainfall Data

Bayesian Modelling of Extreme Rainfall Data Bayesian Modelling of Extreme Rainfall Data Elizabeth Smith A thesis submitted for the degree of Doctor of Philosophy at the University of Newcastle upon Tyne September 2005 UNIVERSITY OF NEWCASTLE Bayesian

More information

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz 1 STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder,

More information

Extreme Value Theory and Applications

Extreme Value Theory and Applications Extreme Value Theory and Deauville - 04/10/2013 Extreme Value Theory and Introduction Asymptotic behavior of the Sum Extreme (from Latin exter, exterus, being on the outside) : Exceeding the ordinary,

More information

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE M Glatz 1, M Fitzhenry 2, M Kulmar 1 1 Manly Hydraulics Laboratory, Department of Finance, Services

More information

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz 1 Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE Rick Katz Institute for Study of Society and Environment National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home

More information

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz

EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS. Rick Katz 1 EXTREMAL MODELS AND ENVIRONMENTAL APPLICATIONS Rick Katz Institute for Study of Society and Environment National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home page: www.isse.ucar.edu/hp_rick/

More information

Overview of Extreme Value Analysis (EVA)

Overview of Extreme Value Analysis (EVA) Overview of Extreme Value Analysis (EVA) Brian Reich North Carolina State University July 26, 2016 Rossbypalooza Chicago, IL Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24 Importance of extremes

More information

Extreme Value Theory.

Extreme Value Theory. Bank of England Centre for Central Banking Studies CEMLA 2013 Extreme Value Theory. David G. Barr November 21, 2013 Any views expressed are those of the author and not necessarily those of the Bank of

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

UNIVERSITY OF CALGARY. Inference for Dependent Generalized Extreme Values. Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES UNIVERSITY OF CALGARY Inference for Dependent Generalized Extreme Values by Jialin He A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Assessing Dependence in Extreme Values

Assessing Dependence in Extreme Values 02/09/2016 1 Motivation Motivation 2 Comparison 3 Asymptotic Independence Component-wise Maxima Measures Estimation Limitations 4 Idea Results Motivation Given historical flood levels, how high should

More information

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz 1 EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home page: www.isse.ucar.edu/staff/katz/

More information

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

HIERARCHICAL MODELS IN EXTREME VALUE THEORY HIERARCHICAL MODELS IN EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research, University of North Carolina, Chapel Hill and Statistical and Applied Mathematical Sciences

More information

CE 3710: Uncertainty Analysis in Engineering

CE 3710: Uncertainty Analysis in Engineering FINAL EXAM Monday, December 14, 10:15 am 12:15 pm, Chem Sci 101 Open book and open notes. Exam will be cumulative, but emphasis will be on material covered since Exam II Learning Expectations for Final

More information

A world-wide investigation of the probability distribution of daily rainfall

A world-wide investigation of the probability distribution of daily rainfall International Precipitation Conference (IPC10) Coimbra, Portugal, 23 25 June 2010 Topic 1 Extreme precipitation events: Physics- and statistics-based descriptions A world-wide investigation of the probability

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes Philip Jonathan Shell Technology Centre Thornton, Chester philip.jonathan@shell.com Paul Northrop University College

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

R.Garçon, F.Garavaglia, J.Gailhard, E.Paquet, F.Gottardi EDF-DTG

R.Garçon, F.Garavaglia, J.Gailhard, E.Paquet, F.Gottardi EDF-DTG Homogeneous samples and reliability of probabilistic models : using an atmospheric circulation patterns sampling for a better estimation of extreme rainfall probability R.Garçon, F.Garavaglia, J.Gailhard,

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Peaks-Over-Threshold Modelling of Environmental Data

Peaks-Over-Threshold Modelling of Environmental Data U.U.D.M. Project Report 2014:33 Peaks-Over-Threshold Modelling of Environmental Data Esther Bommier Examensarbete i matematik, 30 hp Handledare och examinator: Jesper Rydén September 2014 Department of

More information

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 27599-3260 rls@email.unc.edu

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Extreme Event Modelling

Extreme Event Modelling Extreme Event Modelling Liwei Wu, SID: 52208712 Department of Mathematics City University of Hong Kong Supervisor: Dr. Xiang Zhou March 31, 2014 Contents 1 Introduction 4 2 Theory and Methods 5 2.1 Asymptotic

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11: Statistical Analysis of EXTREMES in GEOPHYSICS Zwiers FW and Kharin VV. 1998. Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:2200 2222. http://www.ral.ucar.edu/staff/ericg/readinggroup.html

More information

Estimation of risk measures for extreme pluviometrical measurements

Estimation of risk measures for extreme pluviometrical measurements Estimation of risk measures for extreme pluviometrical measurements by Jonathan EL METHNI in collaboration with Laurent GARDES & Stéphane GIRARD 26th Annual Conference of The International Environmetrics

More information

R&D Research Project: Scaling analysis of hydrometeorological time series data

R&D Research Project: Scaling analysis of hydrometeorological time series data R&D Research Project: Scaling analysis of hydrometeorological time series data Extreme Value Analysis considering Trends: Methodology and Application to Runoff Data of the River Danube Catchment M. Kallache,

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 33 Probability Models using Gamma and Extreme Value

More information

Sharp statistical tools Statistics for extremes

Sharp statistical tools Statistics for extremes Sharp statistical tools Statistics for extremes Georg Lindgren Lund University October 18, 2012 SARMA Background Motivation We want to predict outside the range of observations Sums, averages and proportions

More information

Fin285a:Computer Simulations and Risk Assessment Section 6.2 Extreme Value Theory Daníelson, 9 (skim), skip 9.5

Fin285a:Computer Simulations and Risk Assessment Section 6.2 Extreme Value Theory Daníelson, 9 (skim), skip 9.5 Fin285a:Computer Simulations and Risk Assessment Section 6.2 Extreme Value Theory Daníelson, 9 (skim), skip 9.5 Overview Extreme value distributions Generalized Pareto distributions Tail shapes Using power

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

C4-304 STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS

C4-304 STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS 2, rue d'artois, F-75008 Paris http://www.cigre.org C4-304 Session 2004 CIGRÉ STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS baran@el.poweng.pub.ro

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Modelação de valores extremos e sua importância na

Modelação de valores extremos e sua importância na Modelação de valores extremos e sua importância na segurança e saúde Margarida Brito Departamento de Matemática FCUP (FCUP) Valores Extremos - DemSSO 1 / 12 Motivation Consider the following events Occurance

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Using statistical methods to analyse environmental extremes.

Using statistical methods to analyse environmental extremes. Using statistical methods to analyse environmental extremes. Emma Eastoe Department of Mathematics and Statistics Lancaster University December 16, 2008 Focus of talk Discuss statistical models used to

More information

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA Email: rwk@ucar.edu Web site:

More information

Statistics for extreme & sparse data

Statistics for extreme & sparse data Statistics for extreme & sparse data University of Bath December 6, 2018 Plan 1 2 3 4 5 6 The Problem Climate Change = Bad! 4 key problems Volcanic eruptions/catastrophic event prediction. Windstorms

More information

The battle of extreme value distributions: A global survey on the extreme

The battle of extreme value distributions: A global survey on the extreme 1 2 The battle of extreme value distributions: A global survey on the extreme daily rainfall 3 Simon Michael Papalexiou and Demetris Koutsoyiannis 4 5 Department of Water Resources, Faculty of Civil Engineering,

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Analysis of climate-crop yield relationships in Canada with Distance Correlation

Analysis of climate-crop yield relationships in Canada with Distance Correlation Analysis of climate-crop yield relationships in Canada with Distance Correlation by Yifan Dai (Under the Direction of Lynne Seymour) Abstract Distance correlation is a new measure of relationships between

More information

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA DANIELA JARUŠKOVÁ Department of Mathematics, Czech Technical University, Prague; jarus@mat.fsv.cvut.cz 1. Introduction The

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015 MFM Practitioner Module: Quantitative Risk Management September 23, 2015 Mixtures Mixtures Mixtures Definitions For our purposes, A random variable is a quantity whose value is not known to us right now

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Journal of Environmental Statistics

Journal of Environmental Statistics jes Journal of Environmental Statistics February 2010, Volume 1, Issue 3. http://www.jenvstat.org Exponentiated Gumbel Distribution for Estimation of Return Levels of Significant Wave Height Klara Persson

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Shape of the return probability density function and extreme value statistics

Shape of the return probability density function and extreme value statistics Shape of the return probability density function and extreme value statistics 13/09/03 Int. Workshop on Risk and Regulation, Budapest Overview I aim to elucidate a relation between one field of research

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the

More information

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture Frequency factors Normal distribution

More information

Modeling and Simulating Rainfall

Modeling and Simulating Rainfall Modeling and Simulating Rainfall Kenneth Shirley, Daniel Osgood, Andrew Robertson, Paul Block, Upmanu Lall, James Hansen, Sergey Kirshner, Vincent Moron, Michael Norton, Amor Ines, Calum Turvey, Tufa Dinku

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Athanasios Kottas Department of Applied Mathematics and Statistics,

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

3/10/03 Gregory Carey Cholesky Problems - 1. Cholesky Problems

3/10/03 Gregory Carey Cholesky Problems - 1. Cholesky Problems 3/10/03 Gregory Carey Cholesky Problems - 1 Cholesky Problems Gregory Carey Department of Psychology and Institute for Behavioral Genetics University of Colorado Boulder CO 80309-0345 Email: gregory.carey@colorado.edu

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING

ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING ON THE TWO STEP THRESHOLD SELECTION FOR OVER-THRESHOLD MODELLING Pietro Bernardara (1,2), Franck Mazas (3), Jérôme Weiss (1,2), Marc Andreewsky (1), Xavier Kergadallan (4), Michel Benoît (1,2), Luc Hamm

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Let X = lake depth at a randomly chosen point on lake surface If we draw the histogram so that the

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY

FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY 1 FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY Rick Katz Institute for Study of Society and Environment National Center for Atmospheric Research Boulder, CO USA Email: rwk@ucar.edu Web

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Change Point Analysis of Extreme Values

Change Point Analysis of Extreme Values Change Point Analysis of Extreme Values TIES 2008 p. 1/? Change Point Analysis of Extreme Values Goedele Dierckx Economische Hogeschool Sint Aloysius, Brussels, Belgium e-mail: goedele.dierckx@hubrussel.be

More information

Introduction to Algorithmic Trading Strategies Lecture 10

Introduction to Algorithmic Trading Strategies Lecture 10 Introduction to Algorithmic Trading Strategies Lecture 10 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

STAT 514 Solutions to Assignment #6

STAT 514 Solutions to Assignment #6 STAT 514 Solutions to Assignment #6 Question 1: Suppose that X 1,..., X n are a simple random sample from a Weibull distribution with density function f θ x) = θcx c 1 exp{ θx c }I{x > 0} for some fixed

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES Philip Jonathan, Kevin Ewans, David Randell, Yanyun Wu philip.jonathan@shell.com www.lancs.ac.uk/ jonathan Wave Hindcasting & Forecasting

More information

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart The Fundamentals of Heavy Tails Properties, Emergence, & Identification Jayakrishnan Nair, Adam Wierman, Bert Zwart Why am I doing a tutorial on heavy tails? Because we re writing a book on the topic Why

More information

Quantifying Weather Risk Analysis

Quantifying Weather Risk Analysis Quantifying Weather Risk Analysis Now that an index has been selected and calibrated, it can be used to conduct a more thorough risk analysis. The objective of such a risk analysis is to gain a better

More information