Extreme Value Theory.

Similar documents
Volatility. Gerald P. Dwyer. February Clemson University

Financial Econometrics and Volatility Models Extreme Value Theory

Week 1 Quantitative Analysis of Financial Markets Distributions A

Lectures 5 & 6: Hypothesis Testing

Asymptotic distribution of the sample average value-at-risk

Financial Econometrics and Quantitative Risk Managenent Return Properties

Introduction to Algorithmic Trading Strategies Lecture 10

Time Series Models for Measuring Market Risk

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

SPRING 2007 EXAM C SOLUTIONS

Probabilities & Statistics Revision

Efficient Estimation of Distributional Tail Shape and the Extremal Index with Applications to Risk Management

Recall the Basics of Hypothesis Testing

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

10.7 Fama and French Mutual Funds notes

Fin285a:Computer Simulations and Risk Assessment Section 6.2 Extreme Value Theory Daníelson, 9 (skim), skip 9.5

Business Statistics. Lecture 10: Course Review

Confidence intervals

Business Statistics. Lecture 9: Simple Regression

Review of Statistics

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT

Distribution Fitting (Censored Data)

3 Continuous Random Variables

Regression: Ordinary Least Squares

Solutions of the Financial Risk Management Examination

14.30 Introduction to Statistical Methods in Economics Spring 2009

CONTINUOUS RANDOM VARIABLES

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Frequency Analysis & Probability Plots

Extreme Value Theory as a Theoretical Background for Power Law Behavior

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Chapter 9. Non-Parametric Density Function Estimation

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Frequency Estimation of Rare Events by Adaptive Thresholding

Section 5.4. Ken Ueda

Econ 423 Lecture Notes: Additional Topics in Time Series 1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Physics 509: Bootstrap and Robust Parameter Estimation

Shape of the return probability density function and extreme value statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Binary Logistic Regression

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Quantifying Weather Risk Analysis

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Testing for Normality

Review of Multiple Regression

Ordinary Least Squares Regression Explained: Vartanian

Applied Econometrics (QEM)

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

IT S TIME FOR AN UPDATE EXTREME WAVES AND DIRECTIONAL DISTRIBUTIONS ALONG THE NEW SOUTH WALES COASTLINE

Multivariate Statistical Analysis

Chapter 9. Non-Parametric Density Function Estimation

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Testing for Normality

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Final Exam. Name: Solution:

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Just Enough Likelihood

CE 3710: Uncertainty Analysis in Engineering

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Practice Problems Section Problems

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Multivariate Distributions

Exam C Solutions Spring 2005

Statistical Inference with Regression Analysis

Interpreting Regression Results

VaR vs. Expected Shortfall

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ECO375 Tutorial 4 Introduction to Statistical Inference

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

CENTRAL LIMIT THEOREM (CLT)

Chapter 5 Confidence Intervals

Chapter 4: An Introduction to Probability and Statistics

Chapter 5. Means and Variances

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

DISTRIBUTIONS USED IN STATISTICAL WORK

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Probability. Table of contents

Institute of Actuaries of India

Multiple Regression Analysis

Common ontinuous random variables

[y i α βx i ] 2 (2) Q = i=1

arxiv: v1 [stat.me] 2 Mar 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

HYPOTHESIS TESTING. Hypothesis Testing

Financial Econometrics

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Chapter 1 Statistical Inference

Transcription:

Bank of England Centre for Central Banking Studies CEMLA 2013 Extreme Value Theory. David G. Barr November 21, 2013 Any views expressed are those of the author and not necessarily those of the Bank of England. 1

Contents 1 Outline. 3 2 How probable are improbable events? 3 3 Can we use the Normal distribution to calculate stock return probabilities? 4 3.1 The data exhibit excess kurtosis i.e. fat tails................................ 4 3.2 So what do we do about it?.......................................... 6 4 Estimating probability distributions. 8 4.1 Estimating a Normal distribution....................................... 8 5 Parametric vs. empirical distributions. 10 6 Extreme value theory. 11 6.1 A quick summary................................................ 11 6.2 Back to work.................................................... 12 6.3 Sensible curves.................................................. 12 7 What makes these curves sensible? The Theory in EVT. 15 7.1 The theory in general terms: Gnedenko s result................................ 15 7.2 The general EVT shape: The generalised Pareto distribution....................... 16 7.3 Do we actually fit the density function to the histogram?......................... 17 7.4 To estimate the GPD do the following:.................................... 17 7.5 Using the estimated parameters of the Generalised Pareto Distribution.................. 19 7.6 A simpler EVT shape: The Pareto distribution................................ 19 7.7 Application of the Pareto distribution to SP500 daily returns........................ 20 8 References. 27 2

1. Outline. 1. Introduction: Why is EVT useful? 2. Ingredients: (a) Probability distributions for EVT. (b) Gnedenko s results 3. A general application of EVT. 4. A simpler, restricted, application. 2. How probable are improbable events? The most damaging events occur rarely (fortunately?). As a result we have few observations with which to estimate their probability. We can estimate them using data, asset returns for example, from normal times and by assuming that they come from a standard distribution. This gives us an estimated parametric distribution, but it is typically very bad at providing estimated probabilities for rare events. Or we can assume that the frequencies of past data represent the probabilities of future outcomes. This gives us an empirical distribution, but this too is very bad at providing the probabilities we want. (Though it can be better than the parametric distribution.) A third option is to use Extreme Value Theory, which is a parametric estimate of the tails of the distribution only. 3

3. Can we use the Normal distribution to calculate stock return probabilities? 3.1. The data exhibit excess kurtosis i.e. fat tails. We hear quite a lot about stock returns being approximately lognormally distributed. I.e. ln (1 + r) N(µ, σ) (1) where ln (1 + r) is the log return. We will refer to this as just the return from now on. The following figures show the frequency of daily returns for the US and UK. They demonstrate that, to the eye at least, the lognormal is a reasonable approximation. Figure 1: SP500 4

Figure 2: FT100 The are three things worth noting in these charts: 1. The returns are centred very close to zero. This is because they are the actual returns from one day to the next. The means are actually slightly positive and, grossed up to annual rates, are about 15% p.a. 2. The data display fat-tails. We could observe these in (at least) two ways. (a) We could fit a more-general curve than the Normal and we would see it hover above the Normal in the tails. (We ll do this later.) (b) We could simply observe that there are several spikes where the Normal is effectively at zero (as these Figures show). 5

Figure 3: SP500-left tail 3. These spikes might not look important, but if we were to use a standard significance test for the returns at these points, we would reject the null hypothesis that the returns are distributed according to the Normals drawn here. 4. More dangerously in a risk management context, using the Normal makes us feel a lot safer than we actually are. 5. There are several ways to test whether an asset s returns follow a specific distribution. The Bera-Jarque test is a test against Normality (i.e. it is used to detect evidence against Normality specifically). The Kolmogorov-Smirnov test can be used to test against any specified distribution. 3.2. So what do we do about it? One possibility is to do nothing, but it s not very interesting, and leads to errors in VaR calculations. Use a simple distribution that makes a better job of matching the data, a Student-t with low degrees of freedom, for example: 6

It looks just like the normal, but it has fatter tails. Figure 4: t (- - - -) and Normal distributions. Unfortunately this doesn t work too well either - the mass of observations close to the mean dominate the parameter estimates. 7

4. Estimating probability distributions. 4.1. Estimating a Normal distribution. The normal has only two parameters to estimate, µ and σ. With these values for ˆµ and ˆσ we can construct the full Normal distribution. However, having these estimates does not imply that the true distribution of S t actually is Normal. Daily returns for the SP500 from 2 January 1957 to 26 April 2012 have the following estimated moments: ˆµ = 0.0244%, ˆσ = 1.0062% for T = 13928 observations. If the returns were Normally distributed the 1st percentile would be found from 0.0244 r 1% = 2.33 (2) 1.0062 r 1% = 2.32% (3) (4) So we should see (1% of 13928) = 139 observations lower than -2.32%. In fact there were 220 such observations. Conclusion: There are more observations in the lower tail of the actual distribution than we should expect if they are generated by a Normal distribution. 1% of the observations lie below -2.71% i.e. VaR(1%:empirical) = -2.71% 8

VaR(1%:Normal) = -2.32% Note that these conclusions about the normal are not based on a statistical test. Would we, for example, want to reject the Normal distribution if there were only 140 > 139 observations below -2.32%? As it happens the statistical tests also reject Normality in this case. 9

5. Parametric vs. empirical distributions. In the above example, it is not impossible that the returns come from a N(0.0244, 1.0062) distribution but......the number of observations below -2.32% suggests that this is unlikely. So what next? We can search for a parametric distribution that is not rejected by the data. We can assume that some of the empirical characteristics of the past observations will be reflected in the future returns. In particular we usually use the frequency distribution (or histogram ), as opposed to an estimated probability distribution. The second of these makes use of the empirical distribution. For example, we would conclude that the probability of a loss in excess of 2.32% is equal to the frequency of these lossess in the historical data i.e. p = 220/13928 = 1.58% rather than the 1% we get from the estimated Normal. The main failing of empirical distributions is that they tell us nothing about the probability of outcomes that lie outside the sample of empirical observations. 10

6. Extreme value theory. 6.1. A quick summary. EVT provides a method for fitting a sensible curve to the following observed histogram: Fitted EVT Fitted Normal Figure 5: SP500-left tail-example EVT curve. We have to do three things: 1. Pick a sensible curve. 2. Rearrange the histogram s data to simplify the estimation process. 3. Perform the estimation. 11

6.2. Back to work... Extreme value outcomes are not necessarily more unusual than any other events - consider a uniform distribution. In financial data however, it turns out that they are. This makes it very difficult to assess their probabilities. In particular, they tend not to fit into the distributions that do quite well at assessing the probabilities of ordinary outcomes. See the SP500 charts we looked at earlier. Since these events are also among the most dangerous, extreme value theory (EVT) has been adopted to deal with them. Hull (2012) summarises EVT s role rather well: [EVT] is a way of smoothing and extrapolating the tails of an empirical distribution. This is true of fitting most parametric distributions of course (why?) but EVT does a better job with the tails. 6.3. Sensible curves. We have seen that fitting a Normal to SP500 returns leads to underestimation of the tail probabilities. EVT provides a theoretical distribution that fits the tails much better, although it typically makes a mess of fitting the rest of the distribution. Does this matter? No, because we are interested only in tail risk here. 12

Why is this better than the empirical distribution? Because it is continuous, and can be extended beyond the most extreme empirical data point, making it more accurate for risk calculations. The core of EVT in finance is the Generalised Pareto Distribution. This will supply our sensible curve. We present it in terms of the probability that a variable x will be less than a number X, given that x is greater than a number m. Note that this is the cumulative distribution and not the density function. The numerical examples we saw earlier were presented in terms of the density function. We will have to take this change of approach into account when we perform the EVT estimation. Generalised Pareto Distribution (GPD): P (x < X x > m) GP D(X) (5) ( ) 1 (X m) φ GP D(X) = 1 1 + φ φ 0 (6) β ( ) X m = 1 exp φ = 0 (7) β where, β > 0, and m, φ 0. The distribution starts at x = m. Note that x is the random variable, and X represents a specific number. Many texts reverse these definitions. We return to the relative merits of these two later. For now all we need to know is that EVT based on the GPD can describe many types 13

of data. For financial returns in particular, the PD restrictions (i.e. fat-tailed) seem to hold. For GP D(X): φ GP D(X) becomes the... > 0 Frechet = 0 Gumbel < 0 Weibull For financial returns we expect to find φ > 0 since the Frechet exhibits fat tails. Financial data are often consistent with a another restriction m = β φ and when this is applied to the GPD/Frechet we get the Pareto Distribution. Pareto Distribution PD: a restricted version of the GPD that displays fat tails. P (x < X x > m) P D(X) (8) ( m ) α P D(X) = 1 x m, α > 0 (9) X = 1 KX α where K = m α (10) where the restriction is m = β φ, and α 1 φ > 0. We will estimate a GPD and a PD in what follows. 14

Figure 6: Frechet distributions. 7. What makes these curves sensible? The Theory in EVT. 7.1. The theory in general terms: Gnedenko s result. The key result of the theory is that, under certain conditions, the cdf of many random variables converges on a specific shape as we get further into the tails. This shape is the familiar smooth decline towards zero that we see in the Normal, t, χ 2 etc. We can apply this result even when we do not know which distribution the variables come from. 15

So, in finance, even though the true distribution of asset returns may remain a mystery, we can say something about the shape of the distribution in the tails, which is what matters for Value at Risk etc. And, provided that we have enough data, EVT allows us to estimate a shape that will approximate the true distribution whatever it is (subject to some conditions). More specifically, Gnedenko s result states that for a wide class of distributions the upper tail converges on the GPD above. 7.2. The general EVT shape: The generalised Pareto distribution. Textbooks typically present this material in terms of losses expressed as positive numbers - we do the same here. For many distributions the part to the right of a value u converges on the generalised Pareto distribution. I.e. if F u (Y ) is the probability that x lies between u and u + Y then, as u increases, ( ) 1 (Y u) φ F u (Y ) G φ,β (y) = 1 1 + φ β φ 0 (11) ( ) Y u = 1 exp β φ = 0 (12) I.e. as u gets larger, and we move into the tail, the distribution converges on the GPD. We use slightly different notation in contrast to (7) above to emphasize that this is an approximation to the GPD that improves as u gets larger. 16

In the limit, Y becomes X, and u becomes m as in (7). The parameters φ, β can then be estimated using non-linear methods, for any choice of u (or m). 7.3. Do we actually fit the density function to the histogram? A histogram presents the observed observations grouped into buckets. A density function presents the probabilities for every possible individual observation i.e. it does not group them into buckets. While the buckets approach is convenient for diagrams it would be complicated (though not impossible) for estimation. We actually fit the probability (more precisely, the value of the density function) of getting each observation For the cumulative distribution for the GPD is ( ) 1 (X m) φ GP D(X) = 1 1 + φ φ 0 (13) β For which the density is gpd(x) = 1 β ( ) 1 (X m) φ 1 1 + φ β (14) which we can think of, loosely, as the probability of observing x = X. 7.4. To estimate the GPD do the following: Order the sample of n loss-making returns (x) as positive numbers, starting with the largest first. Chose u: Usually this will be the empirical 95th percentile, so select the losses above this level. 17

If we want to find VaR(q%) for example, we will need to choose u to the left (closer to the mean) of this. The 95th percentile satisfies this condition for VaR(1%). Call the number of observations in this percentile n u, so n u = 0.05 (15) n Estimate the parameters of GPD(X) using maximum likelihood. The likelihood for the sample is: L = ln(l) = = n u i=1 n u 1 n u 1 ln gpd(x i ) (16) 1 β [ 1 β ( 1 + φ (x ) 1 i u) φ 1 β ( 1 + φ(x ) ] 1/φ 1 i u) β We have to find values of φ, β to maximise ln(l). (17) (18) For this we need a numerical search process, and we have to supply a couple of starting values to get the search started. Hull (2012) p316 gives the following data for losses on an example portfolio (from a total sample of 500 losses): 18

[ 1 Loss ($000s) ln β ( 1 + φ(x i u) β 477.841-8.97 345.435-7.47 282.200-6.51...... 160.778-3.71 ) 1/φ 1 ] φ=0.3,β=40 where the second column has been calculated using initial values of φ = 0.3 and β = 40. The value of u is 160. This gives n u = 22, and n u /n = 22/500 = 4.4% i.e. we are fitting the distribution to the largest 4.4% of losses. We then use the search process to find the φ, β that maximise the sum of the final column (i.e. push the sum towards zero from below). The optimising values in this case are φ = 0.436 and β = 32.532. The estimated φ > 0 confirms the presence of fat tails in the data. 7.5. Using the estimated parameters of the Generalised Pareto Distribution. VaR(q) and expected shortfall (ES(q)) where q is expressed as the weight in the tail (e.g. 1%) are ( [ V ar(q) = u + β ] φ n q 1) (19) φ n u ES(q) = V ar(q) + β φu 1 φ (20) 7.6. A simpler EVT shape: The Pareto distribution. For a lot of financial data the Pareto distribution performs well. GP D(X) = 1 KX α (21) 19

where K = α = 1 φ ( ) α β = m α (22) φ (23) The parameter α is known as the tail index. The larger the tail index, the fatter the tail. The tail index for the Normal is zero. Tail indices can be found for both parametric and empirical distributions. The tail index can then be used to calculate the VaR for either distribution. If the data satisfy this restriction, it is more efficient to estimate a Pareto distribution than the GPD. And we can fit the cumulative distribution function quite easily in this case. 7.7. Application of the Pareto distribution to SP500 daily returns. We use the same sample of SP500 data as in Section 4 above. The cumulative frequencies of deviations on both sides of the mean are shown in Table 1, in which the final column is (1 Φ(Col1)) 100. We can estimate a PD equation to fit the actual column in Table 1 as follows: 20

Deviation Percentage Percentage (below the mean) actual Normal distn > 1 s.d. 10.41 15.87 > 2 s.d. 2.33 2.23 > 3 s.d. 0.71 0.14 > 4 s.d. 0.28 0.005 > 5 s.d. 0.13 2.4 10 5 > 6 s.d. 0.08 1.0 10 7 > 7 s.d. 0.04 1.3 10 10 > 8 s.d. 0.03 0 > 9 s.d. 0.02 0 > 10 s.d. 0.01 0 Table 1: Actual and Normal one-sided frequencies. X will be the value of x measured in standard deviations. This scaling is not important - we could choose numbers for v arbitrarily. The probability that x exceeds X is KX α, which corresponds to the percentages in Table 1. Take logs of this probability to produce a linear equation for estimation i.e. ln(p r(x > X)) = ln(k) αln(x) (24) Construct the ln data for equation (24) using the first 7 observations from Table 1 (we keep the other 3 for out-of-sample tests), see Table 2. 21

1 2 3 4 X ln(x) P rop(x > X) ln[p rop(x > X)] in s.d. (Actual) = ln(col(3)) 1 0.00 0.1041-2.26 2 0.69 0.0233-3.76 3 1.10 0.0071-4.95 4 1.39 0.0028-5.89 5 1.61 0.0013-6.65 6 1.79 0.0008-7.10 7 1.95 0.0004-7.84 Table 2: Data for fitting equation (24). Plot ln(x) (column 2) against ln[p rop(x > X)] (column 4): Figure 1. These are the 2 variables for the regression. Figure 7: x axis = ln(x), y axis = ln[prop(x>x)]. 22

The plot is approximately linear, which is evidence in favour of our using the Pareto. Select the observation beyond which the actual frequencies appear most linear: Observation 3 in this case. So we run the regression using observations 3 to 7. This bit is is art, not science. OLS using the log linear observations generates ˆ ln(k) = 1.300 (25) ˆK = 0.27 (26) ˆα = 3.31 (27) Most financial time series produce ˆα between 3 and 5. From the parameter estimates we can construct the following cumulative probabilities for comparison: Deviation Percentage: Percentage: Percentage: in s.d. fitted Normal distn actual > 2 2.72 2.33 > 2.33 1.64 1.98 > 3 0.71 0.71 > 4 0.27 0.27 > 4.50 0.19 6.8 10 6 > 5 0.13 0.13 > 8.00 0.03 0 0.03 > 9.00 0.02 0 0.02 > 10.00 0.01 0 0.01 23

The value for 4.5 s.d. represents an interpolation of the fitted distribution. The values for 8, 9 and 10 s.d. represent extrapolations of the fited distribution. Figures 2 and 3 show that the power distribution fits the empirical data better than the Normal in the tails. 24

Figure 8: Fitted distributions: 1 to 7 sigma. Figure 9: Fitted distributions: 4 to 15 sigma. 25

The extrapolated power distribution does not beat the empirical in these graphs due to the Black Monday loss of 23σ. Even the Pareto underestimates that probability of this size of loss. 26

8. References. Hull s Risk Management and Financial Institutions., 3ed (2012), is an excellent source for the material covered here. Danielsson s Financial Risk Forecasting. (2011) also covers this material but at a more advanced level. Both books are published by Wiley Finance. Working with both simultaneously can be confusing because their notations differ: Hull is the better place to start. 27