Quantile-quantile plots and the method of peaksover-threshold

Problems in SF2980 2009-11-09 12 6 4 2 0 2 4 6 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Figure 2: qqplot of log-returns (x-axis) against quantiles of a standard t-distribution with 4 degrees of freedom (y-axis). Note, in this plot the emprical quantiles are on the x-axis and the reference (t 4 ) quantiles are on the y-axis. Quantile-quantile plots and the method of peaksover-threshold Problem 1 You have bought 10 shares of a stock with share price $1 today and with historical daily log-returns shown in the qq-plot in Figure 2. For a standard t-distributed random variable Z with 4 degrees of freedom it holds that ES 0.01 (Z) = 5.22. Estimate ES 0.01 for your linearized portfolio net worth from today until tomorrow. Problem 2 Below you find the 24 insurance claims that exceed 10 Million Kronor out of in total 1949 claims (the total sample size is hence 1949/24 81 times larger). The claims are assumed to be observations of independent and identically distributed random variables X 1,...,X 1949. [1] 152.413209 95.168375 41.213000 34.605146 23.191095 18.301611 [7] 17.569546 16.934801 16.753927 15.527950 15.213358 14.577883 [13] 14.023591 12.701101 12.152200 12.150434 12.059369 11.310395 [19] 11.089682 11.081675 10.692103 10.471204 10.248902 10.125362 (a) Compute the empirical estimate of F 1 X (0.99) based on all 1949 claims. (b) Recall the POT approximation P(X > u + x) N u n ( 1 + γ βx) 1/ γ, x > 0, where n is the total sample size and N u the number of observations exceeding the threshold u. Use the parameter esimates ( γ, β) = (0.5, 4.5) and the POT approximation above to estimate F 1 X (0.99).

Problems in SF2980 2009-11-09 13 X 10 5 0 5 10 Z 4 2 0 2 4 30 20 10 0 10 20 30 W 10 5 0 5 10 X Z 4 2 0 2 4 W 30 10 0 10 20 30 3 2 1 0 1 2 3 Y 3 2 1 0 1 2 3 Y Figure 3: QQ-plots Problem 3 Consider four iid samples of logreturns, each of size 500, for four different stocks. The samples X = {x 1,...,x 500 }, Y = {y 1,...,y 500 }, Z = {z 1,...,z 500 } and W = {w 1,...,w 500 } are compared in the QQ-plot in Figure 3. The logreturn distributions are: N(0, 1), N(0, σ 2 ) with σ > 1, t(3) and t(10), where t(ν) denotes a standard t-distribution with ν degrees of freedom. Determine the correct match between the four samples and the four distributions. Problem 4 Consider a portfolio loss L with distribution function F. It is assumed that F = 1 F RV 2, i.e. F is regularly varying at infinity with index 2. To account for stochastic volatility you introduce the random variable σ with distribution P(σ = 1/2) = 3/4, P(σ = 2) = 1/4, which represents two possible volatility regimes. It is assumed that σ and L are independent. Which of the models L and σl gives asymptotically the biggest probability for large losses? i.e. compute the limit lim z P(σL > z)/ P(L > z). Problem 5 Let X be a financial loss with distribution function F satisfying F = 1 F RV α (α > 0), i.e. F is a regularly varying function. (a) Compute lim u P(X u > ux/α X > u) for x > 0.

Problems in SF2980 2009-11-09 14 (b) Find a sequence (a n ) such that lim n nf(a n x) = x α for x > 0. You may assume that F is continuous and strictly increasing. (c) Let (X k ) be a sequence of iid random variables satisfying X d k = X (equal in distribution) for each k. Show that lim n nf(a n x) = x α for x > 0 implies and determine H(x). lim n P(a 1 n max{x 1,...,X n } x) = H(x), x > 0, (d) Hill estimation on an iid sample of size n from the distribution of X gives you the Hill estimate ( 1 k k 1 (ln x j,n ln x k,n )) 2, j=1 where x 1,n > > x n,n and k is assumed to be optimally chosen. Use this information to approximate P(X > 2 10 6 X > 10 6 ). Problem 6 Let X 1,..., X n be observations of the iid random variables. The quantile-quantile plot of the sample with respect to a reference distribution F consists of the points {( F ( n k + 1 n + 1 ), X k,n ) } : k = 1,...,n. (a) Explain how the quantile-quantile plot can be used as a graphical tool to check if the X i s are approximately distributed according to F. (b) Let G(x) = F((x µ)/σ) for µ R, σ > 0 and suppose that the quantile-quantile plot with respect to F is approximately linear. Explain why the quantile-quantile plot with respect to G is also approximately linear. Answer must be properly motivated! (c) In Figure 4, a quantile-quantile plot is constructed from a sample of 10000 data points, X 1,...,X 10000 with respect to a reference distribution F. Does the reference distribution have too light (right) tail? too heavy (right) tail? Answer must be properly motivated! Problem 7 Let X, X 1, X 2,...,X n be iid random variables and suppose x 1, x 2,..., x n are observations of X 1, X 2,...,X n. In the peaks-over-threshold method (POT) the tail of the distribution F of X is approximated by F(u 0 + y) F(u 0 )G γ;β (y), y 0, (1) for some (large) u 0, where G γ;β is the generalised Pareto distribution.

Problems in SF2980 2009-11-09 15 10 9 8 7 6 5 4 3 2 1 0 0 20 40 60 80 100 120 140 160 180 Figure 4: Quantile-quantile plot in Problem 7 (c). (a) Assume that 0 < γ < 1. Use the approximation (1) to show that an approximation of the mean excess function e(u) of X for u u 0, is given by e(u) = β + γ(u u 0), u u 0. 1 γ (b) In Figure 5, the empirical (or sample) mean excess function e n (u) for the observations x 1, x 2,...,x n is plotted. Determine from this figure an estimate of u 0. (c) Use Figure 5 to find (graphically) estimates for γ and β. Hint: (a) The generalised Pareto distribution is given by { 1 (1 + γ G γ;β (x) = β x) 1/γ, γ 0, 1 e x/β, γ = 0. where x 0 if γ 0 and 0 x 1/γ if γ < 0. You may also use that for a > 1, b > 0 z (x z)ab(1 + bx) a 1 dx = Problem 8 1 b(a 1) (1 + bz)1 a. Let X be a positive random variable with distribution function F satisfying F RV α, α > 0 (i.e. F = 1 F is regularly varying with index α). (a) It holds that lim u sup x (0, ) F u (x) G γ,β(u) (x) = 0, where F u is the excess distribution function F u (x) = P(X u x X > u) of X over the threshold u and G γ,β(u) (x) = (1 + γx/β(u)) 1/γ, x 0. Derive the POT approximation of the tail probability P(X > u + x) expressed in terms of x, u, γ and β(u). Suppose that F(x) = 1 Cx α for x C 1/α, where C > 0 and α > 0.

Problems in SF2980 2009-11-09 16 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 Figure 5: Empirical (sample) mean excess function for the sample x 1,...,x n. (b) For some function β : (0, ) (0, ) and γ > 0 we have lim P((X u)/β(u) > x X > u) = G γ,1(x), x 0, (2) u where G γ,1 (x) = (1 + γx) 1/γ. Find γ and β(u) so that (2) holds. (c) The POT method provides the VaR estimator ( ( ) γ F 1 β(u) n X,POT (p) = u + (1 p) 1), γ N u where n is the sample size and N u the number of observations larger than u. Choose the threshold u = 10 and p = 0.995. Use the information in Figures 1 and 2 and part (a) and (b) to obtain the estimates γ and β(u). Compute the empirical estimate F 1 n (p) and the POT estimate 1 (p) of F (p) (give also numeric results). F 1 X,POT X Problem 9 An insurance company defaults if its yearly loss is greater than l. The yearly loss is given by L = X + Y and represents the sum of losses X 0 and Y 0 in two lines of business. Suppose that P(X x) = P(Y x) = F(x) and that P(X x, Y y) = min{f(x), F(y)}, where F is continuous and satisfies F RV α, α > 0. The insurance company finds its overall position risky and wants to investigate how much diversification can reduce its default probability. Therefore, it considers the loss L = X + Ỹ, where Ỹ = d Y (i.e. equally distributed) but X and Ỹ are independent. You are asked to compute P( L > l)/ P(L > l) and since l is large you may approximate this ratio of default probabilities by its limit as l.

Problems in SF2980 2009-11-09 17 5 10 15 20 25 0 100 200 300 400 500 Figure 6: 500 simulations from the distribution of X. 0 1 2 3 4 0 100 200 300 400 500 Figure 7: Hill plot of the sample, from the distribution of X, shown above.

Problems in SF2980 2009-11-09 18 (a) There exists a function f such that Compute f(α). f(α) = lim l P( L > l) P(L > l). (b) Interpret the result in (a): for which regular variation indices can the default probability be reduced, and how much, by having independent lines of business? Hint: To get an upper bound for f(α) you may use that for every ε (0, 1/2), P( L > l) = P( L > l, X εl) + P( L > l, Ỹ εl) + P( L > l, X > εl, Ỹ > εl). Then show that this implies that P( L > l) P(Ỹ > (1 ε)l) + P(X > (1 ε)l) + P(X > εl, Ỹ > εl). To get a lower bound you may use that P( L > l) P(X > l) + P(Ỹ > l) P(X > l, Ỹ > l). Then show that for some g and h, g(α) f(α) h(α, ε) for every ε (0, 1/2) and that lim ε 0 h(α, ε) = g(α).

Problems in SF2980 2009-11-09 19 Solutions Problem 1 Let Z be a standard t-distributed rv with four degrees of freedom, with df F. Let Y the log-return from today until tomorrow, with df G. We see that F 1 (q) = a + bg 1 (q), for q (0, 1), with a 1 and b 50. Hence, approximatively, G 1 (q) = F 1 (q)/50 + 1/50. Hence, we may assume that Y d = Z/50 + 1/50. This means that the linearized net worth is X = 10Y d = Z/5 + 1/5, and the linearized loss L = X d = Z/5 1/5 d = Z/5 1/5, because Z is symmetric. Then 1 ES 0.01 (X ) = 1 F 1 (p)/5 1/5dp 0.01 0.99 = ES 0.01 (Z)/5 1/5 0.844($). Problem 2 (a) [1949(1 0.99)] + 1 = 19 + 1 = 20. Hence, F 1 n (0.99) = X 20,1949 11.1. (b) We get the quantile F 1 X (p) by solving F X(F 1 X (p)) = p and using the approximation 1 F X (x) N ( u 1 + γ β(x 1/ γ u)), n for x > u. This gives With the given values this yields [ ( ) γ β n F 1 X (p) u + (1 p) 1]. γ N u F 1 X (0.99) = 10 + 9[(0.81) 1/2 1] = 10 + 9[(0.9) 1 1] = 11. Problem 3 X t(3), Y N(0, 1), Z t(10), W N(0, σ 2 ) with σ > 1. Problem 4 P(σL > z) lim z P(L > z) = lim z 3 P(L > 2z) 4 P(L > z) + lim z = 3 4 2 2 + 1 4 22 = 3 16 + 1 > 1. 1 P(L > z/2) 4 P(L > z) Hence, the model σl gives loss probabilities that are bigger than those for the model L (asymptotically).

Problems in SF2980 2009-11-09 20 (a) We have Problem 5 P(X u > ux/α X > u) = using that F is regularly varying with index α. P(X > u(1 + x/α)) P(X > u) F(u(1 + x/α)) = F(u) (1 + x/α) α, (b) lim n nf(a n ) = lim n n(1 F(a n )) = 1. Hence one choice is a n such that F(a n ) = 1 1/n, i.e. we can choose a n = F 1 (1 1/n). Now, (c) nf(a n x) = F(a nx) 1/n = F(a n x) 1 F(F 1 (1 1/n)) = F(a nx) F(a n ) x α. P(max{X 1,..., X n } a n x) = F n (a n x) = (1 F(a n x)) n ( = 1 nf(a ) nx) n exp{ x α } n = H(x). (d) From the Hill plot we find that α 2. Hence, P(X > 2 10 6 X > 10 6 ) 2 α 1/4. Problem 6 (a) If the iid random variables X 1,...,X n are distributed according to F, then the quotient between successive quantiles in the reference distribution and the successive empirical quantiles is approximately constant, i.e. F ( n k+1 n+1 ) F ( x k,n x k+1,n n k n+1 ) const. This results in the qq-plot being (approximately) a straight line. (b) Note that G (α) = σf (α) + µ. Hence, if F ((n k + 1)/(n + 1)) cx k,n + m, then ( n k + 1 ) G cσx k,n + m + µ, n + 1

Problems in SF2980 2009-11-09 21 which also is a straight line. (c) The qq-plot does not look like ( a straight ) line; ( it curves ) down in the right tail. This means that the distance F F is too large compared to n k+1 n+1 X k,n X k+1,n when k is small. That is, the distance between the high quantiles in the reference distribution is too large, which implies that it has a too heavy right tail. Problem 7 (a) By (1) we have F(x) 1 F(u 0 )G γ;β (x u 0 ) for x > u 0. Hence the density of F can be approximated by f(x) d ( ) 1 F(u 0 )G γ;β (x u 0 ) = F(u 0 )g γ;β (x u 0 ), x > u 0 dx n k n+1 with g γ;β (x) the density of G γ;β (x). The mean excess function is then e(u) = E(X u X > u) = 1 (x u)f X (x)dx F(u) u 1 (x u)f(u 0 )g γ;β (x u 0 )dx F(u 0 )G γ;β (u u 0 ) u 1 = (y (u u 0 ))g γ;β (y)dy G γ;β (u u 0 ) u u 0 = {HINT} = (1 + γ β (u u 0)) 1/γ β(1 + γ β (u u 0)) 1 1/γ = β + γ(u u 0). 1 γ 1 γ (b) Since the mean excess function is linear a reasonable estimate of u 0 is where the empirical mean excess function seems linear for u > u 0. u 0 (2.5, 3.5) seems to be a good estimate. for u > u 0, which is a straight line. We take u 0 = 3. At u = u 0 = 3 we have e n (3) 1 and at u = 5 we have e n (5) 4. We have to solve (c) The mean excess function is given by β+γ(u u 0) 1 γ β 1 γ = 1, β 1 γ + γ (5 3) = 4. 1 γ This system of equations has solution γ = 3/5, β = 2/5. (Anything reasonably close to these values will give full points). (a) See Lecture notes. Problem 8

Problems in SF2980 2009-11-09 22 (b) For u sufficiently large, P((X u)/β(u) > x X > u) = P(X > u + xβ(u) X > u) By choosing β(u) = u/α we see that P(X > u + xβ(u)) = P(X > u) (u + xβ(u)) α = u α = (1 + xβ(u)/u) α. lim P((X u)/β(u) > x X > u) = (1 + u x/α) α = G 1/α,1 (x). (c) From Figure 2 we choose our Hill estimate of α as α = 2. Since we have chosen the threshold u = 10 we have γ = 1/ α = 1/2, β(u) = u/ α = 5. Hence, with p = 0.995, ( ( ) γ F 1 β(u) n X,POT (p) = u + (1 p) 1) γ N u ( (500 ) 1/2 = 10 + 10 (1 0.995) 1) 10 = 10 + 10 ( (0.25) 1/2 1 ) = 20. For the empirical estimate, we have [500(1 0.995)] + 1 = 2 + 1 = 3. Hence, the empirical quantile estimate is the third largest observation: Fn 1 (0.995) 19. Problem 9 (a) Note that X and Y are comonotonic so that (X, Y ) d = (F (U), F (U)) where U U(0, 1). Hence, (X, Y ) d = (X, X). Hence, P(L > l) = P(X + X > l) = P(X > l/2). We have P( L > l) = P( L > l, X εl) + P( L > l, Ỹ εl) + P( L > l, X > εl, Ỹ > εl). If X + Ỹ > l and X εl, then we must have Ỹ > (1 ǫ)l. Similarly, if X + Ỹ > l and Ỹ εl, then we must have X > (1 ǫ)l. Hence, P( L > l) = P( L > l, X εl) + P( L > l, Ỹ εl) + P( L > l, X > εl, Ỹ > εl) P(Ỹ > (1 ε)l) + P(X > (1 ε)l) + P(X > εl) P(Ỹ > εl) = 2 P(X > (1 ε)l) + P(X > εl) 2.

Problems in SF2980 2009-11-09 23 Therefore, for any ε (0, 1/2), P( L > l) P(L > l) 2P(X > (1 ε)l) P(X > l/2) + P(X > εl) P(X > l/2) P(X > εl) 2 2 α (1 ε) α + 0, as l. Since ε (0, 1/2) can be chosen arbitrarily small, For the lower bound we have P( L > l) lim l P(L > l) 21 α. P( L > l) P(X > l) + P(Ỹ > l) P(X > l, Ỹ > l) P(L > l) P(X > l/2) P(X > l) P(X > l) = 2 P(X > l/2) P(X > l/2) P(X > l) 2 2 α 0, as l. Hence, We conclude that P( L > l) lim l P(L > l) 21 α. f(α) = lim l P( L > l) P(L > l) = 21 α. (b) For α < 1 (very heavy tails) the default probability is actually higher for two independent lines of business. For α > 1 the default probability is lower for two independent lines of business. For large α-values (lighter tails) the default probability is much lower for independent lines of business.