Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32
Outline 1 Introduction 2 Competitive and collaborative mixtures of experts 3 GARCH processes with non-parametric innovations 4 Conclusions and future work 2/ 32
Introduction Outline 1 Introduction 2 Competitive and collaborative mixtures of experts 3 GARCH processes with non-parametric innovations 4 Conclusions and future work 3/ 32
Introduction Risk, market risk and risk measures Risk has two components: 1 Uncertainty, and 2 Exposure. Market risk is caused by exposure to uncertainty in the market value of a portfolio. The risk involved by a situation is captured by the probability distribution P of its possible outcomes. A risk measure ρ maps P to a risk measurement: ρ : D R ±, (1) 4/ 32
Introduction Risk measures: Value at Risk and Expected Shortfall Value at Risk (VaR) It is the worst result within the α fraction of best results: ρ VaR (P) = P 1 (1 α). (2) Expected Shortfall (ES) It is the average result when the result is worse than the VaR: ρ ES (P) = 1 1 α P 1 (1 α) x dp(x). (3) α is usually high, e.g. 0.95 or 0.99. 5/ 32
Introduction A risk measuring example VaR 0.95 ES 0.95 density 0.000 0.005 0.010 0.015 100 50 0 50 100 Figure: Density for the loss after holding 100 euros of an imaginary asset for one year. 0.95 VaR is 29.5 euros and the 0.95 ES is 35.3 euros. loss However, we do not know the distribution of future price changes in real financial assets!!! We have to estimate it from historical data. 6/ 32
Introduction Price variations We focus on logarithmic returns: r t = log(p t ) log(p t 1 ). Price in USD 20 25 30 35 40 45 50 Percentage Logarithmic Return 15 10 5 0 5 10 15 0 500 1000 1500 0 500 1000 1500 Day Figure: Left, daily prices of Microsoft stocks from January 4, 2000 to April 18, 2007. Right, corresponding daily percentage logarithmic returns of Microsoft stocks. 7/ 32 Day
Introduction Statistical properties of returns Linear autocorrelation, time-dependent volatility and heavy tails. Histogram Normal Q Q Plot Density 0.0 0.1 0.2 0.3 0.4 0.5 Sample Quantiles 5 0 5 5 0 5 3 2 1 0 1 2 3 Normalized Return Theoretical Quantiles Figure: Left, histogram for the returns of Microsoft stocks and Gaussian density. Right, normal q-q plot for returns of Microsoft stocks. 8/ 32
Introduction Volatility models: GARCH processes We say {r t } T t=1 follows a GARCH(1,1) process if: r t = σ t ε t (4) σ 2 t = γ + α r t 2 + βσ 2 t 1, (5) where ε t N(0,1), γ > 0, α 0, β 0 α + β < 1. The volatility process is correctly modeled (more or less). The heavy-tails are not. 9/ 32
Introduction Example Series and 2 Coditional SD superimposed Normal Q Q Plot Return 15 10 5 0 5 10 15 Sample Quantiles 10 5 0 5 0 500 1000 1500 3 2 1 0 1 2 3 Day Theoretical Quantiles Figure: Left, volatility of Microsoft returns estimated by a GARCH process. Right, normal q-q plot for the GARCH residuals. 10/ 32
Introduction Validating market risk models: Backtesting 11/ 32
Competitive and collaborative mixtures of experts Outline 1 Introduction 2 Competitive and collaborative mixtures of experts 3 GARCH processes with non-parametric innovations 4 Conclusions and future work 12/ 32
Competitive and collaborative mixtures of experts Mixture of experts paradigm Output Mixture Model Act. 1 Gating Network Act. n Expert 1 Expert n Input 13/ 32
Competitive and collaborative mixtures of experts The model Volatility is modeled by a GARCH process with mean µ: r t = µ + σ t ε t (6) σ 2 t = γ + α r t µ 2 + βσ 2 t 1, (7) We eliminate heteroskedasticity with the normalization: z t = r t µ σ t. (8) Linear autocorrelations and heavy-tails in {z t } T t=1 competitive or collaborative mixtures of experts. are modeled by 14/ 32
Competitive and collaborative mixtures of experts Example of a normalization process Log Return 5 0 5 10 0 500 1000 1500 2000 2500 Day Normalized Log Return 6 2 0 2 4 6 0 500 1000 1500 2000 2500 15/ 32
Competitive and collaborative mixtures of experts The experts First order autoregressive processes: z t = φ 0 + φ 1 z t 1 + σε t, (9) where φ 1 < 1, ε t N(0,1) and σ > 0. Mixtures include up to 3 experts on a single level. 16/ 32
Competitive and collaborative mixtures of experts The gating network 1 Soft max Activation Expert 1 Z(t 1) Soft max Activation Expert 2 Soft max Activation Expert M 17/ 32
Competitive and collaborative mixtures of experts The strategies Collaboration The output of the mixture is a weighted average of the output of the experts. The gating network determines the weights. Hard Competition Only one expert is active at a given time. The gating network outputs 1 for that expert and 0 for the others. Soft Competition The output of the network determines the probability of each expert to be the single one that generates the output of the system. 18/ 32
Competitive and collaborative mixtures of experts Fitting the mixtures Models are fitted by the maximum likelihood method. Optimization problems Sometimes an expert gets stuck to a single data point close to its mean. This makes the variance of the expert tend to 0. Solution based on quasi-bayesian techniques We include a prior for the variances of the experts as if we had observed 0.1 points with sample variance 1.5 known to have been generated by each expert. This penalizes small variances. 19/ 32
Competitive and collaborative mixtures of experts Sliding window experiment We take the time series of log-returns of the Spanish stock index IBEX-35 from 12/29/1989 to 1/31/2006. 4034 points. Each mixture is trained on a normalized window of 1000 points and tested on the first out-of-sample point. The window moves forward and the process repeats obtaining 3034 points that should be standard normal distributed. We apply statistical tests to those points. 20/ 32
Competitive and collaborative mixtures of experts Test results Tests VaR Exc Expt. Shortfall #AR(1) Strategy 99% 95% 99% 95% 99% 95% 1-0.01 0.93 0.07 0.95 3 10 8 2 10 3 CL 0.01 0.19 0.03 0.39 2 10 9 10 4 2 SC 0.26 0.56 0.50 0.72 0.15 0.13 HC 0.05 0.48 0.07 0.60 3 10 6 2 10 3 CL 0.02 0.08 0.02 0.17 8 10 7 2 10 4 3 SC 0.44 0.31 0.39 0.54 0.16 0.10 HC 0.02 0.07 0.03 0.39 4 10 6 5 10 4 21/ 32
Competitive and collaborative mixtures of experts Analysis A single Gaussian distribution with time dependent mean and variance (Hard Competition and Collaboration) fail to model the heavy tails of the conditional distribution of log-returns. Soft Competitive models predict the conditional distribution as a mixture of Gaussians, a much more expressive model. 22/ 32
GARCH processes with non-parametric innovations Outline 1 Introduction 2 Competitive and collaborative mixtures of experts 3 GARCH processes with non-parametric innovations 4 Conclusions and future work 23/ 32
GARCH processes with non-parametric innovations The model A power autoregressive GARCH(1,1,1) with non-parametric innovations: r t = φ 0 + φ 1 r t 1 + σ t ε t (10) σ t = γ + α r t φ 0 φ 1 r t 2 + βσ t 1, (11) where ε t ˆf (x; {c i } N i=1 ) and ˆf (x; {c i } N i=1) = 1 Nh N ( ) x ci K, (12) h where K is the standard Gaussian density. θ = {φ 0,φ 1,γ,α,β}. i=1 24/ 32
GARCH processes with non-parametric innovations Training procedure: maximum likelihood If u t = r t φ 0 φ 1 r t 1 is the empirical autoregressive residual of the process (u 0 = 0, u 1 = r 1 ˆµ and σ 0 = ˆσ) we maximize: T L(θ, {c i } N i=1 {r t } T t=1) = ˆf t=1 ( ) ut 1 ; {c i } N i=1, (13) σ t σ t If N = T we just have to repeat 1 We standardize {u t /σ t } T t=1. 2 We hold θ fixed and maximize (13) with respect to {c i } T i=1 by setting c t = u t /σ t, t = 1,...,T. 3 We fix h by means of {c i } T i=1 and Silverman s rule. 4 We hold {c i } T i=1 fixed and maximize (13) with respect to θ by means of a non-linear optimization routine. 25/ 32
GARCH processes with non-parametric innovations Problem Kernel density estimates have difficulty modeling leptokurtic densities. Solution: perform the estimation in a transformed space where the data looks more Gaussian. Kernel Density Estimate for IBM Returns Density Estimate in a Transformed Space Log Density 30 25 20 15 10 5 0 Log Density 30 25 20 15 10 5 0 20 10 0 10 Return 20 10 0 10 Return 26/ 32
GARCH processes with non-parametric innovations Choosing a good transformation We choose g λ (x) = Φ 1 (ˆF λ (x)). Where Φ is the standard Gaussian distribution and ˆF is a stable distribution. λ is found by maximum likelihood to {u t /σ t } T t=1. Log Density 8 7 6 5 4 3 2 1 a = 1.00 a = 1.25 a = 1.50 a = 1.75 a = 2.00 Log Density 8 6 4 2 b = 0.75 b = 0.50 b = 0.00 b = 0.50 b = 0.75 10 5 0 5 10 10 5 0 5 10 x x 27/ 32
GARCH processes with non-parametric innovations Sliding window experiment We perform another sliding window experiment with the returns of IBM stocks from 1962/07/03 to 1998/12/31. 9190 points. Return 20 10 0 10 Unexpected loss Black Monday 0 2000 4000 6000 8000 Day 28/ 32
GARCH processes with non-parametric innovations Test results Tests VaR Exc Expt. Shortfall Model 99% 95% 99% 95% 99% 95% stable 1.5e-3 0.01 3.7e-5 0.08 0.026 4e-4 non-parametric 0.51 0.66 0.49 0.86 0.042 0.31 mix. of 2 exp. 5.7e-3 0.30 0.025 0.19 7e-7 7.2e-4 29/ 32
Conclusions and future work Outline 1 Introduction 2 Competitive and collaborative mixtures of experts 3 GARCH processes with non-parametric innovations 4 Conclusions and future work 30/ 32
Conclusions and future work Conclusions Asset returns show time-dependent volatility and heavy tails. Soft competitive mixtures predict a distribution which is a mixture of a few Gaussians and can account for the heavy tails. GARCH processes with non-parametric innovations outperform soft competitive mixtures because they employ constrained mixtures of thousands of Gaussians without causing overfitting. 31/ 32
Conclusions and future work Future work Risk measuring in multi-asset portfolios. Use high frequency data to build improved models. Measure risk with a higher horizon in the future. Employ different risk measures. 32/ 32