High-frequency data modelling using Hawkes processes

High-frequency data modelling using Hawkes processes Valérie Chavez-Demoulin 1 joint work J.A McGill 1 Faculty of Business and Economics, University of Lausanne, Switzerland Boulder, April 2016 Boulder, April 2016 1 / 25

High frequency financial data The availability of high frequency data on transactions, quotes and order flow has revolutionized data processing and has led to statistical modeling challenges At the same time, the frequency of submission of orders has increased and the time to execution of market orders has dropped from more than 25 milliseconds in 2000 to less than a millisecond in 2014. Average number Number of price of orders in 10s changes (June 26, 2008) Citigroup 4469 12499 General Electric 2356 7862 General Motors 1275 9016 Table: Taken from R.Cont, 2011, http://ssrn.com/abstract=1748022 Boulder, April 2016 2 / 25

Trade database structure SYMBOL DATE TIME PRICE SIZE G127 CORR COND NVDA 20060901 9:30:04 28.67 264600 0 0 - NVDA 20060901 9:30:04 28.67 264509 0 2 - NVDA 20060901 9:30:05 28.65 100 0 0 - NVDA 20060901 9:30:05 28.65 200 0 0 -........ Price 36.5 37.0 37.5 38.0 38.5 09:30:03 10:15:19 11:35:30 13:54:00 15:02:55 15:45:54 Presence of erroneous trades: moving average filtering Irregularly spaced in time: linear interpolation Boulder, April 2016 3 / 25

NVDA negative log-returns from 2006 to 2008 (15-minutes linearly interpolated) Negative log returns 0.1 0.0 0.1 0.2 0.3 2006 01 03 2007 01 03 2008 01 02 2008 12 31 Time(15min intervals) NVDA, July to December 2008 negative returns exceedances : Negative log returns 0.02 0.04 0.06 0.08 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Time(15min intervals) Boulder, April 2016 4 / 25

Extreme Value Theory: Classical Peaks-Over-Threshold (iid case) 20 40 60 80 x 0 20 40 60 80 100 Time The number of exceedances over u has a Poisson distribution with mean λ Conditional on n exceedances, their sizes W j = Z j u are a random sample of size n from the generalized Pareto distribution (GPD) G ξ,σ (w) = { 1 (1 + ξw/σ) 1/ξ +, ξ 0, 1 exp( w/σ), ξ = 0. Boulder, April 2016 5 / 25

NVDA log-returns from 2006 to 2008 (15-minutes interpolated) time differences between exceedances over high threshold Difference in time 0 100 200 300 0 20 40 60 80 100 120 Exponential does not seem exponentially distributed higher than expected frequencies for short distance between extremes Boulder, April 2016 6 / 25

High-frequency financial data features Returns not iid Absolute returns highly correlated Volatility appears to change randomly with time Returns are leptokurtic or heavy tailed Extremes appear in clusters Boulder, April 2016 7 / 25

Declustering methods exist: An example for daily returns Original data Returns in % 0.04 0.08 0.12 1975 1980 1985 1990 1995 Declustered data Returns in % 0.04 0.08 0.12 1975 1980 1985 1990 1995 Boulder, April 2016 8 / 25

Idea the classical Peaks-Over-Thresholds (POT) model of EVT assumes an homogeneous Poisson process the inter-event times would be independent exponential random variables the Figures above contradict the classical model Aim: Treats the negative log-returns above a high threshold u as a realization of a marked point process: Using a first model that combines a self-exciting process for the threshold exceedance times with Generalized Pareto model for the threshold excess sizes (mark sizes) Using a second model that considers a self-exciting POT model Boulder, April 2016 9 / 25

Overall loglikelihood (POT method) Since number and mark size for the threshold exceedances are assumed independent n l(λ, σ, ξ) = log P(N u = n) g ξ,σ (w j ) j=1. = n log λ λ n log σ (1 + 1/ξ) n log(1 + ξw j /σ) + so inference can be performed separately for the frequency of exceedances and their sizes. j=1 Boulder, April 2016 10 / 25

Marked Point Processes (1) consider an event process (T i, W i ), i Z observed over the period (0; t 0 ] gives data (t 1, w 1 ),..., (t n, w n ) let H t, entire history of process up to time t the joint density of the data seen over (0, t 0 ] can be written n f Ti,W i H ti 1 (t i, w i H ti 1 )P(T n+1 > t 0 H tn ) i=1 Boulder, April 2016 11 / 25

Marked Point Processes (2) Assuming independence of times T i and marks W i the conditional density function is f Ti,W i H ti 1 (t i, w i H ti 1 ) = f Ti H ti 1 (t i H ti 1 ) f Wi H ti 1 (w i H ti 1 ) leading to the log-likelihood l = { n log f Ti H i 1 (t i ) + log P(T n+1 > t 0 H tn ) i=1 { n + log f Wi H ti 1 (w i H ti 1 ) i=1 } B } A Boulder, April 2016 12 / 25

Hawkes process Combining separately a Hawkes process for the counting process of the times (part A) and a GPD model for the contributions from the marks (part B), the likelihood becomes { n ( t0 ) } l = log λ(t i H ti ) exp λ(u H u )du i=1 0 n n log σ (1 + 1/ξ) log(1 + ξw i /σ) i=1 with λ(t H t ) being the conditional intensity Boulder, April 2016 13 / 25

Self-exciting process We chose the following general form with a constant loading µ 0 λ(t H t ) = µ 0 + ω(t t j ; w j ), j:t j <t where µ 0 > 0 and the self-exciting function ω(s) has to be positive when s 0 and 0 elsewhere. For example, the conditional intensity model proposed by Ogata (1988) is λ(t H t ) = µ 0 + ψ where µ 0, ψ, δ, γ, ρ > 0. j:t j <t e δw j (t t j + γ) 1+ρ ETAS Boulder, April 2016 14 / 25

Applications to high-frequency data: 15-minutes linearly interpolated NVDA 2008 negative log-returns Negative log returns 0.10 0.05 2008 01 02 2008 04 03 2008 07 02 2008 10 01 2008 12 31 Time(15min intervals) Estimated intensity 0.00 0.06 0.12 2008 01 02 2008 04 03 2008 07 02 2008 10 01 2008 12 31 Boulder, April 2016 15 / 25

Estimated cumulative events number ˆΛ Ht (t) = t 0 ˆλ(u H u )du and two-sided 95% and 99% confidence limits based on the Kolmogorov-Smirnov statistic Cumulative number of extreme losses 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Transformed time Boulder, April 2016 16 / 25

Alternative models (1) Hawkes process with exponential decay λ(t H t ) = µ 0 + ψ exp{δw j γ(t t j )}. j:t j <t First order Markov chain for the marks W i W i 1 = w i GPD(ξ, σ i = exp(a + bw i 1 )) whose parameters a, b and ξ can be estimated by maximising n f Wi W i 1 (w i w i 1 )f w1 (w 1 ) i=2 Diagnostic for the marginal distribution based on the residuals R j = ˆξ 1 log (1 + ˆξw j /ˆσ j ), j = 1,..., n approximately independent unit exponential variables Boulder, April 2016 17 / 25

Alternative models (2): A self-exciting POT model A model with predictable marks λ (t, w) = τ + α ( 1ν(t) w u 1 + ξ σ + α 2 ν(t) σ + α 2 ν(t) ) 1/ξ 1 Where ν(t) = j:t j <t ω(t t j ) uses a self-exciting function ω. This uses the two-dimensional re-parametrisation of the POT: Homogeneous Poisson process for the exceedances of the level u with intensity τ = ln H(u; µ, β, ξ) = {1 + ξ(u µ)/β} 1/ξ +, where H(y; µ, β, ξ) is the GEV distribution and σ = β + ξ(u µ) Inhomogeneous Poisson process for their sizes with intensity λ(t, w) = λ(w) = 1 σ ( 1 + ξ w u ) 1/ξ 1 σ Boulder, April 2016 18 / 25

Estimated intensity of exceeding the threshold from self-exciting POT model for NVDA with Ogata function Marks amplitude in percent 2 4 6 8 10 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) Intensity 0.1 0.3 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) mean of the GPD 0.8 1.0 1.2 1.4 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) Boulder, April 2016 19 / 25

Conditional risk measures The conditional approach models the intraday clustering of extremes and is used to calculate instantaneous conditional p =95% or 99% VaR over the next period z t+1 p = F 1 Z t+1 H (p), t F Zt+1 H (z) = P(Z t+1 > z H t ) which is t P(Z t+1 u > z Z t+1 > u, H t ) P(Z t+1 > u H t ). P(Z t+1 > u H t ) represents the conditional probability of an event in (t, t + 1) that is: ( t+1 ) 1 P{N(t, t + 1) = 0 H t } = 1 exp λ(u H u )du P(Z T +1 u > z Z t+1 > u, H t ) is estimated using the GPD model with parameters σ and ξ. t Boulder, April 2016 20 / 25

Conditional VaR and Expected Shortfall The VaR z t+1 p is then the solution of the equation P(Z T +1 > z t+1 p H t ) = 1 p, Using the self-exciting POT model the condition VaR is z t+1 p = u + σ + α 2ν(t+) ξ { ( ) 1 p ξ 1} τ + α 1 ν(t+) From the estimated conditional VaR it is straightforward to get an estimation of the conditional Expected Shortfall (ES) ES t+1 p = zt+1 p 1 ξ + σ ξu 1 ξ. with parameters replaced by their maximum likelihood estimates Boulder, April 2016 21 / 25

15-minutes 95%VaR (red) and 95%ES (blue) estimates for NVDA negative log-returns using self-exciting POT models and 95%VaR using classical POT (green) Negative log returns 0.10 0.05 0.00 0.05 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Times (15 min intervals) Boulder, April 2016 22 / 25

15-minutes 95%VaR and 95%ES estimates for NVDA negative log-returns using a competitor model (Chavez-Demoulin et al. (2014)) Negative log returns 0.10 0.05 0.00 0.05 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Time Boulder, April 2016 23 / 25

Backtesting results 0.95 Quantile Expected 25 POT 59 (0) ETAS ρ = 0 21 (0.24) ETAS ρ 0 22 (0.31) ETAS ρ = 0 24 (0.47) ETAS ρ 0 23 (0.39) exp 25 (0.55) NPOT 26 (0.39) 0.99 Quantile Expected 5 POT 14 (0) ETAS ρ = 0 6 (0.39) exp 6 (0.39) ETAS ρ = 0 6 (0.39) exp 7 (0.24) NPOT 9 (0.11) Boulder, April 2016 24 / 25

References A.G. Hawkes, 1971, Point spectra of some mutually exciting point processes, Biometrika. Y. Ogata, 1988, Statistical models for earthquake Occurrences and residuals analysis for point processes, JASA. E. Errais, K. Giesecke and L.R. Goldberg, 2010, Affine point processes and portfolio credit risk, SIAM J. Financial Math. Y. AIt-Sahalia, J. Cacho-Diaz, and R. J. A. Laeven, 2010, Modeling financial contagion using mutually exciting jump processes, The Review of Financial Studies. V. Chavez-Demoulin, A.C. Davison, A.J. McNeil, 2005, A point process approach to Value-at-Risk estimation, Quantitative Finance. V. Chavez-Demoulin, P. Embrechts, S.Sardy, 2014 Extreme-quantile tracking for financial time series Journal of Econometrics. Boulder, April 2016 25 / 25