High-frequency data modelling using Hawkes processes

Similar documents
High-frequency data modelling using Hawkes processes

Generalized additive modelling of hydrological sample extremes

Introduction to Algorithmic Trading Strategies Lecture 10

Applied Probability Trust (7 March 2011)

Extreme Value Analysis and Spatial Extremes

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

Financial Econometrics and Volatility Models Extreme Value Theory

Shape of the return probability density function and extreme value statistics

Contagious default: application of methods of Statistical Mechanics in Finance

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Modeling Ultra-High-Frequency Multivariate Financial Data by Monte Carlo Simulation Methods

Time Series Models for Measuring Market Risk

Symmetric btw positive & negative prior returns. where c is referred to as risk premium, which is expected to be positive.

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Sharp statistical tools Statistics for extremes

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Temporal Point Processes the Conditional Intensity Function

MULTIDIMENSIONAL COVARIATE EFFECTS IN SPATIAL AND JOINT EXTREMES

STATISTICAL METHODS FOR RELATING TEMPERATURE EXTREMES TO LARGE-SCALE METEOROLOGICAL PATTERNS. Rick Katz

Efficient Estimation of Distributional Tail Shape and the Extremal Index with Applications to Risk Management

A Dynamic Contagion Process with Applications to Finance & Insurance

Specification testing in Hawkes models

Temporal point processes: the conditional intensity function

Peaks-Over-Threshold Modelling of Environmental Data

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

RISK AND EXTREMES: ASSESSING THE PROBABILITIES OF VERY RARE EVENTS

Interpreting Financial Market Crashes as Earthquakes: A New early Warning System for Medium Term Crashes

Statistics for extreme & sparse data

A marked point process model for intraday financial returns: Modelling extreme risk

Dependence and VaR Estimation:An Empirical Study of Chinese Stock Markets using Copula. Baoliang Li WISE, XMU Sep. 2009

Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays

Extreme Value Theory and Applications

Probabilities & Statistics Revision

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures

Bayesian Inference for Clustered Extremes

Elicitability and backtesting

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

On Backtesting Risk Measurement Models

SFB 823. A simple and focused backtest of value at risk. Discussion Paper. Walter Krämer, Dominik Wied

Threshold estimation in marginal modelling of spatially-dependent non-stationary extremes

Assessing Dependence in Extreme Values

Bayesian Modelling of Extreme Rainfall Data

arxiv: v1 [q-fin.st] 18 Sep 2017

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Generalized quantiles as risk measures

Volatility. Gerald P. Dwyer. February Clemson University

Risk Aggregation and Model Uncertainty

RISK ANALYSIS AND EXTREMES

Extreme Value Theory.

FORECAST VERIFICATION OF EXTREMES: USE OF EXTREME VALUE THEORY

SDS-5. November 2017

Non-Life Insurance: Mathematics and Statistics

Quantitative Finance II Lecture 10

Solutions of the Financial Risk Management Examination

6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series. MA6622, Ernesto Mordecki, CityU, HK, 2006.

Generalized Autoregressive Score Models

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Vast Volatility Matrix Estimation for High Frequency Data

New stylized facts in financial markets: The Omori law and price impact of a single transaction in financial markets

Week 1 Quantitative Analysis of Financial Markets Distributions A

Weak convergence to the t-distribution

Analysis methods of heavy-tailed data

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Extreme value theory and high quantile convergence

Using statistical methods to analyse environmental extremes.

Are financial markets becoming systemically more unstable?

Market Risk. MFM Practitioner Module: Quantitiative Risk Management. John Dodson. February 8, Market Risk. John Dodson.

Switching Regime Estimation

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

Multivariate Asset Return Prediction with Mixture Models

Semi-parametric estimation of non-stationary Pickands functions

A simple graphical method to explore tail-dependence in stock-return pairs

PROBABILITY DISTRIBUTIONS

Poisson Processes. Particles arriving over time at a particle detector. Several ways to describe most common model.

Order book modeling and market making under uncertainty.

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Exploring non linearities in Hedge Funds

On the L p -quantiles and the Student t distribution

Models and estimation.

Rearrangement Algorithm and Maximum Entropy

SIMULATING POINT PROCESSES BY INTENSITY PROJECTION. Kay Giesecke Hossein Kakavand Mohammad Mousavi

Generalized Autoregressive Conditional Intensity Models with Long Range Dependence

Asymptotic behaviour of multivariate default probabilities and default correlations under stress

Modeling Operational Risk Depending on Covariates. An Empirical Investigation.

Modelling Trades-Through in a Limit Order Book Using Hawkes Processes

Extreme Values and Their Applications in Finance

An Academic Response to Basel 3.5

Asymptotic distribution of the sample average value-at-risk

A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series

Quantitative Modeling of Operational Risk: Between g-and-h and EVT

Portfolio Allocation using High Frequency Data. Jianqing Fan

Intro VEC and BEKK Example Factor Models Cond Var and Cor Application Ref 4. MGARCH

Transcription:

High-frequency data modelling using Hawkes processes Valérie Chavez-Demoulin 1 joint work J.A McGill 1 Faculty of Business and Economics, University of Lausanne, Switzerland Boulder, April 2016 Boulder, April 2016 1 / 25

High frequency financial data The availability of high frequency data on transactions, quotes and order flow has revolutionized data processing and has led to statistical modeling challenges At the same time, the frequency of submission of orders has increased and the time to execution of market orders has dropped from more than 25 milliseconds in 2000 to less than a millisecond in 2014. Average number Number of price of orders in 10s changes (June 26, 2008) Citigroup 4469 12499 General Electric 2356 7862 General Motors 1275 9016 Table: Taken from R.Cont, 2011, http://ssrn.com/abstract=1748022 Boulder, April 2016 2 / 25

Trade database structure SYMBOL DATE TIME PRICE SIZE G127 CORR COND NVDA 20060901 9:30:04 28.67 264600 0 0 - NVDA 20060901 9:30:04 28.67 264509 0 2 - NVDA 20060901 9:30:05 28.65 100 0 0 - NVDA 20060901 9:30:05 28.65 200 0 0 -........ Price 36.5 37.0 37.5 38.0 38.5 09:30:03 10:15:19 11:35:30 13:54:00 15:02:55 15:45:54 Presence of erroneous trades: moving average filtering Irregularly spaced in time: linear interpolation Boulder, April 2016 3 / 25

NVDA negative log-returns from 2006 to 2008 (15-minutes linearly interpolated) Negative log returns 0.1 0.0 0.1 0.2 0.3 2006 01 03 2007 01 03 2008 01 02 2008 12 31 Time(15min intervals) NVDA, July to December 2008 negative returns exceedances : Negative log returns 0.02 0.04 0.06 0.08 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Time(15min intervals) Boulder, April 2016 4 / 25

Extreme Value Theory: Classical Peaks-Over-Threshold (iid case) 20 40 60 80 x 0 20 40 60 80 100 Time The number of exceedances over u has a Poisson distribution with mean λ Conditional on n exceedances, their sizes W j = Z j u are a random sample of size n from the generalized Pareto distribution (GPD) G ξ,σ (w) = { 1 (1 + ξw/σ) 1/ξ +, ξ 0, 1 exp( w/σ), ξ = 0. Boulder, April 2016 5 / 25

NVDA log-returns from 2006 to 2008 (15-minutes interpolated) time differences between exceedances over high threshold Difference in time 0 100 200 300 0 20 40 60 80 100 120 Exponential does not seem exponentially distributed higher than expected frequencies for short distance between extremes Boulder, April 2016 6 / 25

High-frequency financial data features Returns not iid Absolute returns highly correlated Volatility appears to change randomly with time Returns are leptokurtic or heavy tailed Extremes appear in clusters Boulder, April 2016 7 / 25

Declustering methods exist: An example for daily returns Original data Returns in % 0.04 0.08 0.12 1975 1980 1985 1990 1995 Declustered data Returns in % 0.04 0.08 0.12 1975 1980 1985 1990 1995 Boulder, April 2016 8 / 25

Idea the classical Peaks-Over-Thresholds (POT) model of EVT assumes an homogeneous Poisson process the inter-event times would be independent exponential random variables the Figures above contradict the classical model Aim: Treats the negative log-returns above a high threshold u as a realization of a marked point process: Using a first model that combines a self-exciting process for the threshold exceedance times with Generalized Pareto model for the threshold excess sizes (mark sizes) Using a second model that considers a self-exciting POT model Boulder, April 2016 9 / 25

Overall loglikelihood (POT method) Since number and mark size for the threshold exceedances are assumed independent n l(λ, σ, ξ) = log P(N u = n) g ξ,σ (w j ) j=1. = n log λ λ n log σ (1 + 1/ξ) n log(1 + ξw j /σ) + so inference can be performed separately for the frequency of exceedances and their sizes. j=1 Boulder, April 2016 10 / 25

Marked Point Processes (1) consider an event process (T i, W i ), i Z observed over the period (0; t 0 ] gives data (t 1, w 1 ),..., (t n, w n ) let H t, entire history of process up to time t the joint density of the data seen over (0, t 0 ] can be written n f Ti,W i H ti 1 (t i, w i H ti 1 )P(T n+1 > t 0 H tn ) i=1 Boulder, April 2016 11 / 25

Marked Point Processes (2) Assuming independence of times T i and marks W i the conditional density function is f Ti,W i H ti 1 (t i, w i H ti 1 ) = f Ti H ti 1 (t i H ti 1 ) f Wi H ti 1 (w i H ti 1 ) leading to the log-likelihood l = { n log f Ti H i 1 (t i ) + log P(T n+1 > t 0 H tn ) i=1 { n + log f Wi H ti 1 (w i H ti 1 ) i=1 } B } A Boulder, April 2016 12 / 25

Hawkes process Combining separately a Hawkes process for the counting process of the times (part A) and a GPD model for the contributions from the marks (part B), the likelihood becomes { n ( t0 ) } l = log λ(t i H ti ) exp λ(u H u )du i=1 0 n n log σ (1 + 1/ξ) log(1 + ξw i /σ) i=1 with λ(t H t ) being the conditional intensity Boulder, April 2016 13 / 25

Self-exciting process We chose the following general form with a constant loading µ 0 λ(t H t ) = µ 0 + ω(t t j ; w j ), j:t j <t where µ 0 > 0 and the self-exciting function ω(s) has to be positive when s 0 and 0 elsewhere. For example, the conditional intensity model proposed by Ogata (1988) is λ(t H t ) = µ 0 + ψ where µ 0, ψ, δ, γ, ρ > 0. j:t j <t e δw j (t t j + γ) 1+ρ ETAS Boulder, April 2016 14 / 25

Applications to high-frequency data: 15-minutes linearly interpolated NVDA 2008 negative log-returns Negative log returns 0.10 0.05 2008 01 02 2008 04 03 2008 07 02 2008 10 01 2008 12 31 Time(15min intervals) Estimated intensity 0.00 0.06 0.12 2008 01 02 2008 04 03 2008 07 02 2008 10 01 2008 12 31 Boulder, April 2016 15 / 25

Estimated cumulative events number ˆΛ Ht (t) = t 0 ˆλ(u H u )du and two-sided 95% and 99% confidence limits based on the Kolmogorov-Smirnov statistic Cumulative number of extreme losses 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Transformed time Boulder, April 2016 16 / 25

Alternative models (1) Hawkes process with exponential decay λ(t H t ) = µ 0 + ψ exp{δw j γ(t t j )}. j:t j <t First order Markov chain for the marks W i W i 1 = w i GPD(ξ, σ i = exp(a + bw i 1 )) whose parameters a, b and ξ can be estimated by maximising n f Wi W i 1 (w i w i 1 )f w1 (w 1 ) i=2 Diagnostic for the marginal distribution based on the residuals R j = ˆξ 1 log (1 + ˆξw j /ˆσ j ), j = 1,..., n approximately independent unit exponential variables Boulder, April 2016 17 / 25

Alternative models (2): A self-exciting POT model A model with predictable marks λ (t, w) = τ + α ( 1ν(t) w u 1 + ξ σ + α 2 ν(t) σ + α 2 ν(t) ) 1/ξ 1 Where ν(t) = j:t j <t ω(t t j ) uses a self-exciting function ω. This uses the two-dimensional re-parametrisation of the POT: Homogeneous Poisson process for the exceedances of the level u with intensity τ = ln H(u; µ, β, ξ) = {1 + ξ(u µ)/β} 1/ξ +, where H(y; µ, β, ξ) is the GEV distribution and σ = β + ξ(u µ) Inhomogeneous Poisson process for their sizes with intensity λ(t, w) = λ(w) = 1 σ ( 1 + ξ w u ) 1/ξ 1 σ Boulder, April 2016 18 / 25

Estimated intensity of exceeding the threshold from self-exciting POT model for NVDA with Ogata function Marks amplitude in percent 2 4 6 8 10 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) Intensity 0.1 0.3 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) mean of the GPD 0.8 1.0 1.2 1.4 0 1000 2000 3000 4000 5000 6000 7000 Time (15 min intervals) Boulder, April 2016 19 / 25

Conditional risk measures The conditional approach models the intraday clustering of extremes and is used to calculate instantaneous conditional p =95% or 99% VaR over the next period z t+1 p = F 1 Z t+1 H (p), t F Zt+1 H (z) = P(Z t+1 > z H t ) which is t P(Z t+1 u > z Z t+1 > u, H t ) P(Z t+1 > u H t ). P(Z t+1 > u H t ) represents the conditional probability of an event in (t, t + 1) that is: ( t+1 ) 1 P{N(t, t + 1) = 0 H t } = 1 exp λ(u H u )du P(Z T +1 u > z Z t+1 > u, H t ) is estimated using the GPD model with parameters σ and ξ. t Boulder, April 2016 20 / 25

Conditional VaR and Expected Shortfall The VaR z t+1 p is then the solution of the equation P(Z T +1 > z t+1 p H t ) = 1 p, Using the self-exciting POT model the condition VaR is z t+1 p = u + σ + α 2ν(t+) ξ { ( ) 1 p ξ 1} τ + α 1 ν(t+) From the estimated conditional VaR it is straightforward to get an estimation of the conditional Expected Shortfall (ES) ES t+1 p = zt+1 p 1 ξ + σ ξu 1 ξ. with parameters replaced by their maximum likelihood estimates Boulder, April 2016 21 / 25

15-minutes 95%VaR (red) and 95%ES (blue) estimates for NVDA negative log-returns using self-exciting POT models and 95%VaR using classical POT (green) Negative log returns 0.10 0.05 0.00 0.05 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Times (15 min intervals) Boulder, April 2016 22 / 25

15-minutes 95%VaR and 95%ES estimates for NVDA negative log-returns using a competitor model (Chavez-Demoulin et al. (2014)) Negative log returns 0.10 0.05 0.00 0.05 2008 07 24 2008 09 03 2008 10 13 2008 11 20 2008 12 31 Time Boulder, April 2016 23 / 25

Backtesting results 0.95 Quantile Expected 25 POT 59 (0) ETAS ρ = 0 21 (0.24) ETAS ρ 0 22 (0.31) ETAS ρ = 0 24 (0.47) ETAS ρ 0 23 (0.39) exp 25 (0.55) NPOT 26 (0.39) 0.99 Quantile Expected 5 POT 14 (0) ETAS ρ = 0 6 (0.39) exp 6 (0.39) ETAS ρ = 0 6 (0.39) exp 7 (0.24) NPOT 9 (0.11) Boulder, April 2016 24 / 25

References A.G. Hawkes, 1971, Point spectra of some mutually exciting point processes, Biometrika. Y. Ogata, 1988, Statistical models for earthquake Occurrences and residuals analysis for point processes, JASA. E. Errais, K. Giesecke and L.R. Goldberg, 2010, Affine point processes and portfolio credit risk, SIAM J. Financial Math. Y. AIt-Sahalia, J. Cacho-Diaz, and R. J. A. Laeven, 2010, Modeling financial contagion using mutually exciting jump processes, The Review of Financial Studies. V. Chavez-Demoulin, A.C. Davison, A.J. McNeil, 2005, A point process approach to Value-at-Risk estimation, Quantitative Finance. V. Chavez-Demoulin, P. Embrechts, S.Sardy, 2014 Extreme-quantile tracking for financial time series Journal of Econometrics. Boulder, April 2016 25 / 25