Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage?

Similar documents
Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Eigenvector stability: Random Matrix Theory and Financial Applications

Empirical properties of large covariance matrices in finance

ARANDOM-MATRIX-THEORY-BASEDANALYSISOF STOCKS OF MARKETS FROM DIFFERENT COUNTRIES

Applications of Random Matrix Theory to Economics, Finance and Political Science

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 3 Feb 2004

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Mathematics for Economics and Finance

R = µ + Bf Arbitrage Pricing Model, APM

Lectures 6 7 : Marchenko-Pastur Law

Regression: Ordinary Least Squares

Factor Mimicking Portfolios

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

1 Intro to RMT (Gene)

Econophysics III: Financial Correlations and Portfolio Optimization

Signal and noise in financial correlation matrices

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Threading Rotational Dynamics, Crystal Optics, and Quantum Mechanics to Risky Asset Selection. A Physics & Pizza on Wednesdays presentation

Correlation Matrices and the Perron-Frobenius Theorem

Quantum Chaos and Nonunitary Dynamics

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 1 Aug 2001

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

arxiv: v2 [cond-mat.stat-mech] 16 Jul 2012

Vast Volatility Matrix Estimation for High Frequency Data

Prior Information: Shrinkage and Black-Litterman. Prof. Daniel P. Palomar

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Robustní monitorování stability v modelu CAPM

Products of Rectangular Gaussian Matrices

ASSET PRICING MODELS

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Cleaning correlation matrices, Random Matrix Theory & HCIZ integrals

Factor Models for Asset Returns. Prof. Daniel P. Palomar

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Universality of distribution functions in random matrix theory Arno Kuijlaars Katholieke Universiteit Leuven, Belgium

Intro VEC and BEKK Example Factor Models Cond Var and Cor Application Ref 4. MGARCH

Identifying Financial Risk Factors

8.1 Concentration inequality for Gaussian random matrix (cont d)

arxiv:physics/ v1 [physics.soc-ph] 15 May 2006

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations

arxiv:physics/ v3 [physics.data-an] 29 Nov 2006

Numerical Solutions to the General Marcenko Pastur Equation

Time Series Modeling of Financial Data. Prof. Daniel P. Palomar

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Signal and Noise in Correlation Matrix

Singular Value Decomposition and Principal Component Analysis (PCA) I

Concentration Inequalities for Random Matrices

Random Matrix: From Wigner to Quantum Chaos

Eigenvalue PDFs. Peter Forrester, M&S, University of Melbourne

Introduction to Machine Learning

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Department of Economics, UCSB UC Santa Barbara

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Errata for Campbell, Financial Decisions and Markets, 01/02/2019.

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

Multivariate Statistical Analysis

Regularization of Portfolio Allocation

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

The problem is to infer on the underlying probability distribution that gives rise to the data S.

Network Connectivity and Systematic Risk

Econ671 Factor Models: Principal Components

Financial Econometrics Return Predictability

Comparison Method in Random Matrix Theory

Financial Econometrics

Multivariate Time Series: VAR(p) Processes and Models

Financial Econometrics Short Course Lecture 3 Multifactor Pricing Model

STATE COUNCIL OF EDUCATIONAL RESEARCH AND TRAINING TNCF DRAFT SYLLABUS

Random Matrices and Multivariate Statistical Analysis

On the distinguishability of random quantum states

Robust Portfolio Risk Minimization Using the Graphical Lasso

Large sample covariance matrices and the T 2 statistic

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Peter S. Karlsson Jönköping University SE Jönköping Sweden

The Multivariate Gaussian Distribution

Financial Econometrics

If we want to analyze experimental or simulated data we might encounter the following tasks:

Unsupervised Learning: Dimensionality Reduction

Financial Econometrics Lecture 6: Testing the CAPM model

Multivariate Random Variable

Probabilities & Statistics Revision

MA Advanced Macroeconomics: Solving Models with Rational Expectations

ECON4515 Finance theory 1 Diderik Lund, 5 May Perold: The CAPM

The Matrix Dyson Equation in random matrix theory

Optimal Investment Strategies: A Constrained Optimization Approach

Operator norm convergence for sequence of matrices and application to QIT

Vector Space Models. wine_spectral.r

1 Tridiagonal matrices

Estimation of large dimensional sparse covariance matrices

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Non white sample covariance matrices.

The Hilbert Space of Random Variables

3.1. The probabilistic view of the principal component analysis.

Sparse PCA with applications in finance

Convergence of spectral measures and eigenvalue rigidity

1: PROBABILITY REVIEW

Statistical Machine Learning

Random Matrix Theory and its Applications to Econometrics

Transcription:

Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage? Stefan Thurner www.complex-systems.meduniwien.ac.at www.santafe.edu London July 28

Foundations of theory of financial economics CAPM, Markowitz portfolio optimization, etc. Key: correlation matrices of timeseries of financial instruments efficient market hypothesis know: price timeseries are not GBM, BM,... if it were so: would there be hedge funds? London July 28 1

Market inefficiency and random matrices eigenvalue spectra of empirical equal-time covariance matrices compared to predictions of EV densities for Gaussian-randomness obtained from random matrix theory (RMT). eigenvalues which strongly depart from RMT spectrum contain information about market sectors largest eigenvalue identified as the market-mode EV cleaning of the original correlation matrices results in improved mean variance efficient frontier RMT provides full understanding why the Markowitz approach is close to useless in actual portfolio management. Reason: dominance of small eigenvalues in the noise regime London July 28 2

Eigenvalue spectrum of market data plot from Bochaud/Potters London July 28 3

Eigenvalue cleaning + mean variance frontier plot from Bochaud/Potters London July 28 4

How can this be used? RMT systematic search for non-random structures in large data sets no time lag: eigenvalue cleaning: Markowitz portfolio arbitrage with no time lag? maybe not arbitrage from time-shifted covariances: maybe yes London July 28 5

Ensembles of random matrices random matrix ensemble of N N matrices M with iid random variables distribution P (M) exp ( βn ) 2 Tr(MMT ) β different for different ensembles, i.e. if variables are complex or real Eigenvalue spectra and correlations of eigenvalues are known for: Gaussian orthogonal ensemble (GOE): real valued entries, symmetric random matrices, N Ginibre ensembles (GinOE, GinUE, GinSE): not symmetric (real, complex, quaternion). Eigenvalue densities available even for finite size London July 28 6

Random matrices from time-series Input: covariance matrices from N T data matrices X N assets (or instruments) at T observation points Matrix ensemble for N N covariance matrix C XX T is the Wishart ensemble (cornerstone of multivariate data analysis) uncorrelated Gaussian distributed data: exact EV spectrum of XX T the Marcenko-Pastur law is spectrum of time-lagged covariance matrix C τ T t ri tr j t τ, unknown. (results exist for symmetrized lagged correlation matrices problematic) London July 28 7

There is structure in lead-lag relations analysis of asymmetric time-lagged correlations forms a fundamental part of finance and econometrics. From practical point: Arbitraging asymmetric lead-lag relationships reported for U.S. stock market (lo & kinlay) lagged correlation function typically exhibits asymmetric peak Why? information adjustment asymmetries: lead-lag effects mainly explained by information adjustment asymmetry (brennan) some recent understanding of relation of strength of lagged correlations and time-shift τ (kertesz) London July 28 8

Eigenvalues of time-lagged covariance matrices N T data matrix X: N assets, T observation times, time-lag τ entries: log-return time-series (zero-mean unit-variance) of asset i at time t Time-lagged correlation function C τ (T ) (r i t r i t )(r j t τ r j t τ ) T = 1 T XD τx T r i t = ln S i t ln S i t 1 and D τ δ t,t+τ For τ, C τ is not symmetric Denote eigenvalues of C τ by λ i C τ random variables with a certain distribution,through specific construction, not purely random real asymmetric N N matrix with iid Gaussian entries. Can not expect a flat eigenvalue distribution as in the Ginibre-Girko case. London July 28 9

!"#.7,-$!! '+(!"$ Re(!) Im(!).5 "x(!), "y(!) )*'!( (b).6!!!"$.4.3.2.1!!"#!!"#.5!!"$! %&'!(!"$ &*'!"#!.2 Q,1!.1.6 Re(!), Im(!).1 Re(!) (d) Im(!) "x(!), "y(!) ()&!'.5.2.4.3.2.1!.5!.5 $%&!'.5!.5 2 &e' Re(!), Im(!).15 *+1.5 Re(!) (f) Im(!) "x(!), "y(!) (m&!' 1.1.5!1!2!2 3!1 $e&!' '+( 1!2 2!1 Re(!)/Im(!) (h) Re(!) 2 Im(!).3 "x(!), "y(!) )*'!(!1 2.4 Q-./ 1 1.2.1!2!3!2 %&'!( 2!3!2!1 1 Re(!), Im(!) 2 3 London July 28 1

General arguments Idea: see distribution of EV in complex plane as a electrical charges. In present case EV density ρ(z) = ρ(x, y) is then given by Poisson equation ρ(x, y) = 1 φ(x, y) 4π with φ(x, y) = 1 N ln det ( (1z C T τ )(1z C τ ) ) c... c average over distribution of X, P (X) exp ( N 2 Tr(XXT ) ) Expand argument of determinant H = 1 z + C τ C T τ x(c τ + C T τ ) + iy(c τ C T τ ) any symmetric (anti-symmetric) contribution of Cτ ij real (imaginary) part of z only influences the London July 28 11

General arguments: consequences potential is a function of the radius r = x 2 + y 2 only, φ(x, y) = φ(r) in the limit N thus EV density is radial symmetric function ρ(x, y) = ρ(r) 1 2πr S dzρ(z) δ( z r) Checked validity to 4th order computation if ρ(r) is circular symmetric support S of the EV-spectrum is limited to a circle of r max, which can be computed. London July 28 12

Idea: determine ρ(r) based on its radial symmetry EV density of symmetric problem is obtained from well-known relation ρ S (x) = n δ(x x n ) = 1 π lim [ Im(G S (x iɛ)) ] ɛ Idea: use inverse Abel-transform to determine radial symmetric density ρ(r) ρ S ( 2x) = 2 x ρ(r)r r2 x 2dr Now reconstruct EV spectrumexactly for N via inverse Abel-transform and thus via the cuts of the Greens function of the symmetric problem ρ(r) = 1 π 2 r d dx lim [ ɛ Im(G S τ ( 2x iɛ)) ] dx x2 r 2 expect discrepancies in finite data due to finite-size effects London July 28 13

Application to lagged correlation matrices Greens function G(z) of the symmetrized problem C τ S = 1 2T X(D τ + D τ )X T is given by implicit solution of (burda) 1 Q 3z2 G 4 (z) 2 1 Q 2( 1 Q 1)zG3 (z) 1 Q (z2 ( 1 Q 1)2 )G 2 (z) + 2( 1 Q 1)zG(z) + 2 1 Q = Q T/N information-to-noise ratio remaining integral hard to solve in general but easy to solve numerically For Q = 1 exact solution is possible ρ Q=1 (r) = 1 6 π 5 r 3 " 243rΓ 3 ««5 5 Γ 4 4 Φ 1 2 1 4, 5 4, 3! 2, λ2 2 1 4Γ 1 «Γ 2 4 «!# 7 Φ 12 14 34 12 λ2,,, 4 2 London July 28 14

!(r).5.4.3.2 Im(")# Re(")# Im(")#Re(")!(r).3.25.2.15.1 Im(")# Re(")# Im(")#Re(")!(r).25.2.15.1 Im(")# Re(")# Im(")#Re(").6.4.2.5 1 1.5 r.1 (a).2.4.6.8.1.12 r.5 (b).1.2.3.4.5.6 r.5 (c).5 1 1.5 2 r London July 28 15

Summary theory used a well-known analogy to classical electrostatics to show radial symmetry of the potential for lagged correlation matrices introduced a novel method to calculate the exact radial eigenvalue-density via the inverse Abel-transform used existing results for symmetrized lagged correlation matrices as an input to our method to arrive at ρ Q (r) with the knowledge of the pure random spectrum: deviations indicate arbitragable structure in financial data London July 28 16

Application to markets financial data: 5 min data of the S&P5, Jan 2 22 Apr 2 24 empirical lagged correlation matrices C τ, τ = 5, 3 mins data cleaning X: N = 4 time-series at T = 4472 observations From X construct two surrogate data sets: Market mode removed data: X re : largest eigenvalue is market-mode. Market return (movement of the index) rt m = N j=1 v 1jrt, j at equal times, τ =. Regress as in CAPM, r t = α + βr t m + ɛ t and Xit res ɛ i t Randomly reshuffled data : X scr is a random permutation of all elements of X. This destroys correlation structures but keeps same distributions as in original data London July 28 17

Im(!) 1 (a)!1!1 1 2 3 4 5 Re(!).2 (b) Im(!) Im(!)!.2.2!.2.2.4.6.8 1 Re(!) (c) Im(! res )!.2.2!.2!.2.2.4.6.8 1 Re(!) (d)!1!.5.5 1 1.5 Re(! res ) London July 28 18

Eigenvalue spectra! x ("),! y (").15.1.5 2r#!(r).8.6.4.2.5 1 r theory Re(") Im(")!.4!.2.2.4.6.8 Re("),Im(") Eigenvalues lying outside the random regime can be confidently associated with specific non-random structures London July 28 19

Interpretation of deviating eigenvalues Deviations from the theoretical pure-random prediction indicate correlation structure in data across time Which assets participate in a given eigenvector associated to a deviating eigenvalue? define inverse participation ratio for the eigenvectors u i IPR( u i ) N u ik 4 k=1 u ik entries in i-th EV IPR shows to what extent each of N = 4 assets contribute to EV u i small IPR: assets contribute equally large IPR: a few assets dominate the eigenvector London July 28 2

Indication for group structure IPR(u ik ).2.15 (a) IPR(u ik ).1.1.2.4! i.5 1 2 3 4 5! i res IPR(u ik ).4.35.3.25.2.15.1 (b) res IPR(u ik ).1.1.2! res i.5.5 1 1.5 2! res i largest EV relatively small IPR IPRs for residuals X res larger for the deviating eigenvalues group structure in the lagged-correlations London July 28 21

Sector organization in time-lagged data RMT with τ = : know eigenvectors u i of large eigenvalues are associated with sector organization of markets label sectors with s and define sk = { 1 if stock k belongs to sector s otherwise To visualize influence of each sector s to a given eigenvector i I si 1 N s N k=1 sk u ik 2 N s number of stocks in sector s London July 28 22

Sector GICS No. of Stocks N s Energy 1 22 Materials 15 27 Industrials 2 44 Consumer Discretionary 25 63 Consumer Staples 3 35 Healthcare 35 4 Financials 4 71 Information Technology 45 63 Telecommunication 5 11 Utilities 55 24 Table 1: Global Industry Classification Standard (GICS) for the 1 main sectors in the S&P5 from www.standardandpoors.com London July 28 23

contribution of sector 6 x 1!3 5 4 3 2 1 original! 1 market removed contribution of sector.2 res! 1.15.1.5 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code.1! 2 =! 3 * 8 x 1!3! 2 res contribution of sector.8.6.4.2 contribution of sector 6 4 2 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code.2! 4 4 x 1!3! 3 res contribution of sector.15.1.5 contribution of sector 3 2 1 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code 8 x 1!3! 1.2! 4 res =!5 res contribution of sector 6 4 2 contribution of sector.15.1.5 London July 28 24 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code

Lead-lag networks London July 28 25

Some observations Structures in network-view are associated with deviating eigenvalues via decomposition of the lagged correlation matrix in its right eigenvectors, C λi = UΛ i U 1 Λ i = diag(λ i ) diagonal matrix with only one entry at the respective position associated with eigenvalue λ i For all eigenvalues we found no indication for the leading stocks being the ones with the highest market capitalization (as would be implied by work of lo and kinlay) Close correspondence between the C λi and different sectors visible in the network-representation of the data confirms the validity of analysis Large negative eigenvalues are associated with time-lagged anticorrelations between various sectors London July 28 26

Time dependence of largest eigenvalues 3 25 5 Q=1 Im(! 1 )= Im(! 1 )" 1 8 2 Q=1 Im(! res 1 )= Im(! res 1 )" abs(! 1 ) 2 15 1 5 1 n Q=1.25 abs(! res 1 ) 6 4 5 1 n Q=1.25 5 (a) 2 4 6 8 n 2 (b) 2 4 6 8 n C 1 (T i ) for consecutive, non-overlapping time periods T i show clear deviations from the predicted support down to Q 1.25. even though noise is drastically increased for low Q non-random structures present at very short time-scales similar for other ranks of EV London July 28 27

Predictive power? c(t 1,T 1+d ) 1.8.6.4 c res (T 1,T 1+d ) 1.5 Q=1.25 Q=1 2 4 6 8 d.2 2 4 6 8 d c(t n, T m ) = (Cij τ (T n ) C ij τ (T n ) ij )(C ij τ (T m ) C ij τ (T m ) ij ) ij σ Tn σ Tm correlation of matrix elements of lagged correlation matrices from nonoverlapping T n and T m. Average over all matrix-elements significance band 1/4 extremely significant, increase with Q London July 28 28

A strategy or two there are two ways: cleaning of the matrices in analogy to the method known for τ =. cleaned matrix at time T n allows for an strongly improved estimation of future lagged correlation matrices at times T m > T n fix timescales T n compute complex spectrum of data replace all EV within circle with a number rotate back: find significant correlations and identify temporarily stable ones use movement of leading instrument as trading signal at appropriate timescale use pseudo-sector information for non-standard stat arb strategy London July 28 29

Conclusion established random matrix theory to lagged cross-correlation matrices (Abel transformation) this opens maybe the most straight forward way for systematic quantitative search for lead-lag structures in a noisy world possibility for predictability of significant parts of lagged-correlation matrices based on measurements of past (non-overlapping) periods identified correlations can be used in a multitude of strategies as a by-product one gets a pseudo-sectorization if the trading universe (clustering in the lead-lag network of residuals) discussed issues of Q dependence (information to noise ratio) London July 28 3