Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage? Stefan Thurner www.complex-systems.meduniwien.ac.at www.santafe.edu London July 28
Foundations of theory of financial economics CAPM, Markowitz portfolio optimization, etc. Key: correlation matrices of timeseries of financial instruments efficient market hypothesis know: price timeseries are not GBM, BM,... if it were so: would there be hedge funds? London July 28 1
Market inefficiency and random matrices eigenvalue spectra of empirical equal-time covariance matrices compared to predictions of EV densities for Gaussian-randomness obtained from random matrix theory (RMT). eigenvalues which strongly depart from RMT spectrum contain information about market sectors largest eigenvalue identified as the market-mode EV cleaning of the original correlation matrices results in improved mean variance efficient frontier RMT provides full understanding why the Markowitz approach is close to useless in actual portfolio management. Reason: dominance of small eigenvalues in the noise regime London July 28 2
Eigenvalue spectrum of market data plot from Bochaud/Potters London July 28 3
Eigenvalue cleaning + mean variance frontier plot from Bochaud/Potters London July 28 4
How can this be used? RMT systematic search for non-random structures in large data sets no time lag: eigenvalue cleaning: Markowitz portfolio arbitrage with no time lag? maybe not arbitrage from time-shifted covariances: maybe yes London July 28 5
Ensembles of random matrices random matrix ensemble of N N matrices M with iid random variables distribution P (M) exp ( βn ) 2 Tr(MMT ) β different for different ensembles, i.e. if variables are complex or real Eigenvalue spectra and correlations of eigenvalues are known for: Gaussian orthogonal ensemble (GOE): real valued entries, symmetric random matrices, N Ginibre ensembles (GinOE, GinUE, GinSE): not symmetric (real, complex, quaternion). Eigenvalue densities available even for finite size London July 28 6
Random matrices from time-series Input: covariance matrices from N T data matrices X N assets (or instruments) at T observation points Matrix ensemble for N N covariance matrix C XX T is the Wishart ensemble (cornerstone of multivariate data analysis) uncorrelated Gaussian distributed data: exact EV spectrum of XX T the Marcenko-Pastur law is spectrum of time-lagged covariance matrix C τ T t ri tr j t τ, unknown. (results exist for symmetrized lagged correlation matrices problematic) London July 28 7
There is structure in lead-lag relations analysis of asymmetric time-lagged correlations forms a fundamental part of finance and econometrics. From practical point: Arbitraging asymmetric lead-lag relationships reported for U.S. stock market (lo & kinlay) lagged correlation function typically exhibits asymmetric peak Why? information adjustment asymmetries: lead-lag effects mainly explained by information adjustment asymmetry (brennan) some recent understanding of relation of strength of lagged correlations and time-shift τ (kertesz) London July 28 8
Eigenvalues of time-lagged covariance matrices N T data matrix X: N assets, T observation times, time-lag τ entries: log-return time-series (zero-mean unit-variance) of asset i at time t Time-lagged correlation function C τ (T ) (r i t r i t )(r j t τ r j t τ ) T = 1 T XD τx T r i t = ln S i t ln S i t 1 and D τ δ t,t+τ For τ, C τ is not symmetric Denote eigenvalues of C τ by λ i C τ random variables with a certain distribution,through specific construction, not purely random real asymmetric N N matrix with iid Gaussian entries. Can not expect a flat eigenvalue distribution as in the Ginibre-Girko case. London July 28 9
!"#.7,-$!! '+(!"$ Re(!) Im(!).5 "x(!), "y(!) )*'!( (b).6!!!"$.4.3.2.1!!"#!!"#.5!!"$! %&'!(!"$ &*'!"#!.2 Q,1!.1.6 Re(!), Im(!).1 Re(!) (d) Im(!) "x(!), "y(!) ()&!'.5.2.4.3.2.1!.5!.5 $%&!'.5!.5 2 &e' Re(!), Im(!).15 *+1.5 Re(!) (f) Im(!) "x(!), "y(!) (m&!' 1.1.5!1!2!2 3!1 $e&!' '+( 1!2 2!1 Re(!)/Im(!) (h) Re(!) 2 Im(!).3 "x(!), "y(!) )*'!(!1 2.4 Q-./ 1 1.2.1!2!3!2 %&'!( 2!3!2!1 1 Re(!), Im(!) 2 3 London July 28 1
General arguments Idea: see distribution of EV in complex plane as a electrical charges. In present case EV density ρ(z) = ρ(x, y) is then given by Poisson equation ρ(x, y) = 1 φ(x, y) 4π with φ(x, y) = 1 N ln det ( (1z C T τ )(1z C τ ) ) c... c average over distribution of X, P (X) exp ( N 2 Tr(XXT ) ) Expand argument of determinant H = 1 z + C τ C T τ x(c τ + C T τ ) + iy(c τ C T τ ) any symmetric (anti-symmetric) contribution of Cτ ij real (imaginary) part of z only influences the London July 28 11
General arguments: consequences potential is a function of the radius r = x 2 + y 2 only, φ(x, y) = φ(r) in the limit N thus EV density is radial symmetric function ρ(x, y) = ρ(r) 1 2πr S dzρ(z) δ( z r) Checked validity to 4th order computation if ρ(r) is circular symmetric support S of the EV-spectrum is limited to a circle of r max, which can be computed. London July 28 12
Idea: determine ρ(r) based on its radial symmetry EV density of symmetric problem is obtained from well-known relation ρ S (x) = n δ(x x n ) = 1 π lim [ Im(G S (x iɛ)) ] ɛ Idea: use inverse Abel-transform to determine radial symmetric density ρ(r) ρ S ( 2x) = 2 x ρ(r)r r2 x 2dr Now reconstruct EV spectrumexactly for N via inverse Abel-transform and thus via the cuts of the Greens function of the symmetric problem ρ(r) = 1 π 2 r d dx lim [ ɛ Im(G S τ ( 2x iɛ)) ] dx x2 r 2 expect discrepancies in finite data due to finite-size effects London July 28 13
Application to lagged correlation matrices Greens function G(z) of the symmetrized problem C τ S = 1 2T X(D τ + D τ )X T is given by implicit solution of (burda) 1 Q 3z2 G 4 (z) 2 1 Q 2( 1 Q 1)zG3 (z) 1 Q (z2 ( 1 Q 1)2 )G 2 (z) + 2( 1 Q 1)zG(z) + 2 1 Q = Q T/N information-to-noise ratio remaining integral hard to solve in general but easy to solve numerically For Q = 1 exact solution is possible ρ Q=1 (r) = 1 6 π 5 r 3 " 243rΓ 3 ««5 5 Γ 4 4 Φ 1 2 1 4, 5 4, 3! 2, λ2 2 1 4Γ 1 «Γ 2 4 «!# 7 Φ 12 14 34 12 λ2,,, 4 2 London July 28 14
!(r).5.4.3.2 Im(")# Re(")# Im(")#Re(")!(r).3.25.2.15.1 Im(")# Re(")# Im(")#Re(")!(r).25.2.15.1 Im(")# Re(")# Im(")#Re(").6.4.2.5 1 1.5 r.1 (a).2.4.6.8.1.12 r.5 (b).1.2.3.4.5.6 r.5 (c).5 1 1.5 2 r London July 28 15
Summary theory used a well-known analogy to classical electrostatics to show radial symmetry of the potential for lagged correlation matrices introduced a novel method to calculate the exact radial eigenvalue-density via the inverse Abel-transform used existing results for symmetrized lagged correlation matrices as an input to our method to arrive at ρ Q (r) with the knowledge of the pure random spectrum: deviations indicate arbitragable structure in financial data London July 28 16
Application to markets financial data: 5 min data of the S&P5, Jan 2 22 Apr 2 24 empirical lagged correlation matrices C τ, τ = 5, 3 mins data cleaning X: N = 4 time-series at T = 4472 observations From X construct two surrogate data sets: Market mode removed data: X re : largest eigenvalue is market-mode. Market return (movement of the index) rt m = N j=1 v 1jrt, j at equal times, τ =. Regress as in CAPM, r t = α + βr t m + ɛ t and Xit res ɛ i t Randomly reshuffled data : X scr is a random permutation of all elements of X. This destroys correlation structures but keeps same distributions as in original data London July 28 17
Im(!) 1 (a)!1!1 1 2 3 4 5 Re(!).2 (b) Im(!) Im(!)!.2.2!.2.2.4.6.8 1 Re(!) (c) Im(! res )!.2.2!.2!.2.2.4.6.8 1 Re(!) (d)!1!.5.5 1 1.5 Re(! res ) London July 28 18
Eigenvalue spectra! x ("),! y (").15.1.5 2r#!(r).8.6.4.2.5 1 r theory Re(") Im(")!.4!.2.2.4.6.8 Re("),Im(") Eigenvalues lying outside the random regime can be confidently associated with specific non-random structures London July 28 19
Interpretation of deviating eigenvalues Deviations from the theoretical pure-random prediction indicate correlation structure in data across time Which assets participate in a given eigenvector associated to a deviating eigenvalue? define inverse participation ratio for the eigenvectors u i IPR( u i ) N u ik 4 k=1 u ik entries in i-th EV IPR shows to what extent each of N = 4 assets contribute to EV u i small IPR: assets contribute equally large IPR: a few assets dominate the eigenvector London July 28 2
Indication for group structure IPR(u ik ).2.15 (a) IPR(u ik ).1.1.2.4! i.5 1 2 3 4 5! i res IPR(u ik ).4.35.3.25.2.15.1 (b) res IPR(u ik ).1.1.2! res i.5.5 1 1.5 2! res i largest EV relatively small IPR IPRs for residuals X res larger for the deviating eigenvalues group structure in the lagged-correlations London July 28 21
Sector organization in time-lagged data RMT with τ = : know eigenvectors u i of large eigenvalues are associated with sector organization of markets label sectors with s and define sk = { 1 if stock k belongs to sector s otherwise To visualize influence of each sector s to a given eigenvector i I si 1 N s N k=1 sk u ik 2 N s number of stocks in sector s London July 28 22
Sector GICS No. of Stocks N s Energy 1 22 Materials 15 27 Industrials 2 44 Consumer Discretionary 25 63 Consumer Staples 3 35 Healthcare 35 4 Financials 4 71 Information Technology 45 63 Telecommunication 5 11 Utilities 55 24 Table 1: Global Industry Classification Standard (GICS) for the 1 main sectors in the S&P5 from www.standardandpoors.com London July 28 23
contribution of sector 6 x 1!3 5 4 3 2 1 original! 1 market removed contribution of sector.2 res! 1.15.1.5 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code.1! 2 =! 3 * 8 x 1!3! 2 res contribution of sector.8.6.4.2 contribution of sector 6 4 2 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code.2! 4 4 x 1!3! 3 res contribution of sector.15.1.5 contribution of sector 3 2 1 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code 8 x 1!3! 1.2! 4 res =!5 res contribution of sector 6 4 2 contribution of sector.15.1.5 London July 28 24 1 15 2 25 3 35 4 45 5 55 GICS code 1 15 2 25 3 35 4 45 5 55 GICS code
Lead-lag networks London July 28 25
Some observations Structures in network-view are associated with deviating eigenvalues via decomposition of the lagged correlation matrix in its right eigenvectors, C λi = UΛ i U 1 Λ i = diag(λ i ) diagonal matrix with only one entry at the respective position associated with eigenvalue λ i For all eigenvalues we found no indication for the leading stocks being the ones with the highest market capitalization (as would be implied by work of lo and kinlay) Close correspondence between the C λi and different sectors visible in the network-representation of the data confirms the validity of analysis Large negative eigenvalues are associated with time-lagged anticorrelations between various sectors London July 28 26
Time dependence of largest eigenvalues 3 25 5 Q=1 Im(! 1 )= Im(! 1 )" 1 8 2 Q=1 Im(! res 1 )= Im(! res 1 )" abs(! 1 ) 2 15 1 5 1 n Q=1.25 abs(! res 1 ) 6 4 5 1 n Q=1.25 5 (a) 2 4 6 8 n 2 (b) 2 4 6 8 n C 1 (T i ) for consecutive, non-overlapping time periods T i show clear deviations from the predicted support down to Q 1.25. even though noise is drastically increased for low Q non-random structures present at very short time-scales similar for other ranks of EV London July 28 27
Predictive power? c(t 1,T 1+d ) 1.8.6.4 c res (T 1,T 1+d ) 1.5 Q=1.25 Q=1 2 4 6 8 d.2 2 4 6 8 d c(t n, T m ) = (Cij τ (T n ) C ij τ (T n ) ij )(C ij τ (T m ) C ij τ (T m ) ij ) ij σ Tn σ Tm correlation of matrix elements of lagged correlation matrices from nonoverlapping T n and T m. Average over all matrix-elements significance band 1/4 extremely significant, increase with Q London July 28 28
A strategy or two there are two ways: cleaning of the matrices in analogy to the method known for τ =. cleaned matrix at time T n allows for an strongly improved estimation of future lagged correlation matrices at times T m > T n fix timescales T n compute complex spectrum of data replace all EV within circle with a number rotate back: find significant correlations and identify temporarily stable ones use movement of leading instrument as trading signal at appropriate timescale use pseudo-sector information for non-standard stat arb strategy London July 28 29
Conclusion established random matrix theory to lagged cross-correlation matrices (Abel transformation) this opens maybe the most straight forward way for systematic quantitative search for lead-lag structures in a noisy world possibility for predictability of significant parts of lagged-correlation matrices based on measurements of past (non-overlapping) periods identified correlations can be used in a multitude of strategies as a by-product one gets a pseudo-sectorization if the trading universe (clustering in the lead-lag network of residuals) discussed issues of Q dependence (information to noise ratio) London July 28 3