Point Process Models for Multivariate High-Frequency Irregularly

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Point Process Models for Multivariate High-Frequency Irregularly"

Transcription

1 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data Stephe Crowley Abstract. Defiitios from the theory of poit processes are recalled. Models of itesity fuctio parametrizatio ad maximum likelihood estimatio from data are explored. Closed-form log-likelihood expressios are give for the Hawkes uivariate ad multivariate)process, Autoregressive Coditioal DuratioACD), with both expoetial ad Weibull distributed errors, ad a hybrid model combiig the ACD ad the Hawkes models. Diural, or daily, adjustmet of the determiistic predictable part of the itesity variatio via piecewise polyomial splies is discussed. Data from the symbol SY o three differet electroic markets is used to estimate model parameters ad geerate illustrative plots. The parameters were estimated without diural adjustmets, a repeat of the aalysis with adjustmets is due i a future versio of this article. The coectio of the Hawkes process to quatum theory is briefly metioed. The Hawkess process with a Weibull kerel is also briefly metioed ad will be explored more i the future. Table of cotets 1. Defitios oit rocesses ad Itesities Stochastic Itegrals The Expoetial Autoregressive Coditioal DuratioEACD) Model The Weibull-ACD Model The Hawkes rocess Liear Self-Excitig rocesses The Hawkes1) Model Maximum Likelihood Estimatio The Hawkes rocess i Quatum Theory The Hawkes rocess Havig a Weibull Kerel Combiig the ACD ad Hawkes Models Multivariate Hawkes Models The Compesator Log-Likelihood Numerical Methods The Nelder-Mead Algorithm Startig oits for Optimizig the Hawkes rocess of Order Examples Millisecod Resolutio Trade Sequeces Adjustig for the Determiistic Daily Itesity Variatio Uivarate Hawkes model fit to SY SDR S& 5 ETF Trust) Multivariate SY Data for Multivariate SY Data for Bibliography oit rocesses ad Itesities. 1. Defitios 1

2 2 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data Cosider a K dimesioal multivariate poit process. Let N k t deote the coutig process associated with the k-th poit process which is simply the umber of evets which have occured by time t. Let F t deote the filtratio of the pooled process N t of K poit processes cosistig of the set t k <t k 1 <t k 2 < <t k i < deotig the history of arrival times of each evet type associated with the k=1 K poit processes. At time t, the most recet arrival time will be deoted t k k Nt. A process is said to be simple if o poits occur at the same time, thas, there are o zero-legth duratios. The coutig process ca be represeted as a sum of Heaviside step fuctios θt)= { t< 1 t N t k = ti k t θt k ) 1) The coditioal itesity fuctio gives the coditioal probability per uit time that a evet of type k occurs i the exstat. For small values of t we have λ k t F t )= lim t k rn t+ t N k t > F t ) t 2) so that λ k k t F t ) t=en t+ t N k t F t )+o t) 3) k EN t+ t N k t ) λ k t F t ) t)=o t) 4) ad 4) will be ucorrelated with the past of F t as t. Next cosider lim t s 1 s ) t k Ns+j t = lim N s k N s1 t =N k s N k s1 ) k ) s ) k N s +j 1) t λ k s +j t F t ) t s 1 s ) t s 1 λ k t F t )dt λ k j t F t ) t 5) which will be ucorrelated with F s, thas s 1 E λ k t F t )dt) =N k k s N s1 6) s The itegrated itesity fuctio is kow as the compesator, or more precisely, the F t -compesator ad will be deoted by Λ k s 1 s,s 1 )= λ k t F t )dt 7) s Let x k = t k k i 1 deote the time iterval, or duratio, betwee the i-th ad i 1)-th arrival times. The F t -coditioal survivor fuctio for the k-th process is give by Let S k x i k )= k k >x i k F ti 1+τ) 8) Ẽ k i = λ k t F t )dt=λ k 1, ) 1 the provided the survivor fuctio is absolutely cotiuous with respect to Lebesgue measurewhich is a assumptio that eeds to be verified, usually by graphical tests) we have S k x k i )=e t λ k t F t )dt k i 1 =e Ẽ i 9)

3 ,, 3 ad Ẽ Nt) is a i.i.d. expoetial radom variable with uit mea ad variace. Sice E Ẽ Nt) ) =1 the radom variable k E Nt) =1 Ẽ Nt) 1) has zero mea ad uit variace. ositive values of E Nt) idicate that the path of coditioal itesity fuctio λ k t F t ) uder-predicted the umber of evets i the time iterval ad egative values of E Nt) idicate that λ k t F t ) over-predicted the umber of evets i the iterval. I this way, 8) ca be iterpreated as a geeralized residual. The backwards recurrece time give by icreases liearly with jumps back to at each ew poit. U k) t)=t t N k t) 11) Stochastic Itegrals. The stochastic Stieltjes itegral[1, 2.1][8, 2.2] of a measurable process, havig either locally bouded or oegative sample paths, Xt) with respect to N k exists ad for each t we have Xs)dN k s = θt t k i )Xt k i ) 12),t] i The Expoetial Autoregressive Coditioal DuratioEACD) Model. Lettig p i be the family of coditioal probability desity fuctios for arrival time, the log likelihood of the expoetial) ACD model ca be expressed i terms of the coditioal desities or itesities as [11] ll{ } i= ) = logp i t,, 1 ) i= ) t = logλ i 1,t,, 1 ) λu,t,,t Nu )du i=1 = i=1 = i=1 logλ i 1,t,, 1 ) λu,t,,t Nu )du 1 ) logλ i 1,t,, 1 ) Ẽ i t = lλt)dnt t t t λt)dt We will see that λ ca be parameterized i terms of t ) 13) λt N t,t 1,,t Nt )=ω+ i=1 N t π i t Nt+1 i t Nt i) 14) so that the impact of a duratio betwee successive evets depeds upo the umber of iterveig evets. Let x i = 1 be the iterval betwee cosecutive arrival times; the x i is a sequece of duratios or waitig times. The coditioal desity of x i give its pass the give directly by Ex i x i 1,,x 1 )=ψ i x i 1,,x 1 ;θ)=ψ i 15) The the ACD models are those that cosist of the assumptio x i =ψ i ε i 16) where ε i is idepedetly ad idetically distributed with desity pε; φ) where θ ad φ are variatio free. ACD processes are limited to the uivariate settig but later we will see that this model ca be combied with a Hawkes process i a multivariate framework. [6] The coditioal itesity of a ACD model ca be expressed i geeral as λt N t,t 1,,t Nt )=λ t tnt ψ Nt+1 ) 1 ψ Nt+1 17)

4 4 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data where λ t) is a determiistic baselie hazard, so that the past history iflueces the coditioal itesity by both a multiplicative effect ad a shif the baselie hazard. This is called a accelerated failure time model sice pasformatio iflueces the rate at which time passes. The simplest model is the expoetial ACD which assumes that the duratios are coditioally expoetial so that the baselie hazard λ t)=1 ad the coditioal itesity is λt x Nt,,x 1 )= 1 ψ Nt +1 18) The compesator for cosecutive evets of the ACD model i the case of costat baselie itesity λ t)=1 is simply Ẽ i =Λ k 1, ) = λt x i,,x 1 )dt t i 1 1 = dt 1 ψ Nt +1 1 = dt ψ i 1 = 1 ψ i = x i ψ i where x i = 1. A geeral model without limited memory is referred to as ACDm,q) where m ad q refer to the order of the lags so that there are m+q+1) parameters. m ψ i =ω+ q α j x i j + 19) ψ i j 2) ω 1 q j=q where ω,α j, ad ψ i = for i=1 maxm,q) so the coditioal itesity is the writte 1 λt x Nt,,x 1 )= ω+ m α j x q 21) Nt+1 j+ ψ Nt+1 j The log-likelihood for the ACDm,q) model is the writte i terms of the duratios x i = 1 ) ll{x i } i=1,, ) = lλ i 1,t,, 1 ) Ẽ i i=1 = ) Sxi ) l ψ i i=1 ) e Ẽi ψ i = l i=1 = i=1 = i=1 l e xi ψ i ψ i 1 l ψ i ) x i ψ i 22) A ACD process is statioary if m α j + q i=1 i=1 <1 23)

5 ,, 5 i which case the ucoditioal mea exists ad is give by ω µ=e[x i ]= 1 m i=1 α j + q i=1 ) 24) The goodess of fit ca be checked by testig that residuals Ẽ i have mea ad variace equal to 1 ad o autocorrelatio The Weibull-ACD Model. The WACDWeibull-ACD) model exteds the EACD model by assumig a Weibull distributio for the residuals ε i i 16) istead of a expoetial. We have the itesity give by ad log-likelihood by 1+ λt x Nt,,x 1 )= Γ 1 γ ψ Nt+1 ) ) ll{x i } i=1,, ) = ) γ 1+ l +γl Γ 1 γ x i ψ i i=1 γ t t Nt ) γ 1 γ 25) x i ) 1+ Γ 1 γ ψ i x i γ 26) The goodess of fit ca be checked by testig that the mea of Ẽ i is equal to 1 ad graphically checkig whas kow as a weibull plot. If is a good fit, the empirical curve will be ear the straight lie. I the example show below, the weibull does better tha the expoetial bus still ot a great fit Weibull robability lot.1 robability Data Figure 1. Weibull plot for WACD1,1) model fit to SY INET o The Hawkes rocess.

6 6 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data Liear Self-Excitig rocesses. A uivariate) liear self-excitig coutig) process N s oe that ca be expressed as [15][7][14][3] t λt) =λ t)κ+ νt s)dn s =λ t)κ+ 27) νt ) <t where λ t) is a determiistic base itesity, see 77), ν: R + R + expresses the positive ifluece of past evets o the curret value of the itesity process, ad κ takes the place of the λ costa the refereced papers. The expoetial) Hawkes process of order is a liear selfexcitig process defied by the expoetial kerel so that the itesity is writte as νt)= α j e βjt 28) λt) =λ t)κ+ t N t 1 α j e βjt s) dn s =λ t)κ+ α j e βjt ti) i= =λ t)κ+ N t 1 α j e βjt ti) =λ t)κ+ =λ t)κ+ i= N t 1 α j e βjt tk) i= α j B j N t ) 29) where B j i) is give recursively by i 1 A uivariate Hawkes process is statioary if B j i) = 3) k= =1+B j i 1))e βjt ti) e βjt tk) αj If a Hawkes process is statioary the the ucoditioal mea is <1 31) λ µ=e[λt)] = 1 νt)dt = λ 1 λ = 1 α j α j e βjt dt 32)

7 ,, 7 For cosecutive evets, we have the compesator 7) Λ 1, ) = λt)dt 1 = λ t)+ α j B j N t ) dt 1 = λ s)ds+ i 1 αj e 1 t k) e βjti tk)) 1 k= = λ s)ds+ αj ) 1 e 1) A j i 1) 1 where there is the recursio A j i) = e βjti tk) t k i 1 e βjti tk) = k= 33) 34) =1+e βjti ti 1) A j i 1) with A j )=. If λ t)=λ the 33) simplifies to Λ 1, ) = 1 )λ + k= = 1 )λ + i 1 αj αj e 1 t k) e βjti tk)) 1 e 1) ) A j i 1) 35) Similiarly, aother parameterizatio is give by Λ 1, ) = κλ s)ds+ αj ) 1 e 1) A j i 1) 1 =κ λ s)ds+ αj ) 1 e 1) A j i 1) 1 36) =κλ 1, )+ αj 1 e 1) ) A j i 1) where κ scales the predetermied baselie itesity λ s). I this parameterizatio the itesity is also scaled by κ λt) =κλ t)+ α j B j N t ) 37) this allows to precompute the determiistic part of the compesator Λ 1, )= 1 λ s)ds The Hawkes1) Model. The simplest case occurs whe the baselie itesity λ t) is costat ad =1 where we have λt)=λ + αe βt ti) 38) <t

8 8 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data which has the ucoditioal mea E[λt)]= λ 1 α β 39) Maximum Likelihood Estimatio. The log-likelihood of a simple poit process is writte as T llnt) t [,T] ) = =T 1 λs))ds+ T λs)ds + T T lλs)dn s lλs)dn s 4) which i the case of the Hawkes model of order ca be explicitly writte [13] as ll{ } i=1 ) =T Λ,T)+ i=1 =T + i=1 lλ ) lλ ) Λ 1, ) =T Λ,T)+ lλ ) i=1 =T Λ,T)+ l κλ )+ i=1 =T Λ,T)+ i=1 i 1 k=1 l κλ )+ α j R j i) α j e βjti tk) 41) =T + i=1 T κλ s)ds αj i=1 l κλ )+ α j R j i) ) 1 e t ) where T =t ad we have the recursio[12] i 1 e βjti tk) R j i) = k=1 =e βjti ti 1) 1+R j i 1)) 42) If we have costat baselie itesity λ t)=1 the the log-likelihood ca be writte ll{ } i=1 ) =T κt i=1 + i=1 αj l λ + α j R j i) ) 1 e t ) 43) Note that was ecessary to shift each by t 1 so that t 1 = ad t =T. Also ote that T is just a additive costat which does ot vary with the parameters so for the purposes of estimatio ca be removed from the equatio The Hawkes rocess i Quatum Theory.

9 ,, 9 The Hawkes process arises i quatum theory by cosiderig feedback via cotiuous measuremets where the quatum aalog of a self-excitig poit process is a source of irreversibility whose stregth is cotrolled by the rate of detectios from that source. [16] The Hawkes rocess Havig a Weibull Kerel. The expoetial kerel of the Hawkes process ca be replaced with that of a Weibull kerel. [1, 6.3] Recall that the itesity is defied by 27) t λt) =λ t)κ+ νt s)dn s =λ t)κ+ νt ) <t 44) where the expoetial kerel νt)= so the Weibull-Hawkes itesity is writte is λt) =λ t)κ+ α j e βjs replaced by the Weibull kerel νt)= ) ) ) κj κj t κj 1 t ω α j e j 45) ω j ω j =λ t)κ+ i= t N t 1 α j κj ω j α j κj ω j ) t s ω j ) t ti ω j ) ) κj κj 1 e βj t s ω j dn s ) κj 1 e t ω j ) κj 46) 1.5. Combiig the ACD ad Hawkes Models. The ACD ad Hawkes models ca be combied to provide a model for itraday volatility. [2] Let λt) =λ t)+ 1 t + νt s)dn s 47) ψ Nt where λ t) is the determistic baselie itesity77) ad where the ACD2) pars m ψ i =ω+ q α j x i j + ψ i j 48) ad the Hawkes part has the expoetial kerel28) νt)= γ j e ϕjt 49) so that t νt s)dn s = t γ j e ϕjt s) dn s N t νt t k ) = k= N t = k= = N t γ j k= = γ j B j N t ) γ j e ϕjt tk) e ϕjt tk) 5)

10 1 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data where we have replaced α=γ ad β=ϕ i the Hawkes part so that the parameter ames do ot coflict with the ACD part where α ad β are also used as parameter ames. The Hawkes part of the itesity has a recursive structure similiar to that of the compesator. Let where B j )=. The we have i 1 B j i) = 51) k= =1+B j i 1))e ϕjt ti) e ϕjt tk) 1 λt) =λ t)+ ω+ m α j x q + γ j B j N t ) 52) Nt j+ ψ Nt j The log-likelihood for this hybrid model ca be writte as ll{ } i=1,.., ) = i=1 lλ ) 1 ) λt)dt = lλ ) Λ 1, )) i=1 53) = i=1 lλti ) Ẽ i ) By direct calculatio, combiig 19) ad 33), ad lettig x i = 1 we have the compesator Ẽ i =Λ 1, ) = λt)dt t i 1 = λ t)+ 1 1 ψ Nt +1 = x i + λ t)+ ψ i 1 ti = λ t)dt+ x i + ψ ti 1 i k= = 1 + t t νt s)dn s )dt νt s)dn s )dt i 1 γj e ϕjti 1 tk) e ϕjti tk)) ϕ j λ t)dt+ x i + γj 1 e ϕjxi )A j i 1) ψ i ϕ j 54) where ψ i is defied by 48) ad is give by 34) so that 53) ca be wriitte as A j i)=1+e ϕjxi A j i 1) 55) ll{ } i=,.., ) = lλ ) Ẽ i ) i=1 = lλti ) x i + γj 1 e ϕjxi )A j i 1) ψ i ϕ j i=1 = l 1 i 1 + γ ψ j e ϕjti tk) x i + γj 1 e ϕjxi )A i ψ i ϕ j i 1) j i=1 k= = l 1 + γ ψ j B j i) x i + γj 1 e ϕjxi )A i ψ i ϕ j i 1) j i= Multivariate Hawkes Models. Let M N ad { m )} m=1,,m be a M-dimesioal poit process. The associated coutig process will be deoted N t =N t 1,,N t M ). A multivariate Hawkes process[7][5][9] is defied with 56)

11 ,, 11 itesities λ m t),m=1 M give by M λ m t) =λ m t)κ m + =1 =λ m t)κ m + =1 =λ m t)κ m + =1 =λ m t)κ m + =1 =λ m t)κ m + =1 =λ m t)κ m + =1 t M t k <t M M M M α m, j e β m, j t s) dn s α j m, α m, j e β m, j t t k ) t k <t α j m, t k <t N t 1 α j m, k= α j m, B j m, N t ) e β m, j t t k ) e β m, j t t k ) e β m, j t t k ) where i this parameterizatio κ is a vector which scales the baselie itesities, i this case, specified by piecewise polyomial splies 77). We ca write B j m, i) recursively 57) B j m, i) = i 1 k= e β m, j t t k ) =1+B j m, i 1))e m, t ) I the simplest versio with =1 ad λ m t)=1 costat we have λ m t) =κ m + =1 =κ m + =1 M t M M =κ m + =1 N t 1 α m, e βm, t s) dn s α m, e βm, t t k ) k= N t 1 e βm, t t k ) α m, k= 58) 59) M =κ m + =1 Rewritig 59) i vectorial otio, we have λt)=κ+ where α m, B 1 m, N t ) t Gt s)dn s 6) Gt)=α m, e βm, t s) ) m,=1 M 61) Assumig statioarity gives E[λt)] = µ a costat vector ad thus κ µ = I Gu)du κ = I αm, ) β m, = κ I Γ A sufficiet coditio for a multivariate Hawkes process to be statioary is that the spectral radius of the brachig matrix Γ= Gs)ds= αm, β m, 63) 62)

12 12 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data be strictly less tha 1. The spectral radius of the matrix G is defied as where SG) deotes the set of eigevalues of G. ρg)= max a 64) a SG) The Compesator. The compesator of the m-th coordiate of a multivariate Hawkes process betwee two cosecutive evets t m i 1 ad t m i of type m is give by Λ m t m i 1,t m i ) = m m 1 M + =1 M + =1 λ m s)ds t m k < 1 t m i 1 t m k < m, α j β m,[e β m, j t m i 1 t ) k e β m, j t m i t ) k ] j m, α j β m,[1 e β m, j t m i t ) k ] j To save a cosiderable amout of computatioal complexity, ote that we have the recursio A m, j i) = e β m, j t m i t k ) t m k < =e β m, j t m i 1 m ) A j m, i 1)+ t m i 1 t m k < 65) e β 66) m, j t m i t k ) ad rewrite 65) as Λ m t m i 1,t m i ) =κ m ti m λ s)ds+ m m m 1 1 =κ m ti m λ m s)ds m 1 M m, α j m, β j m + =1 =κ m + =1 tm i 1 M [ λ m s)ds ti m M =1 t k <s α j m, e m, s tk ) ds 1 e β m, j ti m ti 1 m ) ) A m, j i 1)+ m, αj m, β 1 e β m, j m ti ti 1 m ) ) j t k <tm i 1 e β m, j m ti 1 t k ) + t m i 1 t m k < t m i 1 t k <tm i ] 1 e βjm, ti m ) tk ) 1 e m, m ti t ) k ) 67) where we have the iitial coditios A j m, )= Log-Likelihood. The log-likelihood of the multivariate Hawkes process ca be computed as the sum of the loglikelihoods for each coordiate. Let where each term is defied by ll m { })= M ll{ } i=1,,nt )= m=1 which i this case ca be writte as ll m { }) =T Λ m,t)+ N T z m i l m λ )κ m + i=1 =1 N T m =T Λ m,t)+ i=1 T ll m { }) 68) 1 λ m T s))ds+ lλ m m s)dn s 69) l m λ t m i )κ m + =1 M α m, j e βjm, t k ) t k < M t m k < α j m, e m, m t k ) 7)

13 ,, 13 where agai t NT =T ad ad { z m 1 evetti of typem i = otherwise 71) Λ m T,T)= λ m t)dt= m N T Λ m t m i 1,t m i ) 72) i=1 where Λ m t m i 1,t m i ) is give by 67). Similiar to to the oe-dimesioal case, we have the recursio R m, j i) = t k <t j m = e β m, j t m i t k ) e β m, j t m i t m i 1 ) R m, j i 1)+ t m i 1 t m k <t e β m, j t m i t k ) i e β m, j t m i t m i 1 ) 1+R m, j i 1)) ifm ifm= 73) so that 7) ca be rewritte as ll m { }) =T κ m T λ m t)dt i=1 m N T M N T m + i=1 =1 m, αj m, β 1 e β m, j ti m ti 1 m ) ) A m, i 1)+ j j l m λ t m i )κ m + =1 M t m i 1 t m k < α m, j R m, j i) 1 e βjm, ti m t ) k ) + 74) with iitial coditios R j m, ) = ad A j m, ) = where T = t N where N is the umber of observatios, M is the umber of dimesios, ad is the order of the model. Agai, T ca be dropped from the equatio for the purposes of optimizatio. 2. Numerical Methods 2.1. The Nelder-Mead Algorithm. The Nelder-Mead simplex algorithm[4] was used to optimize the likelihood expressios give above Startig oits for Optimizig the Hawkes rocess of Order. A startig poit for the optimizatio of a Hawkes process of order with a exact ucoditioal itesity was chose as the most reasoable startig poit, bus by o meas claimed to be the best. Letx i = 1 be the iterval betwee cosecutive arrival times as i the ACD model 16). The set the iitial value of λ to.5 E[x i], α 1 = 1 ad β 1 =2. This gives a ucoditioal mea of E[x i ] for these parameters used as a startig poit for the Nelder-Mead algorithm. 3. Examples 3.1. Millisecod Resolutio Trade Sequeces. The source data has resolutio of millisecods but the data is trasformed prior to estimatio by dividig each time by 1 so that the uit of time is secods. Also, trades occurig at the same price withi 2ms of each other are dropped from the aalysis. Further work will be doe to fid the optimal level of time aggregatio, ideally the data would be timestamped with aosecod resolutio ad this will be doe i the future Adjustig for the Determiistic Daily Itesity Variatio. Is a well kow fact that arrival ratesad the closely related volatility) have daily seasoal or diural patters where tradig activity peaks after ope ad before close ad has a low aroud the middle of the day kow as the luchtime effect. I order to accout for this we will fit a cubic splie with 14 kot poits spaced every 3 miutes, icludig the opeig ad closig times of t= ad t= =234 respectively sice t has uits of secods. Let the adjusted duratios be defied x i=φ )x i 75)

14 14 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data where x i = 1 is the uadjusted duratio ad φ ) is a piecewise polyomial) cubic splie with kot poits at tzj) with values give by j 1 j = N tzj)+w N tzj) w ) N tzj)+w i=n tzj) w 1 xi forj= 13 76) where z= 6 3=18 is the umber of secods i a half-hour ad j= 6.5 2). The first ad last kots have a widow of 3-miutes whereas the iterior kot poits have a widow of 1 hour lookig forward ad backward i time 3-miutes, the first kot poit oly looks forward ad the last kot poit oly looks backward. This gives us the determiistic baselie itesity which is a piecewise polyomial cubic splie fuctio whose exact form is ot metioed here sice is ot the focus of the paper. λ t)=ft,,, j ) 77) The followig figure shows the determiistic part of the itesity estimated for SY o for INET, BATS, ad ARCA. 2 φt) 1.8 INET BATS ARCA Figure 2. Iterpolatig splie φt) for SY o Uivarate Hawkes model fit to SY SDR S& 5 ETF Trust). Cosider these parameter estimates for the uivariate) Hawkes model of various orders fitted to data geerated by trades of the symbol SY traded o the NASDAQ o Nov 3th, 212. The ucoditioal sample mea itesity for this symbol o this day o this exchage was trades per secod where the umber of samples is = The data preseted here has ot bee deseasoalized, the aalysis with determiistic diural variatio accouted for will be preseted i the ext sectio. As ca be see, = 6 provides the best likelihood but a more rigorous method to choose would be to use some iformatio criterio like Bayes or Akaike to decide the order. Error bars are ot provided, but presumably they could be estimated with derivative iformatio. Note that the closer E[λt)] to ad E[Λ] ad Var[Λ] to 1. the better, sice Λ should be expoetially distributed with mea

15 ,, 15 1 by desig ad for a oisso process the mea ad variace are equal. The ext thig to check is that the Λ series is ot autocorrelated. κ α 1 β 1 ll{ }) E[λt)] E[Λ] Var[Λ] Table 1. arameters ad statistics for model fitted to data without diural adjustmets κ α 1 β 1 ll{ }) E[λt)] E[Λ] Var[Λ] Table 2. arameters ad statistics for model fitted to data with diural adjustmets

16 16 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data Autocorrelatio of Λ for =1 Autocorrelatio of Λ for =2.8.8 Sample Autocorrelatio Sample Autocorrelatio Autocorrelatio of Λ for =3 Autocorrelatio of Λ for =4.8.8 Sample Autocorrelatio Sample Autocorrelatio Autocorrelatio of Λ for =5 Autocorrelatio of Λ for =6.8.8 Sample Autocorrelatio Sample Autocorrelatio Figure 3. Autocorrelatios of Λ 1, ) for =1 6 without diural adjustmets

17 ,, 17 Autocorrelatio of Λ for =1 Autocorrelatio of Λ for =2.8.8 Sample Autocorrelatio Sample Autocorrelatio Autocorrelatio of Λ for =3 Autocorrelatio of Λ for =4.8.8 Sample Autocorrelatio Sample Autocorrelatio Autocorrelatio of Λ for =5 Autocorrelatio of Λ for =6.8.8 Sample Autocorrelatio Sample Autocorrelatio Figure 4. Autocorrelatios of Λ 1, ) for =1 6 with diural adjustmets As ca be see by visually ispectig the autocorrelatios, all of the residual series are prettymuch acceptable *without* diural adjustmets except for = 1 with still had sigificat leftover autocorrelatio. Stragely, it seems thaclusio of the diural adjustmet sigificatly worses the model fi early all cases. I am tempted to suspect somethig wrog with the code.

18 18 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data x 1 4 Figure 5. rice history for SY traded o INET o Oct 22d, Figure 6. x i = 1 i blue ad {Λ 1, ): =1} i gree

19 ,, Figure 7. x i = 1 i blue ad {Λ 1, ): =6} i gree

20 2 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data x 1 4 Figure 8. Zoomed i view of x i = 1 i blue ad {Λ 1, ): =6} i gree Multivariate SY Data for Cosider a 5-dimesioal multivariate Hawkes model of order =1 fit to data for SY from 3 exchages, INET, BATS, ad ARCA o Both INET ad BATS distiguish buys from sells whereas ARCA does ot, hece 5 dimesioal, 2 dimesios each for INET ad BATS ad 1 dimesio for ARCA which will aturally have twice as high a rate as that for buys ad sells cosidered seperately. The 5 dimesios are orgaized as follows: BATS Buys BATS Sells INET Buys INET Sells ARCA Trades 78)

21 ,, SY Trades o BATS Buys Λ BATS Sells Λ INET Buys Λ INET Sells Λ ARCA Trades Λ x 1 4 Figure 9. We say trades for ARCA because the type set from the data broker is Ukow, idiciatig thas ukow whether is a buyer or seller iitiated trade. We have the followig parameter estimates where large values of α >.1) are highlighted i bold. α= λ= β= ) 8) 81) with a log-likelihood score of Multivariate SY Data for

22 22 oit rocess Models for Multivariate High-Frequecy Irregularly Spaced Data Cosider the same symbol, SY, as a 5-dimesioal Hawkes process as i 3.1.3, for a differet day, o , estimated with order = 2 for a total of 15 parameters. α j coefficiets that are >.1 are highlighted i bold. The parameters listed below resulted i a log-likelihood value of A iterestig patter emerges i the β coefficiets where it takes o some approximate stair-step patter ragig from 2 to 22. This might be idicitative of some fixedfrequecy algorithms operatig across the differet exchages at approximate 1-secod itervals. λ= α 1= α 2= β 1 = β 2 = Bibliography 82) 83) 84) [1] C.G. Bowsher. Modellig security market evets i cotiuous time: itesity based, multivariate poit process models. Joural of Ecoometrics, 1412): , 27. [2] Y. Cai, B. Kim, M. Leduc, K. Szczegot, Y. Yixiao ad M. Zamfir. A model for itraday volatility.,, 27. [3] V. Chavez-Demouli ad JA McGill. High-frequecy fiacial data modelig usig hawkes processes. Joural of Bakig & Fiace,, 212. [4] JE Deis ad D.J. Woods. Optimizatio o microcomputers: the elder-mead simplex algorithm. New Computig Eviromets: Microcomputers i Large-Scale Computig, : , [5]. Embrechts, T. Liiger ad L. Li. Multivariate hawkes processes: a applicatio to fiacial data. Joural of Applied robability, 48: , 211. [6] R.F. Egle ad J.R. Russell. Autoregressive coditioal duratio: a ew model for irregularly spaced trasactio data. Ecoometrica, : , [7] A.G. Hawkes. Spectra of some self-excitig ad mutually excitig poit processes. Biometrika, 581):83 9, [8] A. Karr. oit processes ad their statistical iferece, volume 7. CRC, [9] T.J. Liiger. Multivariate Hawkes rocesses. hd thesis, Swiss Federal Istitute Of Techology Zurich, 29. [1] F. Loreze. Aalysis of Order Clusterig Usig High Frequecy Data: A oit rocess Approach. hd thesis, Tilburg School of Ecoomics ad Maagemet, Fiace Departmet, August 212. [11] Y. Ogata. The asymptotic behaviour of maximum likelihood estimators for statioary poit processes. Aals of the Istitute of Statistical Mathematics, 31): , [12] Y. Ogata. O lewis simulatio method for poit processes. Iformatio Theory, IEEE Trasactios o, 271):23 31, [13] T. Ozaki. Maximum likelihood estimatio of hawkes self-excitig poit processes. Aals of the Istitute of Statistical Mathematics, 311): , [14] H. Shek. Modelig high frequecy market order dyamics usig self-excited poit process. Available at SSRN ,, ) 86)

23 ,, 23 [15] Ioae Mui Toke. A itroductio to hawkes processes with applicatios to fiace., [16] HM Wisema. Quatum theory of cotiuous feedback. hysical Review A, 493):2133, 1994.