QB research September 4, 06 Page -Minute Bin Volume Forecast Model Overview In response to strong client demand, Quantitative Brokers (QB) has developed a new algorithm called Closer that specifically targets the daily settlement price as the performance benchmark. (In the case of US equity index products, we use the cash close). The research and development focus of this implementation has been to deliver more precise volume, volatility, and quote size forecasts at the -minute interval level. This forecasting is instrument specific and also incorporates seasonal adjustments for month-end, quarter-end and roll periods. The Closer algorithm allows traders to place a simple order instruction, without the need to set parameters. The benchmark time will be automatically selected and the execution window optimally determined by granular volume forecasts. Our previous volume curve model utilized -minute bins without a confidence interval. In order to develop an algorithmic strategy with settlement price as the benchmark, we had to produce -minute bin volume curve forecasts with a confidence interval. This paper highlights the research we have undertaken. The data we include here is for CME Treasury futures, but the same principles can be applied to other futures instruments. Model Since there are virtually no significant economic events around settlement times, we need not worry about a change in the current model that will push the base curves into negative territory. Our statistical model (Almgren, 04) says that v lj = v j + e lj. () v lj v j e lj = observed volume on day l for bin j = base volume in jth bin, and = residual error, assumed independent with mean zero. Let K = { (l, j) {,, N} {,, n} v lj not null } We choose the n parameters { v j } n j= to minimize the weighted sum across days and bins of the squared errors plus a tension term E = W w l ( v j v lj ) + (l,j) K n j= μ j ( v j+ v j ). () where the w l 0 are daily weights and W = N l= w l. Rectangular averaging takes w l = ; exponential averaging takes w l = e (N l)/d, where D is the scale length in trading days. The last term expresses a penalty for the base volume profile to vary abruptly between neighboring bins. The n parameters μ,, μ n 0 determine a time scale of averaging which will make the choice of bin size less critical. In practice we will take μ j = μ constant, except for certain values for which μ j = 0, for example, before and end of the settlement time. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 0k 0k vlm without smoothing in thousand 8k 6k 4k k vlm in thousand 8k 6k 4k k 0k 0k -60-40 -0 0 0 40 60 relative to settlement time in mins -60-40 -0 0 0 40 60 relative to settlement time in mins Figure : volume profile around settlement time The picture shows one day estimation around settlement time.(figure ). Red dots are the estimation of the volume profile with smoothing and breakpoints, blue dots are the estimation without smoothing, grey dots are historical volume. We set several break points close to the settlement window and smooth other parts outside based on the jumps in the blue dots graph. Confidence Interval Our goal is to estimate a confidence interval for the sum of any bins.. Single Bin Confidence Interval Using this model we obtain the current day s volume curve prediction at the end of the previous day. Let v lj = v lj + e lj. () v lj v lj e lj = observed volume on day l for bin j = predicted volume on day l for bin j, and = residual error, assumed independent with mean zero. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page We can calculate a single bin confidence interval by utilizing the residual sample standard deviation σ: N σj = elj N. l= σ j = residual standard deviation on bin j.. Multiple Bins Confidence Interval In order to generate a confidence interval for multiple bins, we need to estimate our residual sample covariance matrix due to the many parameters in the original covariance matrix. Our sample covariance matrix modeling incorporates -hour before and after settlement times, without any consideration of price spikes during the time itself. During this window, there are virtually no significant economic events, but we exclude the historical days that do. We also exclude dates during the last week of each month, since we observed much higher volume spikes relative to the other dates around settlement time. We use principal components method to model following covariance matrix: Σ = σ σm. σm σmm m here represent number of minutes we are looking at around settlement time, for example, when m is 0, that means hour before settlement time and hour after settlement time. The ith row, jth volume element is the covariance of ith bin and jth bin when i is not equal to j, otherwise, is the variance of the ith bin. Our model will focus on eigenvalues and the eigenvector of the original covariance matrix. Let Σ = m i= λ i v i v i λ λ λ m = the ith eigenvalue of Σ λ i v i = the ith eigenvector of Σ If the first k eigenvalue and vector contain major information of the matrix, our estimation of the covariance matrix can be written as follows: Σ = Σ base + Σ add Σ base = k i= λ i v i v i Here Σ base is calculated by first k largest eigenvalue and corresponding eigenvector. When k<m, Σ base will be singular and the variance estimation will be smaller than it should be. So we need to add Σ add. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 4 Let Σ add = s ρ s s ρ s s ρ s s s ρ s s ρ s m s m ρ s m s m sm In general, when i is not equal to j and i < j, for example, the ith row and jth column element in the matrix above will be: s i s j ρ j i. In practice, we can set s k = σ k x k x k = the kth element of Σ base s diagnoal Another research strand here involves the choice of a reasonable number of eigenvalues in the formula above. Practically speaking, the easiest way is to calculate the cumulative sum of eigenvalues until it is larger than a fixed percentage e.g. 80% of the total sum of eigenvalues, or directly determined by screen graph. We plot our -minute volume curve model error covariance matrix. (Figure ) The Figure shows that the first eigenvalue is significantly larger than the other eigenvalues. We also plot the largest eigenvectors. (Figure ) The spike at the tenth element in each line is due to market volatility market during the settlement time. In addition, we introduced the Akaike information criterion (AIC) to choose the number of eigenvalues. AIC is widely used in regression analysis to deal with the tradeoff between model fitness and complexity. The methodology of AIC applies to the covariance matrix is more complicated, since it involves historical sample size. Nadakuditi and Edelman (008) suggest that the optimal number can be found in the following formula: t k = ((m k) m i=k+ λ i ( m i=k+ λ i) ( + m l ))m m l k op = arg k min 4 ( l m ) (t k ) + (k + ) λ i = the ith eigenvalue of Σ m = dimension of the matrix l = number of historical days used We calculate the result by this method and got 4 is the safe solution for the number of eigenvalues. (Figure 4 ) Now we can get an estimation of the covariance matrix Σ.(Figure ). Test Covariance Matrix Once we get the model of our m m covariance matrix, then the basic hypothesis is H 0 Σ = Σ. The statistic is u = ν( m i= (w i ln w i ) m) + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page.0 0.9 0.8 cumulative percentage 0.7 0.6 0. 0.4 0. 0. 0. 0.0 4 6 7 8 9 0 4 6 8 0 up to numth of eigen value 00 00 0 eigen value in 0^ 0 0 4 6 7 8 9 0 4 6 8 0 numth of eigen value Figure : eigenvalues and cumulative sum percentage of eigenvalues + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 6 0.8 0.7 0.6 0. 0.4 0. 0. 0. 0-0. -0. -0. -0.4-0. -0.6-0.7-0.8-0.9-7 9 7 9 Figure : eigenvectors for largest three eigenvalues 0K 00 7 9 7 9 Figure 4: AIC value based on different number of eigenvalues we choose + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
4 6 7 8 9 Page 7 0 4 6 7 8 9 0 4 6 7 8 9 0649788. 9640.8 4 84784.08 6 7978.6 7 8 6077.64 9 0 449.9 478.9 4 004.47 6 964446.7 7 8 878779.0 9 0-06888.69 0 4 6 7 8 9 0 0649788. 9864. 4 849.7 6 7476. 7 8 69090.9 9 0 896.4 46474.4 4 0067.74 6 69.4 7 8 078.9 9 0 8044. Figure : Comparison of sample covariance matrix(upper) and estimation covariance matrix( lower) for ZN, zoomed in to 0-minutes before and after settlement time. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 8 ν = degree of freedom of w i = the ith eigenvector of Σ Σ Σ, number of historical days we use minus This statistic has a chi-square distribution of χ (α, m(m + )) at confidence level α. we performed the chi-square test within a 90% significance level and 0 degrees of freedom and the resulting critical value was 80.4. This critical value is smaller than 84., so the model passed the test. Intraday update We can also update our estimations of the covariance matrix and volume throughout the day. For example, in our m m error covariance matrix Σ, with error mean vector as ε, ε is a m vectors, at beginning, elements in ε are all zero. We do partition as follows: ε needupd = ( ε needupd ) ε needupd Σ needupd = ( Σ upd Σ upd ) Σ needupd Σ upd ε needupd ε needupd Σ needupd = q vectors = m-q vectors = m-q m-q matrix Let s say we observed first q-minutes of that m-minutes window. Then we calculate the conditional distribution and can update it as: E( ε needupd ε needupd = ε obs ) = ε needupd + Σ upd Σ upd (ε obs ε needupd ) Cov( ε needupd ε needupd = ε obs ) = Σ needupd Σ upd Σ upd Σ upd ε obs ε upd = ( E( ε needupd ε needupd = ε obs ) ) Σ upd = ( Σ upd Σ upd Σ upd Cov( ε needupd ε needupd = ε obs ) ) ε obs = q vectors as first q mins observations We can repeat this process above and update on a minute-by-minute basis during the day. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 9 60k 0 Yr Note Futures / :9 to 4:00 Fri 0 Oct Wed 0 Sep Fri 9 Jan Thu 0 Jun 0k Thu Mar Thu 6 May Mon Aug Fri 9 Apr 40k Fri Jul Mon 0 Nov Wed 4 Feb Mon 9 Feb 0k Thu Feb Tue May Tue Aug 0k 0k 0k Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Figure 6: min volume bar include settlement time across dates Result and Comments In practice, we can also handle the special dates which usually have significant volume than other days. From the picture (Figure 6), we can see that the end of the month has much larger volume than other days, and before the end of the month in February, August, November also has larger volume due to the roll period. So we model them separately, for example, the estimation of day before first notice day will be based on all the dates just day before first notice day in the past years. Here we treat the end of month in December as ordinary dates due to the holiday season. we showed one day estimation around settlement time at the beginning.(figure ). Besides that, in order to improve performance around settlement time, we developed a price signal around the settlement time to take advantage of price reversion. From the picture (Figure 7), we use the sign adjusted price based on minutes before the end of settlement time. We found that if the market is trending before the end of the settlement time, then the price movement is more likely due to temporary market impact since traders are actively trading before the settlement. The black dots are the average price and red lines are the error bar. The price movement shows a strong reversion pattern after the end of the settlement time. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com
Page 0 relative to settlment price in minimum price increment 0 - - - 0 4 relative to settlement time in mins Figure 7: price pattern around settlement time Further research could be to focus on the volume profile very close to the settlement window. The volume is very discrete from minute before to minute after the settlement window, so minute bin size seems too large within that time window. One solution is to set a flexible time grid around the end of the settlement time, from second, seconds, to 0 seconds, minute. We found there were significant volume spikes at second before and second after the end of settlement time. We can make a decision about bin size based on the distance to the end of the settlement time. References Almgren, R. (04, March). Fitting volume curves. Quantitative Brokers Research. Nadakuditi, R. and A. Edelman (008, April). Sample eigenvalue based detection of highdimensional signals in white noise using relatively few samples. IEEE Transactions on Signal Processing 6, 6 68. Disclaimer This document contains actual performance results achieved, but past performance is not necessarily indicative of future results. Trading futures and options on futures is a high risk activity. QB offers execution services to institutional traders exclusively. + 646 9-888 sales@quantitativebrokers.com www.quantitativebrokers.com