Gaussian Process Regression Models for Predicting Stock Trends

Size: px

Start display at page:

Download "Gaussian Process Regression Models for Predicting Stock Trends"

Alan Barrett
5 years ago
Views:

1 Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no noise. From all this relatively clean data it should be possible to predict accurate estimates o uture stock prices. A number o dierent supervised learning methods have been tried to predict uture stock prices, both or possible monetary gain and because it is an interesting research question. Examples o recent research are K. G. Srinivasa et al. [3] who used a neuro - genetic algorithm, M. A. Kaboudan [] who used genetic programming, and Robert J. Yan et al. [5] who tried competetive learning techniques. We use regression to capture changes in stock price prediction as a unction o a covariance (kernel or Gram) matrix. For our purposes it is natural to think o the price o a stock as being some unction over time. Loosely, a Gaussian processes can be cosidered to be deining a distrubution over unctions with the inerence step taking place directly in the space o unctions []. Thus, by using GP to extend a unction beyond known price data, we can predict whether stocks will rise or all the next day, and by how much. We conduct two experiments, both o which are carried out with data taken rom 8 stocks over the period o years. Our experiments keep track o two important eatures o stocks: their price, and whether their prices are rising or alling. The ormer is much more important in many cases o stock trading since it is useul to know by how much you are winning or loosing money. Method Non-kernel regression problems try to predict a continuous response y t R to some input vector x t R d. Kernel regression problems, on the other hand, map their input vectors into eature vectors via unctions φ(x t ) R n or some eature expansion φ( ) and input vector x t R d. The continous predictions are y t = θ T φ(x t )+ǫ in this simpliied model without bias and additive noise depends directly on the eature expansion used and thus the kernel K = φ( )φ( ) T. This is similar in the case o gaussian processes, and many o same rules or both linear and kernel regression apply. There are two ways to view regression in the case o gaussian processes (GP), the weight-space view and unction-space view []. We choose the unction-space view. Gaussian Process. A Gaussian process is any collection o random variables where an arbitrary subset o variables have a joint Gaussian distribution. mtarrel@mit.edu acorrea@mit.edu These stocks are: adbe, adsk, erts, mst, orcl, sap, symc, and vrsn

2 A Gaussian distribution is entirely speciied by the mean covariance and similarly a Gaussian distribution is completely speciied by its mean unction m(x) and covariance unction k(x,x ). That is, m(x) = E[(x)] = and, We can write the Gaussian process in the orm k(x,x ) = E[((x) m(x))((x m(x )))] = K (x) GP(m(x),k(x,x )). To simpliy notation and calculation the mean m(x) is taken to be. Thus we have, (x) GP(,K) The closing prices or stocks are assumed be be noise-ree. This is a special case since most training data contains noise in realistic processes. In this case we know the pairs {(t,i (t) t =,...,n)} or n training points, where t is the time at which we are evaluating the true price o the stock i, i (t) at closing time. There is a joint distribution over the training examples, and the predictions ˆ. The prior is centered with mean on, so [ ] ( [ ]) K(X,X) K(X, X) N, ˆ K( X,X) K( X, X) where the covariance term o X and X elements is an n n Gram matrix, X is the n d set o training input vectors, and X is the n d set o test input vectors. Functions drawn rom this joint prior distribution have the property that they agree with the observered training points. This limits the types o unctions acceptable to make predictions on the data. Function samples could be drawn rom the distribution and matched with the training data points to see i there is a it; however, or large data sets such as ours, this is computationally ineasible. Instead, consider calculating the conditional distribution or the values o ˆ on the observations o,x, X. This takes the orm, ˆ X,X, N(K( X,X)K(X,X),K( ˆX, ˆX) K( ˆX,X)K(X,X) K(X, ˆX)) This is the noise-ree prediction or ˆ over the posterior [].. Kernel Selection For the stock price data set we chose to try several dierent kernels to see which ones produced the best it o the training data. There are several types o kernel classes that can be used []. Below are the ew considered in this project. Stock data does not have a mean o. It would not make sense, as stocks cannot have negative prices. However, we used a Gaussian that assumed a mean o and a variance o our kernel (i.e. Ñ(, K)). We did this by shiting our stock price data such that its mean was. Not doing so would lead the GP process to believe that our predictive data s variance would be quite high, since the actual mean o a stock price is much higher than (one would hope).

3 .. Squared Exponential Covariance ( K(t,t ) = σ exp (t ) t ) ρ where ρ is the characteristic length-scale o the covariance unction and σ controls the smoothness o the curve. Squared Exponential Covariance is ininitely dierentiable. This means there is some implicit assumption about how smooth the estimated unctions are. That is, higher order noise will be smoothed over by this unction... Matern Class Covariance K(t,t ) = ν Γ(ν) ( ν(t t ) This covariance unction is an alternative to the Squared Exponential Covariance (SE) unction. While it can model higher order terms it is not ininitley dierentiable and approximates the SE unction as ν. In the our implementation we use only a speciic case o this covariance unction, where ν = 3 which is ( ) ( ) 3(t t K ν= 3(t,t ) 3(t t ) ) = + exp l l K ν= 3 is more interesting in most machine learning applications since the value o ν is not too high. When ν is too high, the predicted unction will have ver many jagged edges; on the other hand when it is too low the unction will be too smooth. Either case makes unrealistic assumptions about ˆ, thus ν = 3 is a good value. ρ ) ν..3 Rational Quadratic Covariance k RQ (t t ) = ( + (t t ) αl ) α The k RQ covariance is the SE covariance unction in the limit α. This is viewed as an ininite sum o SE covariance unctions with dierent length scales... Marginal Likelihood Gradient Given a Kernel it is important to ind the optimal set o hyperparameters it the observed data. To do this, maximize the marginal likelihood, log p(y X,θ) = ( (αα θ j tr T K ) ) K, θ j where α = K y []. The inversion o the matrix K is very computationally intensive and o O(n 3 ) complexity. This is a large drawback, particularly or the large data sets such as ours. However, once the value o K is known, the computational complexity drops to (O)(n ). 3

4 3 Experiments We perormed two experiments showing two dierent methods o stock prediction using Gaussian process. The irst uses a ± labeling to produce a simple metric o trend (where the label is + when rising and - when alling). The second uses regression on the value o the stock at the close o the trading day to predict the price o that stock or the ollowing day(s). To it the hyperparameters mentioned in Section. we used a iteration optimization o the marginal liklehood with each dierent kernel. 3. Prediction o Up/Down Trend In this experiment we see i we can predict stock prices by looking at only stock trend data (i.e. Is this stock rising or alling? ). Prediction on this property is highly sensitive to the amount o training data used. As shown in the large time range in Figure (rom /6/3 to /8/6) the price luctuates regularly and averages to a 5% chance that the predicted label, ŷ, will be + and a 5% chance it will be -. This particular prediction method is more useul on smaller data sets. Consider a Matern Class covariance unction trained on one month o stock price data as in Figure 3. Ater regression is perormed there is an indicaton or many stocks, take stock 6 and notice how the prediction curve is going down but remains above indicating that it is becoming less sure there will be a gain in close price. I the data is biased towards a + label then the next day s prediction is +. As the variance increases or urther predictions beyond the irst day the trend moves to about hal as likely to a + or. This is likely due to the uncertainty o our model to be able to predict a label eectively once the number o days passed has increased. 3. Single-Stock Price Prediction This experiment consists o three parts: training data regression, test data price prediction, and prediction assessment. The model assumes that the training data has zero mean, so the training outputs were recentered by calculating T rainingdata = T rainingdata mean(t rainingdata), with the outputs centered about zero. Ater the means and variances are predicted we translate the data back. This gives results like those in Table. Table : Experiment : Results Time Period Squared Exponential Matern ν = 3/ Rational Quadratic day Stock 7,. regret Stock 7,. regret no stock predicted month Stock 7,. regret Stock 7,. regret Stock 7,. 6 month Stock 7,. regret Stock 7,. regret Stock 7,. regret year Stock 8, regret Stock 8, regret Stock 7,. regret 3 years Stock 8, regret Stock 8, regret Stock 7,. regret Table : Chosen stock paired with regret in Covariance unction vs. time span table We measured penalty in terms o regret. Since we had the actual next-day (day t + ) prices o the test stock, we were able to ind which stock would have been the best to invest in the day

5 beore (day t). We then compared the dierence in price between time t and t + or the best stock and the stock we predicted, and took the dierence. In the case when our model predicted that no stock would gain any money the next day and no stock did, then we gave no penalty. I the model was wrong in this prediction, however, the penalty became the gain o the best stock, just as in the normal case. Conclusions The two experiments conducted here show the power o using Gaussian processes or predicting stock prices. There were a number o dierent ways to test how to make a good prediction based on the training data. By varying the lenghths o time considered or training, we achieved dierent results and accuracies. Using a long time period in experiment resulted in good stock predictions, while using shorter times resulted in sub-optimal predictions. In experiment, on the other hand, converse was true. The use o long time periods in the irst experiment did not produce interesting results, however, because regression only predicted the mean value o out o a long time period or predicting trend. For experiment where the goal was to classiy between two types o activity it was better to consider shorter time scales to avoid this averaging eect. A possible uture direction is using a larger subset o trends rather than restricting ourselves to mere growth and loss trends. The method in experiment showed that a Matern class or Squared Exponential class covariance unction predicted the optimal stock with less or equal regret compared with the rational quadratic covariance unction. The Squared Exponential unction makes some assumptions about the smoothness o the data as compared with the Matern Class covariance. The Rational Quadratic is a combination o Squared Exponential unctions at dierent length scales. This must play a role in how ineective this covariance unction was at predicting the optimal stock pick. It seems that i a large enough training set is used the Rational Quadratic covariance will eventually give stock 8 as a result. However, there was not enough time to test this. Given the data it appears either the Matern Class or Squared Exponential class is appropiate or use in prediction. Practially speaking however, using Gaussian process are not the most eicient method to use, because the computational load or predictions is high. To make money in the stock market using a machine learning technique, A aster method o computation would be required. Reerences [] M. A. Kaboudan. Genetic programming prediction o stock prices. Comput. Econ., 6(3):7 36,. [] C.E. Rasmussen and K. I. Williams. Gaussian processes or machine learning. 6. [3] K. G. Srinivasa, K. R. Venugopal, and L. M. Patnaik. An eicient uzzy based neuro - genetic algorithm or stock market prediction. Int. J. Hybrid Intell. Syst., 3():63 8, 6. [] R. von Mises. Mathematical theory o probability and statistics. 96. [5] Robert J. Yan and Charles X. Ling. Machine learning or stock selection. In KDD 7: Proceedings o the 3th ACM SIGKDD international conerence on Knowledge discovery and data mining, pages 38, New York, NY, USA, 7. ACM. 5

6 with Stock Hyperparameters σ =.65ρ= x with Stock Hyperparameters σ =.539ρ = x with Stock3 Hyperparameters σ =.88ρ= x with Stock Hyperparameters σ =.98ρ= x with Stock5 Hyperparameters σ =.65ρ= x with Stock6 Hyperparameters σ =.797ρ= x with Stock7 Hyperparameters σ =.338ρ = x with Stock8 Hyperparameters σ =.695ρ = x Figure : The above igures show the results o running a prediction on a large multi-year dataset. 6

7 8 Stock Predictions with Matern Class Covariance with Stock =.578ρ = Stock Predictions with Matern Class Covariance with Stock =.6ρ = Predictions with Matern Class Covariance x 8 Stock with Stock3 =.9969ρ = x Figure : Running label classiication on a large training set produces poor prediction. In a practical setting some stocks are highly regular and vary only a small amount or years at a time. This is relected in the act that the labels have what appears to be 5% chance o beeing labels + and 5% chace o being labeled - over time. 7

8 Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock Hyperparameters σ =.ρ =.85 Target Label Stock Trends with Matern Class Covariance with Stock =.98ρ = Stock Trends with Matern Class Covariance with Stock5 =.83ρ = Stock Trends with Matern Class Covariance with Stock6 Hyperparameters σ =.687ρ = Stock Trends with Matern Class Covariance with Stock7 =.3ρ = Stock Trends with Matern Class Covariance with Stock8 Hyperparameters σ =.736ρ = Figure 3: Running label classiication on a smaller dataset produces more meaninul results. The 95% conidence intervals show a trend towards either a + or - label. 8

9 Stock Predictions with Squared Exponential Covariance with Stock7 Hyperparameters σ =.7399ρ = Stock Predictions with Squared Exponential Covariance with Stock8 Hyperparameters σ =.3ρ = =.6867ρ =.88 Stock Predictions with Matern Class Covariance with Stock8 Hyperparameters σ =.559ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.6ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock7 =.658ρ = Stock Predictions with Rational Quadratic Class Covariance with Stock8 =.395ρ = Stock Prediction with Squared Exponential Class Covariance with Stock7 =.7856ρ = Stock Prediction with Squared Exponential Class Covariance with Stock8 Hyperparameters σ =.9ρ = with Stock7 =.6867ρ = with Stock8 Hyperparameters σ =.559ρ = Figure : The comparison between stock 7 and stock 9 8. Both were picked by the model. choosing stock 7 caused a regreet o. and choosing stock 8 caused no loss.

Gaussian Process Regression

Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process