Bayesian Machine Learning in Finance

Size: px

Start display at page:

Download "Bayesian Machine Learning in Finance"

Ami Hutchinson
5 years ago
Views:

1 Bayesian Machine Learning in Finance Patricia(Ning) Ning Computational Finance & Risk Management Dept. of Applied Math Univ. of Washington, Seattle Talk maily based on paper Multivariate Bayesian Structural Time Series Model Forthcoming Journal of Machine Learning Research

2 AI and Big Data Jobs in Hedge Funds

3 AI and Big Data Jobs in Hedge Funds

4 Bayesian Structural Time Series (BSTS) Model Figure: Steven L. Scott Figure: Hal R Varian

6 Bayesian Structural Time Series (BSTS) Model BSTS model is a machine learning technique used for feature selection, time series forecasting, nowcasting, inferring causal impact and other. The model consists of three main parts: 1 Kalman filter: The technique for time series decomposition. In this step, a researcher can add different state variables: trend, seasonality, regression, and others. 2 Spike-and-slab method: In this step, the most important regression predictors are selected. 3 Bayesian model averaging: Combining the results and prediction calculation.

7 Multivariate Bayesian Structural Time Series (MBSTS) Model Structural Time Series Models belong to state space models Observation Equation: ỹ t = Z T t α t + ɛ t, ɛ t N m (0, Σ t ), ỹ t : observations, α t : unobserved latent states Transition Equation: α t+1 = T t α t + R t η t, η t N q (0, Q t ), The model matrices Z t, T t, and R t typically contain unknown parameters.

8 MBSTS Model In general, the model in state space form can be written as: ỹ t = µ t + τ t + ω t + ξ t + ɛ t ɛ t iid Nm (0, Σ ɛ ) t = 1, 2..., n. (1) Based on state space form, α t is the collection of these components, namely α t = [ µ T t, τ T t, ω T t, ξt t ] T.

9 MBSTS Model Trend component µ t+1 = µ t + δ t + ũ t, ũ t iid Nm (0, Σ µ ), (2) δ t+1 = D + ρ( δ t D) + ṽ t, ṽ t iid Nm (0, Σ δ ). (3) Seasonality component S i 2 t+1 = τ (i) (i) t k +w τ (i) k=0 t, w t = [w (1) t,, w (m) t ] T iid Nm (0, Σ τ ), (4)

10 MBSTS Model Cyclical effect component ω t+1 = ϱĉos(λ) ω t + ϱŝin(λ) ω t + κ t, κ t iid Nm (0, Σ ω ), ω t+1 = ϱŝin(λ) ω t + ϱĉos(λ) ω t + κ t, κ t iid Nm (0, Σ ω ), (5) where ϱ, ŝin(λ), ĉos(λ) are m m diagonal matrices with diagonal entries equal to ϱ ii, sin(λ ii ) where λ ii = 2π/q i is the frequency with q i being a period such that 0 < λ ii < π, and cos(λ ii ) respectively.

11 MBSTS Model

12 MBSTS Model Regression component ξ (i) t For target series y (i), the x (i) t = β T i x (i) t. (6) = [x (i) t1,..., x (i) tk i ] T is the pool of all available predictors at time t, and β i = [β i1,..., β ij,..., β iki ] T represent corresponding static regression coefficients.

13 MBSTS Model Spike and Slab Regression: In feature selection, a high degree of sparsity is expected, in the sense that the coefficients for the vast majority of predictors will be zero. A natural way to represent sparsity in the Bayesian paradigm is through the spike and slab coefficients. One advantage of working in a fully Bayesian setting is that we do not need to commit to a fixed set of predictors.

14 MBSTS Model Ỹ = M + T + W + X β + Ẽ (7) where Ỹ = vec(y ), M = vec(m), T = vec(t ), W = vec(w ), Ẽ = vec(e), and X, β are written as: X β 1 0 X β 2 X = β = (8) X m β m where X i being n k i matrix, representing all observations of k i candidate predictors for y (i).

15 MBSTS Model Prior distribution and elicitation We define γ ij = 1 if β ij 0, and γ ij = 0 if β ij = 0. Then γ = [γ 1,..., γ m ], where γ i = [γ i1,..., γ iki ]. The spike prior may be written as: γ m k i i=1 j=1 π γ ij ij (1 π ij ) 1 γ ij (9) where π ij is prior inclusion probability of j th predictor for i th response series.

16 MBSTS Model Prior distribution and elicitation A simple slab prior specification is to make β and Σ ɛ prior independent: p(β, Σ ɛ, γ) = p(β γ)p(σ ɛ γ)p(γ) β γ N K (b γ, A 1 γ ) Σ ɛ γ IW (v 0, V 0 ) (10) where b γ is the vector of prior means with the same dimension as β γ, and A γ is the full-model prior information matrix.

17 MBSTS Model By the law of total probability, the full likelihood function is given by p(ỹ, β, Σ ɛ, γ) = p(ỹ β, Σ ɛ, γ) p(β γ) p(σ ɛ γ) p(γ), (11) ( p(ỹ β, Σ ɛ, γ) Σ ɛ n/2 exp 1 2 (Ỹ X γ β γ ) T (Σ 1 ɛ I n )(Ỹ X γ β γ ) ( (12) p(β γ) A γ 1/2 exp 1 ) 2 (β γ b γ ) T A γ (β γ b γ ), (13) ( p(σ ɛ γ) Σ ɛ (v0+m+1)/2 exp tr( 1 ) 2 V 0Σ 1 ɛ ), (14) where Ỹ = Ỹ M T W is the multiple response series Ỹ with time series components subtracted out.

18 MBSTS Model Posterior Inference β Ŷ, Σ ɛ, γ N K ( β γ, ( ˆX T γ ˆX γ + A γ ) 1 ). (15) Σ ɛ Ỹ, β, γ IW (v 0 + n, E T γ E γ + V 0 ). (16) p(γ Σ ɛ, Ỹ ) =C(Σ ɛ, Ỹ A γ 1/2 p(γ) ) ˆX γ T ˆX γ + A γ 1/2 ( exp 1 ) 2 {bt γ A γ b γ Zγ T ( ˆX γ T ˆX γ + A γ ) 1 Z γ }. (17) Σ µ,δ,τ,ω µ, δ, τ, ω IW (w µ,δ,τ,ω + n, W µ,δ,τ,ω + AA T ). (18)

19 MBSTS Model Markov Chain Monte Carlo MCMC methods are a class of algorithms to sample from a probability distributions ((15), (16), (17) and (18)) based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a number of steps is then used as a sample from the desired distribution.

20 Model Training Let θ = (Σ µ, Σ δ, Σ τ, Σ ω ) denote the set of state component parameters. Looping through the five steps yields a sequence of draws ψ = (α, θ, γ, Σ ɛ, β) from a Markov chain with stationary distribution p( ψ Y ), the posterior distribution of ψ given Y.

21 MBSTS Model Training 1: Draw the latent state α = ( µ, δ, τ, ω) from given model parameters and Ỹ, namely p(α Ỹ, θ, γ, Σ ɛ, β), using the posterior simulation algorithm Durbin and Koopman (2002). 2: Draw time series state component parameters θ given α, namely simulating θ p(θ Ỹ, α) based on equation (18). 3: Loop over i in an random order, draw each γ i γ i, Ỹ, α, Σ ɛ, namely simulating γ p(γ Ỹ, Σ ɛ ) one by one based on equation (17), using the stochastic search variable selection (SSVS) algorithm from George and McCulloch (1997). 4: Draw β given Σ ɛ, γ, α and Ỹ, namely simulating β p(β Σ ɛ, γ, Ỹ ) based on equation (15). 5: Draw Σ ɛ given γ, α, β and Ỹ, namely simulating Σ ɛ p(σ ɛ γ, Ỹ, β) based on equation (16).

22 Target Series Forecasting Let Ŷ represents the set of values to be forecast. The posterior predictive distribution of Ŷ can be expressed as follows: p(ŷ Y ) = p(ŷ ψ)p( ψ Y )d ψ, (19) where ψ is the set of all model parameters and latent states randomly drawn from p( ψ Y ), then we can draw samples of Ŷ from p(ŷ ψ).

23 Empirical Analysis Data: Daily stock price of Bank of America (BOA), Capital One Financial Corporation (COF), J.P. Morgan (JPM) and Wells Fargo (WFC). Time horizon: 11/27/ /03/2017 Source: Google Finance. Purpose: Trade when its future price is predicted to vary more than p%. Goal: Forecast the trend of stock movement in the next k(= 5) days.

24 We approximate the daily average price as: Pt = (C t + H t + L t )/3, where C t, H t and L t are the close, high, and low quotes for day t respectively. However, instead of using the arithmetic returns, we are interested in the log return V t defined as V t = {log( P t+j /C t )} k j=1. We consider the indicator variable y t = max{v V t }, the maximum value of log returns over the next k days.

26 Fundamental analysis claims that markets may incorrectly price a security in the short run but will eventually correct it. Trend Abbr. Trend Abbr. Advertising & marketing advert Air travel airtvl Auto buyers auto Auto financing autoby Automotive autofi Business & industrial bizind Bankruptcy bnkrpt Commercial Lending comlnd Computers & electronics comput Construction constr Credit cards crcard Durable goods durble Education educat Finance & investing invest Financial planning finpln Furniture furntr Insurance insur Jobs jobs Luxury goods luxury Mobile & wireless mobile Mortgage mtge Real estate rlest Rental rental Shopping shop Small business smallbiz Travel travel Unemployment unempl Table: Google domestic trends

27 Technical analysis claims that useful information is already reflected in the price of a security. We selected a representative set of technical indicators to capture the volatility, momentum and trend, close location value, and potential reversals of each stock. Variable Chaikin volatility Yang and Zhang Volatility historical estimator Arms Ease of Movement Value Moving Average Convergence/Divergence Money Flow Index Aroon Indicator Parabolic Stop-and-Reverse Close Location Value Abbr. ChaVol Vol EMV MACD MFI AROON SAR CLV Table: Stock Technical Predictors

28 Figure: True and fitted values of max log return from 11/27/2006 to 10/20/2017 (BOA)

30 Empirical posterior inclusion probability for the most likely predictors of max log return. Figure: Bank of America Corp. Figure: Capital One Financial Corp.

31 Empirical posterior inclusion probability for the most likely predictors of max log return. Figure: JPMorgan Chase & Co. Figure: Wells Fargo & Co.

32 Cumulative absolute one-step-ahead prediction error. Figure: All Predictors Without Deaseasonal Figure: Partial Predictors With Deaseasonal

33 One-step-ahead predction of max log return. Figure: Bank of America Corp. Figure: JPMorgan Chase & Co.

34 One-step-ahead predction of max log return. Figure: JPMorgan Chase & Co. Figure: Wells Fargo & Co.

35 Thank you!

Bayesian Variable Selection for Nowcasting Time Series

Bayesian Variable Selection for Time Series Steve Scott Hal Varian Google August 14, 2013 What day of the week are there the most searches for [hangover]? 1. Sunday 2. Monday 3. Tuesday 4. Wednesday 5.