Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu 2 Chivalric Trading 3 Carnegie Mellon University Pittsburg, PA PGMO Conference, October 29, 2014 João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 1 / 42

Outline 1 Motivation What s Novel? Portfolio Optimization Non-Stationary Covariance 2 Introduction to Factorial HMMs HMM Application to Problems Quick Overview of Hidden Markov Models Estimation Factorial HMM 3 Empirical Results 4 Conclusion João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 2 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach Incorporation of high frequency data João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach Incorporation of high frequency data Estimation in near real time João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach Incorporation of high frequency data Estimation in near real time Continuous emission HMM João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach Incorporation of high frequency data Estimation in near real time Continuous emission HMM Provable bounds João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Innovations to Factorial HMMs Multiple time horizon HMM using a structured approach Incorporation of high frequency data Estimation in near real time Continuous emission HMM Provable bounds Incorporation of exogenous data João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 4 / 42

What s Novel- Application to Portfolio Optimization Markowitz optimization is a well know theory, but hard to do right João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 6 / 42

What s Novel- Application to Portfolio Optimization Markowitz optimization is a well know theory, but hard to do right The allocation is optimized under exponential utility argmax α pos P T α pos 1 2ζ αt posσα pos where α pos is the notional allocation, p is the asset price at time t, P t = E[p t+τ p t ] is the expected profit, Σ is the asset return covariance matrix, and ζ is the risk aversion free variable. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 6 / 42

What s Novel- Application to Portfolio Optimization Markowitz optimization is a well know theory, but hard to do right The allocation is optimized under exponential utility argmax α pos P T α pos 1 2ζ αt posσα pos where α pos is the notional allocation, p is the asset price at time t, P t = E[p t+τ p t ] is the expected profit, Σ is the asset return covariance matrix, and ζ is the risk aversion free variable. In this talk we will only focus on improving covariance estimation João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 6 / 42

What s Novel- Application to Portfolio Optimization Markowitz optimization is a well know theory, but hard to do right The allocation is optimized under exponential utility argmax α pos P T α pos 1 2ζ αt posσα pos where α pos is the notional allocation, p is the asset price at time t, P t = E[p t+τ p t ] is the expected profit, Σ is the asset return covariance matrix, and ζ is the risk aversion free variable. In this talk we will only focus on improving covariance estimation We want a better estimate of Σ Σ t João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 6 / 42

Drawbacks of Current Models Modern approaches are constrained by computational complexity João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 7 / 42

Drawbacks of Current Models Modern approaches are constrained by computational complexity Trade off between model richness and data richness João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 7 / 42

Drawbacks of Current Models Modern approaches are constrained by computational complexity Trade off between model richness and data richness Difficult to both explain and identify the model João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 7 / 42

Drawbacks of Current Models Modern approaches are constrained by computational complexity Trade off between model richness and data richness Difficult to both explain and identify the model Incorporation of exogenous data is often difficult in empirical models João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 7 / 42

S&P 500 realized variance Figure: S&P 500 variance (second resolution) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 9 / 42

S&P 500 and 30 Year Treasury realized covariance Figure: S&P 500 and 30 Year Treasury covariance (second resolution) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 10 / 42

Common Applications of Hidden Markov Models Gene recognition Robotics Natural language processing tasks Speech Recognition João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 12 / 42

Hidden Markov Models There are two primary assumptions for this basic HMM: 1 The underlying hidden state process is Markovian 2 Given the hidden states, the observations are independent t t + 1 t + 2 h t h t+1 h t+2 x t x t+1 x t+2 Figure: HMM with states h t, h t+1, and h t+2 that emit observations x t, x t+1, and x t+2 respectively. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 14 / 42

Hidden Markov Models The probability distribution over the next hidden state at time t + 1 depends only on the current hidden state at time t Pr(h t+1 h t,..., h 1 ) = Pr(h t+1 h t ). João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 15 / 42

The Hidden Markov Model parameters T = Pr(h t+1 h t = i). π = Pr(x t+1 h t+1 ) Collection of λ(x) s Pr(h 1 ). João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 16 / 42

Hidden Markov Models The likelihood of a sequence of observations from a specified model is Pr(x 1,..., x t ) = [π] h1 h 1,...,h t j=2 t t [T ] hj,h j 1 [λ(x j )] hj though we will not consider this particular form of the likelihood. Instead, we will look at a new form for the likelihood, j=1 Pr(x t,..., x 1 ) = 1 A(x t ) A(x 1 )π where λ(x) is the distribution of the observation given a hidden state, and A(x t ) = T diag(λ(x)). João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 17 / 42

Hidden Markov Models λ(x) = Pr(x h) Pr(h t+1, x h t = 1) A(x) = = Figure: A(x), graphically João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 18 / 42

Spectral Methods for Estimation Spectral methods use singular value decomposition (SVD) and method of moments. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 20 / 42

Spectral Methods for Estimation Spectral methods use singular value decomposition (SVD) and method of moments. Fast SVD instead of forward/backward method EM estimation. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 20 / 42

Spectral Methods for Estimation Spectral methods use singular value decomposition (SVD) and method of moments. Fast SVD instead of forward/backward method EM estimation. Computing observables for spectral estimation of an HMM, fully reduced third moment. Estimation speed is critical given the size of high frequency financial datasets. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 20 / 42

Spectral Methods for Estimation Spectral methods use singular value decomposition (SVD) and method of moments. Fast SVD instead of forward/backward method EM estimation. Computing observables for spectral estimation of an HMM, fully reduced third moment. Estimation speed is critical given the size of high frequency financial datasets. For US equities sampling per second yields roughly 5 million data points per year per stock! João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 20 / 42

Spectral Algorithm Sketch Calculate E[X 2 X 1 ]. Calculate fast SVD of E[X 2 X 1 ] keeping k left singular vectors. Reduce the data where ŷ = Û x. Compute the first three moments E[Y 1 ], E[Y 2 Y 1 ], E[Y 3 Y 1 Y 2 ]. In the discrete case, Pr(x t,..., x 1 ) = b B(y t ) B(y 1 )b 1 where B(y) is the similarity transform of A(x). João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 21 / 42

Generalization to the Continuous Case To generalize to the continuous case we need to take expectations where, Pr(x t,..., x 1 ) = b B(G(x t )) B(G(x 1 ))b 1 and G(x) is an estimate of E[Y 2 x 1 ]. B(G(x)) is exactly what we want, up to a constant factor depending on x. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 22 / 42

Factorial HMM Different state layers evolve differently Figure: Factorial HMM diagram João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 24 / 42

Factorial HMM Figure: Structured Factorial HMM diagram João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 25 / 42

Structured Factorial HMM Differences Improvements João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Structured Factorial HMM Differences Improvements Faster estimation using Spectral methods João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Structured Factorial HMM Differences Improvements Faster estimation using Spectral methods Intuition about time horizon Simple layer aggregation João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Structured Factorial HMM Differences Improvements Faster estimation using Spectral methods Intuition about time horizon Simple layer aggregation Drawbacks Jumps in covariance estimation at hourly boundaries João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Structured Factorial HMM Differences Improvements Faster estimation using Spectral methods Intuition about time horizon Simple layer aggregation Drawbacks Jumps in covariance estimation at hourly boundaries Heuristic choice of time horizon João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Structured Factorial HMM Differences Improvements Faster estimation using Spectral methods Intuition about time horizon Simple layer aggregation Drawbacks Jumps in covariance estimation at hourly boundaries Heuristic choice of time horizon Requires lots of data João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 26 / 42

Stock Covariance Model Horizon RMSE N training N out of sample CAPM daily 0.9 10 5 3125 1000 CAPM hourly 1.2 10 7 40000 4000 CAPM second 1.7 10 8 4000000 400000 PCA (1) daily 0.85 10 5 3124 1000 PCA (1) hourly 1.0 10 7 40000 4000 PCA (1) second 1.6 10 8 4000000 400000 GARCH daily 0.6 10 5 3124 1000 GARCH hourly 0.9 10 7 40000 4000 GARCH second 1.2 10 8 4000000 400000 FHMM daily 1.2 10 6 3124 1000 FHMM hourly 3.0 10 7 40000 4000 FHMM second 0.9 10 9 4000000 400000 (1) 15 principal components João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 27 / 42

Summary Multiple time frames Major Contributions João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 28 / 42

Summary Multiple time frames Richer model Major Contributions João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 28 / 42

Summary Major Contributions Multiple time frames Richer model Intuitive explanation of model João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 28 / 42

Summary Major Contributions Multiple time frames Richer model Intuitive explanation of model Fast estimation João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 28 / 42

Thanks for listening! João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 29 / 42

Future Work Empirical frequency selection João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 30 / 42

Future Work Empirical frequency selection Expansion to other datasets (energy / weather) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 30 / 42

Future Work Empirical frequency selection Expansion to other datasets (energy / weather) Better estimation on lower time horizons João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 30 / 42

Future Work Empirical frequency selection Expansion to other datasets (energy / weather) Better estimation on lower time horizons Test more distributions for G(x) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 30 / 42

For Further Reading I Spectral Algorithm for Learning Hidden Markov Models Hsu, Kakade, Zhang 2009 Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. Halko, Martinsson, Tropp 2011 Using Regression for Spectral Estimation, Foster, Rodu, Ungar, Wu 2013 Two Step CCA: A new spectral method for estimating vector models of words, Dhillon, Foster, Rodu, Ungar 2013 Spectral Dependency Parsing with Latent Variables, Collins, Dhillon, Foster, Rodu, Ungar 2012 Spectral Dimensionality Reduction for HMMs, Foster, Rodu, Ungar 2012 Papers and Projects In Progress Spectral Estimation of HMMs with a continuous output distribution, Foster, Rodu, Ungar (in progress) Spectral Estimation of hierarchical HMMs, Foster, Rodu, Sedoc, Ungar (in progress) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 31 / 42

Appendix João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 32 / 42

Spectral Methods for Estimation In this section we will describe how to build the observables B(x). First note that the first three moments of the data from an HMM yield the following theoretical form: E[X 1 ] = Mπ E[X 2 X 1 ] = MT diag(π) M E[X 3 X 1 X 2 ] = MT diag(λ(x)) T diag(π) M where in this particular setting X 1 is Pr(Σ t 1 ), X 2 is Pr(Σ t ), X 3 is Pr(Σ t+1 ), π is the initial state vector, and M is the expected value of x given hidden state i. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 33 / 42

Spectral Algorithm Sketch Calculate E[X 2 X 1 ]. Calculate fast SVD of E[X 2 X 1 ] keeping k left singular vectors. Reduce the data where ŷ = Û x. Compute the first three moments E[Y 1 ], E[Y 2 Y 1 ], E[Y 3 Y 1 Y 2 ]. Consider an U such that U M is invertible, then estimating the second and third moments with reduced data y = U x allows in the discrete case, B(x) E[Y 3 Y 1 Y 2 ](λ(x))e[y 2 Y 1 ] 1 = (U M)T diagλ(x)(u M) 1. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 34 / 42

Generalization to the Continuous Case To generalize to the continuous case we need to take expectations where, B(G(x)) = (U M)T diagλ(x)(u M) 1 1 Pr(x) where Pr(x) is the marginal probability, and G(x) is a function of E[Y 2 x 1 ]. B(G(x)) is exactly what we want, up to a constant factor depending on x as Pr(Y 1,..., Y t ) b B(G(x t )) B(G(x 1 )) b 1. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 35 / 42

Outline Continuous Emission HMM João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 36 / 42

Continuous Emission HMM Define g(x) E[Y 2 x 1 ]. Let h t be the probability vector associated with begin in a particular state at time t. Then Also, thus E[y 2 h 2 ] = U M h 2. E[h 2 h 1 ] = T h 1. E[y 2 h 1 ] = U MT h 1 João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 37 / 42

Continuous HMM Emission To establish a belief about h 1 given x 1, recall from Bayes formula Pr(h 1 x 1 ) = Pr(x 1 h 1 ) Pr(h 1 ) Pr(x 1 ) We can arrange each probability into a vector, and because in the indicator vector case the probability vector is the same as the expected value vector, we have, in vector notation E[h 1 x 1 ] = diagπλ(x) π λ(x) and so putting together the pieces we get E[y 2 x 1 ] = U MT diagπλ(x) π λ(x) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 38 / 42

Continuous HMM Emission Recall that the goal is to isolate λ(x). Note that E[y 2 y 1 ] 1 g(x) = (M U) 1 λ(x) π λ(x) G(x) When this is plugged into our fully reduced version of B(γ), we get B(G(x)) = (U M)T diagm UG(x)(U M) 1 = (U M)T diagλ(x)(u M) 1 1 Pr(x) where Pr(x) is the marginal probability. B(G(x)) is exactly what we want, up to a constant factor depending on x. João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 39 / 42

Spectral Estimation Algorithm Algorithm 1 Computing observables for spectral estimation of an HMM, fully reduced third moment 1: Input: Training examples- x (i) for i {1,..., M} where x (i) = x (i) 1, x (i) 2, x (i) 3. 2: Compute Ê[x 2 x 1 ] = 1 m M i=1 x (i) 2 x (i) 1. 3: Compute the left k eigenvectors corresponding to the top k eigenvalues of Σ. Call the matrix of these eigenvectors Û. 4: Reduce data: ŷ = Û x. 5: Compute ˆµ = 1 M M i=1 y (i) 1, ˆΣ = 1 M M i=1 y (i) 2 y (i) 1 and tensor Ĉ = 1 M M i=1 y (i) 3 y (i) 1 y (i) 2. 6: Set ˆb 1 = ˆµ and b = b1 ˆΣ 1 7: Right multiply each slice of the tensor in the y 2 direction (so y 2 is being sliced up, leaving the y 3 y1 matrices intact) by ˆΣ 1 to form ˆB(γ) = Ĉ(γ)ˆΣ 1 João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 40 / 42

Similarity Transform from A(x) to B(x) Unfortunately, A(x) isn t directly learnable. However an appropriate similarity transformation of A(x) (of which there are more than one) is learnable by the method of moments, bypassing the need to recover the HMM parameters, and still gets us what we want. Note that P(x 1,..., x t ) = 1 A(x t ) A(x 1 ) π = 1 S 1 }{{} b SA(x t )S 1 }{{} B(x t) b B(x t ) B(x 1 ) b 1 S S 1 SA(x 1 )S 1 }{{} Sπ b 1 João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 41 / 42

Markowitz Optimization Given a vector of current prices p t and unknown future prices P tτ the market value is Ψ α = α T pos (P t+τ p t ) (1) Assuming that the market is Gaussian, the price distribution is P t+τ N (µ, Σ) (2) Therefore the distribution of the portfolio is ) Ψ α N (α T pos (µ p t ), α T posσα pos The allocation is optimized under exponential utility, having risk-aversion parameter ζ, and the certainty equivalent by the quadratic program QP argmax αpos CE(α pos ) = P T α pos 1 2ζ αt posσα pos (4) where P is the expected profit, roughly defined as P t = E[P t+τ p t ]. Numeric optimizers seek to minimize, define the objective function as f (α) CE(α). (3) João Sedoc Estimating Covariance Using Factorial Hidden Markov Models 42 / 42