July 31, 2009 / Ben Kedem Symposium

ing The s ing The Department of Statistics North Carolina State University July 31, 2009 / Ben Kedem Symposium

Outline ing The s 1 2 s 3 4 5

Ben Kedem ing The s Ben has made many contributions to time series methodology. A common theme is that some unobserved (latent) series controls either: the values of the observed data, or the distribution the observed data. In a stochastic volatility model, a latent series controls specifically the variance of the observed data. We relate stochastic volatility models to other time series models.

Outline ing The s 1 2 s 3 4 5

Correlation ing The Time series modeling is not just about correlation... s http://xkcd.com/552/

Time Domain Approach ing The s The time domain approach to modeling a time series {Y t } focuses on the conditional distribution of Y t Y t 1, Y t 2,.... One reason for this focus is that the joint distribution of Y 1, Y 2,..., Y n can be factorized as f 1:n (y 1, y 2,..., y n ) = f 1 (y 1 ) f 2 1 (y 2 y 1 )... f n n 1:1 (y n y n 1, y n 2,..., y 1 ). So the likelihood function is determined by these conditional distributions.

ing The s The conditional distribution may be defined by: the conditional mean, µ t = E(Y t Y t 1 = y t 1, Y t 2 = y t 2,... ) ; the conditional variance, h t = Var(Y t Y t 1 = y t 1, Y t 2 = y t 2,... ) ; the shape of the conditional distribution.

Forecasting ing The s The conditional distribution of Y t Y t 1, Y t 2,... also gives the most complete solution to the forecasting problem: We observe Y t 1, Y t 2,... ; what statements can we make about Y t? The conditional mean is our best forecast, and the conditional standard deviation measures how far we believe the actual value might differ from the forecast. The conditional shape, usually a fixed distribution such as the normal, allows us to make probability statements about the actual value.

Outline ing The s 1 2 s 3 4 5

ing The s The first wave of time series methods focused on the conditional mean, µ t. The conditional variance was assumed to be constant. The conditional shape was either normal or unspecified. Need only to specify the form of Time-homogeneous: µ t = µ t (y t 1, y t 2,... ). µ t = µ(y t 1, y t 2,... ), depends on t only through y t 1, y t 2,....

Autoregression ing The s Simplest form: µ t = a linear function of a small number of values: µ t = φ 1 y t 1 + + φ p y t p. Equivalently, and more familiarly, Y t = φ 1 Y t 1 + + φ p Y t p + ɛ t, where ɛ t = Y t µ t satisfies E(ɛ t Y t 1, Y t 2,... ) = 0, Var(ɛ t Y t 1, Y t 2,... ) = h.

Recursion ing The s Problem: some time series need large p. Solution: recursion; include also some past values of µ t : µ t = φ 1 y t 1 + + φ p y t p + ψ 1 µ t 1 + + ψ q µ t q. Equivalently, and more familiarly, Y t = φ 1 Y t 1 + + φ p Y t p + ɛ t + θ 1 ɛ t 1 + + θ q ɛ t q. This is the ARMA (AutoRegressive Moving Average) model of order (p, q).

Outline ing The s 1 2 s 3 4 5

Two Years of S&P 500: Changing Variance ing The s 10 5 0 5 10 2008 2009

ing The s The second wave of time series methods added a focus on the conditional variance, h t. Now need to specify the form of Time-homogeneous: h t = h t (y t 1, y t 2,... ). h t = h(y t 1, y t 2,... ), depends on t only through y t 1, y t 2,....

ARCH ing The s Simplest form: h t a linear function of a small number of squared ɛs: h t = ω + α 1 ɛ 2 t 1 + + α qɛ 2 t q. Engle, ARCH (AutoRegressive Conditional Heteroscedasticity): proposed in 1982; Nobel Prize in Economics, 2003 (shared with the late Sir Clive Granger).

Recursion ing The s Problem: some time series need large q. Solution: recursion; include also some past values of h t : h t = ω + α 1 ɛ 2 t 1 + + α qɛ 2 t q + β 1 h t 1 + + β p h t p. Bollerslev, 1987; (Generalized ARCH; no Nobel yet, nor yet a Knighthood). Warning! note the reversal of the roles of p and q from the convention of ARMA(p, q).

(1,1) ing The s The simplest model has p = 1, q = 1: h t = ω + αɛ 2 t 1 + βh t 1 If α + β < 1, there exists a stationary process with this structure. If α + β = 1, the model is called integrated: I(1, 1).

Outline ing The s 1 2 s 3 4 5

ing The s In a stochastic volatility model, an unobserved (latent) process {X t } affects the distribution of the observed process {Y t }, specifically the variance of Y t. Introducing a second source of variability is appealing from a modeling perspective.

Simple Example ing The s For instance: {X t } satisfies X t µ = φ (X t 1 µ) + ξ t, ( ) where {ξ t } are i.i.d. N 0, σξ 2. If φ < 1, this is a (stationary) autoregression, but if φ = 1 it is a (non-stationary) random walk. Y t = σ t η t, where σt 2 = σ 2 (X t ) is a non-negative function such as σ 2 (X t ) = exp(x t ) and {η t } are i.i.d.(0, 1) typically Gaussian, but also t.

Conditional Distributions ing The s So the conditional distribution of Y t given Y t 1, Y t 2,... and X t, X t 1,... is simple: ( ) Y t Y t 1, Y t 2,..., X t, X t 1, N 0, σ 2 (X t ). But the conditional distribution of Y t given only Y t 1, Y t 2,... is not analytically tractable. In particular, h t (y t 1, y t 2,... ) = Var(Y t Y t 1 = y t 1, Y t 2 = y t 2,... ) is not a simple function.

Difficulties ing The s Analytic difficulties cause problems in: estimation; forecasting. Computationally intensive methods, e.g.: Particle filtering; Numerical quadrature.

Outline ing The s 1 2 s 3 4 5

ing The s volatility models have the attraction of an explicit model for the volatility, or variance. Is analytic difficulty the unavoidable cost of that advantage?

Outline ing The s 1 2 s 3 4 5

The Latent Process ing The s We construct a latent process by: and for t > 0 ( ν X 0 Γ 2, τ 2 ), 2 where X t = B t X t 1, ( ν θb t β 2, 1 ) 2 and {B t } are i.i.d. and independent of X 0.

The Observed Process ing The The observed process is defined for t 0 by Y t = σ t η t s where σ t = 1 Xt, and {η t } are i.i.d. N(0, 1) and independent of {X t }. Equivalently: given X u = x u, 0 u, and Y u = y u, 0 u < t, Y t N(0, σ 2 t ) with the same definition of σ t.

Constraints ing The s Since ( Var(Y 0 ) = E X 1 0 we constrain ν > 2 to ensure that ( ) E <. Requiring ( E X 1 t X 1 0 ) ( = E X 1 0 ), for all t > 0 is also convenient, and is met if θ = ν 2 ν 1. )

Comparison with Earlier Example ing The s This is quite similar to the earlier example, with φ = 1: Write Xt = log (X t ). Then Xt = Xt 1 + ξ t, where ξ t = log (B t ). In terms of X t, σ 2 t = exp(x t ).

Differences ing The s A key constraint is that now φ = 1, so {X t } is a (non-stationary) random walk, instead of a (stationary) auto-regression. Also {Xt } is non-gaussian, where in the earlier example, the latent process was Gaussian. Also {Xt } has a drift, because E(ξ t ) 0. Of course, we could include a drift in the earlier example.

Matched Simulated Random Walks ing The s 2.0 1.0 0.0 nu = 10 Gaussian 0 20 40 60 80 100

Matched Simulated Random Walks ing The s 8 6 4 2 0 nu = 5 Gaussian 0 20 40 60 80 100

So What? ing The s So our model is not very different from (a carefully chosen instance of) the earlier example. So does it have any advantage? Note: the inverse Gamma distribution is the conjugate prior for the variance of the Gaussian distribution.

Marginal distribution of Y 0 ing The s Marginal distribution of Y 0 : where Y 0 h 0 t (ν) h 0 = τ 2 ν 2 and t (ν) is the standardized t-distribution (i.e., scaled to have variance 1).

Conditional distributions of X 0 and X 1 Y 0 ing The s Conjugate prior/posterior property: conditionally on Y 0 = y 0, ( ν + 1 X 0 Γ 2, τ ) 2 + y0 2. 2 Beta multiplier property: conditionally on Y 0 = y 0, [ ( )] ν τ 2 X 1 = B 1 X 0 Γ 2, θ + y0 2. 2

Conditional distribution of Y 1 Y 0 ing The s The conditional distribution of X 1 Y 0 differs from the distribution of X 0 only in scale, so conditionally on Y 0 = y 0, Y 1 h 1 t (ν), where h 1 = θ ( ) τ 2 + y0 2 = θh 0 + (1 θ)y0 2 ν 2. Hmm...so the distribution of Y 1 Y 0 differs from the distribution of Y 0 only in scale...i smell a recursion!

The Recursion ing The s Write Y t 1 = (Y t 1, Y t 2,..., Y 0 ). For t > 0, conditionally on Y t 1 = y t 1, Y t h t t (ν), where h t = θh t 1 + (1 θ)yt 1 2.

The Structure ing The s That is, {Y t } is I(1, 1) with t(ν)-distributed innovations. Constraints: ω = 0; β = 1 α = ν 2 ν 1. So we can have a stochastic volatility structure, and still have (I) structure for the observed process {Y t }. Details and some multivariate generalizations sketched out at http://www4.stat.ncsu.edu/~bloomfld/talks/sv.pdf

Outline ing The s 1 2 s 3 4 5

Same Two Years of S&P 500 ing The s 10 5 0 5 10 2008 2009

Two Years of S&P 500 ing The s Data: 500 log-returns for the S&P 500 index, from 05/24/2007 to 05/19/2009. Maximum likelihood estimates: With ν unconstrained: ˆτ 2 = 4.37 ˆθ = 0.914 ˆν = 12.6. ˆτ 2 = 3.37 ˆθ = 0.918 ˆν = 9.93.

Comparison ing The s Constrained result has less heavy tails and less memory than unconstrained result. Likelihood ratio test: 2 log(likelihood ratio) = 0.412 assuming χ 2 (1), P = 0.521, so differences are not significant. With more data, difference becomes significant.

Outline ing The s 1 2 s 3 4 5

ing The s Latent processes are useful in time series modeling. and are both valuable tools for modeling time series with changing variance. fits naturally into the time domain approach. is appealing but typically intractable. Exploiting conjugate distributions may bridge the gap.