Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

Motivation 1: Conditional distributions ( ) w t,y t i.i.d. r 1 p (y t w t )=? Example: w t1 w t2 y t Experience or age of individual t Education of individual t Earnings or wage of individual t

Quantiles of posterior distribution of log earnings conditional on age and education

Motivation 2: Financial forecasting and decision making y t Returnonassetinperiodt y t (y 1,...,y t 1 )?

Maximized log-likelihood values, S&P 500 daily returns, 1990-1999 Model Maximized log-likelihood iid N ³ µ, σ 2-3285.7 GARCH(1,1) -3028.0 EGARCH(1,1) -2998.3 t GARCH(1,1) -2959.0 CMNMM >-2967.5 SMR >-2793.9

Posterior quantiles of S&P 500 predictive distribution, 1990-1999

y t x t, v t k 1 p 1 Common structure of the models Variable of interest (Possibly overlapping) sets of covariates s t Latent state, s t {1,...,m} y t (x t, v t,s t = j) N ³ β 0 x t + α 0 j v t,σ 2 j

Simple normal mixture model x t ; z t =1 p 1 s t independent of x t s t i.i.d., P (s t = j) =p j y t (x t,,s t = j) N ³ β 0 x t + α j v t,σ 2 j Equivalently y t = β 0 x t + ε t, ε t N ³ α j,σ 2 j

x t ; v t =1 k 1 Markov normal mixture model s t independent of x t y t (x t,s t = j) N ³ β 0 x t + α j,σ 2 j P (s t = j s t 1 = i, s t 2,s t 3,...)=p ij Parameters β, α = α 1. α m, P = p 11. p 1m. p m1 p mm, σ = σ 2 1. σ 2 m ; θ = {β, P, α, σ}

Inference and forecasting in the Markov normal mixture model 1. A Markov chain Monte Carlo algorithm provides draws n θ (m)o from p (θ y 1,...,y T ). 2. Filtered probabilities P (s T = i θ,y 1,...,y T ) and draws from (s 1,...s T ) (θ,y 1,...,y T ) are given by the algorithm of Chib (1995). 3. Then p y T +1 θ,y 1,...,y T,s 1,...,s T = p yt +1 θ,s T = i is a simple mixture of normals disitribution with state probabilities p i1,...,p im.

Compound Markov normal mixture model Objectives: 1. Construct a Markov mixture model with flexible components 2. Permit skewed predictive distributions while enforcing absence of serial correlation 3. Parsimonious modeling of Markov mixtures of many components

Serial correlation and skewed distributions y t (x t,,s t = j) N ³ β 0 x t + α j,σ 2 j P = α 1 α 2 α 3 = 0 1 0.5 p 11 p 12 2p 12 p 21 p 22 2p 22 p 31 1 2 p 33 p 33

Parameterization of the compound Markov normal mixture model s t = Ã st1 s t2! ; s t1 {1,...,m 1 }, s t2 {1,...,m 2 } P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij

P = β, φ = φ 1. φ m1, Ψ = p 11. p 1m1. p m1 1 p m1 m 1 σ 2, σ = σ 2 1. σ 2 m 1, Σ =, R = ψ 11. ψ 1m2. ψ m1 1 ψ m1 m 2, r 11. r 1m2. r m2 1 r m2 m 2 σ 2 11 σ 2 1m 2.. σ 2 m 1 1 σ2 m 1 m 2 ;, θ = n β, φ, Ψ, P, R, σ 2, σ, Σ o

Example for the case m 1 =3, m 2 =2 P = p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 p 33, R = r 11 r 12 r 21 r 22 r 31 r 32 is equivalent to a 6-state Markov normal mixture model with the transition matrix p 11 r 11 p 11 r 12 p 12 r 21 p 12 r 22 p 13 r 31 p 13 r 32 p 11 r 11 p 11 r 12 p 12 r 21 p 12 r 22 p 13 r 31 p 13 r 32 p 21 r 11 p 21 r 12 p 22 r 21 p 22 r 22 p 23 r 31 p 23 r 32 p 21 r 11 p 21 r 12 p 22 r 21 p 22 r 22 p 23 r 31 p 23 r 32 p 31 r 11 p 31 r 12 p 32 r 21 p 32 r 22 p 33 r 31 p 33 r 32 p 31 r 11 p 31 r 12 p 32 r 21 p 32 r 22 p 33 r 31 p 33 r 32

Restrictions on parameters Let π : π 0 P = π 0 E (y t x t,s t1 = i) =β 0 x t + φ i + m 2 X j=1 r ij ψ ij = β 0 x t + φ i E (y t )=β 0 x t + m 1 X i=1 π i φ i + m 1 X i=1 X m 2 π i j=1 r ij ψ ij = β 0 x t Number of identified parameters in the model is m 1 (m 1 +3m 2 3)+k 1.

Absence of serial correlation y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij Conditional on x t (t =1,...,T), the observables y t (t =1,...,T) are serially uncorrelated if φ = 0. Suppose further that P is irreducible and aperiodic and its eigenvalues are distinct. Then the observables y t are serially uncorrelated if and only if φ = 0.

Conditionally conjugate prior distributions y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij Gaussian: Gaussian conditional on σ 2 : Inverse Gamma: Gaussian conditional on σ 2, σ: Dirichlet: β, φ, Ψ Φ σ 2, σ, Σ Φ Each row of P, eachrowofr

Smoothly mixing regression models (SMRs) Begin with the same normal mixture model y t (x t, v t,s t = j) N β 0 x t + α 0 j v t,σ 2 j k 1 p 1 Determination of latent states s t : ew t m 1 = Γ z t q 1 + ζ t ; ζ t iid N (0, I m ) es t = j iff ew tj ew ti i =1,...,m

y t N β 0 x t + α 0 j v t,σ 2 j k 1 p 1, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 When q =1,thenz t =1and probabilities of mixture components are fixed. (A) If k>1 and p =1: Simple normal mixture model of disturbances in linear regression (B) If k =1and p>1: Mixture of linear regressions with fixed component probabilities (k >1, p>1: Facilitates hierarchical prior)

When q>1, then probabilities of mixture components depend on z t (C) k =1and p =1: Mixture of fixed normals, with w t -dependent probabilities (D) k>1 and p =1: Normal mixture model of disturbances in linear regression, but with covariate-dependent state probabilities (E) k =1and p>1: Mixture of linear regressions with w t -dependent probabilities

y t N β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1 Parameterization issues, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 Because ζ t iid N (0, I m ) translation but not scaling isues arise in ew t = Γz t + ζ t. Impose ι 0 mγ = 0 through Γ = P " 0 0 p Γ #,wherep = P m m := h ι m m 1/2 P 2 i, P 0 P = I m. α 0 = ³ α 0 1,...,α0 m, σ 0 = ³ σ 2 1,...,σ2 m

y t N Conditionally conjugate prior distributions β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1, ew t m 1 = Γ z t +ζ t, es t = iff ew tj ew ti i q 1 Gaussian: Gaussian conditional on σ 2 : Inverse gamma: β, Γ α σ 2, σ

y t N β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1 Blocking for Gibbs sampling, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 σ 2,σ 2 1,...,σ2 m β and α vec (Γ ) ew t Separately conditionally independent inverse gamma Jointly conditionally Gaussian Conditionally Gaussian Orthant-truncated Gaussian times orthant-specific likelihood factors

Covariates in applications to date Substantively distinct covariates: a t, b t First example: Second example: a t = Age of individual t, b t = Education of individual t Covariates of the form: a t = Return on asset in period t 1, a t = y t 1 b t = g b t 1 +(1 g) a t 1 κ = X s=0 g s y t 2 s κ x tj = a 1 t b 2 t, 1 {0,...,L 1 }, 2 {0,...,L 2 }

Functions of interest CDFs: P (y t c a t,b t )=P (y t c x t, v t, z t ) Quantiles: c (q) ={c : P (y t c a t,b t )=P (y t c x t, v t, z t )=q} Note P (es t = j Γ, z t ) = P h ew tj ew ti (i =1,...,m) Γ, z t i = = Z p ³ ew tj = y Γ, z t Z φ (y Γz t) Y Φ ³ y γ 0 i z t dy. i6=j P [ ew ti y (i =1,...,m) Γ, z t ] dy Given M Markov chain Monte Carlo replications of a mixture model with m components, the posterior distribution is a mixture of normals with M m components.

Detail of prior distributons for β, α, Γ a t [a 1,a 2 ]=A, b t [b 1,b 2 ]=B Grid of points G = n³ a i,b j : ai = a 1 + i a,b j = b 1 + j bo, (i =0,...,N a ; j =0,...,N b ; a =(a 2 a 1 ) / (N a +1), b =(b 2 b 1 ) / (N b +1)) Let x ij bethevectorcorrespondingto³ a i,b j. Then the prior has (Na +1)(N b +1) independent components, β 0 x ij N h µ, τ 2 (N a +1)(N b +1) i.