Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007
Outline Motivation Model description Methods of inference Earnings example Asset returns example Conclusion
Motivation The general case and two examples The general case and an example x t, y t n1 i.i.d. p (y t j x t ) =? Example: Earnings y t x 1t x 2t Earnings or wage of individual t Experience or age of individual t Education of individual t
Motivation The general case and two examples A second example: Asset returns y t = Return on asset in period t y t j (y 1,..., y t 1 ) s? The covariates are x 1t Return on asset in period t 1, x 1t = y t 1 x 2t Recent volatility, g x 2,t 1 + (1 g) ja t 1 j κ = s=0 g s jy t 2 s j κ (g = 0.95, κ = 1)
Model description Mixture modeling The common structure of normal mixture models y t x t n1 Variable of interest Vector of covariates es t Latent state, es t 2 f1,..., mg y t j (x t,es t = j) s N β 0 j x t, σ 2 j 0 y t j (x t,es t = j) s N β + α j xt, (h h j ) 1
Model description Mixture modeling A simple special case: The normal mixture model with no covariates x t = 1 11 es t i.i.d., P (es t = j) = p j y t j (x t = 1,es t = j) s N β j, σ 2 j y t j (x t = 1,es t = j) s N hβ + α j, (h h j ) 1i
Model description Mixture modeling
Model description Mixture modeling What is new in this work Begin with the same normal mixture model y t j (x t,es t = j) s N β 0 j x t, σ 2 j Determination of latent states es t : iid ew t = Γ x t + ζ t ; ζ t s N (0, Im ) m1 n1 es t = j i ew tj ew ti 8 i = 1,..., m () es t = arg max ( ew ti ) i =1,...,m
Model description Literature context Literature context Bayesian nonparametric regression (e.g. Wahba, Shiller, Smith & Kohn) Non-Bayesian nonparametric regression (e.g. Härdle & Tsybakov) Quantile regression (e.g. Koenker & Basset, Yu & Jones) Dirichlet process priors (e.g. Gri n & Steel, Dunson & Pilai) Mixture of Experts Models (e.g. Jacobs & Jordan, Jian and Tanner)
Model description Identi cation issues Identi cation issues: Multinomial probit 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m iid Because ζ t s N (0, Im ) translation but not scaling issues arise in ew t = Γ x t + ζ t m1 n1 Impose ι 0 mγ= 0 through Γ = P mq " 0 0 q Γ (m 1)q # ; P = mm ι m m 1/2 P 2 m(m 1), P 0 P = I m
Model description Identi cation issues Identi cation issues: Labeling 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m States have no substantive interpretation and are exchangeable implications for prior distribution m! state permutations =) m! re ections of the posterior distribution Not to worry: p (y t j x t ) invariant to permutations of the states Geweke (2006) CSDA forthcoming
Model description Prior distributions Conditionally conjugate prior distributions 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m Distribution type Parameters Hyperparameters Gaussian: β, Γ µ, τ 2 β, τ2 γ Gaussian conditional on h: α j (j = 1,..., m) τ 2 α Inverse gamma: h; h j (j = 1,..., m) s 2, ν, ν
Model description Some theory Some theory: Conditions 1 Distribution of x 1 x 2 Ω R n, Ω compact 2 p (x) > 0 is continuous w.r.t. Lebesgue measure. 2 Conditional distribution of y 2A 1 For some function h (x), p (yj x) is an exponential distribution, p (yj x) = exp fa [h (x)] y + b [h (x)] + c(yg 8x 2 Ω, y 2 A; a () and b () are analytic, a 0 () 6 0, b 0 () 6 0. 2 h is in a ball of nite values in a Sobolev space with sup-norm and second-order di erentiability: 1 sup x2ω jh (x)j = c 0 < 2 sup i=1,...,n (sup x2ω j h (x) / x i j) = c 1 < 3 sup i=1,...,n supj=1,...,n sup x2ω 2 h (x) / x i x j = c2 <
Model description Some theory Some theory: A de nition De nition. For any two pdf s f () and g () w.r.t. Lebesgue measure de ned on Z, the Kullback-Leibler directed distance from f to g is Z f (z) KL (f, g) = f (z) log dz. g (z) Remark. In a model Z p (y t j θ A, A) (t = 1,...., T ) let T!. The posterior density of θ A will collapse about θ A = θ A, where θa = arg min KL [p (y), p (y j θ A, A)] θ A (Geweke, 2005, Theorem 3.4.2).
Model description Some theory Some theory: A result Theorem. For all x 2Ω, sup p(y jx) inf KL [p (y j x), p (y j x,a m)] < p(y jx,a m ) c m 4/n where c depends on p (x), c 0, c 1 and c 2, but does not depend on m or n. Remarks The result is a consequence of Jiang and Tanner (1999) Theorem 2 for the class of gating functions de ned by the multinomial probit model with scalar variance matrix. Condition 1 of Jiang and Tanner (1999) for gating functions is important but not given here. They conjecture that it holds for multinomial logit gating functions. The proof for multinomial probit gating functions is not di cult.
Methods of inference Blocking for Gibbs sampling Blocking for Gibbs sampling 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m h, h 1,..., h m Separately conditionally independent gamma β, α 1,..., α m Jointly conditionally Gaussian vec (Γ ) ew t Conditionally Gaussian Gaussian times orthant-speci c likelihood factors Conditional posteriors and code have been tested (Geweke 2004).
Methods of inference Quantile functions of interest Quantile functions of interest Conditional CDFs: P (y t c j x t ) Quantiles: c (q) = fc : P (y t c j x t ) = qg Note P (es t = j j Γ, x t ) = P ew tj ew ti (i = 1,..., m) j Γ, x t = R p ew tj = w j Γ, x t P [ ew ti w (i = 1,..., m) j Γ, x t ] dw = R φ w γ 0 j z t Φ y γi 0z t dw. i 6=j Given M Markov chain Monte Carlo replications of a mixture model with m components, the posterior distribution is a mixture of normals with M m components.
Earnings example Data Earnings example Data 2698 men from PSID, interviewed in 1994, data pertain to 1993 25 Age 65 Not black Earnings exceed $1,000
Earnings example Prior distribution Prior distribution hyperparameters Gaussian priors: Inverse gamma priors: β: µ = 10, τ 2 β = 1 α: µ = 0, τ 2 α = 4 Γ µ = 0, τ 2 γ = 16 2h s χ 2 (2) 2h j s χ 2 (2) Information contribution from prior relative to data is minute.
Earnings example Model speci cation comparison Comparison of alternative model speci cations: Modi ed cross-validated log scores Randomly permute all 2698 observations (just to be safe; works for time series too) Select rst T 1 = 2153 for inference (full MCMC, gives θ (m) m = 1,..., M = 10 4 Modi ed cross-validated log scoring rule (Draper and Krnjajic (2005)): T log [p (y t j x t, Y T1, SMR)], t=t 1 +1 p (y t j x t, Y T1, SMR) u M 1 M p y t j θ (m), x t, SMR m=1
Earnings example Model speci cation comparison
Earnings example Earnings quantiles: Model with m = 8 mixture components
Earnings example Median of conditional posterior earnings distribution
Earnings example Median of conditional posterior earnings distribution
Earnings example Median of conditional posterior earnings distribution
Earnings example Earnings quantiles: Model with m = 8 mixture components
Earnings example Earnings quantiles: Model with m = 8 mixture components
Earnings example Earnings quantiles: Model with m = 8 mixture components
Asset Returns Example The data Asset Returns Example The data Standard and Poors 500 daily open (O t ) and close (C t ), 1990-1999 (C t 1 = O t ) Dependent variable: Percent log return, y t = 100 log (C t /O t ) Covariates: x 1t = y t 1 x 2t = g x 2,t 1 + (1 g) ja t 1 j κ = s=0 g s jy t 2 s j κ The evidence favors g = 0.95 and κ = 1.
Asset Returns Example The model The model: Same as in Geweke and Keane (2006) 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m In Geweke and Keane (2006) linear functions of x t are replaced by polynomial functions of x t. Based on the evidence (Modi ed cross-validated log scores) Γx t become second order polynomials in x t β + α j 0 xt become zero-order polynomials in x t. Mixture of m = 3 components Work with linear functions of x t is currently proceeding.
Asset Returns Example Comparison with other models Model comparison: Predictive likelihoods (Bayesian) and Recursive maximum likelihood (non-bayesian) Sample: 1990-1994 Prediction: 1995-1999 Model Log Predictive likelihood Log Recursive ML Normal iid -1848.5 GARCH(1,1) -1660.5 EGARCH(1,1) -1637.5 t-garch(1,1) -1625.5-1624.7 Stochastic volatility -1625.4 SMR -1622.0 Dynamic predictive properties of t-garch and stochastic volatility are similar. Dynamic predictive properties of SMR are quite di erent from t-garch and stochastic volatility.
Asset Returns Example Posterior means of population moments
Asset Returns Example Posterior means of population moments
Asset Returns Example Posterior means of population moments
Asset Returns Example Posterior means of population moments
Asset Returns Example Posterior quantiles
Asset Returns Example Posterior quantiles
Asset Returns Example Posterior quantiles
Asset Returns Example Posterior quantiles
Asset Returns Example Posterior quantiles
Conclusions Summary Summary The smoothly mixing regressions model is easy to apply is competitive with other models produces interesting results provides fully Bayesian answer to questions of interest
Conclusions Current research Current research Convergence of SMR Calibration properties of SMR Contrast with parsimoniously parameterized models in time series applications Extensions into di cult territory More covariates: x t (n 1) Multivariate models p y t `1 j x t n1