Modeling conditional distributions with mixture models: Theory and Inference

Similar documents
Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Bayesian Modeling of Conditional Distributions

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Smoothly Mixing Regressions

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian spatial hierarchical modeling for temperature extremes

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Default Priors and Effcient Posterior Computation in Bayesian

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian spatial quantile regression

Gibbs Sampling in Endogenous Variables Models

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

Session 5B: A worked example EGARCH model

Volatility. Gerald P. Dwyer. February Clemson University

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

The Metropolis-Hastings Algorithm. June 8, 2012

CPSC 540: Machine Learning

Likelihood-free MCMC

STA 4273H: Statistical Machine Learning

Gibbs Sampling in Linear Models #2

F denotes cumulative density. denotes probability density function; (.)

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Markov Chain Monte Carlo Methods

Bayesian Methods for Machine Learning

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Inference and Methods

STA 4273H: Statistical Machine Learning

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Bayesian semiparametric GARCH models

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Introduction to Machine Learning

Gaussian kernel GARCH models

ABC methods for phase-type distributions with applications in insurance risk problems

STA 294: Stochastic Processes & Bayesian Nonparametrics

Bayesian Regression Linear and Logistic Regression

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Computational statistics

STA 4273H: Sta-s-cal Machine Learning

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Lecture 16: Mixtures of Generalized Linear Models

A short introduction to INLA and R-INLA

CS281A/Stat241A Lecture 22

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Switching Regime Estimation

GARCH Models Estimation and Inference

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Hybrid Dirichlet processes for functional data

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Session 3A: Markov chain Monte Carlo (MCMC)

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Robust Backtesting Tests for Value-at-Risk Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Session 2B: Some basic simulation methods

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Markov Chain Monte Carlo (MCMC)

Bayesian Estimation of Input Output Tables for Russia

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Markov Chain Monte Carlo methods

Hidden Markov Models for precipitation

Lecture 2: Univariate Time Series

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Hierarchical Linear Models

Bayesian linear regression

STA 4273H: Statistical Machine Learning

CSC 2541: Bayesian Methods for Machine Learning

Recent Advances in Bayesian Inference Techniques

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Forecasting with ARMA

Introduction to Machine Learning CMU-10701

Nonparametric Bayesian Methods - Lecture I

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Bayesian data analysis in practice: Three simple examples

13: Variational inference II

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -33 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Marginal Specifications and a Gaussian Copula Estimation

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 7 and 8: Markov Chain Monte Carlo

The profit function system with output- and input- specific technical efficiency

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Stock index returns density prediction using GARCH models: Frequentist or Bayesian estimation?

Fitting Narrow Emission Lines in X-ray Spectra

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Transcription:

Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

Motivation 1: Conditional distributions ( ) w t,y t i.i.d. r 1 p (y t w t )=? Example: w t1 w t2 y t Experience or age of individual t Education of individual t Earnings or wage of individual t

Quantiles of posterior distribution of log earnings conditional on age and education

Motivation 2: Financial forecasting and decision making y t Returnonassetinperiodt y t (y 1,...,y t 1 )?

Maximized log-likelihood values, S&P 500 daily returns, 1990-1999 Model Maximized log-likelihood iid N ³ µ, σ 2-3285.7 GARCH(1,1) -3028.0 EGARCH(1,1) -2998.3 t GARCH(1,1) -2959.0 CMNMM >-2967.5 SMR >-2793.9

Posterior quantiles of S&P 500 predictive distribution, 1990-1999

y t x t, v t k 1 p 1 Common structure of the models Variable of interest (Possibly overlapping) sets of covariates s t Latent state, s t {1,...,m} y t (x t, v t,s t = j) N ³ β 0 x t + α 0 j v t,σ 2 j

Simple normal mixture model x t ; z t =1 p 1 s t independent of x t s t i.i.d., P (s t = j) =p j y t (x t,,s t = j) N ³ β 0 x t + α j v t,σ 2 j Equivalently y t = β 0 x t + ε t, ε t N ³ α j,σ 2 j

x t ; v t =1 k 1 Markov normal mixture model s t independent of x t y t (x t,s t = j) N ³ β 0 x t + α j,σ 2 j P (s t = j s t 1 = i, s t 2,s t 3,...)=p ij Parameters β, α = α 1. α m, P = p 11. p 1m. p m1 p mm, σ = σ 2 1. σ 2 m ; θ = {β, P, α, σ}

Inference and forecasting in the Markov normal mixture model 1. A Markov chain Monte Carlo algorithm provides draws n θ (m)o from p (θ y 1,...,y T ). 2. Filtered probabilities P (s T = i θ,y 1,...,y T ) and draws from (s 1,...s T ) (θ,y 1,...,y T ) are given by the algorithm of Chib (1995). 3. Then p y T +1 θ,y 1,...,y T,s 1,...,s T = p yt +1 θ,s T = i is a simple mixture of normals disitribution with state probabilities p i1,...,p im.

Compound Markov normal mixture model Objectives: 1. Construct a Markov mixture model with flexible components 2. Permit skewed predictive distributions while enforcing absence of serial correlation 3. Parsimonious modeling of Markov mixtures of many components

Serial correlation and skewed distributions y t (x t,,s t = j) N ³ β 0 x t + α j,σ 2 j P = α 1 α 2 α 3 = 0 1 0.5 p 11 p 12 2p 12 p 21 p 22 2p 22 p 31 1 2 p 33 p 33

Parameterization of the compound Markov normal mixture model s t = Ã st1 s t2! ; s t1 {1,...,m 1 }, s t2 {1,...,m 2 } P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij

P = β, φ = φ 1. φ m1, Ψ = p 11. p 1m1. p m1 1 p m1 m 1 σ 2, σ = σ 2 1. σ 2 m 1, Σ =, R = ψ 11. ψ 1m2. ψ m1 1 ψ m1 m 2, r 11. r 1m2. r m2 1 r m2 m 2 σ 2 11 σ 2 1m 2.. σ 2 m 1 1 σ2 m 1 m 2 ;, θ = n β, φ, Ψ, P, R, σ 2, σ, Σ o

Example for the case m 1 =3, m 2 =2 P = p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 p 33, R = r 11 r 12 r 21 r 22 r 31 r 32 is equivalent to a 6-state Markov normal mixture model with the transition matrix p 11 r 11 p 11 r 12 p 12 r 21 p 12 r 22 p 13 r 31 p 13 r 32 p 11 r 11 p 11 r 12 p 12 r 21 p 12 r 22 p 13 r 31 p 13 r 32 p 21 r 11 p 21 r 12 p 22 r 21 p 22 r 22 p 23 r 31 p 23 r 32 p 21 r 11 p 21 r 12 p 22 r 21 p 22 r 22 p 23 r 31 p 23 r 32 p 31 r 11 p 31 r 12 p 32 r 21 p 32 r 22 p 33 r 31 p 33 r 32 p 31 r 11 p 31 r 12 p 32 r 21 p 32 r 22 p 33 r 31 p 33 r 32

Restrictions on parameters Let π : π 0 P = π 0 E (y t x t,s t1 = i) =β 0 x t + φ i + m 2 X j=1 r ij ψ ij = β 0 x t + φ i E (y t )=β 0 x t + m 1 X i=1 π i φ i + m 1 X i=1 X m 2 π i j=1 r ij ψ ij = β 0 x t Number of identified parameters in the model is m 1 (m 1 +3m 2 3)+k 1.

Absence of serial correlation y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij Conditional on x t (t =1,...,T), the observables y t (t =1,...,T) are serially uncorrelated if φ = 0. Suppose further that P is irreducible and aperiodic and its eigenvalues are distinct. Then the observables y t are serially uncorrelated if and only if φ = 0.

Conditionally conjugate prior distributions y t (x t,s t1 = i, s t2 = j) N ³ β 0 x t + φ i + ψ ij,σ 2 σ 2 i σ2 ij P ³ s t1 = j s t 1,1 = i, s t 2, s t 3,... = p ij P (s t2 = j s t1 = i, s t 1, s t 2,...)=r ij Gaussian: Gaussian conditional on σ 2 : Inverse Gamma: Gaussian conditional on σ 2, σ: Dirichlet: β, φ, Ψ Φ σ 2, σ, Σ Φ Each row of P, eachrowofr

Smoothly mixing regression models (SMRs) Begin with the same normal mixture model y t (x t, v t,s t = j) N β 0 x t + α 0 j v t,σ 2 j k 1 p 1 Determination of latent states s t : ew t m 1 = Γ z t q 1 + ζ t ; ζ t iid N (0, I m ) es t = j iff ew tj ew ti i =1,...,m

y t N β 0 x t + α 0 j v t,σ 2 j k 1 p 1, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 When q =1,thenz t =1and probabilities of mixture components are fixed. (A) If k>1 and p =1: Simple normal mixture model of disturbances in linear regression (B) If k =1and p>1: Mixture of linear regressions with fixed component probabilities (k >1, p>1: Facilitates hierarchical prior)

When q>1, then probabilities of mixture components depend on z t (C) k =1and p =1: Mixture of fixed normals, with w t -dependent probabilities (D) k>1 and p =1: Normal mixture model of disturbances in linear regression, but with covariate-dependent state probabilities (E) k =1and p>1: Mixture of linear regressions with w t -dependent probabilities

y t N β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1 Parameterization issues, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 Because ζ t iid N (0, I m ) translation but not scaling isues arise in ew t = Γz t + ζ t. Impose ι 0 mγ = 0 through Γ = P " 0 0 p Γ #,wherep = P m m := h ι m m 1/2 P 2 i, P 0 P = I m. α 0 = ³ α 0 1,...,α0 m, σ 0 = ³ σ 2 1,...,σ2 m

y t N Conditionally conjugate prior distributions β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1, ew t m 1 = Γ z t +ζ t, es t = iff ew tj ew ti i q 1 Gaussian: Gaussian conditional on σ 2 : Inverse gamma: β, Γ α σ 2, σ

y t N β 0 x t + α 0 j v t,σ 2 σ 2 j k 1 p 1 Blocking for Gibbs sampling, ew t m 1 = Γ z t +ζ t, es t = j iff ew tj ew ti i q 1 σ 2,σ 2 1,...,σ2 m β and α vec (Γ ) ew t Separately conditionally independent inverse gamma Jointly conditionally Gaussian Conditionally Gaussian Orthant-truncated Gaussian times orthant-specific likelihood factors

Covariates in applications to date Substantively distinct covariates: a t, b t First example: Second example: a t = Age of individual t, b t = Education of individual t Covariates of the form: a t = Return on asset in period t 1, a t = y t 1 b t = g b t 1 +(1 g) a t 1 κ = X s=0 g s y t 2 s κ x tj = a 1 t b 2 t, 1 {0,...,L 1 }, 2 {0,...,L 2 }

Functions of interest CDFs: P (y t c a t,b t )=P (y t c x t, v t, z t ) Quantiles: c (q) ={c : P (y t c a t,b t )=P (y t c x t, v t, z t )=q} Note P (es t = j Γ, z t ) = P h ew tj ew ti (i =1,...,m) Γ, z t i = = Z p ³ ew tj = y Γ, z t Z φ (y Γz t) Y Φ ³ y γ 0 i z t dy. i6=j P [ ew ti y (i =1,...,m) Γ, z t ] dy Given M Markov chain Monte Carlo replications of a mixture model with m components, the posterior distribution is a mixture of normals with M m components.

Detail of prior distributons for β, α, Γ a t [a 1,a 2 ]=A, b t [b 1,b 2 ]=B Grid of points G = n³ a i,b j : ai = a 1 + i a,b j = b 1 + j bo, (i =0,...,N a ; j =0,...,N b ; a =(a 2 a 1 ) / (N a +1), b =(b 2 b 1 ) / (N b +1)) Let x ij bethevectorcorrespondingto³ a i,b j. Then the prior has (Na +1)(N b +1) independent components, β 0 x ij N h µ, τ 2 (N a +1)(N b +1) i.