Bayesian Modeling of Conditional Distributions

Similar documents
Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Smoothly Mixing Regressions

Outline. Introduction to Bayesian nonparametrics. Notation and discrete example. Books on Bayesian nonparametrics

Session 5B: A worked example EGARCH model

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Gaussian Mixture Model

Density Estimation. Seungjin Choi

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

Journal of Econometrics

Session 2B: Some basic simulation methods

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Default Priors and Effcient Posterior Computation in Bayesian

STA 4273H: Statistical Machine Learning

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Unsupervised Learning

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

13: Variational inference II

Nonparametric Bayesian Methods (Gaussian Processes)

Stat 5101 Lecture Notes

Introduction to Probabilistic Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Foundations of Nonparametric Bayesian Methods

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

E cient Importance Sampling

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

On the Power of Tests for Regime Switching

CPSC 540: Machine Learning

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Gaussian kernel GARCH models

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Lecture Notes based on Koop (2003) Bayesian Econometrics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Variational inference

Bayesian Regression Linear and Logistic Regression

Non-Parametric Bayes

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

On Bayesian Computation

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Bayesian spatial quantile regression

Introduction to Machine Learning

Markov Chain Monte Carlo Methods

Compound Random Measures

Gibbs Sampling in Linear Models #2

2014 Preliminary Examination

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Variational Principal Components

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Spatially Adaptive Smoothing Splines

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Patterns of Scalable Bayesian Inference Background (Session 1)

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Aggregation for Extraordinarily Large Dataset

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Integrated Non-Factorized Variational Inference

Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function

Bayesian inference for multivariate skew-normal and skew-t distributions

Contents. Part I: Fundamentals of Bayesian Inference 1

Multivariate Asset Return Prediction with Mixture Models

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Markov Chain Monte Carlo (MCMC)

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Asymptotics for posterior hazards

Exercises Chapter 4 Statistical Hypothesis Testing

Can we do statistical inference in a non-asymptotic way? 1

Pattern Recognition and Machine Learning

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Density Estimation: ML, MAP, Bayesian estimation

GAUSSIAN PROCESS REGRESSION

Combining Macroeconomic Models for Prediction

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

STA 4273H: Statistical Machine Learning

Econ Some Bayesian Econometrics

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

VCMC: Variational Consensus Monte Carlo

Minimum Message Length Analysis of the Behrens Fisher Problem

Bayesian spatial hierarchical modeling for temperature extremes

Introduction: structural econometrics. Jean-Marc Robin

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

F denotes cumulative density. denotes probability density function; (.)

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

A test of the conditional independence assumption in sample selection models

Bayesian linear regression

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Nonparametric Bayes tensor factorizations for big data

Simple Estimators for Monotone Index Models

STAT 518 Intro Student Presentation

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayes methods for categorical data. April 25, 2017

Large-scale Ordinal Collaborative Filtering

Transcription:

Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007

Outline Motivation Model description Methods of inference Earnings example Asset returns example Conclusion

Motivation The general case and two examples The general case and an example x t, y t n1 i.i.d. p (y t j x t ) =? Example: Earnings y t x 1t x 2t Earnings or wage of individual t Experience or age of individual t Education of individual t

Motivation The general case and two examples A second example: Asset returns y t = Return on asset in period t y t j (y 1,..., y t 1 ) s? The covariates are x 1t Return on asset in period t 1, x 1t = y t 1 x 2t Recent volatility, g x 2,t 1 + (1 g) ja t 1 j κ = s=0 g s jy t 2 s j κ (g = 0.95, κ = 1)

Model description Mixture modeling The common structure of normal mixture models y t x t n1 Variable of interest Vector of covariates es t Latent state, es t 2 f1,..., mg y t j (x t,es t = j) s N β 0 j x t, σ 2 j 0 y t j (x t,es t = j) s N β + α j xt, (h h j ) 1

Model description Mixture modeling A simple special case: The normal mixture model with no covariates x t = 1 11 es t i.i.d., P (es t = j) = p j y t j (x t = 1,es t = j) s N β j, σ 2 j y t j (x t = 1,es t = j) s N hβ + α j, (h h j ) 1i

Model description Mixture modeling

Model description Mixture modeling What is new in this work Begin with the same normal mixture model y t j (x t,es t = j) s N β 0 j x t, σ 2 j Determination of latent states es t : iid ew t = Γ x t + ζ t ; ζ t s N (0, Im ) m1 n1 es t = j i ew tj ew ti 8 i = 1,..., m () es t = arg max ( ew ti ) i =1,...,m

Model description Literature context Literature context Bayesian nonparametric regression (e.g. Wahba, Shiller, Smith & Kohn) Non-Bayesian nonparametric regression (e.g. Härdle & Tsybakov) Quantile regression (e.g. Koenker & Basset, Yu & Jones) Dirichlet process priors (e.g. Gri n & Steel, Dunson & Pilai) Mixture of Experts Models (e.g. Jacobs & Jordan, Jian and Tanner)

Model description Identi cation issues Identi cation issues: Multinomial probit 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m iid Because ζ t s N (0, Im ) translation but not scaling issues arise in ew t = Γ x t + ζ t m1 n1 Impose ι 0 mγ= 0 through Γ = P mq " 0 0 q Γ (m 1)q # ; P = mm ι m m 1/2 P 2 m(m 1), P 0 P = I m

Model description Identi cation issues Identi cation issues: Labeling 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m States have no substantive interpretation and are exchangeable implications for prior distribution m! state permutations =) m! re ections of the posterior distribution Not to worry: p (y t j x t ) invariant to permutations of the states Geweke (2006) CSDA forthcoming

Model description Prior distributions Conditionally conjugate prior distributions 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m Distribution type Parameters Hyperparameters Gaussian: β, Γ µ, τ 2 β, τ2 γ Gaussian conditional on h: α j (j = 1,..., m) τ 2 α Inverse gamma: h; h j (j = 1,..., m) s 2, ν, ν

Model description Some theory Some theory: Conditions 1 Distribution of x 1 x 2 Ω R n, Ω compact 2 p (x) > 0 is continuous w.r.t. Lebesgue measure. 2 Conditional distribution of y 2A 1 For some function h (x), p (yj x) is an exponential distribution, p (yj x) = exp fa [h (x)] y + b [h (x)] + c(yg 8x 2 Ω, y 2 A; a () and b () are analytic, a 0 () 6 0, b 0 () 6 0. 2 h is in a ball of nite values in a Sobolev space with sup-norm and second-order di erentiability: 1 sup x2ω jh (x)j = c 0 < 2 sup i=1,...,n (sup x2ω j h (x) / x i j) = c 1 < 3 sup i=1,...,n supj=1,...,n sup x2ω 2 h (x) / x i x j = c2 <

Model description Some theory Some theory: A de nition De nition. For any two pdf s f () and g () w.r.t. Lebesgue measure de ned on Z, the Kullback-Leibler directed distance from f to g is Z f (z) KL (f, g) = f (z) log dz. g (z) Remark. In a model Z p (y t j θ A, A) (t = 1,...., T ) let T!. The posterior density of θ A will collapse about θ A = θ A, where θa = arg min KL [p (y), p (y j θ A, A)] θ A (Geweke, 2005, Theorem 3.4.2).

Model description Some theory Some theory: A result Theorem. For all x 2Ω, sup p(y jx) inf KL [p (y j x), p (y j x,a m)] < p(y jx,a m ) c m 4/n where c depends on p (x), c 0, c 1 and c 2, but does not depend on m or n. Remarks The result is a consequence of Jiang and Tanner (1999) Theorem 2 for the class of gating functions de ned by the multinomial probit model with scalar variance matrix. Condition 1 of Jiang and Tanner (1999) for gating functions is important but not given here. They conjecture that it holds for multinomial logit gating functions. The proof for multinomial probit gating functions is not di cult.

Methods of inference Blocking for Gibbs sampling Blocking for Gibbs sampling 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m h, h 1,..., h m Separately conditionally independent gamma β, α 1,..., α m Jointly conditionally Gaussian vec (Γ ) ew t Conditionally Gaussian Gaussian times orthant-speci c likelihood factors Conditional posteriors and code have been tested (Geweke 2004).

Methods of inference Quantile functions of interest Quantile functions of interest Conditional CDFs: P (y t c j x t ) Quantiles: c (q) = fc : P (y t c j x t ) = qg Note P (es t = j j Γ, x t ) = P ew tj ew ti (i = 1,..., m) j Γ, x t = R p ew tj = w j Γ, x t P [ ew ti w (i = 1,..., m) j Γ, x t ] dw = R φ w γ 0 j z t Φ y γi 0z t dw. i 6=j Given M Markov chain Monte Carlo replications of a mixture model with m components, the posterior distribution is a mixture of normals with M m components.

Earnings example Data Earnings example Data 2698 men from PSID, interviewed in 1994, data pertain to 1993 25 Age 65 Not black Earnings exceed $1,000

Earnings example Prior distribution Prior distribution hyperparameters Gaussian priors: Inverse gamma priors: β: µ = 10, τ 2 β = 1 α: µ = 0, τ 2 α = 4 Γ µ = 0, τ 2 γ = 16 2h s χ 2 (2) 2h j s χ 2 (2) Information contribution from prior relative to data is minute.

Earnings example Model speci cation comparison Comparison of alternative model speci cations: Modi ed cross-validated log scores Randomly permute all 2698 observations (just to be safe; works for time series too) Select rst T 1 = 2153 for inference (full MCMC, gives θ (m) m = 1,..., M = 10 4 Modi ed cross-validated log scoring rule (Draper and Krnjajic (2005)): T log [p (y t j x t, Y T1, SMR)], t=t 1 +1 p (y t j x t, Y T1, SMR) u M 1 M p y t j θ (m), x t, SMR m=1

Earnings example Model speci cation comparison

Earnings example Earnings quantiles: Model with m = 8 mixture components

Earnings example Median of conditional posterior earnings distribution

Earnings example Median of conditional posterior earnings distribution

Earnings example Median of conditional posterior earnings distribution

Earnings example Earnings quantiles: Model with m = 8 mixture components

Earnings example Earnings quantiles: Model with m = 8 mixture components

Earnings example Earnings quantiles: Model with m = 8 mixture components

Asset Returns Example The data Asset Returns Example The data Standard and Poors 500 daily open (O t ) and close (C t ), 1990-1999 (C t 1 = O t ) Dependent variable: Percent log return, y t = 100 log (C t /O t ) Covariates: x 1t = y t 1 x 2t = g x 2,t 1 + (1 g) ja t 1 j κ = s=0 g s jy t 2 s j κ The evidence favors g = 0.95 and κ = 1.

Asset Returns Example The model The model: Same as in Geweke and Keane (2006) 0 ew t = Γx t + ζ t, j = arg max ( ew ti ), y t s N β + α j xt, (h h j ) 1 i=1,...,m In Geweke and Keane (2006) linear functions of x t are replaced by polynomial functions of x t. Based on the evidence (Modi ed cross-validated log scores) Γx t become second order polynomials in x t β + α j 0 xt become zero-order polynomials in x t. Mixture of m = 3 components Work with linear functions of x t is currently proceeding.

Asset Returns Example Comparison with other models Model comparison: Predictive likelihoods (Bayesian) and Recursive maximum likelihood (non-bayesian) Sample: 1990-1994 Prediction: 1995-1999 Model Log Predictive likelihood Log Recursive ML Normal iid -1848.5 GARCH(1,1) -1660.5 EGARCH(1,1) -1637.5 t-garch(1,1) -1625.5-1624.7 Stochastic volatility -1625.4 SMR -1622.0 Dynamic predictive properties of t-garch and stochastic volatility are similar. Dynamic predictive properties of SMR are quite di erent from t-garch and stochastic volatility.

Asset Returns Example Posterior means of population moments

Asset Returns Example Posterior means of population moments

Asset Returns Example Posterior means of population moments

Asset Returns Example Posterior means of population moments

Asset Returns Example Posterior quantiles

Asset Returns Example Posterior quantiles

Asset Returns Example Posterior quantiles

Asset Returns Example Posterior quantiles

Asset Returns Example Posterior quantiles

Conclusions Summary Summary The smoothly mixing regressions model is easy to apply is competitive with other models produces interesting results provides fully Bayesian answer to questions of interest

Conclusions Current research Current research Convergence of SMR Calibration properties of SMR Contrast with parsimoniously parameterized models in time series applications Extensions into di cult territory More covariates: x t (n 1) Multivariate models p y t `1 j x t n1