Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham Young University November 6, 2008

Outline Outline 1. Introduction and motivation 2. Dirichlet process mixture models 3. Curve fitting using Dirichlet process mixtures 4. Bayesian nonparametric quantile regression 5. Modeling for stock-recruitment relationships 6. Current/future work 1/26

Introduction and motivation 1. Introduction and motivation Two dominant trends in the Bayesian regression literature: seek increasingly flexible regression function models, and accompany these models with more comprehensive uncertainty quantification Typically, Bayesian nonparametric modeling focuses on either the regression function or the error distribution Bayesian nonparametric extension of implied conditional regression: use flexible nonparametric mixture model for the joint distribution of response and covariates obtain full inference for the desired conditional distribution for response given covariates Both the response distribution and, implicitly, the regression relationship are modeled nonparametrically, thus providing a flexible framework for the general regression problem 2/26

Introduction and motivation The area of Bayesian nonparametrics provides the framework for such modeling instead of specifying unknown functions and distributions up to a (small) number of parameters, treat them as the random model parameters nonparametric priors support the underlying spaces of random functions/distributions resulting in flexible inferences and more reliable predictions Modeling utilizes Dirichlet process mixtures, a flexible class of nonparametric mixture models 3/26

Dirichlet process mixture models 2. Dirichlet process mixture models The Dirichlet process (DP) (Ferguson, 1973) is a random probability measure on distributions characterized by two parameters: a base distribution G 0 (the center of the process) and a (precision) parameter α > 0 DP constructive definition (Sethuraman, 1994) let {z s, s = 1, 2,...} and {φ j, j = 1, 2,...} be independent sequences of random variables, with z s i.i.d. Beta(1, α), and φ j i.i.d. G 0 define ω 1 = z 1, ω j = z j j 1 s=1 (1 z s), j 2 (stick-breaking construction) then, a realization G from DP(α, G 0 ) is (almost surely) of the form G( ) = ω j δ φj ( ) j=1 i.e., a discrete distribution that can be represented as a countable mixture of point masses 4/26

Dirichlet process mixture models 0.07 1 0.06 0.9 0.8 0.05 0.7 w 0.04 0.03 P(X<x) 0.6 0.5 0.4 0.02 0.3 0.01 0.2 0.1 0 3 2 1 0 1 2 3 x 0 3 2 1 0 1 2 3 x DP with G 0 = N(0, 1) and α = 20. In the left panel, the spiked lines are located at 1000 sampled values of x drawn from N(0, 1) with heights given by the weights, ω l, calculated using the stick-breaking algorithm (a truncated version so that the weights sum to 1). These spikes are then summed from left to right to generate one cdf sample path from the DP. The right panel shows 8 such sample paths indicated by the lighter jagged lines. The heavy smooth line indicates the N(0, 1) cdf. 5/26

Dirichlet process mixture models Dirichlet process mixture model: for a parametric family of distributions K( ; θ), θ Θ R q, define F ( ; G) = K( ; θ)dg(θ), G DP(α, G 0 ) DP mixture prior can model both discrete and continuous distributions Hierarchical model: for y 1,..., y n i.i.d., given G, from F ( ; G), y i θ i θ i G ind. i.i.d. K( ; θ i ), i = 1,..., n G, i = 1,..., n G DP(α, G 0 ) typically, hyperpriors on α and/or the parameters ψ of G 0 G 0 (ψ) are added 6/26

Curve fitting using Dirichlet process mixtures 3. Curve fitting using Dirichlet process mixtures Focus on univariate continuous response y (though extensions currently studied for categorical and/or multivariate responses) DP mixture model for the joint density f(y, x) of the response y and the vector of covariates x: f(y, x) f(y, x; G) = k(y, x; θ)dg(θ), G DP(α, G 0 (ψ)) For the mixture kernel k(y, x; θ) use: multivariate normal for (real-valued) continuous response and covariates mixed continuous/discrete distribution to incorporate both categorical and continuous covariates kernel component for y supported by R + for problems in survival/reliability analysis 7/26

Curve fitting using Dirichlet process mixtures Again, introduce latent mixing parameters θ = {θ i : i = 1,..., n} for each response/covariate observation (y i, x i ), i = 1,..., n full posterior: p(g, θ, α, ψ data) = p(g θ, α, ψ)p(θ, α, ψ data) p(θ, α, ψ data) is the posterior of the finite-dimensional parameter vector that results by marginalizing G over its DP prior MCMC posterior simulation to sample from this marginal posterior p(g θ, α, { ψ) is a DP with precision parameter α + n and mean (α + n) 1 αg 0 ( ; ψ) + } n j=1 n jδ θ j ( ), where n is the number of distinct θ i, and n j is the size of the j-th distinct component sample using the DP stick-breaking definition with a truncation approximation Alternatively, G can be truncated from the outset resulting in a finite mixture model that can be fitted with Gibbs sampling 8/26

Curve fitting using Dirichlet process mixtures For any grid of values (y 0, x 0 ), obtain posterior samples for: joint density f(y 0, x 0 ; G), marginal density f(x 0 ; G), and therefore, conditional density f(y 0 x 0 ; G) conditional expectation E(y x 0 ; G), which, estimated over grid in x, provides inference for the regression relationship conditioning in f(y 0 x 0 ; G) and/or E(y x 0 ; G) may involve only a portion of vector x Key features of the modeling approach: full and exact nonparametric inference (no need for asymptotics) model for both non-linear regression curves and non-standard shapes for the conditional response density model does not rely on additive regression formulations; it can uncover interactions between covariates that might influence the regression relationship 9/26

Curve fitting using Dirichlet process mixtures Data Example Simulated data set with a continuous response y, one continuous covariate x c, and one binary categorical covariate x d x ci ind. N(0, 1) x di x ci ind. Bernoulli(probit(x ci )) y i x ci, x di ind. N(h(x ci ), σ xdi ), with σ 0 = 0.25, σ 1 = 0.5, and h(x c ) = 0.4x c + 0.5 sin(2.7x c ) + 1.1(1 + x 2 c) 1 two sample sizes: n = 200 and n = 2000 DP mixture model with a mixed normal/bernoulli kernel: f(y, x c, x d ; G) = N 2 (y, x c ; µ, Σ)π x d (1 π) 1 x d dg(µ, Σ, π), with G DP(α, G 0 (µ, Σ, π) = N 2 (µ; m, V ) IWish(Σ; ν, S) Beta(π; a, b)) 10/26

Curve fitting using Dirichlet process mixtures 1 0 1 2 3 4 h(x) 1 0 1 2 3 4 2 1 0 1 2 2 1 0 1 2 2 1 0 1 2 Posterior point and 90% interval estimates (dashed and dotted lines) for conditional response expectation E(y xc, x d = 0; G) (left panels), E(y xc, x d = 1; G) (middle panels), and E(y xc; G) (right panels). The corresponding data is plotted in grey for the sample of size n = 200 (top panels) and n = 2000 (bottom panels). The solid line denotes the true curve. x 11/26

Bayesian nonparametric quantile regression 4. Bayesian nonparametric quantile regression In regression settings, the covariates may have effect not only on the center of the response distribution but also on its shape Quantile regression quantifies relationship between a set of quantiles of response distribution and covariates, and thus, provides a more complete explanation of the response distribution in terms of available covariates Semiparametric additive quantile regression framework: y i = h(x i ) + ε i. where the ε i are i.i.d. from a distribution with p-th quantile equal to 0 earlier work on Bayesian semiparametric modeling with parametric quantile regression functions and nonparametric priors for unimodal error densities (Kottas & Krnjajić, 2008) 12/26

Bayesian nonparametric quantile regression Alternative model-based nonparametric approach (Taddy & Kottas, 2007) model joint density f(y, x) of the response y and the M-variate vector of (continuous) covariates x with a DP mixture of normals: f(y, x; G) = N M+1 (y, x; µ, Σ)dG(µ, Σ), G DP(α, G 0 ) with G 0 (µ, Σ) = N M+1 (µ; m, V ) IWish(Σ; ν, S) For any grid of values (y 0, x 0 ), obtain posterior samples for: conditional density f(y 0 x 0 ; G) and conditional cdf F (y 0 x 0 ; G) conditional quantile regression q p (x 0 ; G), for any 0 < p < 1 13/26

Bayesian nonparametric quantile regression Key features: modeling framework enables simultaneous inference for more than one quantile regression model allows flexible response distributions and non-linear quantile regression relationships Extensions to modeling for partially observed responses (and/or covariates): fully nonparametric Tobit quantile regression for econometrics data 14/26

Bayesian nonparametric quantile regression Data Example Moral hazard data on the relationship between shareholder concentration and several indices for managerial moral hazard in the form of expenditure with scope for private benefit (Yafeh & Yoshua, 2003) data set includes a variety of variables describing 185 Japanese industrial chemical firms listed on the Tokyo stock exchange response y: index MH5, consisting of general sales and administrative expenses deflated by sales four-dimensional covariate vector x: Leverage (ratio of debt to total assets); log(assets); Age of the firm; and TOPTEN (the percent of ownership held by the ten largest shareholders) 15/26

Bayesian nonparametric quantile regression Marginal Average Medians with 90% CI Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 30 40 50 60 70 80 TOPTEN 0.2 0.4 0.6 0.8 Leverage Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 20 30 40 50 60 70 80 90 Age 9 10 11 12 13 14 Log(Assets) Posterior mean and 90% interval estimates for median regression for M H5 conditional on each individual covariate. Data scatterplots are shown in grey. 16/26

Bayesian nonparametric quantile regression Marginal Average 90th Percentiles with 90% CI Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 30 40 50 60 70 80 TOPTEN 0.2 0.4 0.6 0.8 Leverage Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 20 30 40 50 60 70 80 90 Age 9 10 11 12 13 14 Log(Assets) Posterior mean and 90% interval estimates for 90th percentile regression for M H5 conditional on each individual covariate. Data scatterplots are shown in grey. 17/26

Bayesian nonparametric quantile regression 0.2 0.4 0.6 0.8 Leverage 0.2 0.4 0.6 0.8 30 40 50 60 70 TOPTEN 30 40 50 60 70 Posterior estimates of median surfaces (left column) and 90th percentile surfaces (right column) for M H5 conditional on Leverage and TOPTEN. The posterior mean is shown on the top row and the posterior interquartile range on the bottom. 18/26

Bayesian nonparametric quantile regression Conditional density for MH5 0.01 0.05 0.01 0.05 10 20 30 40 50 60 70 10 20 30 40 50 60 70 MH5 Posterior mean and 90% interval estimates for response densities f(y x 0 ; G) conditional on four combinations of values x 0 for the covariate vector (TOPTEN, Leverage, Age, log(assets)) 19/26

Modeling for stock-recruitment relationships 5. Modeling for stock-recruitment relationships Relationship between the number of mature individuals of a species (stock biomass, S) and the production of offspring (recruitment, R) is fundamental to the behavior of any ecological system Special relevance in fisheries research, where the stock-recruitment (S-R) relationship applies directly to decision problems of fishery management A common way of writing this relationship is log(r/s) = g(s) + ɛ where g is the S-R function and ɛ are additive (typically, normal) errors work part of NSF project (joint with Steve Munch, Stony Brook University) 20/26

Modeling for stock-recruitment relationships Standard ecological assumption: as stock abundance increases, successful recruitment per individual (reproductive success) decreases a wide variety of factors affect the S-R relationship, and there are many competing models for the influence of biological and physical mechanisms small amounts of noisy data typically available to infer S-R relationships Traditional (parametric) models may be too limited to extract the relevant information from the data, and to provide reliable predictions and/or temporal forecasts DP mixture modeling approach to capture the nature of recruitment dependence upon stock without making parametric assumptions for either the S-R function or the errors around it (Fronczyk, Kottas & Munch, 2008) 21/26

Modeling for stock-recruitment relationships DP mixture of bivariate normals for joint distribution of log-reproductive success, y = log(r/s), and stock biomass, x = S, f(y, x; G) = N 2 (y, x; µ, Σ)dG(µ, Σ), G DP(α, G 0 ) Various types of practically important inference: inference for S-R relationship through conditional expectation functional E(y x; G) inference for log-reproductive success for any specified stock biomass value, x 0 = S 0, through conditional density f(y x 0 ; G) inference for biological reference points through conditional density f(x y 0 ; G) for specific log-reproductive success values y 0 22/26

Modeling for stock-recruitment relationships Cod data from six North Atlantic regions. For each region, posterior mean (blue) and 95% interval estimates (red) for the conditional mean log-reproductive success. 23/26

Modeling for stock-recruitment relationships Cod data. For the NE Arctic (top panels) and West of Scotland (bottom panels) regions, posterior mean (blue) and 95% interval estimates (red) for the conditional density of log-reproductive success at four specified stock biomass values. 24/26

Current/future work 6. Current/future work General framework with several potentially important applications: nonparametric switching regression modeling (Taddy & Kottas, 2008) modeling and inference for marked point processes over time or space (with Matt Taddy) fully nonparametric regression for censored survival data nonparametric regression models for multivariate ordinal responses (with Kassie Fronczyk) sensitivity analysis and inversion for computer model experiments (with Marian Farah) 25/26

Contact info: e-mail: thanos@ams.ucsc.edu, web: http://www.ams.ucsc.edu/ thanos UCSC Department of Applied Math and Statistics: www.ams.ucsc.edu Technical Reports series: http://www.ams.ucsc.edu/reports/trview.php THANKS!!! 26/26