Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Similar documents
A Bayesian Nonparametric Approach to Inference for Quantile Regression

A Nonparametric Model-based Approach to Inference for Quantile Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Flexible modeling for stock-recruitment relationships using Bayesian nonparametric mixtures

Applied Bayesian Nonparametric Mixture Modeling Session 3 Applications of DP mixture models

UNIVERSITY OF CALIFORNIA SANTA CRUZ

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Mixture Modeling for Marked Poisson Processes

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Bayesian Nonparametric Modeling for Multivariate Ordinal Regression

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Bayesian nonparametric Poisson process modeling with applications

A nonparametric Bayesian approach to inference for non-homogeneous. Poisson processes. Athanasios Kottas 1. (REVISED VERSION August 23, 2006)

A Nonparametric Bayesian Model for Multivariate Ordinal Data

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Nonparametric Bayesian Survival Analysis using Mixtures of Weibull Distributions

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Bayesian non-parametric model to longitudinally predict churn

Bayesian nonparametric modeling approaches for quantile regression

Nonparametric Bayes Uncertainty Quantification

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Non-Parametric Bayes

Quantifying the Price of Uncertainty in Bayesian Models

STAT Advanced Bayesian Inference

Foundations of Nonparametric Bayesian Methods

Analysing geoadditive regression data: a mixed model approach

Gibbs Sampling in Latent Variable Models #1

Nonparametric Bayesian Inference for Mean Residual. Life Functions in Survival Analysis

Gibbs Sampling in Endogenous Variables Models

processes Sai Xiao, Athanasios Kottas and Bruno Sansó Abstract

Developmental Toxicity Studies

Lecture 3a: Dirichlet processes

Dirichlet Processes: Tutorial and Practical Course

arxiv: v3 [stat.me] 3 May 2016

Particle Learning for General Mixtures

Bayesian Nonparametric Predictive Modeling of Group Health Claims

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

Bayesian Non-parametric Modeling With Skewed and Heavy-Tailed Data 1

Bayes methods for categorical data. April 25, 2017

Bayesian estimation of the discrepancy with misspecified parametric models

Modeling conditional distributions with mixture models: Theory and Inference

Gaussian kernel GARCH models

Lecture 16: Mixtures of Generalized Linear Models

Spatial modeling for risk assessment of extreme values from environmental time series: A Bayesian nonparametric approach

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Chapter 2. Data Analysis

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Bayesian Linear Regression

Bayesian Semiparametric GARCH Models

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Bayesian Semiparametric GARCH Models

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Modeling for Dynamic Ordinal Regression Relationships: An. Application to Estimating Maturity of Rockfish in California

Bayesian semiparametric inference for the accelerated failure time model using hierarchical mixture modeling with N-IG priors

Hybrid Dirichlet processes for functional data

Bayesian model selection for computer model validation via mixture model estimation

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

CPSC 540: Machine Learning

Nonparametric Bayesian Methods - Lecture I

Part 6: Multivariate Normal and Linear Models

Bayesian Nonparametrics

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

On Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Advanced Machine Learning

A Nonparametric Model for Stationary Time Series

Density Estimation. Seungjin Choi

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

A general mixed model approach for spatio-temporal regression data

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Computational statistics

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER

GAUSSIAN PROCESS REGRESSION

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Normalized kernel-weighted random measures

Modeling and Predicting Healthcare Claims

STA 216, GLM, Lecture 16. October 29, 2007

Partial factor modeling: predictor-dependent shrinkage for linear regression

Bayesian mixture modeling for spectral density estimation

Marginal Specifications and a Gaussian Copula Estimation

Scaling up Bayesian Inference

CTDL-Positive Stable Frailty Model

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametrics

Bayesian Nonparametric Regression through Mixture Models

Bayesian Estimation of log N log S

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Can we do statistical inference in a non-asymptotic way? 1

Gibbs Sampling in Linear Models #2

Extreme Value Analysis and Spatial Extremes

Transcription:

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham Young University November 6, 2008

Outline Outline 1. Introduction and motivation 2. Dirichlet process mixture models 3. Curve fitting using Dirichlet process mixtures 4. Bayesian nonparametric quantile regression 5. Modeling for stock-recruitment relationships 6. Current/future work 1/26

Introduction and motivation 1. Introduction and motivation Two dominant trends in the Bayesian regression literature: seek increasingly flexible regression function models, and accompany these models with more comprehensive uncertainty quantification Typically, Bayesian nonparametric modeling focuses on either the regression function or the error distribution Bayesian nonparametric extension of implied conditional regression: use flexible nonparametric mixture model for the joint distribution of response and covariates obtain full inference for the desired conditional distribution for response given covariates Both the response distribution and, implicitly, the regression relationship are modeled nonparametrically, thus providing a flexible framework for the general regression problem 2/26

Introduction and motivation The area of Bayesian nonparametrics provides the framework for such modeling instead of specifying unknown functions and distributions up to a (small) number of parameters, treat them as the random model parameters nonparametric priors support the underlying spaces of random functions/distributions resulting in flexible inferences and more reliable predictions Modeling utilizes Dirichlet process mixtures, a flexible class of nonparametric mixture models 3/26

Dirichlet process mixture models 2. Dirichlet process mixture models The Dirichlet process (DP) (Ferguson, 1973) is a random probability measure on distributions characterized by two parameters: a base distribution G 0 (the center of the process) and a (precision) parameter α > 0 DP constructive definition (Sethuraman, 1994) let {z s, s = 1, 2,...} and {φ j, j = 1, 2,...} be independent sequences of random variables, with z s i.i.d. Beta(1, α), and φ j i.i.d. G 0 define ω 1 = z 1, ω j = z j j 1 s=1 (1 z s), j 2 (stick-breaking construction) then, a realization G from DP(α, G 0 ) is (almost surely) of the form G( ) = ω j δ φj ( ) j=1 i.e., a discrete distribution that can be represented as a countable mixture of point masses 4/26

Dirichlet process mixture models 0.07 1 0.06 0.9 0.8 0.05 0.7 w 0.04 0.03 P(X<x) 0.6 0.5 0.4 0.02 0.3 0.01 0.2 0.1 0 3 2 1 0 1 2 3 x 0 3 2 1 0 1 2 3 x DP with G 0 = N(0, 1) and α = 20. In the left panel, the spiked lines are located at 1000 sampled values of x drawn from N(0, 1) with heights given by the weights, ω l, calculated using the stick-breaking algorithm (a truncated version so that the weights sum to 1). These spikes are then summed from left to right to generate one cdf sample path from the DP. The right panel shows 8 such sample paths indicated by the lighter jagged lines. The heavy smooth line indicates the N(0, 1) cdf. 5/26

Dirichlet process mixture models Dirichlet process mixture model: for a parametric family of distributions K( ; θ), θ Θ R q, define F ( ; G) = K( ; θ)dg(θ), G DP(α, G 0 ) DP mixture prior can model both discrete and continuous distributions Hierarchical model: for y 1,..., y n i.i.d., given G, from F ( ; G), y i θ i θ i G ind. i.i.d. K( ; θ i ), i = 1,..., n G, i = 1,..., n G DP(α, G 0 ) typically, hyperpriors on α and/or the parameters ψ of G 0 G 0 (ψ) are added 6/26

Curve fitting using Dirichlet process mixtures 3. Curve fitting using Dirichlet process mixtures Focus on univariate continuous response y (though extensions currently studied for categorical and/or multivariate responses) DP mixture model for the joint density f(y, x) of the response y and the vector of covariates x: f(y, x) f(y, x; G) = k(y, x; θ)dg(θ), G DP(α, G 0 (ψ)) For the mixture kernel k(y, x; θ) use: multivariate normal for (real-valued) continuous response and covariates mixed continuous/discrete distribution to incorporate both categorical and continuous covariates kernel component for y supported by R + for problems in survival/reliability analysis 7/26

Curve fitting using Dirichlet process mixtures Again, introduce latent mixing parameters θ = {θ i : i = 1,..., n} for each response/covariate observation (y i, x i ), i = 1,..., n full posterior: p(g, θ, α, ψ data) = p(g θ, α, ψ)p(θ, α, ψ data) p(θ, α, ψ data) is the posterior of the finite-dimensional parameter vector that results by marginalizing G over its DP prior MCMC posterior simulation to sample from this marginal posterior p(g θ, α, { ψ) is a DP with precision parameter α + n and mean (α + n) 1 αg 0 ( ; ψ) + } n j=1 n jδ θ j ( ), where n is the number of distinct θ i, and n j is the size of the j-th distinct component sample using the DP stick-breaking definition with a truncation approximation Alternatively, G can be truncated from the outset resulting in a finite mixture model that can be fitted with Gibbs sampling 8/26

Curve fitting using Dirichlet process mixtures For any grid of values (y 0, x 0 ), obtain posterior samples for: joint density f(y 0, x 0 ; G), marginal density f(x 0 ; G), and therefore, conditional density f(y 0 x 0 ; G) conditional expectation E(y x 0 ; G), which, estimated over grid in x, provides inference for the regression relationship conditioning in f(y 0 x 0 ; G) and/or E(y x 0 ; G) may involve only a portion of vector x Key features of the modeling approach: full and exact nonparametric inference (no need for asymptotics) model for both non-linear regression curves and non-standard shapes for the conditional response density model does not rely on additive regression formulations; it can uncover interactions between covariates that might influence the regression relationship 9/26

Curve fitting using Dirichlet process mixtures Data Example Simulated data set with a continuous response y, one continuous covariate x c, and one binary categorical covariate x d x ci ind. N(0, 1) x di x ci ind. Bernoulli(probit(x ci )) y i x ci, x di ind. N(h(x ci ), σ xdi ), with σ 0 = 0.25, σ 1 = 0.5, and h(x c ) = 0.4x c + 0.5 sin(2.7x c ) + 1.1(1 + x 2 c) 1 two sample sizes: n = 200 and n = 2000 DP mixture model with a mixed normal/bernoulli kernel: f(y, x c, x d ; G) = N 2 (y, x c ; µ, Σ)π x d (1 π) 1 x d dg(µ, Σ, π), with G DP(α, G 0 (µ, Σ, π) = N 2 (µ; m, V ) IWish(Σ; ν, S) Beta(π; a, b)) 10/26

Curve fitting using Dirichlet process mixtures 1 0 1 2 3 4 h(x) 1 0 1 2 3 4 2 1 0 1 2 2 1 0 1 2 2 1 0 1 2 Posterior point and 90% interval estimates (dashed and dotted lines) for conditional response expectation E(y xc, x d = 0; G) (left panels), E(y xc, x d = 1; G) (middle panels), and E(y xc; G) (right panels). The corresponding data is plotted in grey for the sample of size n = 200 (top panels) and n = 2000 (bottom panels). The solid line denotes the true curve. x 11/26

Bayesian nonparametric quantile regression 4. Bayesian nonparametric quantile regression In regression settings, the covariates may have effect not only on the center of the response distribution but also on its shape Quantile regression quantifies relationship between a set of quantiles of response distribution and covariates, and thus, provides a more complete explanation of the response distribution in terms of available covariates Semiparametric additive quantile regression framework: y i = h(x i ) + ε i. where the ε i are i.i.d. from a distribution with p-th quantile equal to 0 earlier work on Bayesian semiparametric modeling with parametric quantile regression functions and nonparametric priors for unimodal error densities (Kottas & Krnjajić, 2008) 12/26

Bayesian nonparametric quantile regression Alternative model-based nonparametric approach (Taddy & Kottas, 2007) model joint density f(y, x) of the response y and the M-variate vector of (continuous) covariates x with a DP mixture of normals: f(y, x; G) = N M+1 (y, x; µ, Σ)dG(µ, Σ), G DP(α, G 0 ) with G 0 (µ, Σ) = N M+1 (µ; m, V ) IWish(Σ; ν, S) For any grid of values (y 0, x 0 ), obtain posterior samples for: conditional density f(y 0 x 0 ; G) and conditional cdf F (y 0 x 0 ; G) conditional quantile regression q p (x 0 ; G), for any 0 < p < 1 13/26

Bayesian nonparametric quantile regression Key features: modeling framework enables simultaneous inference for more than one quantile regression model allows flexible response distributions and non-linear quantile regression relationships Extensions to modeling for partially observed responses (and/or covariates): fully nonparametric Tobit quantile regression for econometrics data 14/26

Bayesian nonparametric quantile regression Data Example Moral hazard data on the relationship between shareholder concentration and several indices for managerial moral hazard in the form of expenditure with scope for private benefit (Yafeh & Yoshua, 2003) data set includes a variety of variables describing 185 Japanese industrial chemical firms listed on the Tokyo stock exchange response y: index MH5, consisting of general sales and administrative expenses deflated by sales four-dimensional covariate vector x: Leverage (ratio of debt to total assets); log(assets); Age of the firm; and TOPTEN (the percent of ownership held by the ten largest shareholders) 15/26

Bayesian nonparametric quantile regression Marginal Average Medians with 90% CI Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 30 40 50 60 70 80 TOPTEN 0.2 0.4 0.6 0.8 Leverage Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 20 30 40 50 60 70 80 90 Age 9 10 11 12 13 14 Log(Assets) Posterior mean and 90% interval estimates for median regression for M H5 conditional on each individual covariate. Data scatterplots are shown in grey. 16/26

Bayesian nonparametric quantile regression Marginal Average 90th Percentiles with 90% CI Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 30 40 50 60 70 80 TOPTEN 0.2 0.4 0.6 0.8 Leverage Moral Hazard 10 20 30 40 50 60 Moral Hazard 10 20 30 40 50 60 20 30 40 50 60 70 80 90 Age 9 10 11 12 13 14 Log(Assets) Posterior mean and 90% interval estimates for 90th percentile regression for M H5 conditional on each individual covariate. Data scatterplots are shown in grey. 17/26

Bayesian nonparametric quantile regression 0.2 0.4 0.6 0.8 Leverage 0.2 0.4 0.6 0.8 30 40 50 60 70 TOPTEN 30 40 50 60 70 Posterior estimates of median surfaces (left column) and 90th percentile surfaces (right column) for M H5 conditional on Leverage and TOPTEN. The posterior mean is shown on the top row and the posterior interquartile range on the bottom. 18/26

Bayesian nonparametric quantile regression Conditional density for MH5 0.01 0.05 0.01 0.05 10 20 30 40 50 60 70 10 20 30 40 50 60 70 MH5 Posterior mean and 90% interval estimates for response densities f(y x 0 ; G) conditional on four combinations of values x 0 for the covariate vector (TOPTEN, Leverage, Age, log(assets)) 19/26

Modeling for stock-recruitment relationships 5. Modeling for stock-recruitment relationships Relationship between the number of mature individuals of a species (stock biomass, S) and the production of offspring (recruitment, R) is fundamental to the behavior of any ecological system Special relevance in fisheries research, where the stock-recruitment (S-R) relationship applies directly to decision problems of fishery management A common way of writing this relationship is log(r/s) = g(s) + ɛ where g is the S-R function and ɛ are additive (typically, normal) errors work part of NSF project (joint with Steve Munch, Stony Brook University) 20/26

Modeling for stock-recruitment relationships Standard ecological assumption: as stock abundance increases, successful recruitment per individual (reproductive success) decreases a wide variety of factors affect the S-R relationship, and there are many competing models for the influence of biological and physical mechanisms small amounts of noisy data typically available to infer S-R relationships Traditional (parametric) models may be too limited to extract the relevant information from the data, and to provide reliable predictions and/or temporal forecasts DP mixture modeling approach to capture the nature of recruitment dependence upon stock without making parametric assumptions for either the S-R function or the errors around it (Fronczyk, Kottas & Munch, 2008) 21/26

Modeling for stock-recruitment relationships DP mixture of bivariate normals for joint distribution of log-reproductive success, y = log(r/s), and stock biomass, x = S, f(y, x; G) = N 2 (y, x; µ, Σ)dG(µ, Σ), G DP(α, G 0 ) Various types of practically important inference: inference for S-R relationship through conditional expectation functional E(y x; G) inference for log-reproductive success for any specified stock biomass value, x 0 = S 0, through conditional density f(y x 0 ; G) inference for biological reference points through conditional density f(x y 0 ; G) for specific log-reproductive success values y 0 22/26

Modeling for stock-recruitment relationships Cod data from six North Atlantic regions. For each region, posterior mean (blue) and 95% interval estimates (red) for the conditional mean log-reproductive success. 23/26

Modeling for stock-recruitment relationships Cod data. For the NE Arctic (top panels) and West of Scotland (bottom panels) regions, posterior mean (blue) and 95% interval estimates (red) for the conditional density of log-reproductive success at four specified stock biomass values. 24/26

Current/future work 6. Current/future work General framework with several potentially important applications: nonparametric switching regression modeling (Taddy & Kottas, 2008) modeling and inference for marked point processes over time or space (with Matt Taddy) fully nonparametric regression for censored survival data nonparametric regression models for multivariate ordinal responses (with Kassie Fronczyk) sensitivity analysis and inversion for computer model experiments (with Marian Farah) 25/26

Contact info: e-mail: thanos@ams.ucsc.edu, web: http://www.ams.ucsc.edu/ thanos UCSC Department of Applied Math and Statistics: www.ams.ucsc.edu Technical Reports series: http://www.ams.ucsc.edu/reports/trview.php THANKS!!! 26/26