A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Similar documents
Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Bayesian Nonparametric Modeling for Multivariate Ordinal Regression

Lecture 16: Mixtures of Generalized Linear Models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

A Nonparametric Bayesian Model for Multivariate Ordinal Data

STA 216, GLM, Lecture 16. October 29, 2007

Bayes methods for categorical data. April 25, 2017

Gibbs Sampling in Endogenous Variables Models

Part 6: Multivariate Normal and Linear Models

Default Priors and Effcient Posterior Computation in Bayesian

Gibbs Sampling in Latent Variable Models #1

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Bayesian Multivariate Logistic Regression

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

A Bayesian Nonparametric Approach to Inference for Quantile Regression

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian linear regression

November 2002 STA Random Effects Selection in Linear Mixed Models

Bayesian non-parametric model to longitudinally predict churn

Part 8: GLMs and Hierarchical LMs and GLMs

MULTILEVEL IMPUTATION 1

arxiv: v3 [stat.me] 3 May 2016

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

A Nonparametric Model-based Approach to Inference for Quantile Regression

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Gibbs Sampling in Linear Models #2

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Mixture Modeling for Marked Poisson Processes

The STS Surgeon Composite Technical Appendix

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Nearest Neighbor Gaussian Processes for Large Spatial Data

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Methods for the Comparability of Scores

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Nonparametric Bayesian Methods (Gaussian Processes)

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

STAT Advanced Bayesian Inference

Modeling conditional distributions with mixture models: Theory and Inference

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

1 Data Arrays and Decompositions

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Bayesian Nonparametric Regression through Mixture Models

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Nonparametric Bayesian Modeling for Multivariate Ordinal. Data

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Modeling for Dynamic Ordinal Regression Relationships: An. Application to Estimating Maturity of Rockfish in California

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Introduction to Probabilistic Graphical Models: Exercises

Bayesian Nonparametrics: Dirichlet Process

STA 4273H: Statistical Machine Learning

Analysing geoadditive regression data: a mixed model approach

UNIVERSITY OF CALIFORNIA SANTA CRUZ

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Nonparametric Bayes Uncertainty Quantification

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Assessing Toxicities in a Clinical Trial: Bayesian Inference for Ordinal Data. Nested within Categories

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Hyperparameter estimation in Dirichlet process mixture models

Latent Variable Models Probabilistic Models in the Study of Language Day 4

1. Introduction. Hang Qian 1 Iowa State University

Nonparametric Bayesian Methods - Lecture I

Spatial Mixture Modelling for Unobserved Point Processes: Examples in Immunofluorescence Histology

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

Flexible modeling for stock-recruitment relationships using Bayesian nonparametric mixtures

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Large-scale Ordinal Collaborative Filtering

Quantifying the Price of Uncertainty in Bayesian Models

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Marginal Specifications and a Gaussian Copula Estimation

Metropolis-Hastings Algorithm

Lecture Notes based on Koop (2003) Bayesian Econometrics

Particle Learning for General Mixtures

MCMC algorithms for fitting Bayesian models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Bayesian Inference in the Multivariate Probit Model

Multiparameter models (cont.)

Partial factor modeling: predictor-dependent shrinkage for linear regression

Multivariate Normal & Wishart

Bayesian Nonparametric Regression for Diabetes Deaths

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Spatial modeling for risk assessment of extreme values from environmental time series: A Bayesian nonparametric approach

Scaling up Bayesian Inference

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Image segmentation combining Markov Random Fields and Dirichlet Processes

Hybrid Copula Bayesian Networks

Simulating Random Variables

Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics

Bayesian nonparametric Poisson process modeling with applications

Wrapped Gaussian processes: a short review and some new results

Transcription:

A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Motivation binary responses along with covariates are present in many settings, including biometrics, econometrics, and social sciences Goal: determine the relationship between response and covariates examples: credit scoring, medicine, population dynamics, environmental sciences the response-covariate relationship is described by the regression function standard approaches involve linearity and distributional assumptions, e.g., GLMs

Motivation binary responses along with covariates are present in many settings, including biometrics, econometrics, and social sciences Goal: determine the relationship between response and covariates examples: credit scoring, medicine, population dynamics, environmental sciences the response-covariate relationship is described by the regression function standard approaches involve linearity and distributional assumptions, e.g., GLMs

Bayesian Nonparametrics Bayesian nonparametrics can be used to relax common distributional assumptions, resulting in flexible regression models with proper uncertainty quantification rather than modeling directly the regression function, model the joint distribution of response and covariates using a nonparametric mixture model (West et al., 1994, Müller et al., 1996) this implies a form for the conditional response distribution, which is implicitly modeled nonparametrically involves random covariates

Bayesian Nonparametrics Bayesian nonparametrics can be used to relax common distributional assumptions, resulting in flexible regression models with proper uncertainty quantification rather than modeling directly the regression function, model the joint distribution of response and covariates using a nonparametric mixture model (West et al., 1994, Müller et al., 1996) this implies a form for the conditional response distribution, which is implicitly modeled nonparametrically involves random covariates

Bayesian Nonparametrics Bayesian nonparametrics can be used to relax common distributional assumptions, resulting in flexible regression models with proper uncertainty quantification rather than modeling directly the regression function, model the joint distribution of response and covariates using a nonparametric mixture model (West et al., 1994, Müller et al., 1996) this implies a form for the conditional response distribution, which is implicitly modeled nonparametrically involves random covariates

Latent Variable Formulation introduce latent continuous random variables z that determine the binary responses y, so that y = 1 if-f z > 0 (e.g., Albert and Chib, 1993) estimate the joint distribution of latent responses and covariates f (z, x) using a nonparametric mixture model, to obtain flexible inference for the regression function pr(y = 1 x) the latent variables may be of interest in some applications, containing more information than just a 0/1 observation in biology applications, these may be thought of as maturity, latent survivorship, or measure of health

Latent Variable Formulation introduce latent continuous random variables z that determine the binary responses y, so that y = 1 if-f z > 0 (e.g., Albert and Chib, 1993) estimate the joint distribution of latent responses and covariates f (z, x) using a nonparametric mixture model, to obtain flexible inference for the regression function pr(y = 1 x) the latent variables may be of interest in some applications, containing more information than just a 0/1 observation in biology applications, these may be thought of as maturity, latent survivorship, or measure of health

Latent Variable Formulation introduce latent continuous random variables z that determine the binary responses y, so that y = 1 if-f z > 0 (e.g., Albert and Chib, 1993) estimate the joint distribution of latent responses and covariates f (z, x) using a nonparametric mixture model, to obtain flexible inference for the regression function pr(y = 1 x) the latent variables may be of interest in some applications, containing more information than just a 0/1 observation in biology applications, these may be thought of as maturity, latent survivorship, or measure of health

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

DP Mixture Model The Dirichlet Process (DP) (Ferguson, 1973) generates random distributions, and can be used as a prior for spaces of distribution functions. DP constructive definition (Sethuraman, 1994): if G DP(α, G 0 ), then it is almost surely of the form l=1 p lδ νl ν l iid G0, l = 1, 2,... iid z r Beta(1, α), r = 1, 2,... l 1 define p 1 = z 1, and p l = z l r=1 (1 z r ), for l = 2, 3,... DP mixture model for the latent responses and covariates f (z, x; G) = N p+1 (z, x; µ, Σ)dG(µ, Σ) G α, ψ DP(α, G 0 (µ, Σ; ψ))

DP Mixture Model The Dirichlet Process (DP) (Ferguson, 1973) generates random distributions, and can be used as a prior for spaces of distribution functions. DP constructive definition (Sethuraman, 1994): if G DP(α, G 0 ), then it is almost surely of the form l=1 p lδ νl ν l iid G0, l = 1, 2,... iid z r Beta(1, α), r = 1, 2,... l 1 define p 1 = z 1, and p l = z l r=1 (1 z r ), for l = 2, 3,... DP mixture model for the latent responses and covariates f (z, x; G) = N p+1 (z, x; µ, Σ)dG(µ, Σ) G α, ψ DP(α, G 0 (µ, Σ; ψ))

Implied Conditional Regression From the constructive definition, the model has an a.s. representation as a countable mixture of MVNs f (z, x; G) = p l N p+1 (z, x; µ l, Σ l ) l=1 Binary regression functional: pr(y = 1 x; G) marginalize over z to obtain f (x; G) and f (y, x; G) f (x; G) = p l N p (x; µ x l, Σxx l ) l=1 And the joint distribution f (y, x; G) = ( ( p l N p (x; µ x l, Σxx l )Bern y; Φ l=1 µ z l + Σ zx l (Σ xx (Σ zz l Σ zx l )) l ) 1 (x µ x l ) ) 1 Σ xz l ) 1/2 (Σ xx l

Implied Conditional Regression From the constructive definition, the model has an a.s. representation as a countable mixture of MVNs f (z, x; G) = p l N p+1 (z, x; µ l, Σ l ) l=1 Binary regression functional: pr(y = 1 x; G) marginalize over z to obtain f (x; G) and f (y, x; G) f (x; G) = p l N p (x; µ x l, Σxx l ) l=1 And the joint distribution f (y, x; G) = ( ( p l N p (x; µ x l, Σxx l )Bern y; Φ l=1 µ z l + Σ zx l (Σ xx (Σ zz l Σ zx l )) l ) 1 (x µ x l ) ) 1 Σ xz l ) 1/2 (Σ xx l

Implied Conditional Regression From the constructive definition, the model has an a.s. representation as a countable mixture of MVNs f (z, x; G) = p l N p+1 (z, x; µ l, Σ l ) l=1 Binary regression functional: pr(y = 1 x; G) marginalize over z to obtain f (x; G) and f (y, x; G) f (x; G) = p l N p (x; µ x l, Σxx l ) l=1 And the joint distribution f (y, x; G) = ( ( p l N p (x; µ x l, Σxx l )Bern y; Φ l=1 µ z l + Σ zx l (Σ xx (Σ zz l Σ zx l )) l ) 1 (x µ x l ) ) 1 Σ xz l ) 1/2 (Σ xx l

The Regression Function implied regression function: pr(y = 1 x; G) = l=1 w l(x)π l (x), with covariate dependent weights and probabilities π l (x) = Φ w l (x) p l N(x; µ x l, Σxx l ) ( µ z l + Σ zx l (Σ xx (Σ zz l Σ zx l ) l ) 1 (x µ x l ) ) 1 Σ xz l ) 1/2 (Σ xx l Notice that the probabilities have the probit form with component-specific intercept and slope parameters

The Regression Function implied regression function: pr(y = 1 x; G) = l=1 w l(x)π l (x), with covariate dependent weights and probabilities π l (x) = Φ w l (x) p l N(x; µ x l, Σxx l ) ( µ z l + Σ zx l (Σ xx (Σ zz l Σ zx l ) l ) 1 (x µ x l ) ) 1 Σ xz l ) 1/2 (Σ xx l Notice that the probabilities have the probit form with component-specific intercept and slope parameters

Identifiability Can the entire covariance matrix Σ be estimated? Probit Regression: z N(x T β, 1) the binary responses are not able to inform about the scale of the latent responses retaining Σ zx is important, if we set it to 0, then π l (x) becomes just π l We have shown that if Σ zz is fixed, the remaining parameters are identifiable in the kernel of the mixture model for y and x

Identifiability Can the entire covariance matrix Σ be estimated? Probit Regression: z N(x T β, 1) the binary responses are not able to inform about the scale of the latent responses retaining Σ zx is important, if we set it to 0, then π l (x) becomes just π l We have shown that if Σ zz is fixed, the remaining parameters are identifiable in the kernel of the mixture model for y and x

Identifiability Can the entire covariance matrix Σ be estimated? Probit Regression: z N(x T β, 1) the binary responses are not able to inform about the scale of the latent responses retaining Σ zx is important, if we set it to 0, then π l (x) becomes just π l We have shown that if Σ zz is fixed, the remaining parameters are identifiable in the kernel of the mixture model for y and x

Identifiability Can the entire covariance matrix Σ be estimated? Probit Regression: z N(x T β, 1) the binary responses are not able to inform about the scale of the latent responses retaining Σ zx is important, if we set it to 0, then π l (x) becomes just π l We have shown that if Σ zz is fixed, the remaining parameters are identifiable in the kernel of the mixture model for y and x

Identifiability Can the entire covariance matrix Σ be estimated? Probit Regression: z N(x T β, 1) the binary responses are not able to inform about the scale of the latent responses retaining Σ zx is important, if we set it to 0, then π l (x) becomes just π l We have shown that if Σ zz is fixed, the remaining parameters are identifiable in the kernel of the mixture model for y and x

Facilitating Identifiability How to fix only one element of the covariance matrix? the usual inverse-wishart distribution will not work square-root-free Cholesky decomposition of Σ uses the relationship = βσβ T, with diagonal with all elements δ i > 0, and β lower triangular with 1 on its diagonal (Daniels and Pourahmadi, 2002; Webb and Forster, 2007) For y = (y 1,..., y m ) N(µ, Σ), with = βσβ T, the joint distribution for y can be expressed in a recursive form: y 1 N(µ 1, δ 1 ), (y k y 1,..., y k 1 ) N(µ k k 1 j=1 β k,j(y j µ j ), δ k ), k = 2,..., m useful for modeling longitudinal data and specifying conditional independence assumptions

Facilitating Identifiability How to fix only one element of the covariance matrix? the usual inverse-wishart distribution will not work square-root-free Cholesky decomposition of Σ uses the relationship = βσβ T, with diagonal with all elements δ i > 0, and β lower triangular with 1 on its diagonal (Daniels and Pourahmadi, 2002; Webb and Forster, 2007) For y = (y 1,..., y m ) N(µ, Σ), with = βσβ T, the joint distribution for y can be expressed in a recursive form: y 1 N(µ 1, δ 1 ), (y k y 1,..., y k 1 ) N(µ k k 1 j=1 β k,j(y j µ j ), δ k ), k = 2,..., m useful for modeling longitudinal data and specifying conditional independence assumptions

Facilitating Identifiability How to fix only one element of the covariance matrix? the usual inverse-wishart distribution will not work square-root-free Cholesky decomposition of Σ uses the relationship = βσβ T, with diagonal with all elements δ i > 0, and β lower triangular with 1 on its diagonal (Daniels and Pourahmadi, 2002; Webb and Forster, 2007) For y = (y 1,..., y m ) N(µ, Σ), with = βσβ T, the joint distribution for y can be expressed in a recursive form: y 1 N(µ 1, δ 1 ), (y k y 1,..., y k 1 ) N(µ k k 1 j=1 β k,j(y j µ j ), δ k ), k = 2,..., m useful for modeling longitudinal data and specifying conditional independence assumptions

Facilitating Identifiability How to fix only one element of the covariance matrix? the usual inverse-wishart distribution will not work square-root-free Cholesky decomposition of Σ uses the relationship = βσβ T, with diagonal with all elements δ i > 0, and β lower triangular with 1 on its diagonal (Daniels and Pourahmadi, 2002; Webb and Forster, 2007) For y = (y 1,..., y m ) N(µ, Σ), with = βσβ T, the joint distribution for y can be expressed in a recursive form: y 1 N(µ 1, δ 1 ), (y k y 1,..., y k 1 ) N(µ k k 1 j=1 β k,j(y j µ j ), δ k ), k = 2,..., m useful for modeling longitudinal data and specifying conditional independence assumptions

Facilitating Identifiability here, no natural ordering is present, but the paramaterization has other useful properties which we exploit δ 1 = Σ zz fix δ 1, and mix on δ 2,..., δ p+1 and p(p + 1)/2 free elements of β, denoted by vector β Then the DP mixture model becomes f (z, x; G) = N p+1 (z, x; µ, β 1 β T )dg(µ, β, ) computationally convenient: there exist conjugate prior distributions for β and δ 2,..., δ p+1, which are MVN and (independent) inverse-gamma

Facilitating Identifiability here, no natural ordering is present, but the paramaterization has other useful properties which we exploit δ 1 = Σ zz fix δ 1, and mix on δ 2,..., δ p+1 and p(p + 1)/2 free elements of β, denoted by vector β Then the DP mixture model becomes f (z, x; G) = N p+1 (z, x; µ, β 1 β T )dg(µ, β, ) computationally convenient: there exist conjugate prior distributions for β and δ 2,..., δ p+1, which are MVN and (independent) inverse-gamma

Outline 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Hierarchical Model Blocked Gibbs sampler: truncate G to G N ( ) = N l=1 p lδ Wl ( ), with W l = (µ l, β l, l ), and introduce configuration variables (L 1,..., L n ) taking values in 1,..., N. y i z i ind 1 (yi =1)1 (zi >0) + 1 (yi =0)1 (zi 0), i = 1,..., n ind (z i, x i ) W, L i N p+1 ((z i, x i ); µ Li, β 1 Li β T L i ), i = 1,..., n L i p N p l δ l (L i ), l=1 L i i = 1,..., n p+1 W l ψ ind N p+1 (µ l ; m, V )N q ( β l ; θ, ci) IG(δ i,l ; ν i, s i ), i=2 l = 1,..., N

Gibbs sampling may be used to simulate from full posterior p(w, L, p, ψ, α, z data), with the conditionally conjugate base distribution, and conjugate priors on ψ and α. The posterior for G N = (p, W ) is imputed in the MCMC, enabling full inference for any functional of f (z, x; G N ), now a finite sum Binary regression functional: for any covariate value x 0, at iteration r of the MCMC, calculate pr(y = 1 x 0 ; G (r) N ) provides point estimate and uncertainty quantification for regression function Same can be done for other functionals, such as latent response distribution f (z x 0 ; G N ) at any covariate value x 0

Gibbs sampling may be used to simulate from full posterior p(w, L, p, ψ, α, z data), with the conditionally conjugate base distribution, and conjugate priors on ψ and α. The posterior for G N = (p, W ) is imputed in the MCMC, enabling full inference for any functional of f (z, x; G N ), now a finite sum Binary regression functional: for any covariate value x 0, at iteration r of the MCMC, calculate pr(y = 1 x 0 ; G (r) N ) provides point estimate and uncertainty quantification for regression function Same can be done for other functionals, such as latent response distribution f (z x 0 ; G N ) at any covariate value x 0

Gibbs sampling may be used to simulate from full posterior p(w, L, p, ψ, α, z data), with the conditionally conjugate base distribution, and conjugate priors on ψ and α. The posterior for G N = (p, W ) is imputed in the MCMC, enabling full inference for any functional of f (z, x; G N ), now a finite sum Binary regression functional: for any covariate value x 0, at iteration r of the MCMC, calculate pr(y = 1 x 0 ; G (r) N ) provides point estimate and uncertainty quantification for regression function Same can be done for other functionals, such as latent response distribution f (z x 0 ; G N ) at any covariate value x 0

Outline Simulation Example Atmospheric Measurements Credit Card Data 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Simulated Data Simulation Example Atmospheric Measurements Credit Card Data Data {(z i, x i ) : i = 1,..., n} was simulated from a mixture of 3 bivariate normals, and y determined from z. compare inference from the binary regression model with data (y, x) to that from model which views (z, x) as data a practical prior specification approach which is appropriate when little is known about the problem is applied here to specify priors on ψ, consider only one mixture component and use an approximate center and range of the data, as well as prior simulation to induce an approximate unif( 1, 1) prior on corr(z, x)

Simulated Data Simulation Example Atmospheric Measurements Credit Card Data Data {(z i, x i ) : i = 1,..., n} was simulated from a mixture of 3 bivariate normals, and y determined from z. compare inference from the binary regression model with data (y, x) to that from model which views (z, x) as data a practical prior specification approach which is appropriate when little is known about the problem is applied here to specify priors on ψ, consider only one mixture component and use an approximate center and range of the data, as well as prior simulation to induce an approximate unif( 1, 1) prior on corr(z, x)

Simulated Data Simulation Example Atmospheric Measurements Credit Card Data Data {(z i, x i ) : i = 1,..., n} was simulated from a mixture of 3 bivariate normals, and y determined from z. compare inference from the binary regression model with data (y, x) to that from model which views (z, x) as data a practical prior specification approach which is appropriate when little is known about the problem is applied here to specify priors on ψ, consider only one mixture component and use an approximate center and range of the data, as well as prior simulation to induce an approximate unif( 1, 1) prior on corr(z, x)

Pr(z>0 x;g) 0.0 0.2 0.4 0.6 0.8 1.0 Pr(y=1 x;g) 0.0 0.2 0.4 0.6 0.8 1.0 2 0 2 4 x 2 0 2 4 x The inference for pr(z > 0 x; G) (left) is compared to that for pr(y = 1 x; G) (right) and the truth (solid line).

4 3 2 1 0 1 2 3 f(z x=x1) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 f(z x=x2) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 f(z x=x3) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 z 4 3 2 1 0 1 2 3 z 4 3 2 1 0 1 2 3 z f(z x=x1) 0.0 1.2 f(z x=x2) 0.0 1.2 f(z x=x3) 0.0 1.2 3.9 0.0 2.9 z 3.9 0.0 2.9 z 3.9 0.0 2.9 z Top row: Inference for f (z x 0 ; G) under the model which views z as observed, with true densities as dashed lines, at 3 values of x 0. Bottom: Inference from the binary regression model.

Outline Simulation Example Atmospheric Measurements Credit Card Data 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Ozone and Wind Speed Simulation Example Atmospheric Measurements Credit Card Data 111 daily measurements of wind speed (mph) and ozone concentration (parts per billion) in NYC over 4 month period objective: model the probability of exceeding a certain ozone concentration as a function of wind speed the model only sees whether or not there was an exceedance, but there is an actual ozone concentration underlying this 0/1 value

probability of ozone exceedence 0.0 0.2 0.4 0.6 0.8 1.0 ozone concentration 0 50 100 150 5 10 15 20 wind speed 5 10 15 20 wind speed Left: The probability that ozone concentration (parts per billion) exceeds a threshold of 70 decreases with wind speed (mph). Right: For comparison, here are the actual non-discretized ozone measurements as a function of wind speed.

f(z x0) 0.0 0.2 0.4 0.6 f(z x0) 0.0 0.2 0.4 0.6 3 1 0 1 2 3 z 3 1 0 1 2 3 z f(z x0) 0.0 0.2 0.4 0.6 f(z x0) 0.0 0.2 0.4 0.6 3 1 0 1 2 3 z 3 1 0 1 2 3 z Estimates for f (z x 0 ; G) at wind speed values of 5, 8, 10, and 15 mph.

Outline Simulation Example Atmospheric Measurements Credit Card Data 1 2 3 Simulation Example Atmospheric Measurements Credit Card Data 4

Credit Cards and Income Simulation Example Atmospheric Measurements Credit Card Data n = 100 subjects in a study were asked whether or not they owned a travel credit card, and their income was recorded (Agresti, 1996) In this situation, it is not clear that there is some meaningful interpretation of the latent continuous random variables, but we can still use the method for regression Does probability of owning a credit card change with income?

Pr(y=1 x;g) 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 60 70 income in thousands Probability of owning a credit card appears to increase with income, with a slight dip or leveling off around income of 40-50, since all subjects in that region did not own a credit card.

Extensions to Ordinal Reponses similar methodology, wider range of applications for an ordinal response with C categories, assume y = j if-f γ j 1 < z γ j, for j = 1,...C, and apply the same DP mixture of MVNs for (z, x) for fixed cut-off points γ, it can be shown that all of µ and Σ are identifiable in the induced kernel for the observables the C 1 free cut-off points can be fixed to arbitrary increasing values (Kottas et al., 2005), which is an attribute in a computational sense

Extensions to Ordinal Reponses similar methodology, wider range of applications for an ordinal response with C categories, assume y = j if-f γ j 1 < z γ j, for j = 1,...C, and apply the same DP mixture of MVNs for (z, x) for fixed cut-off points γ, it can be shown that all of µ and Σ are identifiable in the induced kernel for the observables the C 1 free cut-off points can be fixed to arbitrary increasing values (Kottas et al., 2005), which is an attribute in a computational sense

Extensions to Ordinal Reponses similar methodology, wider range of applications for an ordinal response with C categories, assume y = j if-f γ j 1 < z γ j, for j = 1,...C, and apply the same DP mixture of MVNs for (z, x) for fixed cut-off points γ, it can be shown that all of µ and Σ are identifiable in the induced kernel for the observables the C 1 free cut-off points can be fixed to arbitrary increasing values (Kottas et al., 2005), which is an attribute in a computational sense

Extensions to Ordinal Reponses similar methodology, wider range of applications for an ordinal response with C categories, assume y = j if-f γ j 1 < z γ j, for j = 1,...C, and apply the same DP mixture of MVNs for (z, x) for fixed cut-off points γ, it can be shown that all of µ and Σ are identifiable in the induced kernel for the observables the C 1 free cut-off points can be fixed to arbitrary increasing values (Kottas et al., 2005), which is an attribute in a computational sense

Other Extensions multivariate ordinal responses: J ordinal responses associated with a vector of covariates for each subject; with C j categories associated with the jth response several applications, but limited existing methods for flexible inference y and z are vectors, and y j = l if-f γ j,l 1 < z j γ j,l, for j = 1,..., J, and l = 1,..., C j C j > 2 for all j, then no identifiability restrictions needed C j = 2 for some j, then (β, ) paramaterization can be used, and fixing certain elements of δ provides the necessary restrictions mixed ordinal-continuous responses

Other Extensions multivariate ordinal responses: J ordinal responses associated with a vector of covariates for each subject; with C j categories associated with the jth response several applications, but limited existing methods for flexible inference y and z are vectors, and y j = l if-f γ j,l 1 < z j γ j,l, for j = 1,..., J, and l = 1,..., C j C j > 2 for all j, then no identifiability restrictions needed C j = 2 for some j, then (β, ) paramaterization can be used, and fixing certain elements of δ provides the necessary restrictions mixed ordinal-continuous responses

Other Extensions multivariate ordinal responses: J ordinal responses associated with a vector of covariates for each subject; with C j categories associated with the jth response several applications, but limited existing methods for flexible inference y and z are vectors, and y j = l if-f γ j,l 1 < z j γ j,l, for j = 1,..., J, and l = 1,..., C j C j > 2 for all j, then no identifiability restrictions needed C j = 2 for some j, then (β, ) paramaterization can be used, and fixing certain elements of δ provides the necessary restrictions mixed ordinal-continuous responses

Other Extensions multivariate ordinal responses: J ordinal responses associated with a vector of covariates for each subject; with C j categories associated with the jth response several applications, but limited existing methods for flexible inference y and z are vectors, and y j = l if-f γ j,l 1 < z j γ j,l, for j = 1,..., J, and l = 1,..., C j C j > 2 for all j, then no identifiability restrictions needed C j = 2 for some j, then (β, ) paramaterization can be used, and fixing certain elements of δ provides the necessary restrictions mixed ordinal-continuous responses

Other Extensions multivariate ordinal responses: J ordinal responses associated with a vector of covariates for each subject; with C j categories associated with the jth response several applications, but limited existing methods for flexible inference y and z are vectors, and y j = l if-f γ j,l 1 < z j γ j,l, for j = 1,..., J, and l = 1,..., C j C j > 2 for all j, then no identifiability restrictions needed C j = 2 for some j, then (β, ) paramaterization can be used, and fixing certain elements of δ provides the necessary restrictions mixed ordinal-continuous responses

Conclusions Binary responses measured along with covariates represents a simple setting, but the scope of problems which lie in this category is large. This framework allows flexible, nonparametric inference to be obtained for the regression relationship in a general binary regression problem. The methodology extends easily to larger classes of problems in ordinal regression, including multivariate responses and mixed responses, making the framework much more powerful, with utility in a wide variety of applications.