Transformations and Bayesian Density Estimation

Transformations and Bayesian Density Estimation Andrew Bean 1, Steve MacEachern, Xinyi Xu The Ohio State University 10 th Conference on Bayesian Nonparametrics June 25, 2015 1 bean.243@osu.edu Transformations and Bayesian Density Estimation BNP10 2015 1

Transformations and Bayesian Density Estimation BNP10 2015 2

Distribution of Body Mass Index in Ohio The Ohio Family Health Survey gathered (self-reported) Body Mass Index measurements on 48,885 Ohio adults. An adult whose BMI is at least 30 is defined as obese by the American Center for Disease Control and Prevention (CDC). Transformations and Bayesian Density Estimation BNP10 2015 3

1 Classical Density Estimation Difficulties in capturing tail behavior Suggested remedies Transformation kernel density-estimation 2 Transformations in Bayesian Density Estimation DPM models for density estimation Role of transformations Two families of transformations A proposed transformation routine 3 Simulation study Design Results 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 4

Table of Contents 1 Classical Density Estimation Difficulties in capturing tail behavior Suggested remedies Transformation kernel density-estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 5

Estimating Skewed and Heavy-Tailed Densities Kernel density estimates can be ineffective when the true distribution is skewed and/or heavy-tailed. Estimation is especially difficult in regions where the data are sparse. Sample of size n = 100 from t 2. Sample from SN(0, 1, 10). Transformations and Bayesian Density Estimation BNP10 2015 6

Classical Remedies Some strategies from the classical density estimation literature: Variable (adaptive) kernel density estimates (Terrell and Scott, 1992) ˆf(x) = 1 n 1 ( x h(x i ) K xi ) h(x i ) Combining nonparametric estimates of the body of the distribution with parametric estimates of the tail behavior (cf. Markovitch and Krieger, 2002). Transformation density estimation, which we will discuss in detail. Parts of the range of X are stretched or compressed by non-linear transformation Sparse regions treated differently from the body of the density Transformations and Bayesian Density Estimation BNP10 2015 7

A transformation density-estimation strategy Recipe due to Wand, Marron, and Ruppert, 1991; Yang and Marron, 1999; and others: 1 Choose a parametric family of transformations {g λ : X Y, λ Λ} and a method for density estimation. 2 Specify a criterion for evaluating density estimates (e.g. integrated squared error, Kullback-Leibler, etc.). 3 For candidate transformations λ 0, use transformed sample {Y i = g λ0 (X i )} n i=1 to obtain a density estimate ˆf Y (y, λ 0 ). 4 Back transform using ˆf X (x, λ 0 ) = ˆf Y ( g 1 λ 0 (x) )( g 1 λ 0 ) (x). 5 Search through Λ to find the optimal transformation ˆλ according to the chosen criterion. Transformations and Bayesian Density Estimation BNP10 2015 8

Yang and Marron s (1999) Iterative Procedure Transformations and Bayesian Density Estimation BNP10 2015 9

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation DPM models for density estimation Role of transformations Two families of transformations A proposed transformation routine 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 10

Griffin s DPM model for density estimation Griffin (2010) suggests the following model. with and y i µi, ζ i ind N ( µ i, a ζ i µ ζ σ 2) (µ i, ζ i ) G iid G G DP(MG 0 ), G 0 (µ, ζ) = N(µ µ 0, (1 a)σ 2 ) Γ(ζ 1 φ, 1) µ 0 N(µ 00, λ 1 0 ), σ 2 Γ(s 0, s 1 ), a Beta(a 0, a 1 ). We take M to be fixed. Transformations and Bayesian Density Estimation BNP10 2015 11

Comparing DPM density estimates with KDEs Griffin s DPM model produces better estimates than the kernel density estimation procedure. Sample of size n = 100 from t 2. Sample from SN(0, 1, 10). Density Hellinger estimate distance KDE 0.0137 DPM (normal base) 0.0058 DPM (t base) 0.0055 Density Hellinger estimate distance KDE 0.0179 DPM (normal base) 0.0096 DPM (t base) 0.0094 Transformations and Bayesian Density Estimation BNP10 2015 12

Role of transformations in our analysis We view the transformation as a data pre-processing step, and estimate the density conditional on the transformation. This approach uses the data to estimate the transformation before specifying the prior. Although there is uncertainty associated with estimating an optimal transformation, we do not model that uncertainty as part of a larger Bayesian framework. We believe conditioning on the estimated transformation will be effective because There is far more uncertainty in the density estimation part of the problem; estimates of the transformation are relatively stable. The DPM models we ve specified are quite flexible. Transformations and Bayesian Density Estimation BNP10 2015 13

Yeo-Johnson transformations Yeo and Johnson (2000) propose a family of transformations closely related to the Box-Cox power transformation. ϕ Y J (y; λ) = (y+1) λ 1 λ y 0, λ 0 log(y + 1) y 0, λ = 0 ( y+1)2 λ 1 2 λ y < 0, λ 2 log( y + 1) y < 0, λ = 2 ϕ Y J (y; λ) is continuously differentiable in both arguments Symmetry property: ϕ Y J ( y; 2 λ) = ϕ Y J (y; λ) Effective at correcting skewness Transformations and Bayesian Density Estimation BNP10 2015 14

Yeo-Johnson transformations Transformations and Bayesian Density Estimation BNP10 2015 15

T-cdf transformation To correct heavy tails, we propose using a simple cdf transformation which maps t ν distributions to standard normals. With Φ and T ν as cdf s of a standard normal and a student-t ν distribution, respectively, we set ϕ(y; ξ, τ, ν) = Φ 1( (y ξ ) ) T ν, τ Effective at correcting heavy tails Transformations and Bayesian Density Estimation BNP10 2015 16

T-cdf transformation Transformations and Bayesian Density Estimation BNP10 2015 17

Estimating transformations The transformation parameters may be estimated as follows: To estimate the Yeo-Johnson transformation parameter λ, maximize n 2 log ( σ 2) 1 2σ 2 n ( φ(xi ; λ) µ ) 2 n + (λ 1) sgn(x i ) log ( x i + 1 ). i=1 Estimating the cdf transformation amounts to estimating a three-parameter t model. We do this by maximizing the t likelihood n [ ( Γ ν+1 ) 2 Γ ( 1 ν ) 2 πν τ i=1 ( 1 + 1 ν ( xi ξ τ i=1 ) 2 ) ν+1 2 ]. Transformations and Bayesian Density Estimation BNP10 2015 18

An adaptive transformation routine Round 1: Given sample x, and a KDE ˆf X, calculate ˆL 0 = σˆx [ ( ˆf X(x) )2 dx] 1/5. Apply both transformations to x; with each of the two candidate transformed samples y, compute ˆL(y) = σŷ [ ( ˆf Y (y) )2 dy] 1/5. Select the transformation giving the greatest reduction in ˆL. Round 2+: Continue until neither transformation achieves more than a 5% reduction in ˆL. Transformations and Bayesian Density Estimation BNP10 2015 19

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study Design Results 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 20

Simulation design To assess the effectiveness of our method, we simulate from two-piece densities described in Rubio and Steel (2014): [ 2 ( x µ ) g(x µ, σ 1, σ 2 ) = f I ( x (, µ) ) ( x µ ) + f I ( y (µ, ) )]. σ 1 + σ 2 σ 1 σ 2 In the simulations, µ = 0 and σ 1 = 1 are fixed, while σ 2 and the form of f are allowed to vary. Transformations and Bayesian Density Estimation BNP10 2015 21

A Dirichlet-process location mixture model In the simulations, we will also consider the model with G 0 = N(m 0, s 2 0). y i µi, σ ind N ( µ i, σ 2) µ i G iid G 1 / σ Γ(a, b) G DP(MG 0 ), Both models can be fit with standard Gibbs samplers. Transformations and Bayesian Density Estimation BNP10 2015 22

Simulation results Transformations and Bayesian Density Estimation BNP10 2015 23

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 24

Conclusions Despite the flexibility of DPM models for density estimation, a little effort in selecting a good pre-transformation can go a long way for performance. Future directions: More complex settings: DPMs embedded in heirarchical models. Comparison to a fully Bayes approach, unifying transformation and density estimation in a single framework. Thank you! Transformations and Bayesian Density Estimation BNP10 2015 25