Transformations and Bayesian Density Estimation

Similar documents
Transformations and Bayesian density estimation

The Calibrated Bayes Factor for Model Comparison

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

12 - Nonparametric Density Estimation

A Bayesian approach to parameter estimation for kernel density estimation via transformations

Statistics: Learning models from data

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Methods for Machine Learning

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Confidence Intervals Unknown σ

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

Curve Fitting Re-visited, Bishop1.2.5

Bayesian Nonparametric Regression through Mixture Models

A Novel Nonparametric Density Estimator

New Bayesian methods for model comparison

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Bayesian Regularization

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric Bayes Uncertainty Quantification

Transformation-based Nonparametric Estimation of Multivariate Densities

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Lecture 5: GPs and Streaming regression

Robust Bayesian Simple Linear Regression

Gaussian kernel GARCH models

Fast Approximate MAP Inference for Bayesian Nonparametrics

Beyond Mean Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Statistics & Data Sciences: First Year Prelim Exam May 2018

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Adaptive Nonparametric Density Estimators

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Distribution-free ROC Analysis Using Binary Regression Techniques

Bayesian Non-parametric Modeling With Skewed and Heavy-Tailed Data 1

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Latent factor density regression models

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Slash Distributions and Applications

Modeling and Predicting Healthcare Claims

Analysis methods of heavy-tailed data

Bayesian isotonic density regression

A tailor made nonparametric density estimate

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Quantifying the Price of Uncertainty in Bayesian Models

Stochastic Spectral Approaches to Bayesian Inference

Multivariate Statistics

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Proofs and derivations

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Normalized kernel-weighted random measures

Pattern Recognition and Machine Learning

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Chapter 9. Non-Parametric Density Function Estimation

Variable selection and feature construction using methods related to information theory

Foundations of Statistical Inference

Lecture 13 Fundamentals of Bayesian Inference

A Process over all Stationary Covariance Kernels

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Minimum Message Length Analysis of the Behrens Fisher Problem

Markov Chain Monte Carlo (MCMC)

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Lecture 35: December The fundamental statistical distances

and Comparison with NPMLE

Abstract INTRODUCTION

Expectation Propagation for Approximate Bayesian Inference

Bayesian Linear Regression

Chapter 9. Non-Parametric Density Function Estimation

Lecture 16: Mixtures of Generalized Linear Models

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Steven L. Scott. Presented by Ahmet Engin Ural

Quantifying Stochastic Model Errors via Robust Optimization

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Polynomial Degree Leading Coefficient. Sign of Leading Coefficient

Are Bayesian Inferences Weak for Wasserman s Example?

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Stat 451 Lecture Notes Numerical Integration

Bayesian non-parametric model to longitudinally predict churn

Recent Advances in Bayesian Inference Techniques

Transcription:

Transformations and Bayesian Density Estimation Andrew Bean 1, Steve MacEachern, Xinyi Xu The Ohio State University 10 th Conference on Bayesian Nonparametrics June 25, 2015 1 bean.243@osu.edu Transformations and Bayesian Density Estimation BNP10 2015 1

Transformations and Bayesian Density Estimation BNP10 2015 2

Transformations and Bayesian Density Estimation BNP10 2015 2

Distribution of Body Mass Index in Ohio The Ohio Family Health Survey gathered (self-reported) Body Mass Index measurements on 48,885 Ohio adults. An adult whose BMI is at least 30 is defined as obese by the American Center for Disease Control and Prevention (CDC). Transformations and Bayesian Density Estimation BNP10 2015 3

1 Classical Density Estimation Difficulties in capturing tail behavior Suggested remedies Transformation kernel density-estimation 2 Transformations in Bayesian Density Estimation DPM models for density estimation Role of transformations Two families of transformations A proposed transformation routine 3 Simulation study Design Results 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 4

Table of Contents 1 Classical Density Estimation Difficulties in capturing tail behavior Suggested remedies Transformation kernel density-estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 5

Estimating Skewed and Heavy-Tailed Densities Kernel density estimates can be ineffective when the true distribution is skewed and/or heavy-tailed. Estimation is especially difficult in regions where the data are sparse. Sample of size n = 100 from t 2. Sample from SN(0, 1, 10). Transformations and Bayesian Density Estimation BNP10 2015 6

Classical Remedies Some strategies from the classical density estimation literature: Variable (adaptive) kernel density estimates (Terrell and Scott, 1992) ˆf(x) = 1 n 1 ( x h(x i ) K xi ) h(x i ) Combining nonparametric estimates of the body of the distribution with parametric estimates of the tail behavior (cf. Markovitch and Krieger, 2002). Transformation density estimation, which we will discuss in detail. Parts of the range of X are stretched or compressed by non-linear transformation Sparse regions treated differently from the body of the density Transformations and Bayesian Density Estimation BNP10 2015 7

A transformation density-estimation strategy Recipe due to Wand, Marron, and Ruppert, 1991; Yang and Marron, 1999; and others: 1 Choose a parametric family of transformations {g λ : X Y, λ Λ} and a method for density estimation. 2 Specify a criterion for evaluating density estimates (e.g. integrated squared error, Kullback-Leibler, etc.). 3 For candidate transformations λ 0, use transformed sample {Y i = g λ0 (X i )} n i=1 to obtain a density estimate ˆf Y (y, λ 0 ). 4 Back transform using ˆf X (x, λ 0 ) = ˆf Y ( g 1 λ 0 (x) )( g 1 λ 0 ) (x). 5 Search through Λ to find the optimal transformation ˆλ according to the chosen criterion. Transformations and Bayesian Density Estimation BNP10 2015 8

Yang and Marron s (1999) Iterative Procedure Transformations and Bayesian Density Estimation BNP10 2015 9

Yang and Marron s (1999) Iterative Procedure Transformations and Bayesian Density Estimation BNP10 2015 9

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation DPM models for density estimation Role of transformations Two families of transformations A proposed transformation routine 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 10

Griffin s DPM model for density estimation Griffin (2010) suggests the following model. with and y i µi, ζ i ind N ( µ i, a ζ i µ ζ σ 2) (µ i, ζ i ) G iid G G DP(MG 0 ), G 0 (µ, ζ) = N(µ µ 0, (1 a)σ 2 ) Γ(ζ 1 φ, 1) µ 0 N(µ 00, λ 1 0 ), σ 2 Γ(s 0, s 1 ), a Beta(a 0, a 1 ). We take M to be fixed. Transformations and Bayesian Density Estimation BNP10 2015 11

Comparing DPM density estimates with KDEs Griffin s DPM model produces better estimates than the kernel density estimation procedure. Sample of size n = 100 from t 2. Sample from SN(0, 1, 10). Transformations and Bayesian Density Estimation BNP10 2015 12

Comparing DPM density estimates with KDEs Griffin s DPM model produces better estimates than the kernel density estimation procedure. Sample of size n = 100 from t 2. Sample from SN(0, 1, 10). Density Hellinger estimate distance KDE 0.0137 DPM (normal base) 0.0058 DPM (t base) 0.0055 Density Hellinger estimate distance KDE 0.0179 DPM (normal base) 0.0096 DPM (t base) 0.0094 Transformations and Bayesian Density Estimation BNP10 2015 12

Role of transformations in our analysis We view the transformation as a data pre-processing step, and estimate the density conditional on the transformation. This approach uses the data to estimate the transformation before specifying the prior. Although there is uncertainty associated with estimating an optimal transformation, we do not model that uncertainty as part of a larger Bayesian framework. We believe conditioning on the estimated transformation will be effective because There is far more uncertainty in the density estimation part of the problem; estimates of the transformation are relatively stable. The DPM models we ve specified are quite flexible. Transformations and Bayesian Density Estimation BNP10 2015 13

Yeo-Johnson transformations Yeo and Johnson (2000) propose a family of transformations closely related to the Box-Cox power transformation. ϕ Y J (y; λ) = (y+1) λ 1 λ y 0, λ 0 log(y + 1) y 0, λ = 0 ( y+1)2 λ 1 2 λ y < 0, λ 2 log( y + 1) y < 0, λ = 2 ϕ Y J (y; λ) is continuously differentiable in both arguments Symmetry property: ϕ Y J ( y; 2 λ) = ϕ Y J (y; λ) Effective at correcting skewness Transformations and Bayesian Density Estimation BNP10 2015 14

Yeo-Johnson transformations Transformations and Bayesian Density Estimation BNP10 2015 15

T-cdf transformation To correct heavy tails, we propose using a simple cdf transformation which maps t ν distributions to standard normals. With Φ and T ν as cdf s of a standard normal and a student-t ν distribution, respectively, we set ϕ(y; ξ, τ, ν) = Φ 1( (y ξ ) ) T ν, τ Effective at correcting heavy tails Transformations and Bayesian Density Estimation BNP10 2015 16

T-cdf transformation Transformations and Bayesian Density Estimation BNP10 2015 17

Estimating transformations The transformation parameters may be estimated as follows: To estimate the Yeo-Johnson transformation parameter λ, maximize n 2 log ( σ 2) 1 2σ 2 n ( φ(xi ; λ) µ ) 2 n + (λ 1) sgn(x i ) log ( x i + 1 ). i=1 Estimating the cdf transformation amounts to estimating a three-parameter t model. We do this by maximizing the t likelihood n [ ( Γ ν+1 ) 2 Γ ( 1 ν ) 2 πν τ i=1 ( 1 + 1 ν ( xi ξ τ i=1 ) 2 ) ν+1 2 ]. Transformations and Bayesian Density Estimation BNP10 2015 18

An adaptive transformation routine Round 1: Given sample x, and a KDE ˆf X, calculate ˆL 0 = σˆx [ ( ˆf X(x) )2 dx] 1/5. Apply both transformations to x; with each of the two candidate transformed samples y, compute ˆL(y) = σŷ [ ( ˆf Y (y) )2 dy] 1/5. Select the transformation giving the greatest reduction in ˆL. Round 2+: Continue until neither transformation achieves more than a 5% reduction in ˆL. Transformations and Bayesian Density Estimation BNP10 2015 19

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study Design Results 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 20

Simulation design To assess the effectiveness of our method, we simulate from two-piece densities described in Rubio and Steel (2014): [ 2 ( x µ ) g(x µ, σ 1, σ 2 ) = f I ( x (, µ) ) ( x µ ) + f I ( y (µ, ) )]. σ 1 + σ 2 σ 1 σ 2 In the simulations, µ = 0 and σ 1 = 1 are fixed, while σ 2 and the form of f are allowed to vary. Transformations and Bayesian Density Estimation BNP10 2015 21

A Dirichlet-process location mixture model In the simulations, we will also consider the model with G 0 = N(m 0, s 2 0). y i µi, σ ind N ( µ i, σ 2) µ i G iid G 1 / σ Γ(a, b) G DP(MG 0 ), Both models can be fit with standard Gibbs samplers. Transformations and Bayesian Density Estimation BNP10 2015 22

Simulation results Transformations and Bayesian Density Estimation BNP10 2015 23

Simulation results Transformations and Bayesian Density Estimation BNP10 2015 23

Table of Contents 1 Classical Density Estimation 2 Transformations in Bayesian Density Estimation 3 Simulation study 4 Conclusions Transformations and Bayesian Density Estimation BNP10 2015 24

Conclusions Despite the flexibility of DPM models for density estimation, a little effort in selecting a good pre-transformation can go a long way for performance. Future directions: More complex settings: DPMs embedded in heirarchical models. Comparison to a fully Bayes approach, unifying transformation and density estimation in a single framework. Thank you! Transformations and Bayesian Density Estimation BNP10 2015 25