Nonparametric Bayes Modeling

Similar documents
Bayesian non-parametric model to longitudinally predict churn

STA 216, GLM, Lecture 16. October 29, 2007

Bayes methods for categorical data. April 25, 2017

Bayesian shrinkage approach in variable selection for mixed

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Nonparametric Bayes tensor factorizations for big data

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Non-Parametric Bayes

Multivariate kernel partition processes

Part 7: Hierarchical Modeling

Nonparametric Bayes Uncertainty Quantification

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Multivariate Logistic Regression

Genetic Association Studies in the Presence of Population Structure and Admixture

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Bayesian linear regression

Contents. Part I: Fundamentals of Bayesian Inference 1

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Advanced Machine Learning

Association studies and regression

Nonparametric Bayes regression and classification through mixtures of product kernels

The Effect of Sample Composition on Inference for Random Effects Using Normal and Dirichlet Process Models

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Lecture 16: Mixtures of Generalized Linear Models

Bayesian Methods for Machine Learning

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

STAT Advanced Bayesian Inference

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

Optimal rules for timing intercourse to achieve pregnancy

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Variable Selection in Nonparametric Random Effects Models. Bo Cai and David B. Dunson

Dirichlet process Bayesian clustering with the R package PReMiuM

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Predictive Discrete Latent Factor Models for large incomplete dyadic data

Bayesian Nonparametrics

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Modeling and Predicting Healthcare Claims

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Analysing geoadditive regression data: a mixed model approach

Default Priors and Effcient Posterior Computation in Bayesian

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Semi-Nonparametric Inferences for Massive Data

Bayesian Nonparametrics: Dirichlet Process

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

STAT 518 Intro Student Presentation

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

25 : Graphical induced structured input/output models

Bayesian Methods for Highly Correlated Data. Exposures: An Application to Disinfection By-products and Spontaneous Abortion

Functional Clustering in Nested Designs

Image segmentation combining Markov Random Fields and Dirichlet Processes

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

Variable Selection in Structured High-dimensional Covariate Spaces

arxiv: v1 [stat.me] 3 Mar 2013

Sparse Linear Models (10/7/13)

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Bayesian inference in semiparametric mixed models for longitudinal data

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

CPSC 540: Machine Learning

Longitudinal breast density as a marker of breast cancer risk

Partial factor modeling: predictor-dependent shrinkage for linear regression

Bayesian Analysis of Massive Datasets Via Particle Filters

Regularization in Cox Frailty Models

Bayesian Nonparametric Regression through Mixture Models

The Multiple Bayesian Elastic Net

Proteomics and Variable Selection

Nonparametric Bayesian Methods - Lecture I

Hybrid Dirichlet processes for functional data

Integrated Non-Factorized Variational Inference

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data

Learning ancestral genetic processes using nonparametric Bayesian models

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Chapter 2. Data Analysis

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Lecture 3a: Dirichlet processes

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Bayesian Multi-Task Compressive Sensing with Dirichlet Process Priors

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Hierarchical Dirichlet Processes

25 : Graphical induced structured input/output models

Bayesian spatial quantile regression

Introduction to Probabilistic Machine Learning

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Gibbs Sampling in Latent Variable Models #1

Approximate Bayesian Computation

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Transcription:

Nonparametric Bayes Modeling Lecture 6: Advanced Applications of DPMs David Dunson Department of Statistical Science, Duke University Tuesday February 2, 2010

Motivation Functional data analysis Variable selection & shrinkage

Hierarchical Modeling θ i = random effects specific to subject i Hierarchical models let θ i P P = random effects distribution Choice of P critical in controlling borrowing of information

Some Classical Applications Meta Analysis: combine data from multiple studies to make overall conclusion (e.g., drug is effective) Multi-level Designs: subjects are nested in schools, regions or study centers Longitudinal Data: data collected for subject over time - important to accommodate within-subject dependence

Some Emerging Applications Joint modeling of data from different domains Images and captions Diagnostic images or functional predictors & health responses Multiple types of omics data (sequence & expression) Multi-task learning: borrow strength across tasks Multiple images, music pieces, security videos Compressive sensing User preferences in different domains (film, books, etc)

Application 1 - Multinational Bioassay Increasing concern about adverse effects of environmental estrogens on human development Rodent uterotrophic bioassay: system for identifying suspected agonists or antagonists of estrogen. OECD study: collected data from 19 laboratories to investigate consistency of effects of known agonist (EE) & antagonist (ZM) y ij = uterus weight for rat j in lab i x ij =protocol type, dose of EE, dose of ZM

Summary of 19 participating laboratories Notation Lab Name Country Protocols Conducted F1 Citifrance France A F2 Poulenc France A K1 ChungKorea Korea A, B K2 KoreaPark Korea B, C G1 Berlin Germany A G2 Basf Germany A G3 Bayer Germany A J1 Citijapan Japan A, B, C J2 Hatano Japan A, B, C, D J3 InEnvTox Japan A, B, C, D J4 Mitsubishi Japan A, B, C, D J5 Nihon Japan A, B, C, D J6 Sumitomo Japan A, B, C N1 Denmark Netherlands B N2 TNO Netherlands A, B B1 Huntingdon UK C B2 Zeneca UK A, B, C U1 Exxon USA A U2 WIL USA A, B

Some Comments Can potentially fit normal random effects model, y ij = x ijθ i + ɛ ij, ɛ ij N(0, σ 2 ), θ i N p (θ, Σ) Normal distribution has light tails & does not allow outlying labs or clusters of labs Conclusions may be sensitive to violations of normality Appealing to have a more flexible approach available

Application 2 - Dependent Functions Interest in estimating a collection of functions, {f i } n i=1 Longitudinal trajectories for different individuals Function relating features to probability yij = 1 for task i We will focus on the following model: y ij = f i (t ij ) + ɛ ij, ɛ ij t ν (σ 2 ) p f i (t) = θ ij b j (t) = b(t) θ i j=1 θ i P b = {b j }=basis functions, θ i =basis coefficients

log PdG trajectories (Bigelow & Dunson, JASA, 08)

Comments on Functional Data Model Subject-specific basis coefficients, θ i, allow variability in the functional trajectories for different individuals Heterogeneity among subjects controlled by the random effects distribution, P Number of basis functions, p, is not small (p >= 20)

Multi-Task Compressive Sensing (Ji, Dunson, Carin 08) Compressive sensing (CS): limit # measurements needed to accurately reconstruct a signal, u Let u i = Ψθ i + ɛ i, with Ψ a wavelet basis & assume that many of the elements of θ i are 0 Instead of measuring u i, CS measures v i = ΦΨ u i, with Φ a random projection matrix. Candes & Tau prove accuracy in reconstructing u i from v i We propose to let θ i P to borrow information from related signals

Application 3 - Multiple brain images (a) Linear 1, N=4096 (b) Linear 2, N=4096 (c) Linear 3, N=4096 (d) Linear 4, N=4096 (e) Linear 5, N=4096 (f) ST 1, N=1636 (g) ST 2, N=1636 (h) ST 3, N=1636 (i) ST 4, N=1636 (j) ST 5, N=1636 (k) MT 1, N=1636 (l) MT 2, N=1636 (m) MT 3, N=1636 (n) MT 4, N=1636 (o) MT 5, N=1636

Application 4 - Multiple videos (a) Linear 1, N=4096 (b) Linear 2, N=4096 (c) Linear 3, N=4096 (d) Linear 4, N=4096 (e) Linear 5, N=4096 (f) ST 1, N=1717 (g) ST 2, N=1717 (h) ST 3, N=1717 (i) ST 4, N=1717 (j) ST 5, N=1717 (k) MT 1, N=1717 (l) MT 2, N=1717 (m) MT 3, N=1717 (n) MT 4, N=1717 (o) MT 5, N=1717

Application 5 - Images and text Coast sky, sea water, sand beach, tree, tree, mountain, mountain, person, person

DPMs for Longitudinal & Multi-Level Data A simple case corresponds to the linear mixed effects model y ij = x ijβ + z ijb i + ɛ ij, ɛ ij N(0, σ 2 ) b i P, P DP(αP 0 ), DP prior on P, the distribution of the random effects Useful semiparametric model for longitudinal & correlated data Bush & MacEachern (1996), Müller & Rosner (1997), Kleinman & Ibrahim (1998), Ishwaran & Takahara (2002), etc

Modeling Random Curves Let y ij = f i (t ij ) + ɛ ij = noisy observation of a smooth curve f i for subject i For example, f i may represent child growth during development Characterize variability in growth curves & cluster children having similar trajectories Can be accomplished using DPM linear mixed model with f i (t ij ) = p β il b l (t ij ) = x ijβ i, l=1 β i P = h=1 π h δ β, h b = {b l } p l=1 = basis functions (e.g., cubic splines)

Comments- Functional Dirichlet Process Recalling the DP stick-breaking property (Sethuraman, 1994): β i P = iid V h (1 V l )δ β, V h beta(1, α), β iid h P0, h=1 l<h Hence, the n subjects are grouped into k n clusters Subjects in cluster l all have β i = β l Provides a semiparametric Bayes version of latent trajectory class or growth mixture models. Avoids fixing the number of clusters in advance h

Comments continued The curve in cluster l is f (t) = b(t) β l The number of functional clusters in n growth curves is treated as unknown Gibbs samplers of lecture 1 are straightforward to generalize Number of clusters and configuration of subjects into clusters varies across the MCMC iterations Problem: label switching!

Label Switching Problem arises because the labels on the cluster-specific parameters are ambiguous, so vary in meaning across the iterations Not meaningful to calculate posterior summaries of β h across the iterations Strategies: 1. Relabeling algorithms that align the clusters after running MCMC (Stephens, 00); 2. Define clusters as individuals that are grouped together with high posterior probability 3. Estimate optimal clustering (Dahl, 06; Lau & Green, 97) 4. Ignore problem & avoid cluster-specific inferences

Joint Modeling One is often interested in joint modeling of data having different measurement scales Example: Joint modeling of a functional predictor with a health outcome Functional predictor may consist of a longitudinally recorded biomarker Predictor may also correspond to a diagnostic image Challenging to build flexible joint models for data having different scales

Application to Hormone Curves & Pregnancy Loss Progesterone is a female reproductive hormone - maintains pregnancy Urinary progesterone measured after ovulation through early pregnancy for 172 women Shape of the trajectory may predict impending early pregnancy loss Of interest to identify losses before they occur

log PdG trajectories (Bigelow & Dunson, JASA, 08)

Joint Modeling with Functional Predictors Suppose interest focuses on joint modeling of a functional predictor f i & response y i Component model for functional predictor: x i (t ij ) = f i (t ij ) + ɛ ij, ɛ ij N(0, σ 2 ) Component model for response: logit Pr(y i = 1 u i, µ i ) = µ i + u iψ u i = vector of covariates

Dirichlet Process Joint Modeling How to specify joint model for functional predictor, f i, & response y i? Let f i (t) = p h=1 β ihb h (t) & (β i, µ i ) P P can then be considered as unknown through a DP prior This same strategy can be used broadly for data fusion & joint modeling

DP Joint Models - Comments Approach automatically clusters subjects into groups Group l has functional predictor f (t) = p h=1 β lh b h(t) & baseline response probability, 1/{1 + exp( µ l )} Allows response probability to systematically shift between functional predictor clusters Very flexible approach for characterizing nonlinear & complex relationships with a functional predictor

Estimated Hormone Clusters & Risk of EPL

High Dimensional Applications Enormous increase in the generation of high-dimensional data Large p, small n problems create challenges to classical methods Appealing to develop flexible Bayesian approaches for identifying sparse latent structure in high dimensional data

Application - Massive Numbers of Predictors Commonly a large number of predictors x i = (x i1,..., x ip ) are available Interest focuses on identifying important predictors of y i Many parametric approaches available (e.g., using variable selection mixture priors) Methods commonly rely on two component priors, with one component concentrated at zero & one more diffuse

Application - Single Nucleotide Polymorphism (SNP) Data Single nucleotide polymorphism (SNPs) - variants in pair of amino acids at a given loci. SNP: g icl {1, 2, 3} = genotype at locus l within gene c for individual i. Total number of loci can be very large (Affy set to launch million-snp chip) One SNP = pair of alleles inherited from mother & father, with source (phase) unknown

SNPs and Health Outcomes In genetic epidemiology, interest focuses on identifying genetic factors predictive of a disease outcome y i. Epidemiologists tend to favor logistic regression models: logit Pr(y i = 1 g i, z i ) = z iα + C p c c=1 l=1 h=1 3 1(g icl = h)β clh, z i =environmental exposures, demographic variables, etc g i =SNP data for C genes β=very high-dimensional vector of coefficients Ideally, we would characterize β using a sparseness favoring prior

DP Priors for Shrinkage & Variable Selection Assuming the elements of β are exchangeable draws from P, assign P a zero-inflated DP prior Prior is a mixture of a DP and a point mass at zero, allowing zero coefficients SNPs clustered into null & non-null groups according to impact on health response Dunson et al. (2008, JASA) implement this approach & generalizations to borrow information across functionally-related genes.

Multiple Lasso Shrinkage Popular Lasso procedure corresponds to MAP estimation under a double exponential (Laplace) prior Tendency to over-shrink important predictors MacLehose & Dunson (08) propose a DP mixture of Laplace priors for the coefficients in a high-dimensional regression model Posterior computation uses retrospective MCMC (Papaspiliopoulos & Roberts, 08) Simulation studies show reduce MSE relative to Lasso

Illustration of Multiple Lasso Prior

Estimated Coefficients for Pima Indian Data

Parkinson s Disease Application Parkinson s disease is a common neurologic disease - tremors, rigidity & slowness of movement SNP data available for 540 individuals, with 270 having Parkinson s disease Focus on 270 SNPs on chromosome 11-540 SNP coefficients to estimate MCMC was run for 50,000 iterations Bayesian Lasso very sensitive to hyperparameters - either selects all SNPs or none Two genotypes were selected by the multiple Lasso

Histograms of Posterior Probabilities of Genotype Effect

Summary DPMs are useful in a very broad variety of applications areas beyond density estimation By sharing clustering across data from different scales, provide a highly flexible approach for joint modeling Also useful for generating more flexible sparse shrinkage priors - DP favors few components Computation is feasible even in high dimensions using efficient MCMC & alternatives (variational approximations, fast sequential search, etc)