Latent factor density regression models
|
|
- Annabelle Rice
- 5 years ago
- Views:
Transcription
1 Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON Department of Statistical Science, Duke University, Durham NC 27708, USA ab179@stat.duke.edu dp55@stat.duke.edu dunson@stat.duke.edu SUMMARY 5 In this article, we propose a new class of latent factor conditional density regression models with attractive computational and theoretical properties. The proposed approach is based on a novel extension of the recently proposed latent variable density estimation model in which the response variables conditioned on the predictors are modeled as unknown functions of uniformly distributed latent variables and the predictors with an additive Gaussian error. The latent vari- 10 able specification allows straightforward posterior computation using a uni-dimensional grid via conjugate posterior updates. Moreover, one can center the model on a simple parametric guess facilitating inference. Our approach relies on characterizing the space of conditional densities induced by the above model as kernel convolutions with a general class of continuous predictordependent mixing measures. Theoretical properties in terms of rates of convergence are studied. 15 Some key words: Density regression; Gaussian process; Factor model; Latent variable; Nonparametric Bayes; Rate of convergence 1. INTRODUCTION There is a rich literature on Bayesian methods for density estimation using mixture models of the form 20 y i f(θ i ), θ i P, P Π, (1) where f( ) is a parametric density and P is an unknown mixing distribution assigned a prior Π. The most common choice of Π is the Dirichlet process prior, first introduced by Ferguson (1973, 1974). Recent literature has focused on generalizing model (1) to the density regression setting in which the entire conditional distribution of y given x changes flexibly with predictors. Bayesian density regression, also termed as conditional density estimation, views the entire conditional 25 density f(y x) as a function valued parameter and allows its center, spread, skewness, modality and other such features to vary with x R p. For data {(y i, x i ), i = 1,..., n} let y i x i f( x i ), {f( x), x X} Π X, (2) where X is the predictor space and Π X is a prior for the class of conditional densities {f x, x X } indexed by the predictors. Refer, for example, to Müller et al. (1996); Griffin & Steel (2006, 2008); Dunson et al. (2007); Dunson & Park (2008); Chung & Dunson (2009); Tokdar et al. 30 (2010a) and Pati et al. (2012) among others. The primary focus of this recent development has been mixture models of the form { } y µh (x) f(y x) = π h (x)φ, (3) h=1 σ h
2 A. BHATTACHARYA, D. PATI AND D. B. DUNSON wmhere φ is the standard normal density, {π h (x), h = 1, 2,...} are predictor-dependent probability weights that sum to one almost surely for each x X, and (µ h, σ h ) G 0 independently, with G 0 a base probability measure on F X R +, F X X R, the space of all X R functions. However, density regression models focusing on mixture models are often a black-box in terms of realistic applications which require needing to center the model on a simple parametric family without sacrificing computational efficiency. For example, it is not clear how to appropriately center (3) on simple linear regression model. Observe that this cannot be achieved by just letting µ h (x) = x β h and then β h N(β 0, Σ 0 ). Moreover, (3) is based on infinitely many stochastic processes {(µ h, π h ), h = 1,..., } which is highly computationally demanding irrespective of the dimension. Indeed, Stephen Walker pointed out in one of the ISBA bulletins Current density regression models are too big, too non-identifiable and I doubt whether they would survive the test of time. Lenk (1988, 1991) proposed a logistic Gaussian process prior for density estimation which can be conveniently centered on a parametric family facilitating inference. Tokdar & Ghosh (2007); van der Vaart & van Zanten (2008, 2009) derived nice theoretical properties of logistic Gaussian process prior in terms of the asymptotic behavior of the posterior. Tokdar et al. (2010b) extended the logistic Gaussian process density estimation model to the density regression setting which allows convenient prior centering and shares similar theoretical properties. Despite such attractive properties, logistic Gaussian process is computationally quite challenging owing to the presence of an intractable integral in the denominator. Although several existing literature propose to discretize the denominator for posterior inference, it turns out that such an approach is not theoretical justifiable. Moreover, discretizing is also not a computationally efficient method requiring to choose the grid points at which the finite sum approximation of the integral will be evaluated. A recent article by Walker (2011) can potentially by-pass this issue at the cost of the introduction of several latent variables and needing to perform RJMCMC for selecting the appropriate grid-points. All these approaches tend to be computationally expensive and suffer from the curse of dimensionality with increasing number of predictors. Although latent variable models have become increasingly popular as a dimension reduction tool in machine learning applications, it was only recently (Kundu & Dunson, 2011) realized that they are also suitable for density estimation. Kundu & Dunson (2011) developed a density estimation model where unobserved U(0, 1) latent variables are related to the response variables via a random non-linear regression with an additive error. Consider the nonlinear latent variable model, y i = µ(η i ) + ɛ i, ɛ i N(0, σ 2 ), (i = 1,..., n) (4) µ GP(µ 0, c), σ 2 IG(a, b), η i U(0, 1), (5) where η i s are subject specific latent variables, µ C[0, 1] relates the latent variables to the observed variables and is assigned a Gaussian process prior, and ɛ i is an idiosyncratic error specific to subject i. Kundu & Dunson (2011); Pati et al. (2011) showed that (4) leads to a highly flexible representation which can approximate a large class of densities subject to minor regularity constraints. Moreover, this allows convenient prior centering, avoids the black-box mixture formulation and enables efficient computation through a griddy Gibbs algorithm. By characterizing the space of densities induced by the above model as kernel convolutions with a general class of continuous mixing measures, Pati et al. (2011) showed optimal posterior convergence properties of this model. They also demonstrated that the posterior converges at a much faster rate under appropriate prior centering.
3 Latent factor density regression 3 Kundu & Dunson (2011) also proposed a density regression model by modeling the response and the covariates jointly using shared dependence on subject specific latent factors after expressing y and each x j, j = 1,... p as conditionally independent non-linear factor models. Such 80 an approach suffers from the inevitable drawback of estimating p + 1 Gaussian process components posing computational inefficiency with diverging number of predictors. Moreover, the joint modeling framework extravagantly focuses a significant amount of the estimation procedure in explaining the dependence among the high dimensional predictors when the key is only to infer conditional dependence of y given the x s. 85 The purpose of this article is to develop a novel density regression model for high-dimensional predictors which not only possesses theoretically elegant optimality properties, but is also computationally efficient and easy to incorporate prior information to perform meaningful statistical inference. To that end, we introduce a general family of continuous mixing measures indexed by the predictors and discuss connections with existing dependent process mixing measures 90 (MacEachern, 1999) used to construct predictor dependent models (3) for density regression. As opposed to (3), our proposed model is based on a single stochastic process linking the response to the predictors and a latent variable. We also study theoretical support properties and show that our prior encompasses a wide range of conditional densities. To our knowledge, only a few papers have considered theoretical properties of density regres- 95 sion models. Tokdar et al. (2010a); Pati et al. (2012); Norets & Pelenis (2010) consider posterior consistency in estimating conditional distributions focusing on logistic Gaussian process priors and mixture models (3). Yoon (2009) tackled the problem in a different way through a limited information approach by approximating the likelihood by the quantiles of the true distribution. Tang & Ghosal (2007a,b) provide sufficient conditions for showing posterior consistency in esti- 100 mating an autoregressive conditional density and a transition density rather than regression with respect to another covariate. However, guaranteeing merely consistency is not enough to characterize the behavior of posterior as the sample size increases. A more informative way to study the behavior is to derive the rate of posterior contraction of density regression models which is crucial to quantifying the concentration of the posterior in terms of the smallest shrinking 105 neighborhood around the true conditional density that still accumulates all the posterior mass asymptotically. To that end, we show that out prior leads to a posterior which converges at the minimax optimal rate under mixed smoothness assumption on the true conditional density. 2. METHODS Suppose y i Y R is observed independently given the covariates x i X, i = 1,..., n 110 which are drawn independently from a probability distribution Q on X R p, compact. Assume that Q admits a density q with respect to the Lebesgue measure. Assuming x i s are rescaled prior to analysis, x i [0, 1] p. Consider the nonlinear latent variable model, y i = µ(η i, x i ) + ɛ i, ɛ i N(0, σ 2 ), (i = 1,..., n) (6) 115 µ Π µ, σ Π σ, η i U(0, 1), (7) where η i [0, 1] are subject specific latent variables, µ C[0, 1] p+1 is a transfer function relating the latent variables and the covariates to the observed variables and ɛ i is an idiosyncratic error specific to subject i. The density of y given x conditional on the transfer function µ and
4 A. BHATTACHARYA, D. PATI AND D. B. DUNSON scale σ is obtained on marginalizing out the latent variable as f(y x; µ, σ) def = f µ,σ (y x) = 1 0 φ σ (y µ(t, x))dt. (8) Define a map g : C[0, 1] p+1 [0, ) F with g(µ, σ) = f µ,σ. One can induce a prior Π on F via the mapping g by placing independent priors Π µ and Π σ on C[0, 1] p+1 and [0, ) respectively, with Π = (Π µ Π σ ) g 1. Kundu & Dunson (2011) assumed a Gaussian process prior in (4) with squared exponential covariance kernel on µ and an inverse-gamma prior on σ 2. In the density regression context, µ is assumed a Gaussian process prior with mean µ 0 : [0, 1] p+1 R and a covariance kernel c : [0, 1] p+1 [0, 1] p+1 R. It is not immediately clear whether the class of densities f µ,σ in (8) in the range of g encompass a large subset of the space of conditional densities. { F d = f(y x) = g(x, y), g : R p+1 R +, g(y, x)dy = 1 x [0, 1] }. p We provide an intuition that relates the above class with convolutions and is crucially used later on. Let f 0 F d be a continuous conditional density with cumulative distribution function F 0 (y x) = y f 0(z x)dz. Assume f 0 to be non-zero almost everywhere within its support, so that F 0 ( x) : supp(f 0 ( x)) [0, 1] is strictly monotone for each x and hence has an inverse F0 1 ( x) : [0, 1] supp(f 0 ( x)) satisfying F 0 {F0 1 (t x) x} = t for all t supp(f 0 ( x)) and for each x [0, 1] p. If supp(f 0 ( x)) = R, then the domain of F0 1 is the open interval (0, 1) instead of [0, 1]. Letting µ 0 (t, x) = F0 1 (t x), one obtains f µ0,σ(y x) = 1 0 φ σ (y F 1 0 (t x))dt = Y φ σ (y z)f 0 (z x)dt, (9) where the second equality follows from the change of variable theorem. Thus, f µ0,σ(y x) = φ σ f 0 (y x), i.e., f µ0,σ is the convolution of f 0 with a normal density having mean 0 and standard deviation σ. It is well known that the convolution φ σ f 0 ( x) can approximate f 0 ( x) for each x arbitrary closely as the bandwidth σ 0. More precisely, for f 0 ( x) L p (λ) for any p 1 for each x, φ σ f 0 ( x) f 0 ( x) p,λ 0 as σ 0. Furthermore, a stronger result φ σ f 0 ( x) f 0 ( x) = O(σ 2 ) for each x holds if f 0 ( x) for each x is compactly supported. However, convergence pointwise in x is a weaker notion, and one might hope for stronger notions of convergence involving joint topology defined on [0, 1] p Y. Refer to 3 2 for details. Thus the above model can approximate a large collection of conditional densities {f 0 (y x)} by letting µ(y, x) concentrate around the conditional quantile functions F 1 0 (y x) by assigning a Gaussian process prior. A couple of advantages of this formulation is the feasibility of an efficient posterior computation based on an uni-dimensional griddy Gibbs algorithm THEORY 3 1. Notations Throughout the paper, Lebesgue measure on R or R p is denoted by λ. The supremum and the L 1 -norms are denoted by and 1 respectively. The indicator function of a set B is denoted by 1 B. Let L p (ν, M) denote the space of real valued measurable functions defined on M with ν-integrable pth absolute power. For two density functions f, g, the Kullback-Leibler
5 Latent factor density regression 5 divergence is given by K(f, g) = log(f/g)fdλ. A ball of radius r with centre x 0 relative to the metric d is defined as B(x 0, r; d). The diameter of a bounded metric space M relative to a metric d is defined to be sup{d(x, y) : x, y M}. The ɛ-covering number N(ɛ, M, d) of a semimetric space M relative to the semi-metric d is the minimal number of balls of radius ɛ needed 160 to cover M. The logarithm of the covering number is referred to as the entropy. 0 stands for a distribution degenerate at 0 and supp(ν) for the support of a measure ν Notions of neighborhoods in conditional density estimation If we define h(x, y) = q(x)f(y x) and h 0 (x, y) = q(x)f 0 (y x) then h, h 0 F. Throughout the paper, h 0 is assumed to be a fixed density in F which we alternatively refer to as the 165 true data generating density and {f 0 ( x), x X } is referred to as the true conditional density. The density q(x) will be needed only for theoretical investigation. In practice, we do not need to know it or learn it from the data. We define the weak, ν-integrated L 1 and sup-l 1 neighborhoods of the collection of conditional densities {f 0 ( x), x X } in the following. A sub-base of a weak neighborhood is 170 defined as W ɛ,g (f 0 ) = { f : f F d, gh gh 0 < ɛ }, (10) X Y X Y for a bounded continuous function g : Y X R. A weak neighborhood base is formed by finite intersections of neighborhoods of the type (10). Define a ν-integrated L 1 neighborhood S ɛ (f 0 ; ν) = { f : f F d, f( x) f 0 ( x) 1 ν(x)dx < ɛ } (11) for any measure ν with supp(ν) X. Observe that under the topology in (11), F d can be identified to a closed subset of L 1 (λ ν, Y supp(ν)) making it a complete separable metric space. 175 For f 1, f 2 F d, let d SS (f 1, f 2 ) = sup x X f( x) f 0 ( x) 1 and define the sup-l 1 neighborhood SS ɛ (f 0 ) = { f : f F d, d SS (f, f 0 ) < ɛ }. (12) Under the sup-l 1 topology, F d can be viewed as a closed subset of the separable Banach space of continuous functions from X L 1 (λ, Y) which are norm bounded and hence a complete separable metric space. Thus measurability issues won t arise with these topologies Posterior convergence rate theorem in the compact case We recall the definitions of rates of posterior convergence. DEFINITION 1. The posterior Π X ( y n, x n) is said to contract at a rate ɛ n going to 0 in the ν-integrated L 1 topology or strongly in the sup-l 1 topology at {f 0 ( x), x X } if Π X ( U c Mɛn y n, x n) 0 a.s. for any ɛ > 0 with U Mɛn = S Mɛn (f 0 ; ν) and SS Mɛn (f 0 ) respectively for large 185 enough M > 0. Here a.s. consistency at {f 0 ( x), x X } means that the posterior distribution concentrates around a neighborhood of {f 0 ( x), x X } for almost every sequence {y i, x i } i=1 generated by i.i.d. sampling from the joint density q(x)f 0 (y x). Studying rates of convergence in density regression models becomes more challenging as we 190 need to assume mixed smoothness in y and x. Although posterior contraction rates are studied widely in mean regression, logistic regression and density estimation models, results on convergence rates for density regression models are lacking. we study posterior convergence rates of the
6 A. BHATTACHARYA, D. PATI AND D. B. DUNSON density regression model (4) by assuming the true conditional density has different smoothness across y and x. Assuming f 0 (y x) to be compact and twice and thrice continuously differentiable in y and x, we obtain a rate of n 1/3 (log n) t 2 using a Gaussian process prior for µ having a single inverse-gamma bandwidth across different dimensions. The optimal rate in such a mixed smoothness class is n 6/17. The slight slow rate of n 1/3 is the drawback of using an isotropic Gaussian process used for modeling an anisotropic function. First let s consider the very simple case when the true conditional density f 0 (y x) is compactly supported for each x [0, 1] p and p = 1. Define µ 0 (x, y) = F0 1 (y x). Assumption 1. For each x [0, 1] p, f 0 ( x) is compactly supported. Without loss of generality we can assume that the support is [0, 1]. Hence µ 0 : [0, 1] 2 R. Assumption 2. f 0 (y x) is twice continuously differentiable in y for any fixed x [0, 1] p and thrice continuously differentiable in x for any fixed y Y. Note that under this assumption µ 0 C 3 [0, 1] 2. Also ( 2 ) f 0 (y x) 2 sup x [0,1] p Y y 2 /f 0 (y x) dy < (13) ( f0 (y x) 4 /f 0 (y x)) dy <. (14) y sup x [0,1] p Y Assumption 3. For any x [0, 1] p, there exists 0 < m x < M x such that f 0 ( x) is bounded, nondecreasing on (, m x ], bounded away from 0 on [m x, M x ] and non-increasing on [M x, ). In addition we have inf x [0,1] p(m x m x ) > 0 (15) Assumption 4. We assume µ follows a centered Gaussian process denoted by GP(0, c) on [0, 1] 2, with a squared exponential covariance kernel c(, ; A) and a Gamma prior for the inversebandwidth A. Thus ψ(t, s; A) = e A t s 2, t, s [0, 1] 2, A Ga(p, q). Assumption 5. We assume σ IG(a σ, b σ ). THEOREM 1. If the true conditional density f 0 satisfies Assumptions 1, 2 and 3 and the prior satisfies Assumptions 4 and 5, the posterior rate of convergence in q-integrated L 1 topology is given by where t 0 is a known constant. n 1/3 log t 0 n (16) Remark 1. The optimal effective smoothness α e for this problem is given by 1 α e = and hence the optimal rate of convergence is given by n 2αe 2αe+1 = n 6/17. Its important to note that the obtained rate of convergence is slower by a very small factor. 225 REFERENCES CHUNG, Y. & DUNSON, D. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association 104, DUNSON, D. & PARK, J. (2008). Kernel stick-breaking processes. Biometrika 95, DUNSON, D., PILLAI, N. & PARK, J. (2007). Bayesian density regression. Journal of the Royal Statistical Society, Series B 69,
7 Latent factor density regression 7 FERGUSON, T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1, FERGUSON, T. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics 2, GRIFFIN, J. & STEEL, M. (2006). Order-based dependent Dirichlet processes. Journal of The American Statistical Association 101, GRIFFIN, J. & STEEL, M. (2008). Bayesian nonparametric modelling with the dirichlet process regression smoother. Statistica Sinica (to appear). KUNDU, S. & DUNSON, D. (2011). Single Factor Transformation Priors for Density Regression. DSS Discussion Series. LENK, P. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. Journal of the 235 American Statistical Association 83, LENK, P. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika 78, 531. MACEACHERN, S. (1999). Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science. MÜLLER, P., ERKANLI, A. & WEST, M. (1996). Bayesian curve fitting using multivariate normal mixtures. 240 Biometrika 83, NORETS, A. & PELENIS, J. (2010). Posterior consistency in conditional distribution estimation by covariate dependent mixtures. Unpublished manuscript, Princeton Univ. PATI, D., BHATTACHARYA, A. & DUNSON, D. (2011). Posterior convergence rates in non-linear latent variable models. Arxiv preprint arxiv: (submitted to Bernoulli). 245 PATI, D., DUNSON, D. & TOKDAR, S. (2012). Posterior consistency in conditional distribution estimation. Journal of Multivariate Analysis (submitted). TANG, Y. & GHOSAL, S. (2007a). A consistent nonparametric Bayesian procedure for estimating autoregressive conditional densities. Computational Statistics & Data Analysis 51, TANG, Y. & GHOSAL, S. (2007b). Posterior consistency of Dirichlet mixtures for estimating a transition density. 250 Journal of Statistical Planning and Inference 137, TOKDAR, S. & GHOSH, J. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. Journal of Statistical Planning and Inference 137, TOKDAR, S., ZHU, Y. & GHOSH, J. (2010a). Bayesian Density Regression with Logistic Gaussian Process and Subspace Projection. Bayesian Analysis 5, TOKDAR, S., ZHU, Y. & GHOSH, J. (2010b). Bayesian Density Regression with Logistic Gaussian Process and Subspace Projection. Bayesian Analysis 5, VAN DER VAART, A. & VAN ZANTEN, J. (2008). Rates of contraction of posterior distributions based on gaussian process priors. The Annals of Statistics 36, VAN DER VAART, A. & VAN ZANTEN, J. (2009). Adaptive Bayesian estimation using a Gaussian random field with 260 inverse Gamma bandwidth. The Annals of Statistics 37, WALKER, S. (2011). Posterior sampling when the normalizing constant is unknown. Communications in Statistics 40, YOON, J. (2009). Bayesian analysis of conditional density functions: a limited information approach. Unpublished manuscript, Claremont Mckenna College. 265 [Received January Revised June 2010]
Foundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationPOSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis
Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationBayesian Nonparametric Regression through Mixture Models
Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationBayesian Regression with Heteroscedastic Error Density and Parametric Mean Function
Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis pelenis@ihs.ac.at Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationNonparametric Bayes regression and classification through mixtures of product kernels
Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationSingle Factor Transformation Priors for Density. Regression
Single Factor Transformation Priors for Density Regression Suprateek Kundu 1 and David B. Dunson 2 Abstract: Although discrete mixture modeling has formed the backbone of the literature on Bayesian density
More informationComputer Emulation With Density Estimation
Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17 Computer Emulation Motivation Expensive
More informationOptimality of Poisson Processes Intensity Learning with Gaussian Processes
Journal of Machine Learning Research 16 (2015) 2909-2919 Submitted 9/14; Revised 3/15; Published 12/15 Optimality of Poisson Processes Intensity Learning with Gaussian Processes Alisa Kirichenko Harry
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationDirichlet Process Mixtures of Generalized Linear Models
Lauren A. Hannah David M. Blei Warren B. Powell Department of Computer Science, Princeton University Department of Operations Research and Financial Engineering, Princeton University Department of Operations
More informationNONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES
NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationHyperparameter estimation in Dirichlet process mixture models
Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationOn the Support of MacEachern s Dependent Dirichlet Processes and Extensions
Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January
More informationNonparametric Bayes Uncertainty Quantification
Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationA Nonparametric Model for Stationary Time Series
A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES
Submitted to the Annals of Statistics arxiv: 1111.1044v1 ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES By Anirban Bhattacharya,, Debdeep Pati, and David Dunson, Duke University,
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationA Representation Approach for Relative Entropy Minimization with Expectation Constraints
A Representation Approach for Relative Entropy Minimization with Expectation Constraints Oluwasanmi Koyejo sanmi.k@utexas.edu Department of Electrical and Computer Engineering, University of Texas, Austin,
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationFlexible Regression Modeling using Bayesian Nonparametric Mixtures
Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationNonparametric Bayesian modeling for dynamic ordinal regression relationships
Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationNoninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions
Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationBayesian Modeling of Conditional Distributions
Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationDirichlet Process Mixtures of Generalized Linear Models
Dirichlet Process Mixtures of Generalized Linear Models Lauren Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David Blei Department of
More informationEconomics 204 Fall 2011 Problem Set 2 Suggested Solutions
Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit
More informationFrontier estimation based on extreme risk measures
Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2
More informationDirichlet Process Mixtures of Generalized Linear Models
Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationA Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation
Submitted to Operations Research manuscript Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationNormalized kernel-weighted random measures
Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationInference with few assumptions: Wasserman s example
Inference with few assumptions: Wasserman s example Christopher A. Sims Princeton University sims@princeton.edu October 27, 2007 Types of assumption-free inference A simple procedure or set of statistics
More information