Latent factor density regression models

Size: px
Start display at page:

Download "Latent factor density regression models"

Transcription

1 Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON Department of Statistical Science, Duke University, Durham NC 27708, USA ab179@stat.duke.edu dp55@stat.duke.edu dunson@stat.duke.edu SUMMARY 5 In this article, we propose a new class of latent factor conditional density regression models with attractive computational and theoretical properties. The proposed approach is based on a novel extension of the recently proposed latent variable density estimation model in which the response variables conditioned on the predictors are modeled as unknown functions of uniformly distributed latent variables and the predictors with an additive Gaussian error. The latent vari- 10 able specification allows straightforward posterior computation using a uni-dimensional grid via conjugate posterior updates. Moreover, one can center the model on a simple parametric guess facilitating inference. Our approach relies on characterizing the space of conditional densities induced by the above model as kernel convolutions with a general class of continuous predictordependent mixing measures. Theoretical properties in terms of rates of convergence are studied. 15 Some key words: Density regression; Gaussian process; Factor model; Latent variable; Nonparametric Bayes; Rate of convergence 1. INTRODUCTION There is a rich literature on Bayesian methods for density estimation using mixture models of the form 20 y i f(θ i ), θ i P, P Π, (1) where f( ) is a parametric density and P is an unknown mixing distribution assigned a prior Π. The most common choice of Π is the Dirichlet process prior, first introduced by Ferguson (1973, 1974). Recent literature has focused on generalizing model (1) to the density regression setting in which the entire conditional distribution of y given x changes flexibly with predictors. Bayesian density regression, also termed as conditional density estimation, views the entire conditional 25 density f(y x) as a function valued parameter and allows its center, spread, skewness, modality and other such features to vary with x R p. For data {(y i, x i ), i = 1,..., n} let y i x i f( x i ), {f( x), x X} Π X, (2) where X is the predictor space and Π X is a prior for the class of conditional densities {f x, x X } indexed by the predictors. Refer, for example, to Müller et al. (1996); Griffin & Steel (2006, 2008); Dunson et al. (2007); Dunson & Park (2008); Chung & Dunson (2009); Tokdar et al. 30 (2010a) and Pati et al. (2012) among others. The primary focus of this recent development has been mixture models of the form { } y µh (x) f(y x) = π h (x)φ, (3) h=1 σ h

2 A. BHATTACHARYA, D. PATI AND D. B. DUNSON wmhere φ is the standard normal density, {π h (x), h = 1, 2,...} are predictor-dependent probability weights that sum to one almost surely for each x X, and (µ h, σ h ) G 0 independently, with G 0 a base probability measure on F X R +, F X X R, the space of all X R functions. However, density regression models focusing on mixture models are often a black-box in terms of realistic applications which require needing to center the model on a simple parametric family without sacrificing computational efficiency. For example, it is not clear how to appropriately center (3) on simple linear regression model. Observe that this cannot be achieved by just letting µ h (x) = x β h and then β h N(β 0, Σ 0 ). Moreover, (3) is based on infinitely many stochastic processes {(µ h, π h ), h = 1,..., } which is highly computationally demanding irrespective of the dimension. Indeed, Stephen Walker pointed out in one of the ISBA bulletins Current density regression models are too big, too non-identifiable and I doubt whether they would survive the test of time. Lenk (1988, 1991) proposed a logistic Gaussian process prior for density estimation which can be conveniently centered on a parametric family facilitating inference. Tokdar & Ghosh (2007); van der Vaart & van Zanten (2008, 2009) derived nice theoretical properties of logistic Gaussian process prior in terms of the asymptotic behavior of the posterior. Tokdar et al. (2010b) extended the logistic Gaussian process density estimation model to the density regression setting which allows convenient prior centering and shares similar theoretical properties. Despite such attractive properties, logistic Gaussian process is computationally quite challenging owing to the presence of an intractable integral in the denominator. Although several existing literature propose to discretize the denominator for posterior inference, it turns out that such an approach is not theoretical justifiable. Moreover, discretizing is also not a computationally efficient method requiring to choose the grid points at which the finite sum approximation of the integral will be evaluated. A recent article by Walker (2011) can potentially by-pass this issue at the cost of the introduction of several latent variables and needing to perform RJMCMC for selecting the appropriate grid-points. All these approaches tend to be computationally expensive and suffer from the curse of dimensionality with increasing number of predictors. Although latent variable models have become increasingly popular as a dimension reduction tool in machine learning applications, it was only recently (Kundu & Dunson, 2011) realized that they are also suitable for density estimation. Kundu & Dunson (2011) developed a density estimation model where unobserved U(0, 1) latent variables are related to the response variables via a random non-linear regression with an additive error. Consider the nonlinear latent variable model, y i = µ(η i ) + ɛ i, ɛ i N(0, σ 2 ), (i = 1,..., n) (4) µ GP(µ 0, c), σ 2 IG(a, b), η i U(0, 1), (5) where η i s are subject specific latent variables, µ C[0, 1] relates the latent variables to the observed variables and is assigned a Gaussian process prior, and ɛ i is an idiosyncratic error specific to subject i. Kundu & Dunson (2011); Pati et al. (2011) showed that (4) leads to a highly flexible representation which can approximate a large class of densities subject to minor regularity constraints. Moreover, this allows convenient prior centering, avoids the black-box mixture formulation and enables efficient computation through a griddy Gibbs algorithm. By characterizing the space of densities induced by the above model as kernel convolutions with a general class of continuous mixing measures, Pati et al. (2011) showed optimal posterior convergence properties of this model. They also demonstrated that the posterior converges at a much faster rate under appropriate prior centering.

3 Latent factor density regression 3 Kundu & Dunson (2011) also proposed a density regression model by modeling the response and the covariates jointly using shared dependence on subject specific latent factors after expressing y and each x j, j = 1,... p as conditionally independent non-linear factor models. Such 80 an approach suffers from the inevitable drawback of estimating p + 1 Gaussian process components posing computational inefficiency with diverging number of predictors. Moreover, the joint modeling framework extravagantly focuses a significant amount of the estimation procedure in explaining the dependence among the high dimensional predictors when the key is only to infer conditional dependence of y given the x s. 85 The purpose of this article is to develop a novel density regression model for high-dimensional predictors which not only possesses theoretically elegant optimality properties, but is also computationally efficient and easy to incorporate prior information to perform meaningful statistical inference. To that end, we introduce a general family of continuous mixing measures indexed by the predictors and discuss connections with existing dependent process mixing measures 90 (MacEachern, 1999) used to construct predictor dependent models (3) for density regression. As opposed to (3), our proposed model is based on a single stochastic process linking the response to the predictors and a latent variable. We also study theoretical support properties and show that our prior encompasses a wide range of conditional densities. To our knowledge, only a few papers have considered theoretical properties of density regres- 95 sion models. Tokdar et al. (2010a); Pati et al. (2012); Norets & Pelenis (2010) consider posterior consistency in estimating conditional distributions focusing on logistic Gaussian process priors and mixture models (3). Yoon (2009) tackled the problem in a different way through a limited information approach by approximating the likelihood by the quantiles of the true distribution. Tang & Ghosal (2007a,b) provide sufficient conditions for showing posterior consistency in esti- 100 mating an autoregressive conditional density and a transition density rather than regression with respect to another covariate. However, guaranteeing merely consistency is not enough to characterize the behavior of posterior as the sample size increases. A more informative way to study the behavior is to derive the rate of posterior contraction of density regression models which is crucial to quantifying the concentration of the posterior in terms of the smallest shrinking 105 neighborhood around the true conditional density that still accumulates all the posterior mass asymptotically. To that end, we show that out prior leads to a posterior which converges at the minimax optimal rate under mixed smoothness assumption on the true conditional density. 2. METHODS Suppose y i Y R is observed independently given the covariates x i X, i = 1,..., n 110 which are drawn independently from a probability distribution Q on X R p, compact. Assume that Q admits a density q with respect to the Lebesgue measure. Assuming x i s are rescaled prior to analysis, x i [0, 1] p. Consider the nonlinear latent variable model, y i = µ(η i, x i ) + ɛ i, ɛ i N(0, σ 2 ), (i = 1,..., n) (6) 115 µ Π µ, σ Π σ, η i U(0, 1), (7) where η i [0, 1] are subject specific latent variables, µ C[0, 1] p+1 is a transfer function relating the latent variables and the covariates to the observed variables and ɛ i is an idiosyncratic error specific to subject i. The density of y given x conditional on the transfer function µ and

4 A. BHATTACHARYA, D. PATI AND D. B. DUNSON scale σ is obtained on marginalizing out the latent variable as f(y x; µ, σ) def = f µ,σ (y x) = 1 0 φ σ (y µ(t, x))dt. (8) Define a map g : C[0, 1] p+1 [0, ) F with g(µ, σ) = f µ,σ. One can induce a prior Π on F via the mapping g by placing independent priors Π µ and Π σ on C[0, 1] p+1 and [0, ) respectively, with Π = (Π µ Π σ ) g 1. Kundu & Dunson (2011) assumed a Gaussian process prior in (4) with squared exponential covariance kernel on µ and an inverse-gamma prior on σ 2. In the density regression context, µ is assumed a Gaussian process prior with mean µ 0 : [0, 1] p+1 R and a covariance kernel c : [0, 1] p+1 [0, 1] p+1 R. It is not immediately clear whether the class of densities f µ,σ in (8) in the range of g encompass a large subset of the space of conditional densities. { F d = f(y x) = g(x, y), g : R p+1 R +, g(y, x)dy = 1 x [0, 1] }. p We provide an intuition that relates the above class with convolutions and is crucially used later on. Let f 0 F d be a continuous conditional density with cumulative distribution function F 0 (y x) = y f 0(z x)dz. Assume f 0 to be non-zero almost everywhere within its support, so that F 0 ( x) : supp(f 0 ( x)) [0, 1] is strictly monotone for each x and hence has an inverse F0 1 ( x) : [0, 1] supp(f 0 ( x)) satisfying F 0 {F0 1 (t x) x} = t for all t supp(f 0 ( x)) and for each x [0, 1] p. If supp(f 0 ( x)) = R, then the domain of F0 1 is the open interval (0, 1) instead of [0, 1]. Letting µ 0 (t, x) = F0 1 (t x), one obtains f µ0,σ(y x) = 1 0 φ σ (y F 1 0 (t x))dt = Y φ σ (y z)f 0 (z x)dt, (9) where the second equality follows from the change of variable theorem. Thus, f µ0,σ(y x) = φ σ f 0 (y x), i.e., f µ0,σ is the convolution of f 0 with a normal density having mean 0 and standard deviation σ. It is well known that the convolution φ σ f 0 ( x) can approximate f 0 ( x) for each x arbitrary closely as the bandwidth σ 0. More precisely, for f 0 ( x) L p (λ) for any p 1 for each x, φ σ f 0 ( x) f 0 ( x) p,λ 0 as σ 0. Furthermore, a stronger result φ σ f 0 ( x) f 0 ( x) = O(σ 2 ) for each x holds if f 0 ( x) for each x is compactly supported. However, convergence pointwise in x is a weaker notion, and one might hope for stronger notions of convergence involving joint topology defined on [0, 1] p Y. Refer to 3 2 for details. Thus the above model can approximate a large collection of conditional densities {f 0 (y x)} by letting µ(y, x) concentrate around the conditional quantile functions F 1 0 (y x) by assigning a Gaussian process prior. A couple of advantages of this formulation is the feasibility of an efficient posterior computation based on an uni-dimensional griddy Gibbs algorithm THEORY 3 1. Notations Throughout the paper, Lebesgue measure on R or R p is denoted by λ. The supremum and the L 1 -norms are denoted by and 1 respectively. The indicator function of a set B is denoted by 1 B. Let L p (ν, M) denote the space of real valued measurable functions defined on M with ν-integrable pth absolute power. For two density functions f, g, the Kullback-Leibler

5 Latent factor density regression 5 divergence is given by K(f, g) = log(f/g)fdλ. A ball of radius r with centre x 0 relative to the metric d is defined as B(x 0, r; d). The diameter of a bounded metric space M relative to a metric d is defined to be sup{d(x, y) : x, y M}. The ɛ-covering number N(ɛ, M, d) of a semimetric space M relative to the semi-metric d is the minimal number of balls of radius ɛ needed 160 to cover M. The logarithm of the covering number is referred to as the entropy. 0 stands for a distribution degenerate at 0 and supp(ν) for the support of a measure ν Notions of neighborhoods in conditional density estimation If we define h(x, y) = q(x)f(y x) and h 0 (x, y) = q(x)f 0 (y x) then h, h 0 F. Throughout the paper, h 0 is assumed to be a fixed density in F which we alternatively refer to as the 165 true data generating density and {f 0 ( x), x X } is referred to as the true conditional density. The density q(x) will be needed only for theoretical investigation. In practice, we do not need to know it or learn it from the data. We define the weak, ν-integrated L 1 and sup-l 1 neighborhoods of the collection of conditional densities {f 0 ( x), x X } in the following. A sub-base of a weak neighborhood is 170 defined as W ɛ,g (f 0 ) = { f : f F d, gh gh 0 < ɛ }, (10) X Y X Y for a bounded continuous function g : Y X R. A weak neighborhood base is formed by finite intersections of neighborhoods of the type (10). Define a ν-integrated L 1 neighborhood S ɛ (f 0 ; ν) = { f : f F d, f( x) f 0 ( x) 1 ν(x)dx < ɛ } (11) for any measure ν with supp(ν) X. Observe that under the topology in (11), F d can be identified to a closed subset of L 1 (λ ν, Y supp(ν)) making it a complete separable metric space. 175 For f 1, f 2 F d, let d SS (f 1, f 2 ) = sup x X f( x) f 0 ( x) 1 and define the sup-l 1 neighborhood SS ɛ (f 0 ) = { f : f F d, d SS (f, f 0 ) < ɛ }. (12) Under the sup-l 1 topology, F d can be viewed as a closed subset of the separable Banach space of continuous functions from X L 1 (λ, Y) which are norm bounded and hence a complete separable metric space. Thus measurability issues won t arise with these topologies Posterior convergence rate theorem in the compact case We recall the definitions of rates of posterior convergence. DEFINITION 1. The posterior Π X ( y n, x n) is said to contract at a rate ɛ n going to 0 in the ν-integrated L 1 topology or strongly in the sup-l 1 topology at {f 0 ( x), x X } if Π X ( U c Mɛn y n, x n) 0 a.s. for any ɛ > 0 with U Mɛn = S Mɛn (f 0 ; ν) and SS Mɛn (f 0 ) respectively for large 185 enough M > 0. Here a.s. consistency at {f 0 ( x), x X } means that the posterior distribution concentrates around a neighborhood of {f 0 ( x), x X } for almost every sequence {y i, x i } i=1 generated by i.i.d. sampling from the joint density q(x)f 0 (y x). Studying rates of convergence in density regression models becomes more challenging as we 190 need to assume mixed smoothness in y and x. Although posterior contraction rates are studied widely in mean regression, logistic regression and density estimation models, results on convergence rates for density regression models are lacking. we study posterior convergence rates of the

6 A. BHATTACHARYA, D. PATI AND D. B. DUNSON density regression model (4) by assuming the true conditional density has different smoothness across y and x. Assuming f 0 (y x) to be compact and twice and thrice continuously differentiable in y and x, we obtain a rate of n 1/3 (log n) t 2 using a Gaussian process prior for µ having a single inverse-gamma bandwidth across different dimensions. The optimal rate in such a mixed smoothness class is n 6/17. The slight slow rate of n 1/3 is the drawback of using an isotropic Gaussian process used for modeling an anisotropic function. First let s consider the very simple case when the true conditional density f 0 (y x) is compactly supported for each x [0, 1] p and p = 1. Define µ 0 (x, y) = F0 1 (y x). Assumption 1. For each x [0, 1] p, f 0 ( x) is compactly supported. Without loss of generality we can assume that the support is [0, 1]. Hence µ 0 : [0, 1] 2 R. Assumption 2. f 0 (y x) is twice continuously differentiable in y for any fixed x [0, 1] p and thrice continuously differentiable in x for any fixed y Y. Note that under this assumption µ 0 C 3 [0, 1] 2. Also ( 2 ) f 0 (y x) 2 sup x [0,1] p Y y 2 /f 0 (y x) dy < (13) ( f0 (y x) 4 /f 0 (y x)) dy <. (14) y sup x [0,1] p Y Assumption 3. For any x [0, 1] p, there exists 0 < m x < M x such that f 0 ( x) is bounded, nondecreasing on (, m x ], bounded away from 0 on [m x, M x ] and non-increasing on [M x, ). In addition we have inf x [0,1] p(m x m x ) > 0 (15) Assumption 4. We assume µ follows a centered Gaussian process denoted by GP(0, c) on [0, 1] 2, with a squared exponential covariance kernel c(, ; A) and a Gamma prior for the inversebandwidth A. Thus ψ(t, s; A) = e A t s 2, t, s [0, 1] 2, A Ga(p, q). Assumption 5. We assume σ IG(a σ, b σ ). THEOREM 1. If the true conditional density f 0 satisfies Assumptions 1, 2 and 3 and the prior satisfies Assumptions 4 and 5, the posterior rate of convergence in q-integrated L 1 topology is given by where t 0 is a known constant. n 1/3 log t 0 n (16) Remark 1. The optimal effective smoothness α e for this problem is given by 1 α e = and hence the optimal rate of convergence is given by n 2αe 2αe+1 = n 6/17. Its important to note that the obtained rate of convergence is slower by a very small factor. 225 REFERENCES CHUNG, Y. & DUNSON, D. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association 104, DUNSON, D. & PARK, J. (2008). Kernel stick-breaking processes. Biometrika 95, DUNSON, D., PILLAI, N. & PARK, J. (2007). Bayesian density regression. Journal of the Royal Statistical Society, Series B 69,

7 Latent factor density regression 7 FERGUSON, T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1, FERGUSON, T. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics 2, GRIFFIN, J. & STEEL, M. (2006). Order-based dependent Dirichlet processes. Journal of The American Statistical Association 101, GRIFFIN, J. & STEEL, M. (2008). Bayesian nonparametric modelling with the dirichlet process regression smoother. Statistica Sinica (to appear). KUNDU, S. & DUNSON, D. (2011). Single Factor Transformation Priors for Density Regression. DSS Discussion Series. LENK, P. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. Journal of the 235 American Statistical Association 83, LENK, P. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika 78, 531. MACEACHERN, S. (1999). Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science. MÜLLER, P., ERKANLI, A. & WEST, M. (1996). Bayesian curve fitting using multivariate normal mixtures. 240 Biometrika 83, NORETS, A. & PELENIS, J. (2010). Posterior consistency in conditional distribution estimation by covariate dependent mixtures. Unpublished manuscript, Princeton Univ. PATI, D., BHATTACHARYA, A. & DUNSON, D. (2011). Posterior convergence rates in non-linear latent variable models. Arxiv preprint arxiv: (submitted to Bernoulli). 245 PATI, D., DUNSON, D. & TOKDAR, S. (2012). Posterior consistency in conditional distribution estimation. Journal of Multivariate Analysis (submitted). TANG, Y. & GHOSAL, S. (2007a). A consistent nonparametric Bayesian procedure for estimating autoregressive conditional densities. Computational Statistics & Data Analysis 51, TANG, Y. & GHOSAL, S. (2007b). Posterior consistency of Dirichlet mixtures for estimating a transition density. 250 Journal of Statistical Planning and Inference 137, TOKDAR, S. & GHOSH, J. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. Journal of Statistical Planning and Inference 137, TOKDAR, S., ZHU, Y. & GHOSH, J. (2010a). Bayesian Density Regression with Logistic Gaussian Process and Subspace Projection. Bayesian Analysis 5, TOKDAR, S., ZHU, Y. & GHOSH, J. (2010b). Bayesian Density Regression with Logistic Gaussian Process and Subspace Projection. Bayesian Analysis 5, VAN DER VAART, A. & VAN ZANTEN, J. (2008). Rates of contraction of posterior distributions based on gaussian process priors. The Annals of Statistics 36, VAN DER VAART, A. & VAN ZANTEN, J. (2009). Adaptive Bayesian estimation using a Gaussian random field with 260 inverse Gamma bandwidth. The Annals of Statistics 37, WALKER, S. (2011). Posterior sampling when the normalizing constant is unknown. Communications in Statistics 40, YOON, J. (2009). Bayesian analysis of conditional density functions: a limited information approach. Unpublished manuscript, Claremont Mckenna College. 265 [Received January Revised June 2010]

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bayesian Nonparametric Regression through Mixture Models

Bayesian Nonparametric Regression through Mixture Models Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function

Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Bayesian Regression with Heteroscedastic Error Density and Parametric Mean Function Justinas Pelenis pelenis@ihs.ac.at Institute for Advanced Studies, Vienna May 8, 2013 Abstract This paper considers a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Single Factor Transformation Priors for Density. Regression

Single Factor Transformation Priors for Density. Regression Single Factor Transformation Priors for Density Regression Suprateek Kundu 1 and David B. Dunson 2 Abstract: Although discrete mixture modeling has formed the backbone of the literature on Bayesian density

More information

Computer Emulation With Density Estimation

Computer Emulation With Density Estimation Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17 Computer Emulation Motivation Expensive

More information

Optimality of Poisson Processes Intensity Learning with Gaussian Processes

Optimality of Poisson Processes Intensity Learning with Gaussian Processes Journal of Machine Learning Research 16 (2015) 2909-2919 Submitted 9/14; Revised 3/15; Published 12/15 Optimality of Poisson Processes Intensity Learning with Gaussian Processes Alisa Kirichenko Harry

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah David M. Blei Warren B. Powell Department of Computer Science, Princeton University Department of Operations Research and Financial Engineering, Princeton University Department of Operations

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES

ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES Submitted to the Annals of Statistics arxiv: 1111.1044v1 ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES By Anirban Bhattacharya,, Debdeep Pati, and David Dunson, Duke University,

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

A Representation Approach for Relative Entropy Minimization with Expectation Constraints

A Representation Approach for Relative Entropy Minimization with Expectation Constraints A Representation Approach for Relative Entropy Minimization with Expectation Constraints Oluwasanmi Koyejo sanmi.k@utexas.edu Department of Electrical and Computer Engineering, University of Texas, Austin,

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David Blei Department of

More information

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit

More information

Frontier estimation based on extreme risk measures

Frontier estimation based on extreme risk measures Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation Submitted to Operations Research manuscript Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Inference with few assumptions: Wasserman s example

Inference with few assumptions: Wasserman s example Inference with few assumptions: Wasserman s example Christopher A. Sims Princeton University sims@princeton.edu October 27, 2007 Types of assumption-free inference A simple procedure or set of statistics

More information