Nonparametric Bayes Uncertainty Quantification
|
|
- Miles O’Connor’
- 6 years ago
- Views:
Transcription
1 Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES & ONR
2 Review of Bayes Intro to Nonparametric Bayes Density estimation More advanced applications
3 Bayesian modeling Provides a probabilistic framework for characterizing uncertainty Components of Bayesian inference Prior distribution: a probability distribution characterizing uncertainty in model parameters θ Likelihood function: sampling distribution characterizing uncertainty in the measurements conditionally on θ Loss function: quantifies price paid for errors in decisions
4 Components of Bayes Prior distribution p(θ) characterizes uncertainty in θ prior to observing current data y p(θ) chosen to be one s best guess based on previous experiments & knowledge in the field (subjective Bayes) Alternatively use a default approach that uses a prior that is flat or non-informative in some sense (objective Bayes) The prior is then updated with the likelihood p(y θ) via Bayes Theorem, p(θ y) = p(y θ)p(θ) Θ p(y θ)p( θ)d θ, with Θ = parameter space
5 Comments on the Posterior Posterior distribution: p(θ y) quantifies the current state of knowledge about θ p(θ y) is a full probability distribution - one can obtain a mean, covariance, intervals/regions characterizing uncertainty etc The process of updating the prior with the likelihood to obtain the posterior is known as Bayesian updating This calculation can be challenging due to the normalizing constant in the denominator of p(θ y) = p(y θ)p(θ) Θ p(y θ)p( θ)d θ. In conjugate models we can calculate p(θ y) analytically but in more complex models Monte Carlo & other methods are used
6 Simple example - estimating a population proportion Suppose θ (0, 1) is the population proportion of individuals with diabetes in the US A prior distribution for θ would correspond to some distribution that distributes probability across (0, 1) A very precise prior corresponding to abundant prior knowledge would be concentrated tightly in a small sub-interval of (0, 1) A vague prior may be distributed widely across (0, 1) - e.g., a uniform distribution would be one choice
7 Some possible prior densities beta densities p(θ) Distributions beta(1,1) beta(1,11) beta(2,10) θ
8 Collecting data & calculating likelihood To update your prior knowledge & learn more about θ, one can collect data y = (y 1,..., y n ) that relate to θ The likelihood of the data is denoted L(y; θ) For example, suppose θ is the population proportion of type II diabetes in year olds A random sample of n individuals is surveyed & asked type II diabetes status, with y i = 1 for individuals reporting disease & y i = 0 otherwise The likelihood is then L(y; θ) = n θ y i (1 θ) 1 y i. i=1
9 Beta-binomial example The prior is π(θ) = B(a, b) 1 θ a 1 (1 θ) b 1 The likelihood is L(y; θ) = n i=1 θy i (1 θ) 1 y i The posterior is then B(a, b) 1 θ a 1 (1 θ) b 1 n i=1 p(θ y) = θy i (1 θ) 1 y i 1 0 B(a, b) 1 θ a 1 (1 θ) b 1 n i=1 θy i (1 θ) 1 y i dθ = c(a, b, y)θ a+ n i=1 y i 1 (1 θ) b+n ( n i=1 y i ) 1 beta ( a + ny, b + n(1 y) ) where c(x) is a function of (x) but not of θ Updating a beta prior with a Bernoulli likelihood leads to a beta posterior - we have conjugacy
10 Generalizing to more interesting settings Although this is a super simple example, the same machinery of Bayes rule can be used in much more complex settings In particular, the unknown parameters θ in the model can be conceptually essentially anything θ can be a vector, matrix, tensor, a function or surface or shape, an unknown density, missing data, etc Generalizing Bayesian inference to such complex settings can be challenging but there is a rich literature to leverage on Today I ll be focusing on introducing nonparametric Bayes approaches & some corresponding tools
11 What is nonparametric Bayes? Nonparametric (np) Bayes is fundamentally different from classical nonparametric methods, particularly based on ranks, etc One defines a fully generative probabilistic model for the available data However, unlike in parametric models some of the model unknowns are infinite-dimensional For example, there may be unknown functions or densities involved in the model Also, np Bayes is defined by a large support property guaranteeing flexibility
12 Simple example - Function estimation One of the canonical examples is the simple setting in which y i = µ(x i ) + ɛ i, ɛ i N(0, σ 2 ). µ : X R is an unknown function x i = one or more inputs for observation i y i = output for observation i ɛ i = measurement error σ 2 = measurement error variance
13 Function estimation - continued In this model the unknowns are θ = {µ, σ 2 } For the measurement error variance σ 2 the usual prior would correspond to an inverse-gamma distribution However, µ( ) is an infinite-dimensional unknown in being defined at every point in X The likelihood function is simply n [ (2πσ 2 ) 1/2 exp 1 ] 2σ 2 {y i µ(x i )} 2, i=1 which is parametric The nonparametric part is µ - how to choose an np Bayes prior?
14 What is np Bayes? From a nonparametric Bayes perspective, there are several key properties for the prior Large support: This means that the prior can generate functions arbitrarily close to any function in a large class Interpretability: The prior shouldn t be entirely a black box but one should be able to put in their prior beliefs about where & how much the prior is concentrated Computability: Unless we can conduct posterior computation (at least approximately) without too much headache it s not very useful There is also an increasing literature showing more involved properties - e.g., posterior consistency, minimax optimal rates, etc.
15 Some examples Motivated by these properties & general practical performance, there are two canonical priors that are most broadly used in Bayesian nonparametrics Gaussian processes (GPs): provide a broadly useful prior for unknown functions & surfaces (e.g., µ in the above example) Dirichlet processes (DPs): a prior for discrete random measures providing a building block for priors for densities, clustering, etc There is also a rich literature proposing generalizations such as beta processes, kernel stick-breaking processes, etc
16 Bayes density estimation Introduced by Ferguson (1973) as a nonparametric prior for an unknown distribution that satisfies the three desirable properties listed above Suppose we have the simple setting in which y i f, i = 1,..., n f is an unknown density on the real line R Taking a Bayesian nonparametric UQ approach, we d like to choose a prior for f - how to do this?
17 Random probability measures (RPMs) In defining priors for unknown distributions, it is particularly convenient to work with probability measures A given distribution (e.g., N(0,1)) has a corresponding probability measure & samples from that distribution obey that probability law. To allow uncertainty in whether samples are N(0, 1) or come from some other unknown distribution, we can choose a random probability measure (RPM). Each realization from the RPM gives a different probability measure & hence different distribution of the samples. A parametric model would always yield distributions in a particular parametric class (e.g., Gaussian) while a nonparametric model can generate PMs close to any PM in a broad class.
18 Probability Measures Let (Ω, B, P) denote a probability space, with Ω the sample space, B the Borel σ-algebra of subsets of Ω, and P a probability measure For example, we may have Ω = R corresponding to the real line, with P a probability measure corresponding to a density f wrt to a reference measure µ For continuous univariate densities, µ corresponds to the Lesbesgue measure.
19 Random Probability Measures (RPMs) Let P denote the set of all probability measures P on (Ω, B) To allow P to be unknown, we let P π P, where π P is a prior over P. P is then a random probability measure By allowing P to be unknown, we automatically allow the corresponding distribution of the data to be unknown How to choose π P?
20 A simple motivating model - Bayesian histograms The goal is to obtain a Bayes estimate of the density f with y i f. From a frequentist perspective, a very common strategy is to rely on a simple histogram. Assume for simplicity we have pre-specified knots ξ = (ξ 0, ξ 1,..., ξ k ), with ξ 0 < ξ 1 < < ξ k 1 < ξ k and y i [ξ 0, ξ k ] The model for the density is as follows f (y) = k π h 1(ξ h 1 < y ξ h ) (ξ h ξ h 1 ), y R, h=1 with π = (π 1,..., π k ) an unknown probability vector
21 Choosing priors for probability vectors In the fixed knot model, the only unknown parameters are the probability weights π on the bins By choosing a prior for π we induce a prior on the density f Earlier we used the beta prior for a single probability but now we have a vector of probabilities We need to choose a probability distribution on the simplex A simple choice that provides a multivariate generalization of the beta is the Dirichlet distribution
22 Dirichlet Prior Assume a Dirichlet(a 1,..., a k ) prior for π, k h=1 Γ(a h) k ) Γ( h=1 a h k h=1 π a h 1 h The hyperparameter vector can be re-expressed as a = απ 0, where E(π) = π 0 = {a 1 / h a h,..., a k / h a h} is the prior mean and α is a scale (the prior sample size) Note that an appealing aspects of histograms is that we can easily incorporate prior data & knowledge to elicit α and π 0. Very simple & interpretable & previous data often come in the form of counts in bins After choosing the prior, we use Bayesian updating to obtain the posterior distribution for π
23 Posterior distribution for bin probabilities The posterior distribution of π is calculated as ( π y n ) k h=1 k h=1 π a h 1 h π a h+n h 1 h i:y i (ξ h 1,ξ h ] π h ξ h ξ h 1 D = Diri(a 1 + n 1,..., a k + n k ), where n h = i 1(ξ h 1 < y i ξ h ) Hence, we have conjugacy and the posterior for the bin probabilities has a simple form We can easily sample from the Dirichlet to obtain realizations from the induced posterior for the density f These samples can be used to quantify uncertainty through point & interval estimates
24 Simulation Experiment To evaluate the Bayes histogram method, I simulated data from a mixture of two betas, f (y) = 0.75beta(y; 1, 5) beta(y; 20, 2). n = 100 samples were obtained from this density Assuming data between [0,1] and choosing a 10 equally-spaced knots, I applied the Bayes histogram approach The true density and Bayes posterior mean are plotted on the next slide
25 Bayes Histogram Estimate for Simulation Example density y
26 Some Comments Procedure is really easy in that we have conjugacy Results very sensitive to knots & allowing unknown knots is computationally demanding Allows prior information to be included in frequentist histogram estimates easily Can we eliminate sensitivity to choice of bins by thinking of an RPM characterization & defining a prior for all possible bins?
27 Dirichlet processes (Ferguson, 1973; 1974) Let B 1,..., B k denote a partition of the sample space Ω - e.g., histogram bins Let P correspond to a random probability measure (RPM) that assigns probability to any subset B Ω Then we could let {P(B 1 ),..., P(B k )} Diri ( αp 0 (B 1 ),..., αp 0 (B k ) ). (1) P 0 is a base probability measure providing an initial guess at P & α is a prior concentration parameter
28 Dirichlet Dirichlet process However, we don t want to be sensitive to the bins B 1,..., B k or to k It would be really cool if there was an RPM P such that (1) were satisfied for all B 1,..., B k and all k This RPM does indeed exist (as shown by Ferguson) & corresponds to a Dirichlet process (DP) The DP has many wonderful properties making it widely used in practice. For example, E{P(B)} = P 0 (B) and V{P(B)} = P 0(B){1 P 0 (B)}, 1 + α for all B B, with B the Borel σ-algebra of subsets of Ω. Hence, the prior is centered on P 0 and α controls the variance
29 Some basic properties of the DP Using the notation P DP(αP 0 ), the DP prior has large support - assigning probability to arbitrarily small balls around any PM Q over Ω iid Also, if we let y i P then we obtain conjugacy so that ( ( ) P y1,..., y n DP αp 0 + δ yi ). i Posterior is also a DP but updated to add the empirical measure i δ y i to the base measure αp 0 Updated precision is α + n so α is a prior sample size The posterior expectation of P is defined as ( ) ( ) α n E{P(B) y n 1 } = P 0 (B) + α + n α + n n δ y i. i
30 DP DPMs Realizations from the DP are almost surely discrete, having masses at the observed data points If we convert the probability measure to a cumulative distribution function it will have jumps at all data points Not ideal for characterizing continuous data but really good as a prior for mixing measure In particular, instead of using the DP directly as a prior for (the probability measure corresponding to) the distribution that generated the data, let f (y) = K(y; θ)dp(θ), P DP(αP 0 ), (2) with K( ; θ) a parametric density on Ω parameterized by θ & P an unknown mixing measure
31 Samples from the Dirichlet process with precision α α=.5 α= α=5 0.2 α=
32 Dirichlet process mixtures (DPM)s DP mixtures provide an incredibly powerful & useful framework for UQ in a very broad variety of models Although expression (2) focuses on a simple setting involving univariate density estimation, we can take any parametric model & develop a more realistic & flexible model by DP mixing the whole model or one or more components This can allow for unknown residual densities that change in shape and variance with predictors, flexible modeling of large tabular data & many other settings At this point I m going to provide some basic details on computation & inference in simple DPMs & then I ll provide a more involved example illustrating what is possible
33 Density estimation via mixtures Considering the DPM in expression (2) above, it seems we have an intractable integral to deal with However, Sethuraman (1994) provided a constructive representation of the DP showing: P DP(αP 0 ) P = iid π h δ θh, θ h P0, h=1 with π h = V h l<h (1 V l), V h Be(1, α). This is referred to as the stick-breaking representation & implies that (2) can be expressed as f (y) = π h K(y; θ h ), (3) h=1 which is a discrete mixture model.
34 Approximating as finite mixtures Although not necessary using slice sampling & other recent(ish) samplers, (3) can be approximated accurately as a finite mixture This is because the weights {π h } have a prior that strongly favors stochastic decreases in the index h After the first small number of weights the prior is concentrated near zero for the future ones Hence, we can truncated the DP using k components as an upper bound & the extra components will be effectively deleted We consider a motivating application to make these ideas concrete
35 Gestational Length vs DDE (mg/l) Gestational age at delivery DDE (mg/l)
36 Gestational Length Densities within DDE Categories DDE <15 mg/l [15,30) Density Density Gestational length Gestational length [30,45) [45,60) Density Density Gestational length Gestational length DDE >60 Density Gestational length
37 Finite mixture application We illustrate finite mixtures using a simple location mixture of Gaussians, k f (y) = π h N(y; µ h, τ 1 ). h=1 We apply the model to data on length of gestation Preterm birth defined as delivery occurring prior to 37 weeks of completed gestation Cutoff is somewhat arbitrary & the shorter the length of gestation, the more adverse the associated health effects Appealing to model the distribution of gestational age at delivery as unknown & then allow predictors to impact this distribution
38 Comments on Gestational Length Data Data are non-gaussian with a left skew Not straightforward to transform the data to approximate normality A different transformation would be needed within each DDE category First question: how to characterize gestational age at delivery distribution without considering predictors?
39 Mixture Models Initially ignoring DDE Letting y i = gestational age at delivery for woman i, f (y i ) = N(y i ; µ, σ 2 ) dg(µ, σ 2 ), where G=mixture distribution for θ = (µ, σ 2 ) Mixtures of normals can approximate any smooth density Location Gaussian mixture with k components one possibility: k f (y i ) = p h N(y i ; µ h, σ 2 ). h=1
40 Mixture components for gestational age at delivery Component 1 (86% of deliveries) Density Component 2 (12% of deliveries) Density Density Component 3 (2% of deliveries) Gestational week of delivery
41 Mixture-based density of gestational age at delivery Density Gestational week at delivery
42 Approximate DPM computation The finite mixture of normals can be equivalently expressed as y i N(µ Si, τ 1 S i ), S i k π h δ h h=1 δ h = probability measure concentrated at the integer h S i {1,..., k} = indexes the mixture component for subject i A Bayesian specification is completed with priors, π = (π 1,..., π k ) Dirichlet(a 1,..., a k ) (µ h, τ h ) N(µ h ; µ 0, κτ 1 )Ga(τ; a τ, b τ ), h = 1,..., k with a j = α/k approximating the Dirichlet process (as k increases).
43 Posterior Computation via Gibbs Sampling 1. Update S i from its multinomial conditional posterior with Pr(S i = h ) = π hn(y i ; µ h, τ 1 h ) k l=1 π ln(y i ; µ l, τ 1, h = 1,..., k. l ) 2. Update (µ h, τ 1 ) from its conditional posterior h (µ h, τ 1 h ) = N(µ h ; µ h, κτ 1 h )Ga(τ h; â τh, ˆb τh ), κ h = (κ 1 + n h ) 1, µ h = κ(κ 1 µ 0 + n h y h ), â τh = a τ + n h /2, ˆb τh = b τ + 1 { ( } (y i y 2 h ) 2 nh + )(y 1 + κn h µ 0 ) 2. h i:z i =h 3. Update (π ) Dir(a 1 + n 1,..., a k + n k )
44 Some Comments Gibbs sampler is trivial to implement Discarding a burn-in, monitor f (y) = k h=1 π hn(y; µ h, τ 1 for a large number of iterations & a dense grid of y values h ) Bayes estimate of f (y) under squared error loss averages the samples Can also obtain 95% pointwise intervals for unknown density
45 Nonparametric residual densities The above focus was on univariate densities but the approach is very easy to generalize to other settings For example, suppose we have the nonparametric regression model: y i = µ(x i ) + ɛ i, ɛ i f. Then, we can use a DPM to allow the residual density to have an unknown form - allowing possible skewness & multimodality Due to the modular nature of MCMC, computational steps for DPM part of the model are essentially no different from described above
46 Nonparametric hierarchical models In hierarchical probability models, random effects often assigned Gaussian distributions Doesn t account for uncertainty in the random effect distributions We can use a DPM to allow unknown distributions of random effects Such a direction can be applied well beyond settings involving univariate continuous responses
47 Probabilistic tensor factorizations We instead propose a low rank & sparse factorization of pr(y i1 = c 1,..., y ip = c p ) = π c1 c p, with these probabilities forming an array Π = {π c1 c p }. Express array as a weighted average of rank-one arrays, π c1 c p = k p ν h h=1 j=1 λ (j) hc j, with appropriate priors on the components Leads to a simple & scalable approach - allowing uncertainty in rank & other parameters in factorization
48 High-dimensional contingency table analysis One application is nonparametric modeling of a high-dimensional pmf for categorical data y i = (y i1,..., y ip ) Many different categorical items recorded for the same subjects We have unknown dependence in the occurrences of these different categorical items We could use a log-linear model but flexible log-linear models aren t scalable beyond small p
49 Probabilistic Parafac Considering the Parafac factorization, π c1 c p = k p ν h h=1 j=1 λ (j) hc j, we can place a stick-breaking prior on the weights {ν h } Essentially a DPM of product multinomial distributions - automatically infers the rank of the tensor Leads to a simple Gibbs sampling algorithm Full probabilistic characterization of uncertainty in the elements of π
50 Beyond DPMs There are also rich classes of models that go well beyond DPMs For example, in the premature delivery application from above & many other settings we may want to model f (y x) This conditional response density may not factorize as µ(x) + ɛ with ɛ iid Instead we may need to characterize flexible changes in mean, variance & shape of the density with predictors
51 Gestational Length vs DDE (mg/l) Gestational age at delivery DDE (mg/l)
52 Gestational Length Densities within DDE Categories DDE <15 mg/l [15,30) Density Density Gestational length Gestational length [30,45) [45,60) Density Density Gestational length Gestational length DDE >60 Density Gestational length
53 Predictor-dependent RPMs One highly flexible method to characterize conditional densities lets f (y x) = N { y; µ(x; θ), τ 1} dg x (θ, τ), µ(x; θ) is a regression function parameterized by θ G X = {G x, x X } is a mixing measure varying with predictors Even if µ is linear, obtain a fully flexible specification with a flexible enough prior for G X
54 Kernel-stick breaking The kernel stick-breaking process is one popular prior for G X Generalizes the stick-breaking representation of the DP to include predictor dependence in the weights Leads to consistent estimation of the conditional density function & essentially as easy to do computation as for a DPM
55 DDE & Gestational Age at Delivery Application We focus on the mixture model: f (y i x i ) = N(y i ; x iβ i, σi 2 ) dg xi (β i, σi 2 ) y i = Gestational age at delivery in days, x i = DDE (mg/l) Collection of mixture distributions, G X = {G x : x X }, assigned an adaptive kernel mixture of DPs prior
56 Posterior Simulation Overview MCMC algorithm run 10,000 iterations Convergence and mixing good Fit excellent based on pivotal statistic-based approach (Johnson, 2006) Can estimate gestational age density for any dose of DDE & dose response curves for any quantile
57 Results: DDE & Gestational Age at Delivery Gestational age at delivery DDE (mg/l)
58 Results: DDE & Gestational Age at Delivery 10th percentile of DDE(12.57) 30th percentile of DDE(18.69) f(y x) f(y x) f(y x) Gestational length 60th percentile of DDE(28.44) Gestational length 99th percentile of DDE(105.48) Gestational length f(y x) f(y x) Gestational length 90th percentile of DDE(53.72) Gestational length
59 Results: DDE & Gestational Age at Delivery Pr(Gestational length <33) Pr(Gestational length <35) DDE (mg/l) DDE (mg/l) Pr(Gestational length <37) DDE (mg/l) Pr(Gestational length <40) DDE (mg/l)
60 Notes Have provided a very brief overview of nonparametric Bayes with an eye towards UQ Focus on Dirichlet process mixtures, their applications & closely related models Gaussian processes at least as useful in UQ if not more so - they provide another np Bayes prior I assume O Hagan will focus on the GP in his lectures so purposely avoided it
61 Some References Dunson & Park (2008) Kernel stick-breaking processes. Biometrika, 95, Dunson & Xing (2009) Nonparametric Bayes modeling of multivariate categorical data. JASA, 104, Gelman et al. (2013) Bayesian Data Analysis (BDA3). CRC Press. Contains chapters on np Bayes.
Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017
Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,
Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationFlexible Regression Modeling using Bayesian Nonparametric Mixtures
Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationIntroduction to Applied Bayesian Modeling. ICPSR Day 4
Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian isotonic density regression
Bayesian isotonic density regression Lianming Wang and David B. Dunson Biostatistics Branch, MD A3-3 National Institute of Environmental Health Sciences U.S. National Institutes of Health P.O. Box 33,
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationNonparametric Bayes regression and classification through mixtures of product kernels
Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationStochastic Processes, Kernel Regression, Infinite Mixture Models
Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationBayesian Nonparametric Regression through Mixture Models
Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression
More information