Nonparametric Bayes Uncertainty Quantification

Size: px
Start display at page:

Download "Nonparametric Bayes Uncertainty Quantification"

Transcription

1 Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES & ONR

2 Review of Bayes Intro to Nonparametric Bayes Density estimation More advanced applications

3 Bayesian modeling Provides a probabilistic framework for characterizing uncertainty Components of Bayesian inference Prior distribution: a probability distribution characterizing uncertainty in model parameters θ Likelihood function: sampling distribution characterizing uncertainty in the measurements conditionally on θ Loss function: quantifies price paid for errors in decisions

4 Components of Bayes Prior distribution p(θ) characterizes uncertainty in θ prior to observing current data y p(θ) chosen to be one s best guess based on previous experiments & knowledge in the field (subjective Bayes) Alternatively use a default approach that uses a prior that is flat or non-informative in some sense (objective Bayes) The prior is then updated with the likelihood p(y θ) via Bayes Theorem, p(θ y) = p(y θ)p(θ) Θ p(y θ)p( θ)d θ, with Θ = parameter space

5 Comments on the Posterior Posterior distribution: p(θ y) quantifies the current state of knowledge about θ p(θ y) is a full probability distribution - one can obtain a mean, covariance, intervals/regions characterizing uncertainty etc The process of updating the prior with the likelihood to obtain the posterior is known as Bayesian updating This calculation can be challenging due to the normalizing constant in the denominator of p(θ y) = p(y θ)p(θ) Θ p(y θ)p( θ)d θ. In conjugate models we can calculate p(θ y) analytically but in more complex models Monte Carlo & other methods are used

6 Simple example - estimating a population proportion Suppose θ (0, 1) is the population proportion of individuals with diabetes in the US A prior distribution for θ would correspond to some distribution that distributes probability across (0, 1) A very precise prior corresponding to abundant prior knowledge would be concentrated tightly in a small sub-interval of (0, 1) A vague prior may be distributed widely across (0, 1) - e.g., a uniform distribution would be one choice

7 Some possible prior densities beta densities p(θ) Distributions beta(1,1) beta(1,11) beta(2,10) θ

8 Collecting data & calculating likelihood To update your prior knowledge & learn more about θ, one can collect data y = (y 1,..., y n ) that relate to θ The likelihood of the data is denoted L(y; θ) For example, suppose θ is the population proportion of type II diabetes in year olds A random sample of n individuals is surveyed & asked type II diabetes status, with y i = 1 for individuals reporting disease & y i = 0 otherwise The likelihood is then L(y; θ) = n θ y i (1 θ) 1 y i. i=1

9 Beta-binomial example The prior is π(θ) = B(a, b) 1 θ a 1 (1 θ) b 1 The likelihood is L(y; θ) = n i=1 θy i (1 θ) 1 y i The posterior is then B(a, b) 1 θ a 1 (1 θ) b 1 n i=1 p(θ y) = θy i (1 θ) 1 y i 1 0 B(a, b) 1 θ a 1 (1 θ) b 1 n i=1 θy i (1 θ) 1 y i dθ = c(a, b, y)θ a+ n i=1 y i 1 (1 θ) b+n ( n i=1 y i ) 1 beta ( a + ny, b + n(1 y) ) where c(x) is a function of (x) but not of θ Updating a beta prior with a Bernoulli likelihood leads to a beta posterior - we have conjugacy

10 Generalizing to more interesting settings Although this is a super simple example, the same machinery of Bayes rule can be used in much more complex settings In particular, the unknown parameters θ in the model can be conceptually essentially anything θ can be a vector, matrix, tensor, a function or surface or shape, an unknown density, missing data, etc Generalizing Bayesian inference to such complex settings can be challenging but there is a rich literature to leverage on Today I ll be focusing on introducing nonparametric Bayes approaches & some corresponding tools

11 What is nonparametric Bayes? Nonparametric (np) Bayes is fundamentally different from classical nonparametric methods, particularly based on ranks, etc One defines a fully generative probabilistic model for the available data However, unlike in parametric models some of the model unknowns are infinite-dimensional For example, there may be unknown functions or densities involved in the model Also, np Bayes is defined by a large support property guaranteeing flexibility

12 Simple example - Function estimation One of the canonical examples is the simple setting in which y i = µ(x i ) + ɛ i, ɛ i N(0, σ 2 ). µ : X R is an unknown function x i = one or more inputs for observation i y i = output for observation i ɛ i = measurement error σ 2 = measurement error variance

13 Function estimation - continued In this model the unknowns are θ = {µ, σ 2 } For the measurement error variance σ 2 the usual prior would correspond to an inverse-gamma distribution However, µ( ) is an infinite-dimensional unknown in being defined at every point in X The likelihood function is simply n [ (2πσ 2 ) 1/2 exp 1 ] 2σ 2 {y i µ(x i )} 2, i=1 which is parametric The nonparametric part is µ - how to choose an np Bayes prior?

14 What is np Bayes? From a nonparametric Bayes perspective, there are several key properties for the prior Large support: This means that the prior can generate functions arbitrarily close to any function in a large class Interpretability: The prior shouldn t be entirely a black box but one should be able to put in their prior beliefs about where & how much the prior is concentrated Computability: Unless we can conduct posterior computation (at least approximately) without too much headache it s not very useful There is also an increasing literature showing more involved properties - e.g., posterior consistency, minimax optimal rates, etc.

15 Some examples Motivated by these properties & general practical performance, there are two canonical priors that are most broadly used in Bayesian nonparametrics Gaussian processes (GPs): provide a broadly useful prior for unknown functions & surfaces (e.g., µ in the above example) Dirichlet processes (DPs): a prior for discrete random measures providing a building block for priors for densities, clustering, etc There is also a rich literature proposing generalizations such as beta processes, kernel stick-breaking processes, etc

16 Bayes density estimation Introduced by Ferguson (1973) as a nonparametric prior for an unknown distribution that satisfies the three desirable properties listed above Suppose we have the simple setting in which y i f, i = 1,..., n f is an unknown density on the real line R Taking a Bayesian nonparametric UQ approach, we d like to choose a prior for f - how to do this?

17 Random probability measures (RPMs) In defining priors for unknown distributions, it is particularly convenient to work with probability measures A given distribution (e.g., N(0,1)) has a corresponding probability measure & samples from that distribution obey that probability law. To allow uncertainty in whether samples are N(0, 1) or come from some other unknown distribution, we can choose a random probability measure (RPM). Each realization from the RPM gives a different probability measure & hence different distribution of the samples. A parametric model would always yield distributions in a particular parametric class (e.g., Gaussian) while a nonparametric model can generate PMs close to any PM in a broad class.

18 Probability Measures Let (Ω, B, P) denote a probability space, with Ω the sample space, B the Borel σ-algebra of subsets of Ω, and P a probability measure For example, we may have Ω = R corresponding to the real line, with P a probability measure corresponding to a density f wrt to a reference measure µ For continuous univariate densities, µ corresponds to the Lesbesgue measure.

19 Random Probability Measures (RPMs) Let P denote the set of all probability measures P on (Ω, B) To allow P to be unknown, we let P π P, where π P is a prior over P. P is then a random probability measure By allowing P to be unknown, we automatically allow the corresponding distribution of the data to be unknown How to choose π P?

20 A simple motivating model - Bayesian histograms The goal is to obtain a Bayes estimate of the density f with y i f. From a frequentist perspective, a very common strategy is to rely on a simple histogram. Assume for simplicity we have pre-specified knots ξ = (ξ 0, ξ 1,..., ξ k ), with ξ 0 < ξ 1 < < ξ k 1 < ξ k and y i [ξ 0, ξ k ] The model for the density is as follows f (y) = k π h 1(ξ h 1 < y ξ h ) (ξ h ξ h 1 ), y R, h=1 with π = (π 1,..., π k ) an unknown probability vector

21 Choosing priors for probability vectors In the fixed knot model, the only unknown parameters are the probability weights π on the bins By choosing a prior for π we induce a prior on the density f Earlier we used the beta prior for a single probability but now we have a vector of probabilities We need to choose a probability distribution on the simplex A simple choice that provides a multivariate generalization of the beta is the Dirichlet distribution

22 Dirichlet Prior Assume a Dirichlet(a 1,..., a k ) prior for π, k h=1 Γ(a h) k ) Γ( h=1 a h k h=1 π a h 1 h The hyperparameter vector can be re-expressed as a = απ 0, where E(π) = π 0 = {a 1 / h a h,..., a k / h a h} is the prior mean and α is a scale (the prior sample size) Note that an appealing aspects of histograms is that we can easily incorporate prior data & knowledge to elicit α and π 0. Very simple & interpretable & previous data often come in the form of counts in bins After choosing the prior, we use Bayesian updating to obtain the posterior distribution for π

23 Posterior distribution for bin probabilities The posterior distribution of π is calculated as ( π y n ) k h=1 k h=1 π a h 1 h π a h+n h 1 h i:y i (ξ h 1,ξ h ] π h ξ h ξ h 1 D = Diri(a 1 + n 1,..., a k + n k ), where n h = i 1(ξ h 1 < y i ξ h ) Hence, we have conjugacy and the posterior for the bin probabilities has a simple form We can easily sample from the Dirichlet to obtain realizations from the induced posterior for the density f These samples can be used to quantify uncertainty through point & interval estimates

24 Simulation Experiment To evaluate the Bayes histogram method, I simulated data from a mixture of two betas, f (y) = 0.75beta(y; 1, 5) beta(y; 20, 2). n = 100 samples were obtained from this density Assuming data between [0,1] and choosing a 10 equally-spaced knots, I applied the Bayes histogram approach The true density and Bayes posterior mean are plotted on the next slide

25 Bayes Histogram Estimate for Simulation Example density y

26 Some Comments Procedure is really easy in that we have conjugacy Results very sensitive to knots & allowing unknown knots is computationally demanding Allows prior information to be included in frequentist histogram estimates easily Can we eliminate sensitivity to choice of bins by thinking of an RPM characterization & defining a prior for all possible bins?

27 Dirichlet processes (Ferguson, 1973; 1974) Let B 1,..., B k denote a partition of the sample space Ω - e.g., histogram bins Let P correspond to a random probability measure (RPM) that assigns probability to any subset B Ω Then we could let {P(B 1 ),..., P(B k )} Diri ( αp 0 (B 1 ),..., αp 0 (B k ) ). (1) P 0 is a base probability measure providing an initial guess at P & α is a prior concentration parameter

28 Dirichlet Dirichlet process However, we don t want to be sensitive to the bins B 1,..., B k or to k It would be really cool if there was an RPM P such that (1) were satisfied for all B 1,..., B k and all k This RPM does indeed exist (as shown by Ferguson) & corresponds to a Dirichlet process (DP) The DP has many wonderful properties making it widely used in practice. For example, E{P(B)} = P 0 (B) and V{P(B)} = P 0(B){1 P 0 (B)}, 1 + α for all B B, with B the Borel σ-algebra of subsets of Ω. Hence, the prior is centered on P 0 and α controls the variance

29 Some basic properties of the DP Using the notation P DP(αP 0 ), the DP prior has large support - assigning probability to arbitrarily small balls around any PM Q over Ω iid Also, if we let y i P then we obtain conjugacy so that ( ( ) P y1,..., y n DP αp 0 + δ yi ). i Posterior is also a DP but updated to add the empirical measure i δ y i to the base measure αp 0 Updated precision is α + n so α is a prior sample size The posterior expectation of P is defined as ( ) ( ) α n E{P(B) y n 1 } = P 0 (B) + α + n α + n n δ y i. i

30 DP DPMs Realizations from the DP are almost surely discrete, having masses at the observed data points If we convert the probability measure to a cumulative distribution function it will have jumps at all data points Not ideal for characterizing continuous data but really good as a prior for mixing measure In particular, instead of using the DP directly as a prior for (the probability measure corresponding to) the distribution that generated the data, let f (y) = K(y; θ)dp(θ), P DP(αP 0 ), (2) with K( ; θ) a parametric density on Ω parameterized by θ & P an unknown mixing measure

31 Samples from the Dirichlet process with precision α α=.5 α= α=5 0.2 α=

32 Dirichlet process mixtures (DPM)s DP mixtures provide an incredibly powerful & useful framework for UQ in a very broad variety of models Although expression (2) focuses on a simple setting involving univariate density estimation, we can take any parametric model & develop a more realistic & flexible model by DP mixing the whole model or one or more components This can allow for unknown residual densities that change in shape and variance with predictors, flexible modeling of large tabular data & many other settings At this point I m going to provide some basic details on computation & inference in simple DPMs & then I ll provide a more involved example illustrating what is possible

33 Density estimation via mixtures Considering the DPM in expression (2) above, it seems we have an intractable integral to deal with However, Sethuraman (1994) provided a constructive representation of the DP showing: P DP(αP 0 ) P = iid π h δ θh, θ h P0, h=1 with π h = V h l<h (1 V l), V h Be(1, α). This is referred to as the stick-breaking representation & implies that (2) can be expressed as f (y) = π h K(y; θ h ), (3) h=1 which is a discrete mixture model.

34 Approximating as finite mixtures Although not necessary using slice sampling & other recent(ish) samplers, (3) can be approximated accurately as a finite mixture This is because the weights {π h } have a prior that strongly favors stochastic decreases in the index h After the first small number of weights the prior is concentrated near zero for the future ones Hence, we can truncated the DP using k components as an upper bound & the extra components will be effectively deleted We consider a motivating application to make these ideas concrete

35 Gestational Length vs DDE (mg/l) Gestational age at delivery DDE (mg/l)

36 Gestational Length Densities within DDE Categories DDE <15 mg/l [15,30) Density Density Gestational length Gestational length [30,45) [45,60) Density Density Gestational length Gestational length DDE >60 Density Gestational length

37 Finite mixture application We illustrate finite mixtures using a simple location mixture of Gaussians, k f (y) = π h N(y; µ h, τ 1 ). h=1 We apply the model to data on length of gestation Preterm birth defined as delivery occurring prior to 37 weeks of completed gestation Cutoff is somewhat arbitrary & the shorter the length of gestation, the more adverse the associated health effects Appealing to model the distribution of gestational age at delivery as unknown & then allow predictors to impact this distribution

38 Comments on Gestational Length Data Data are non-gaussian with a left skew Not straightforward to transform the data to approximate normality A different transformation would be needed within each DDE category First question: how to characterize gestational age at delivery distribution without considering predictors?

39 Mixture Models Initially ignoring DDE Letting y i = gestational age at delivery for woman i, f (y i ) = N(y i ; µ, σ 2 ) dg(µ, σ 2 ), where G=mixture distribution for θ = (µ, σ 2 ) Mixtures of normals can approximate any smooth density Location Gaussian mixture with k components one possibility: k f (y i ) = p h N(y i ; µ h, σ 2 ). h=1

40 Mixture components for gestational age at delivery Component 1 (86% of deliveries) Density Component 2 (12% of deliveries) Density Density Component 3 (2% of deliveries) Gestational week of delivery

41 Mixture-based density of gestational age at delivery Density Gestational week at delivery

42 Approximate DPM computation The finite mixture of normals can be equivalently expressed as y i N(µ Si, τ 1 S i ), S i k π h δ h h=1 δ h = probability measure concentrated at the integer h S i {1,..., k} = indexes the mixture component for subject i A Bayesian specification is completed with priors, π = (π 1,..., π k ) Dirichlet(a 1,..., a k ) (µ h, τ h ) N(µ h ; µ 0, κτ 1 )Ga(τ; a τ, b τ ), h = 1,..., k with a j = α/k approximating the Dirichlet process (as k increases).

43 Posterior Computation via Gibbs Sampling 1. Update S i from its multinomial conditional posterior with Pr(S i = h ) = π hn(y i ; µ h, τ 1 h ) k l=1 π ln(y i ; µ l, τ 1, h = 1,..., k. l ) 2. Update (µ h, τ 1 ) from its conditional posterior h (µ h, τ 1 h ) = N(µ h ; µ h, κτ 1 h )Ga(τ h; â τh, ˆb τh ), κ h = (κ 1 + n h ) 1, µ h = κ(κ 1 µ 0 + n h y h ), â τh = a τ + n h /2, ˆb τh = b τ + 1 { ( } (y i y 2 h ) 2 nh + )(y 1 + κn h µ 0 ) 2. h i:z i =h 3. Update (π ) Dir(a 1 + n 1,..., a k + n k )

44 Some Comments Gibbs sampler is trivial to implement Discarding a burn-in, monitor f (y) = k h=1 π hn(y; µ h, τ 1 for a large number of iterations & a dense grid of y values h ) Bayes estimate of f (y) under squared error loss averages the samples Can also obtain 95% pointwise intervals for unknown density

45 Nonparametric residual densities The above focus was on univariate densities but the approach is very easy to generalize to other settings For example, suppose we have the nonparametric regression model: y i = µ(x i ) + ɛ i, ɛ i f. Then, we can use a DPM to allow the residual density to have an unknown form - allowing possible skewness & multimodality Due to the modular nature of MCMC, computational steps for DPM part of the model are essentially no different from described above

46 Nonparametric hierarchical models In hierarchical probability models, random effects often assigned Gaussian distributions Doesn t account for uncertainty in the random effect distributions We can use a DPM to allow unknown distributions of random effects Such a direction can be applied well beyond settings involving univariate continuous responses

47 Probabilistic tensor factorizations We instead propose a low rank & sparse factorization of pr(y i1 = c 1,..., y ip = c p ) = π c1 c p, with these probabilities forming an array Π = {π c1 c p }. Express array as a weighted average of rank-one arrays, π c1 c p = k p ν h h=1 j=1 λ (j) hc j, with appropriate priors on the components Leads to a simple & scalable approach - allowing uncertainty in rank & other parameters in factorization

48 High-dimensional contingency table analysis One application is nonparametric modeling of a high-dimensional pmf for categorical data y i = (y i1,..., y ip ) Many different categorical items recorded for the same subjects We have unknown dependence in the occurrences of these different categorical items We could use a log-linear model but flexible log-linear models aren t scalable beyond small p

49 Probabilistic Parafac Considering the Parafac factorization, π c1 c p = k p ν h h=1 j=1 λ (j) hc j, we can place a stick-breaking prior on the weights {ν h } Essentially a DPM of product multinomial distributions - automatically infers the rank of the tensor Leads to a simple Gibbs sampling algorithm Full probabilistic characterization of uncertainty in the elements of π

50 Beyond DPMs There are also rich classes of models that go well beyond DPMs For example, in the premature delivery application from above & many other settings we may want to model f (y x) This conditional response density may not factorize as µ(x) + ɛ with ɛ iid Instead we may need to characterize flexible changes in mean, variance & shape of the density with predictors

51 Gestational Length vs DDE (mg/l) Gestational age at delivery DDE (mg/l)

52 Gestational Length Densities within DDE Categories DDE <15 mg/l [15,30) Density Density Gestational length Gestational length [30,45) [45,60) Density Density Gestational length Gestational length DDE >60 Density Gestational length

53 Predictor-dependent RPMs One highly flexible method to characterize conditional densities lets f (y x) = N { y; µ(x; θ), τ 1} dg x (θ, τ), µ(x; θ) is a regression function parameterized by θ G X = {G x, x X } is a mixing measure varying with predictors Even if µ is linear, obtain a fully flexible specification with a flexible enough prior for G X

54 Kernel-stick breaking The kernel stick-breaking process is one popular prior for G X Generalizes the stick-breaking representation of the DP to include predictor dependence in the weights Leads to consistent estimation of the conditional density function & essentially as easy to do computation as for a DPM

55 DDE & Gestational Age at Delivery Application We focus on the mixture model: f (y i x i ) = N(y i ; x iβ i, σi 2 ) dg xi (β i, σi 2 ) y i = Gestational age at delivery in days, x i = DDE (mg/l) Collection of mixture distributions, G X = {G x : x X }, assigned an adaptive kernel mixture of DPs prior

56 Posterior Simulation Overview MCMC algorithm run 10,000 iterations Convergence and mixing good Fit excellent based on pivotal statistic-based approach (Johnson, 2006) Can estimate gestational age density for any dose of DDE & dose response curves for any quantile

57 Results: DDE & Gestational Age at Delivery Gestational age at delivery DDE (mg/l)

58 Results: DDE & Gestational Age at Delivery 10th percentile of DDE(12.57) 30th percentile of DDE(18.69) f(y x) f(y x) f(y x) Gestational length 60th percentile of DDE(28.44) Gestational length 99th percentile of DDE(105.48) Gestational length f(y x) f(y x) Gestational length 90th percentile of DDE(53.72) Gestational length

59 Results: DDE & Gestational Age at Delivery Pr(Gestational length <33) Pr(Gestational length <35) DDE (mg/l) DDE (mg/l) Pr(Gestational length <37) DDE (mg/l) Pr(Gestational length <40) DDE (mg/l)

60 Notes Have provided a very brief overview of nonparametric Bayes with an eye towards UQ Focus on Dirichlet process mixtures, their applications & closely related models Gaussian processes at least as useful in UQ if not more so - they provide another np Bayes prior I assume O Hagan will focus on the GP in his lectures so purposely avoided it

61 Some References Dunson & Park (2008) Kernel stick-breaking processes. Biometrika, 95, Dunson & Xing (2009) Nonparametric Bayes modeling of multivariate categorical data. JASA, 104, Gelman et al. (2013) Bayesian Data Analysis (BDA3). CRC Press. Contains chapters on np Bayes.

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian isotonic density regression

Bayesian isotonic density regression Bayesian isotonic density regression Lianming Wang and David B. Dunson Biostatistics Branch, MD A3-3 National Institute of Environmental Health Sciences U.S. National Institutes of Health P.O. Box 33,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Bayesian Nonparametric Regression through Mixture Models

Bayesian Nonparametric Regression through Mixture Models Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression

More information