Foundations of Nonparametric Bayesian Methods

Size: px
Start display at page:

Download "Foundations of Nonparametric Bayesian Methods"

Transcription

1 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz

2 2 / 27 Tutorial Overview Part I: Basics Part II: Models on the simplex 1. Bayesian models and de Finetti 2. DP construction 3. Generalization exmples: Pólya trees, neutral processes 4. Consistency of NP Bayesian estimates Part III: Construction of new models

3 3 / 27 Introduction: NP Bayes Motivation Machine learning Roughly: Flexible models that adapt well to data. Bayesian statistics Search for universal priors. Bayesian modeling Assumption: Data i.i.d. from unknown model. (Essential!) De Finetti: Justified for exchangeable data. Theorem of de Finetti (in part) prompted search for NP priors. Exchangeable sequence Probability of data X 1, X 2,... invariant under permutation of sampling order.

4 4 / 27 De Finetti s Theorem (1) Theorem (De Finetti, 1937; Hewitt & Savage, 1955) On a Polish space (D, B(D)), RVs X 1, X 2,... are exchangeable iff there is a random probability measure F on D such that X 1, X 2,... are conditionally i.i.d. with distribution F. Additionally: 1. X i exch.ble distribution G of F uniquely determined. 2. For B B(D): 1 lim n n n I B (X i ) = a.s. F(B) i=1 That is: Empirical distribution n F almost surely

5 5 / 27 De Finetti s Theorem (2) Interpretation If X i exchangeable, then: 1. There is RV F with values f M + 1 (D) 2. Distribution: G = F(P) 3. X i are generated as: f G and X i i.i.d f Consequence For A := i=1 A i: µ X (A ) = M + 1 (D) ( i=1 ) f (A i ) dg(f )

6 6 / 27 Bayes and de Finetti NP Bayesian density estimation 1. Start with a prior assumption (eg a DP) for G. 2. Update prior given data. 3. Hopefully: Posterior will concentrate at true f. (We are not trying to estimate G!) Requirements 1. Prior sufficiently spread over M + 1 (D) 2. Updating prior on F given draws from F must be possible Terminology: Conjugacy Algebraic: Prior in G, likelihood in F posterior in G Functional: Prior, posterior in same parametric class Measurable map (prior par, data) posterior par

7 7 / 27 Prior Domains Note Parametric priors perfectly fit into de Finetti framework. Why not use a parametric prior on F? Example (Draper, 1999) 1. Start with Gaussian model, prior on µ, σ. 2. Data comes in: Bimodal. 3. Two choices: Keep model poor fit Change to bimodal model confounded (data used twice) Prior domain requirement Prior should support sufficiently diverse models to roughly fit arbitrary data.

8 8 / 27 Process Models: Divergent Aims NP Bayesian statistics Prior models with large support in M + 1 (D) DP: Concentrates on discrete distributions in M + 1 (D) Principal motivation of DP generalization: Overcome discreteness Machine Learning Principal application: Clustering and mixtures Discreteness required

9 9 / 27 Construction Approaches Kolmogorov Constructions Construct process directly from marginals. Generalization of DP Consider all processes with a particular property of DP: Partitioning: Tailfree processes and Pólya trees Random CDFs: Neutral-to-the-right (NTTR) and Levy processes Related: DP mixtures (smoothed DPs) De Finetti-based Derive class of possible G from knowledge about X i. Example: X i follow generalized Pólya urn scheme G is a DP

10 10 / 27 Dirichlet Process Construction Recall: Two stage Construction 1. Extend marginals to process on (Ω E, B E ) (Kolmogorov) 2. Restrict process to subset Ω Ω E (Doob) Kolmogorov s theorem: Dirichlet process Axes Ω 0 := [0, 1] Index set E = H(D) "measurable partitions" H(D) := {B 1,..., B d partition of D, d N, B i B(D)} Marginals: Dirichlet distributions (for I E) dµ I Θ I α,y I (θ I ) = p I (θ I α, y I )dλ I (θ I ) where: p I = Dirichlet density λ I = Lebesgue measure restricted to Sim(R, I )

11 11 / 27 DP: Restriction Step Restrict to Ω = M + 1 (D) Check: Ω Polish (in weak topology; true if D Polish) Check: µ E, ( Ω) = 1 (true for Dirichlet marginals) Doob s theorem: Exists µ E on Ω with same marginals as µ E. We are done: µ E is our Dirichlet process µ E = DP(αG 0 ) Historical Note Ferguson s (1973) construction: Only extension step Ghosh & Ramamoorthi (2002) note that singletons are not in B E, give equivalent construction on Q

12 12 / 27 Conjugate Posterior Bayesian Model with DP Unobserved: G DP(. α 0 G 0 ) (draw from prior) Observed: θ G (sample draw) Consider Marginals For any H = (B 1,..., B m ) H: Exactly one bin B k contains θ Posterior update: Prior parameters: y H 0 = (G 0(B 1 ),..., G 0 (B m )) and α 0 Posterior: y H 1 (G 0(B 1 ),..., G 0 (B k ) + 1 α 0,..., G 0 (B m )) α 1 = α DP posterior Posterior is DP(. α 1 G 1 ) with G 1 α 0 G 0 + δ θ and α 1 = α 0 + 1

13 13 / 27 Sampling Under DP Prior Model θ 1,..., θ n G G DP(α 0 G 0 ) Marginalize out G Sequential Sampling With posterior formula: θ 1 G 0 θ i+1 i δ θj + αg 0 j=1

14 14 / 27 Discreteness Key Observation If G DP(α 0 G 0 + δ θ ), then G({θ}) > 0 a.s. Consequence Draw G from DP discrete almost surely, so G = c i δ θi i=1 Stick-breaking (Sethuraman, 1984) Generative model for c i and θ i : θ i iid G 0 (i 1 ) c i := (1 Z j ) Z i with Z i iid Beta(1, α 0 ) j=1

15 15 / 27 Other Properties Undominated Model Family DP(. α 0 G 0 + δ θ ) of posteriors not dominated In particular: No Bayes equation Implication: Conjugacy essential Prior support: KL property DP posits non-zero mass on every Kullback-Leibler ɛ-neighborhood of every measure in M + 1 (D).

16 16 / 27 Now: Generalizations (Potentially) Smooth models Dirichlet process mixtures Pólya trees, tailfree processes Neutral-to-the-right (NTTR) and Lévy processes

17 DP mixtures Original motivation Smooth DP by convolution with parametric model. Mixture model q conditional density, µ Z mixing distribution p(x) = q(x z)dµ Z (z) Ω z Example: q and µ Z Gaussian, z variance p Student Finite mixture µ Z (B) = n c i δ zi (B) for B B(Ω x ) i=1 DP mixture Draw µ Z from DP. 17 / 27

18 18 / 27 Pólya Trees Idea (on R) Binary splitting of R into measurable sets. Subdivision as tree: Nodes = Sets with Children = Subsets Root = R One random probability per edge P(B) = product over probs on path Root B Model components m = level in tree π m = measurable partition π m+1 refines π m V m,b = Beta RVs α m,b = Beta parameters

19 19 / 27 Pólya Tree Properties Include discrete and continuous models Particular expectation G 0 : Choose π m such that values G 0 (B) match quantiles of G 0 α m,b m 2 draws absolutely continuous α m,b 1 m DP Conjugacy Sample observation: Posterior update: G PT(π, α) Y G α m,b α m,b + 1 for B with Y B.

20 20 / 27 The Tailfree Property Pólya trees are example of tailfree processes. Def: Tailfree process Consider process defined like PT, but with arbitrary RVs V m,b. Tailfree: If RV sets {V m,b B π m } independent between tree levels. Meaning: Tailfree model Prior makes no assumption about shape of tails outside region covered by samples. Freedman (1968) Countable domains: Tailfree processes have consistent posteriors.

21 21 / 27 Random CDFs on R Observation: For D = R DP: normalized gamma process. Draw CDF as process trajectory. CDF properties are local: 1. initialize at 0 2. monotonicity expressible as increments Possible choice: Lévy processes Easily chosen as monotonic ( subordinators ) Can be simulated numerically Best-understood class of stochastic processes Independence of components may be desirable

22 22 / 27 Lévy Processes Def: Lévy process Stochastic process with stationary independent increments and right-continuous pathes with left-hand limits (rcll). Intuition: rcll Only countable number of jumps Jump at x: Value at x belongs to right-hand branch Decomposition property Decomposes into a Brownian motion (continuous) and a Poisson (jump) component ( simulation). De Finetti property (Bühlmann, 1960) P Process (X t ) with X s Xt whenever s t. Increments: exchangeable conditionally stationary and independent

23 Lévy Processes and NPB Direct approach To generate CDF: 1. Draw trajectory from increasing process 2. Normalize Dead end (James et al 2005) If process normalized subordinator and conjugate, it is a Dirichlet process. Neutrality (NTTR processes) Process is neutral to the right if cumulative hazard function (= 1-CDF) has independent increments. F NTTR log(1 F(. )) Lévy NTTR process properties Conjugacy (algebraic), contain DP, range of consistency results 23 / 27

24 24 / 27 Lévy Processes and Kolmogorov Extensions Construction of Lévy process Choose µ I to guarantee independent stationary increments Apply Kolmogorov process µ E on Ω E Restrict to Ω = set of rcll functions Convolution Semigroup Set of measures ν s (s R + ) with ν s+t = ν s ν t s, t R + Lévy marginals from convolution semigroup For I = {t} set µ I := ν t with {ν s } s R+ convolution semigroup.

25 25 / 27 Consistency Consistency means: If θ 0 is true solution, posterior (regarded as random measure) converges to δ θ0 a.s. as n. Two notions of consistency Frequentist consistency: a.s. under µ X (X Θ = θ 0 ). Bayesian consistency: a.s. under evidence only ensures consistency for θ 0 in prior domain Different results Bayesian def: Under very general conditions, any Bayesian model consistent (Doob, 1949) Frequentist def: Models like DP can be completely wrong.

26 Example Results Countable domain D (Freedman, 1968) Freedman gives example of Bayes-consistent model which converges to arbitrary prescribed distribution for any true value of θ 0. Remedy: Tailfree prior. DP inconsistency (Diaconis & Freedman, 1986) DP estimate of mean (of unknown distribution) can be inconsistent. Recent results Quantify model complexity (learning-theory style) Under conditions on complexity: Consistency and convergence rates Controllable behavior under model misspecification Shen & Wasserman (1998); Ghosal, van der Vaart et al (2002 and later) 26 / 27

27 27 / 27 Preview Part III Constructing parameterized models from parameterized marginals Conjugacy Sufficient statistics of process models What are suitable families of marginals?

Exchangeability. Peter Orbanz. Columbia University

Exchangeability. Peter Orbanz. Columbia University Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Dirichlet Processes: Tutorial and Practical Course

Dirichlet Processes: Tutorial and Practical Course Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

arxiv: v2 [math.st] 7 Jan 2011

arxiv: v2 [math.st] 7 Jan 2011 CONJUGATE PROJECTIVE LIMITS Peter Orbanz University of Cambridge arxiv:1012.0363v2 [math.st] 7 Jan 2011 We characterize conjugate nonparametric Bayesian models as projective limits of conjugate, finite-dimensional

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

On the posterior structure of NRMI

On the posterior structure of NRMI On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with P. Di Biasi and G. Peccati Workshop on Limit Theorems and Applications Paris, 16th January

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation

On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation On Consistency of Nonparametric Normal Mixtures for Bayesian Density Estimation Antonio LIJOI, Igor PRÜNSTER, andstepheng.walker The past decade has seen a remarkable development in the area of Bayesian

More information

6 Bayesian Methods for Function Estimation

6 Bayesian Methods for Function Estimation hs25 v.2005/04/21 Prn:19/05/2005; 15:11 F:hs25013.tex; VTEX/VJ p. 1 1 Handbook of Statistics, Vol. 25 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(05)25013-7 1 6 Bayesian

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Bayesian Nonparametric and Parametric Inference

Bayesian Nonparametric and Parametric Inference JIRSS (2002) Vol. 1, Nos. 1-2, pp 143-163 Bayesian Nonparametric and Parametric Inference Stephen G. Walker Department of Mathematical Sciences, University of Bath, Bath BA2 7AY, United Kingdom. (massgw@maths.bath.ac.uk)

More information

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

The Calibrated Bayes Factor for Model Comparison

The Calibrated Bayes Factor for Model Comparison The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop

More information

Nonuniform Random Variate Generation

Nonuniform Random Variate Generation Nonuniform Random Variate Generation 1 Suppose we have a generator of i.i.d. U(0, 1) r.v. s. We want to generate r.v. s from other distributions, such as normal, Weibull, Poisson, etc. Or other random

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

1 Independent increments

1 Independent increments Tel Aviv University, 2008 Brownian motion 1 1 Independent increments 1a Three convolution semigroups........... 1 1b Independent increments.............. 2 1c Continuous time................... 3 1d Bad

More information

arxiv: v3 [math.st] 19 Oct 2011

arxiv: v3 [math.st] 19 Oct 2011 Electronic Journal of Statistics Vol. 5 (2011) 1354 1373 ISSN: 1935-7524 DOI: 10.1214/11-EJS641 Projective limit random probabilities on Polish spaces arxiv:1101.4657v3 [math.st] 19 Oct 2011 Peter Orbanz

More information

TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.)

TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) ABSTRACT TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) Prediction of the future observations is an important practical issue for

More information

Quantifying the Price of Uncertainty in Bayesian Models

Quantifying the Price of Uncertainty in Bayesian Models Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Quantifying the Price of Uncertainty in Bayesian Models Author(s)

More information

Scribe Notes 11/30/07

Scribe Notes 11/30/07 Scribe Notes /3/7 Lauren Hannah reviewed salient points from Some Developments of the Blackwell-MacQueen Urn Scheme by Jim Pitman. The paper links existing ideas on Dirichlet Processes (Chinese Restaurant

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Bayesian nonparametric models of sparse and exchangeable random graphs

Bayesian nonparametric models of sparse and exchangeable random graphs Bayesian nonparametric models of sparse and exchangeable random graphs F. Caron & E. Fox Technical Report Discussion led by Esther Salazar Duke University May 16, 2014 (Reading group) May 16, 2014 1 /

More information

Probabilistic modeling of NLP

Probabilistic modeling of NLP Structured Bayesian Nonparametric Models with Variational Inference ACL Tutorial Prague, Czech Republic June 24, 2007 Percy Liang and Dan Klein Probabilistic modeling of NLP Document clustering Topic modeling

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

The Bayesian Paradigm

The Bayesian Paradigm Stat 200 The Bayesian Paradigm Friday March 2nd The Bayesian Paradigm can be seen in some ways as an extra step in the modelling world just as parametric modelling is. We have seen how we could use probabilistic

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Functional Conjugacy in Parametric Bayesian Models

Functional Conjugacy in Parametric Bayesian Models Functional Conjugacy in Parametric Bayesian Models Peter Orbanz University of Cambridge Abstract We address a basic question in Bayesian analysis: Can updates of the posterior under observations be represented

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probability Background

Probability Background CS76 Spring 0 Advanced Machine Learning robability Background Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu robability Meure A sample space Ω is the set of all possible outcomes. Elements ω Ω are called sample

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Density Modeling and Clustering Using Dirichlet Diffusion Trees p. 1/3 Density Modeling and Clustering Using Dirichlet Diffusion Trees Radford M. Neal Bayesian Statistics 7, 2003, pp. 619-629. Presenter: Ivo D. Shterev p. 2/3 Outline Motivation. Data points generation.

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Order-q stochastic processes. Bayesian nonparametric applications

Order-q stochastic processes. Bayesian nonparametric applications Order-q dependent stochastic processes in Bayesian nonparametric applications Department of Statistics, ITAM, Mexico BNP 2015, Raleigh, NC, USA 25 June, 2015 Contents Order-1 process Application in survival

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Construction of Dependent Dirichlet Processes based on Poisson Processes

Construction of Dependent Dirichlet Processes based on Poisson Processes 1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

arxiv: v1 [stat.ml] 20 Nov 2012

arxiv: v1 [stat.ml] 20 Nov 2012 A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Outline. Introduction to Bayesian nonparametrics. Notation and discrete example. Books on Bayesian nonparametrics

Outline. Introduction to Bayesian nonparametrics. Notation and discrete example. Books on Bayesian nonparametrics Outline Introduction to Bayesian nonparametrics Andriy Norets Department of Economics, Princeton University September, 200 Dirichlet Process (DP) - prior on the space of probability measures. Some applications

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information