Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Size: px
Start display at page:

Download "Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)"

Transcription

1 Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research)

2 Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity Hypertension > 5 Low Average High Yes No Yes No Yes No

3 The data Sample data consists of counts of (multivariate) categorical variables, recorded for each unit in the sample. Units i = 1,..., n are classified by variables j = 1,..., p, where variable j has m j categories. Categorical response vectors y i = (y i1, y i2,..., y ip ) are observed. The contingency table y, derived from {y i, i = 1,..., n} has m = p 1 m j cells, with cell counts n p y = i=1 where [y ij ] k = I[y ij = k], k = 1,..., m j. j=1 y ij

4 The unrestricted model For any unit i, the marginal distribution of y (n = 1) can be expressed in unconstrained form as { m m ( p(y) = = exp m )} θ k y k n log exp θ k (1) k=1 π y k k k=1 where π = (π 1,..., π m ) is a vector of cell probabilities and θ = (θ 1,..., θ m ) is a corresponding multivariate logit m θ k = log π k a l log π l where a l = 1, a l 0. Typically a l = I[l = 1] or a l = I[l = m] (reference cell logit) though a l = 1/m (centred logit) can sometimes be more useful. Typically, we assume that individual classifications y i are conditionally independent given a common θ, in which case (1) holds for n > 1. Distribution of y is a multivariate natural exponential family. l=1 k=1

5 Conjugate inference for the unrestricted model A Bayesian conjugate prior for cell probabilities π of the unrestricted model is Dirichlet(α), that is m p(π) or equivalently, for θ, k=1 π α k 1 k { m ( m )} p(θ) exp θ k α k α + log exp θ k k=1 Correspondence of (2) and (1) makes this the Diaconis-Ylvisaker conjugate prior. k=1 (2) [Alternatives include priors based on multivariate normal distribution for θ; some advantages, some drawbacks]

6 Incorporating structure with parsimonious models General log-linear models θ = Xβ for some n r X satisfying the constraint on θ Log-linear interaction models for contingency tables Imply specific forms for X (with or without hierarchy constraints) (Undirected) graphical models Hierarchical models specified purely by conditional independence properties (constraints) C A B

7 Conjugate inference for the general log-linear model { r ( m )} p(y) exp β k t k n log exp[xβ] k k=1 k=1 (3) where t = X T y. The Diaconis-Ylvisaker conjugate prior for β is { r ( m )} p(β) exp β k s k α log exp[xβ] k k=1 k=1 (4) where s and α are hyperparameters. Massam, Liu and Dobra (2008) investigate this structure in detail for hierarchical log-linear models, and provide lots of interesting results.

8 Forthcoming attraction Bayesian structural learning and estimation in Gaussian graphical models and hierarchical log-linear models starring Adrian Dobra In theatre(s) from July 8

9 Conditional Dirichlet distributions and compatibility (Grigsby, 2001) Suppose we derive a prior for a general log-linear model by taking a Dirichlet for the unconstrained model and conditioning (on the log-linear scale). Then, essentially trivially, we arrive at { r ( m )} p(β) exp β k s k α log exp[xβ] k k=1 k=1 (4) as before. Hence, these conjugate priors are compatible (in a conditional sense; see Dawid and Lauritzen, 2001). Conditioning on the linear (θ) scale ensures here that a proper initial prior leads to a proper prior on the submodel. See also Gûnel and Dickey (1974).

10 Hyper-Dirichlet equivalence MLD, Leucari, Grigsby show that, for a decomposable undirected graphical model, the conjugate prior (4) is equivalent to a hyper-dirichlet (Dawid and Lauritzen, 1993). Grigsby proof Based on a perfect ordering, and standard decomposition p(y i ) = p j=1 p j (y ij y i,pa(j) ) where π j (l pa(j) ) = {p j (k l pa(j) ), k = 1,... m j } is a m j -vector of probabilities for each distinct combination l pa(j) of levels of pa(j).

11 With this parameterisation, natural (closed under sampling) prior family is π j (l pa(j) ) Dirichlet(α j,lpa(j) ) (5) independently, for each variable j and each parent combination l pa(j). This is the product Dirichlet described by Cowell et al (1999), which here is equivalent to the hyper-dirichlet for particular choices of {α j,lpa(j) } (directed hyper-dirichlet). Use this distribution within the (conditional) logit parameterisation θ j (k, l pa(j) ) = log p j (k, l pa(j) ) log p j (1 l pa(j) ) Then the (directed) hyper-dirichlet and conditional Dirichlet densities are functionally equivalent ( likelihood-like ). Remains to prove that Jacobian = 1; achieved via identifying a correspondence between {θ j (k, l pa(j) )} and log-linear parameters β for which derivative matrix is upper triangular with unit diagonal.

12 Non-hyper product Dirichlet priors for perfect DAG models Is there any role for using independent with non-consistent {α j,lpa(j) }? π j (l pa(j) ) Dirichlet(α j,lpa(j) ) (5) Potentially, unattractive as typically such a prior family will not be closed under changes of (non-trivial) decomposition. For example for a pair of binary variables {π 1 Beta(α 1 ),π 2 (1) Beta(α 21 ), π 2 (2) Beta(α 22 )} {π 2 Beta(α 2 ), π 1 (1) Beta(α 11 ), π 1 (2) Beta(α 12 )} However, for certain decomposable models, for particular decompositions, the Jeffreys prior has this form.

13 Jeffreys prior Recall Jeffreys rule for constructing a default prior distribution invariant under reparameterisation. f(π) I(π) 1/2, For a decomposable undirected graphical model, parameterised via a perfect DAG and {π j (l pa(j) )}, the Jeffreys prior is f(π) j l pa(j) l pa(j) p j (l j l pa(j )) j m j 1 2 j p j (l j l pa(j) ) 1 2 l {j} pa(j) Jeffreys prior simplifies to product Dirichlet, when the summation in (6) can be simplified to a single product term for every j, l pa(j). (6)

14 Jeffreys prior simplification The Jeffreys prior simplifies to product Dirichlet for any grandparent-free perfect ordering. Such an ordering is only possible for decomposable graphs whose junction tree has a common separator between nodes. C A B A C (a) B A C (b) B A C (c) B For (a) the Jeffreys prior simplifies to independent symmetric Dirichlet(α1) distributions with α C = (m A + m B 3)/2, α A,lC = α B,lC = 1/2 for all l C

15 Inference and computation For Dirichlet-equivalent priors, parameteric inference and marginal likelihood computation (for model determination) are easy. For other priors we have discussed, numerical methods are required, and for non-decomposable conjugates and non-dirichlet Jeffreys, prior normalising constants must be computed for marginal likelihoods. (Markov chain) Monte Carlo methods or numerical integration based on Laplace s method work well. For conditional (generalized hyper) Dirichlet for nondecomposable models, Laplace can be computed simply in standard software with log-linear model fitting, and basic matrix functionality. However, beware of Laplace s method for prior normalising constant.

16 Convergence of Laplace s method for prior normalising constant (a) A + B [2] (c) ABC [8] (b) AB+BC [5] (d) ABC [27] Can be improved by bridge sampling from prior and a normal approximation (time-consuming, but can build a catalogue)

17 Alternative computation? For model determination and conditional Dirichlet priors, it may be possible to exploit the Savage-Dickey density ratio. Recall that, for comparing models M 1 : y p 1 (y θ, φ) and M 0 : y p 0 (y φ) where p 0 (y φ) = p 1 (y θ 0, φ), we have the Bayes factor provided that p 0 (y) p 1 (y) = p 1(θ 0 y) p 1 (θ 0 ) p 0 (φ) = p 1 (φ θ 0 ). This constraint is exactly the same as used in deriving the conditional Dirichlet, so for example models ABC and AB+AC+BC can be compared by generating the missing interaction parameter (log-contrast) under the Dirichlet prior and posterior for ABC, and calculating the ratio of two density estimates at zero. How feasible is this generally?

18 Posterior model probabilities for Table 1 Model Posterior probability Posterior probability (ordinal) OH + AH A + OH AOH OH + OA O + AH OA + AH O + A + H H + OA

19 Ordinal probit models multivariate (Chib and Greenberg, 1998) z i N(β, Σ m ) is a latent continuous variable y ij = c if α j,c 1 < z i α j,c, (α j,0 =, α j,mj = ) Prior is β MV Normal α j,c Indept Uniform subject to ordering constraint Σ m distribution consistent with any constraints Identifiability constraints σ ii = 1, α j,1 = 0, j = 1,..., p.

20 An alternative parameterisation ( Constrain α j,1 = α j,mj 1 = Φ 1 ) 1 m j, j = 1,..., p. Then Σ is unconstrained and can be given a (hyper) Inverse Wishart prior. Conditionals are then straightforward to sample. Not possible if any k i = 2 (binary variable). Instead, consider the Cholesky decomposition Σ 1 = Φ T Φ where Φ is upper triangular.

21 The elements of Φ appear in the decomposition z ip β p + N z i,p 1 z ip β p ( 0, 1 φ 2 pp ) φ p 1,p φ p 1,p 1 (z ip β p ) + N ( 0, 1 φ 2 p 1,p 1 z i1 z i2,..., z ip β 1 φ 1p φ 11 (z ip β p )... φ 12 φ 11 (z i2 β 2 ) + N ) ( 0, 1 φ 2 11 ) For binary (and other) variables we can constrain λ j φ 1 jj = 1. remaining λ j φ 1 jj Gamma ψ j (φ j,j+1,..., φ jp )φ 1 jj φ jj MV Normal [Equivalence with (hyper) inverse Wishart; Roverato (2002)]

22 Graphical models Gaussian DAG models for z ( graphical ordinal probit models for y) can be specified by setting certain ψ jk = 0, for an appropriate ordering. Undirected graphical models can be specified using an equivalent DAG Conditional conjugacy allows straightforward MCMC computation Model determination for DAG models given an ordering uses Reversible Jump MCMC with transitions between models which differ by a single edge (see also Fronk, 2002) Model determination for undirected graphical models requires order switching. Propose to transpose two neighbouring variables in the current ordering, with associated deterministic parameter transformation (RJMCMC allows this) Prior must compensate for the fact that not all models are available under the same number of orderings (Order counting in the model jump step; Chandran et al, 2003).

23 Table 2: Colouring of blackbirds (Anderson and Pemberton, 1985) Orbital Ring Lower Mandible Upper Mandible = mostly black, 2 = intermediate, 3 = mostly yellow

24 Predictive logarithmic scores for Table 2 Conditional independence Ordinal models Non-ordinal models structure None L O U U O L model-averaged S = 90 i=1 log p(y i y \i ) where y \i represents the data y with y i removed Posterior model probabilities are (unstructured), (L O U) and (U O L).

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK)

Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK) Bayesian Methods for Multivariate Categorical Data Jon Forster (University of Southampton, UK) Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Inference for Poisson

Bayesian Inference for Poisson Working Paper M09/11 Methodology Bayesian Inference for Poisson and Multinomial Log-linear Models Jonathan J. Forster Abstract Categorical data frequently arise in applications in the social sciences.

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models

Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Guido Consonni a, Monia Lupparelli b a Dipartimento di Economia Politica e Metodi Quantitativi University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Bayesian Inference in the Multivariate Probit Model

Bayesian Inference in the Multivariate Probit Model Bayesian Inference in the Multivariate Probit Model Estimation of the Correlation Matrix by Aline Tabet A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

In particular, the relative probability of two competing models m 1 and m 2 reduces to f(m 1 jn) f(m 2 jn) = f(m 1) f(m 2 ) R R m 1 f(njm 1 ; m1 )f( m

In particular, the relative probability of two competing models m 1 and m 2 reduces to f(m 1 jn) f(m 2 jn) = f(m 1) f(m 2 ) R R m 1 f(njm 1 ; m1 )f( m Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Log-linear Models Petros Dellaportas 1 and Jonathan J. Forster 2 SUMMARY The Bayesian approach to comparing models involves calculating

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Elements of Graphical Models DRAFT.

Elements of Graphical Models DRAFT. Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Bayesian Inference for Gaussian Mixed Graph Models

Bayesian Inference for Gaussian Mixed Graph Models Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Learning Bayesian Networks

Learning Bayesian Networks Learning Bayesian Networks Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki V-1 Aspects in learning Learning the parameters of a Bayesian network Marginalizing over all all parameters

More information

Statistical Methodology. The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors

Statistical Methodology. The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors Statistical Methodology 7 (2010) 240 253 Contents lists available at ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/stamet The mode oriented stochastic search (MOSS) algorithm

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Quaderni di Dipartimento. Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models

Quaderni di Dipartimento. Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Quaderni di Dipartimento Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Guido Consonni (Università di Pavia) Monia Lupparelli (Università di Bologna)

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Bayesian Inference for Gaussian Mixed Graph Models

Bayesian Inference for Gaussian Mixed Graph Models Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University

More information

The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors

The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors Adrian Dobra,a,b, Héléne Massam c a Department of Statistics, University of Washington, Seattle, WA 98155-4322,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

AALBORG UNIVERSITY. Learning conditional Gaussian networks. Susanne G. Bøttcher. R June Department of Mathematical Sciences

AALBORG UNIVERSITY. Learning conditional Gaussian networks. Susanne G. Bøttcher. R June Department of Mathematical Sciences AALBORG UNIVERSITY Learning conditional Gaussian networks by Susanne G. Bøttcher R-2005-22 June 2005 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7 G DK - 9220 Aalborg Øst

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Graphical Models 359

Graphical Models 359 8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information

Variational Message Passing

Variational Message Passing Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Probabilistic Graphical Models: Representation and Inference

Probabilistic Graphical Models: Representation and Inference Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

The joint posterior distribution of the unknown parameters and hidden variables, given the

The joint posterior distribution of the unknown parameters and hidden variables, given the DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Learning the structure of discrete-variable graphical models with hidden variables

Learning the structure of discrete-variable graphical models with hidden variables Chapter 6 Learning the structure of discrete-variable graphical models with hidden variables 6.1 Introduction One of the key problems in machine learning and statistics is how to learn the structure of

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information