Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
|
|
- Alexina Wilcox
- 5 years ago
- Views:
Transcription
1 Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research)
2 Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity Hypertension > 5 Low Average High Yes No Yes No Yes No
3 The data Sample data consists of counts of (multivariate) categorical variables, recorded for each unit in the sample. Units i = 1,..., n are classified by variables j = 1,..., p, where variable j has m j categories. Categorical response vectors y i = (y i1, y i2,..., y ip ) are observed. The contingency table y, derived from {y i, i = 1,..., n} has m = p 1 m j cells, with cell counts n p y = i=1 where [y ij ] k = I[y ij = k], k = 1,..., m j. j=1 y ij
4 The unrestricted model For any unit i, the marginal distribution of y (n = 1) can be expressed in unconstrained form as { m m ( p(y) = = exp m )} θ k y k n log exp θ k (1) k=1 π y k k k=1 where π = (π 1,..., π m ) is a vector of cell probabilities and θ = (θ 1,..., θ m ) is a corresponding multivariate logit m θ k = log π k a l log π l where a l = 1, a l 0. Typically a l = I[l = 1] or a l = I[l = m] (reference cell logit) though a l = 1/m (centred logit) can sometimes be more useful. Typically, we assume that individual classifications y i are conditionally independent given a common θ, in which case (1) holds for n > 1. Distribution of y is a multivariate natural exponential family. l=1 k=1
5 Conjugate inference for the unrestricted model A Bayesian conjugate prior for cell probabilities π of the unrestricted model is Dirichlet(α), that is m p(π) or equivalently, for θ, k=1 π α k 1 k { m ( m )} p(θ) exp θ k α k α + log exp θ k k=1 Correspondence of (2) and (1) makes this the Diaconis-Ylvisaker conjugate prior. k=1 (2) [Alternatives include priors based on multivariate normal distribution for θ; some advantages, some drawbacks]
6 Incorporating structure with parsimonious models General log-linear models θ = Xβ for some n r X satisfying the constraint on θ Log-linear interaction models for contingency tables Imply specific forms for X (with or without hierarchy constraints) (Undirected) graphical models Hierarchical models specified purely by conditional independence properties (constraints) C A B
7 Conjugate inference for the general log-linear model { r ( m )} p(y) exp β k t k n log exp[xβ] k k=1 k=1 (3) where t = X T y. The Diaconis-Ylvisaker conjugate prior for β is { r ( m )} p(β) exp β k s k α log exp[xβ] k k=1 k=1 (4) where s and α are hyperparameters. Massam, Liu and Dobra (2008) investigate this structure in detail for hierarchical log-linear models, and provide lots of interesting results.
8 Forthcoming attraction Bayesian structural learning and estimation in Gaussian graphical models and hierarchical log-linear models starring Adrian Dobra In theatre(s) from July 8
9 Conditional Dirichlet distributions and compatibility (Grigsby, 2001) Suppose we derive a prior for a general log-linear model by taking a Dirichlet for the unconstrained model and conditioning (on the log-linear scale). Then, essentially trivially, we arrive at { r ( m )} p(β) exp β k s k α log exp[xβ] k k=1 k=1 (4) as before. Hence, these conjugate priors are compatible (in a conditional sense; see Dawid and Lauritzen, 2001). Conditioning on the linear (θ) scale ensures here that a proper initial prior leads to a proper prior on the submodel. See also Gûnel and Dickey (1974).
10 Hyper-Dirichlet equivalence MLD, Leucari, Grigsby show that, for a decomposable undirected graphical model, the conjugate prior (4) is equivalent to a hyper-dirichlet (Dawid and Lauritzen, 1993). Grigsby proof Based on a perfect ordering, and standard decomposition p(y i ) = p j=1 p j (y ij y i,pa(j) ) where π j (l pa(j) ) = {p j (k l pa(j) ), k = 1,... m j } is a m j -vector of probabilities for each distinct combination l pa(j) of levels of pa(j).
11 With this parameterisation, natural (closed under sampling) prior family is π j (l pa(j) ) Dirichlet(α j,lpa(j) ) (5) independently, for each variable j and each parent combination l pa(j). This is the product Dirichlet described by Cowell et al (1999), which here is equivalent to the hyper-dirichlet for particular choices of {α j,lpa(j) } (directed hyper-dirichlet). Use this distribution within the (conditional) logit parameterisation θ j (k, l pa(j) ) = log p j (k, l pa(j) ) log p j (1 l pa(j) ) Then the (directed) hyper-dirichlet and conditional Dirichlet densities are functionally equivalent ( likelihood-like ). Remains to prove that Jacobian = 1; achieved via identifying a correspondence between {θ j (k, l pa(j) )} and log-linear parameters β for which derivative matrix is upper triangular with unit diagonal.
12 Non-hyper product Dirichlet priors for perfect DAG models Is there any role for using independent with non-consistent {α j,lpa(j) }? π j (l pa(j) ) Dirichlet(α j,lpa(j) ) (5) Potentially, unattractive as typically such a prior family will not be closed under changes of (non-trivial) decomposition. For example for a pair of binary variables {π 1 Beta(α 1 ),π 2 (1) Beta(α 21 ), π 2 (2) Beta(α 22 )} {π 2 Beta(α 2 ), π 1 (1) Beta(α 11 ), π 1 (2) Beta(α 12 )} However, for certain decomposable models, for particular decompositions, the Jeffreys prior has this form.
13 Jeffreys prior Recall Jeffreys rule for constructing a default prior distribution invariant under reparameterisation. f(π) I(π) 1/2, For a decomposable undirected graphical model, parameterised via a perfect DAG and {π j (l pa(j) )}, the Jeffreys prior is f(π) j l pa(j) l pa(j) p j (l j l pa(j )) j m j 1 2 j p j (l j l pa(j) ) 1 2 l {j} pa(j) Jeffreys prior simplifies to product Dirichlet, when the summation in (6) can be simplified to a single product term for every j, l pa(j). (6)
14 Jeffreys prior simplification The Jeffreys prior simplifies to product Dirichlet for any grandparent-free perfect ordering. Such an ordering is only possible for decomposable graphs whose junction tree has a common separator between nodes. C A B A C (a) B A C (b) B A C (c) B For (a) the Jeffreys prior simplifies to independent symmetric Dirichlet(α1) distributions with α C = (m A + m B 3)/2, α A,lC = α B,lC = 1/2 for all l C
15 Inference and computation For Dirichlet-equivalent priors, parameteric inference and marginal likelihood computation (for model determination) are easy. For other priors we have discussed, numerical methods are required, and for non-decomposable conjugates and non-dirichlet Jeffreys, prior normalising constants must be computed for marginal likelihoods. (Markov chain) Monte Carlo methods or numerical integration based on Laplace s method work well. For conditional (generalized hyper) Dirichlet for nondecomposable models, Laplace can be computed simply in standard software with log-linear model fitting, and basic matrix functionality. However, beware of Laplace s method for prior normalising constant.
16 Convergence of Laplace s method for prior normalising constant (a) A + B [2] (c) ABC [8] (b) AB+BC [5] (d) ABC [27] Can be improved by bridge sampling from prior and a normal approximation (time-consuming, but can build a catalogue)
17 Alternative computation? For model determination and conditional Dirichlet priors, it may be possible to exploit the Savage-Dickey density ratio. Recall that, for comparing models M 1 : y p 1 (y θ, φ) and M 0 : y p 0 (y φ) where p 0 (y φ) = p 1 (y θ 0, φ), we have the Bayes factor provided that p 0 (y) p 1 (y) = p 1(θ 0 y) p 1 (θ 0 ) p 0 (φ) = p 1 (φ θ 0 ). This constraint is exactly the same as used in deriving the conditional Dirichlet, so for example models ABC and AB+AC+BC can be compared by generating the missing interaction parameter (log-contrast) under the Dirichlet prior and posterior for ABC, and calculating the ratio of two density estimates at zero. How feasible is this generally?
18 Posterior model probabilities for Table 1 Model Posterior probability Posterior probability (ordinal) OH + AH A + OH AOH OH + OA O + AH OA + AH O + A + H H + OA
19 Ordinal probit models multivariate (Chib and Greenberg, 1998) z i N(β, Σ m ) is a latent continuous variable y ij = c if α j,c 1 < z i α j,c, (α j,0 =, α j,mj = ) Prior is β MV Normal α j,c Indept Uniform subject to ordering constraint Σ m distribution consistent with any constraints Identifiability constraints σ ii = 1, α j,1 = 0, j = 1,..., p.
20 An alternative parameterisation ( Constrain α j,1 = α j,mj 1 = Φ 1 ) 1 m j, j = 1,..., p. Then Σ is unconstrained and can be given a (hyper) Inverse Wishart prior. Conditionals are then straightforward to sample. Not possible if any k i = 2 (binary variable). Instead, consider the Cholesky decomposition Σ 1 = Φ T Φ where Φ is upper triangular.
21 The elements of Φ appear in the decomposition z ip β p + N z i,p 1 z ip β p ( 0, 1 φ 2 pp ) φ p 1,p φ p 1,p 1 (z ip β p ) + N ( 0, 1 φ 2 p 1,p 1 z i1 z i2,..., z ip β 1 φ 1p φ 11 (z ip β p )... φ 12 φ 11 (z i2 β 2 ) + N ) ( 0, 1 φ 2 11 ) For binary (and other) variables we can constrain λ j φ 1 jj = 1. remaining λ j φ 1 jj Gamma ψ j (φ j,j+1,..., φ jp )φ 1 jj φ jj MV Normal [Equivalence with (hyper) inverse Wishart; Roverato (2002)]
22 Graphical models Gaussian DAG models for z ( graphical ordinal probit models for y) can be specified by setting certain ψ jk = 0, for an appropriate ordering. Undirected graphical models can be specified using an equivalent DAG Conditional conjugacy allows straightforward MCMC computation Model determination for DAG models given an ordering uses Reversible Jump MCMC with transitions between models which differ by a single edge (see also Fronk, 2002) Model determination for undirected graphical models requires order switching. Propose to transpose two neighbouring variables in the current ordering, with associated deterministic parameter transformation (RJMCMC allows this) Prior must compensate for the fact that not all models are available under the same number of orderings (Order counting in the model jump step; Chandran et al, 2003).
23 Table 2: Colouring of blackbirds (Anderson and Pemberton, 1985) Orbital Ring Lower Mandible Upper Mandible = mostly black, 2 = intermediate, 3 = mostly yellow
24 Predictive logarithmic scores for Table 2 Conditional independence Ordinal models Non-ordinal models structure None L O U U O L model-averaged S = 90 i=1 log p(y i y \i ) where y \i represents the data y with y i removed Posterior model probabilities are (unstructured), (L O U) and (U O L).
Bayesian Methods for Multivariate Categorical Data. Jon Forster (University of Southampton, UK)
Bayesian Methods for Multivariate Categorical Data Jon Forster (University of Southampton, UK) Table 1: Alcohol intake, hypertension and obesity (Knuiman and Speed, 1988) Alcohol intake (drinks/day) Obesity
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Inference for Poisson
Working Paper M09/11 Methodology Bayesian Inference for Poisson and Multinomial Log-linear Models Jonathan J. Forster Abstract Categorical data frequently arise in applications in the social sciences.
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationDecomposable Graphical Gaussian Models
CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationBayesian model comparison based on expected posterior priors for discrete decomposable graphical models
Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Guido Consonni a, Monia Lupparelli b a Dipartimento di Economia Politica e Metodi Quantitativi University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationBayesian Inference in the Multivariate Probit Model
Bayesian Inference in the Multivariate Probit Model Estimation of the Correlation Matrix by Aline Tabet A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIn particular, the relative probability of two competing models m 1 and m 2 reduces to f(m 1 jn) f(m 2 jn) = f(m 1) f(m 2 ) R R m 1 f(njm 1 ; m1 )f( m
Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Log-linear Models Petros Dellaportas 1 and Jonathan J. Forster 2 SUMMARY The Bayesian approach to comparing models involves calculating
More informationIndex. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.
Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationStat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationElements of Graphical Models DRAFT.
Steffen L. Lauritzen Elements of Graphical Models DRAFT. Lectures from the XXXVIth International Probability Summer School in Saint-Flour, France, 2006 December 2, 2009 Springer Contents 1 Introduction...................................................
More informationBayesian Inference of Multiple Gaussian Graphical Models
Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationBayesian model selection in graphs by using BDgraph package
Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA
More informationAn Introduction to Reversible Jump MCMC for Bayesian Networks, with Application
An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian Inference for Gaussian Mixed Graph Models
Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationLearning Bayesian Networks
Learning Bayesian Networks Probabilistic Models, Spring 2011 Petri Myllymäki, University of Helsinki V-1 Aspects in learning Learning the parameters of a Bayesian network Marginalizing over all all parameters
More informationStatistical Methodology. The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors
Statistical Methodology 7 (2010) 240 253 Contents lists available at ScienceDirect Statistical Methodology journal homepage: www.elsevier.com/locate/stamet The mode oriented stochastic search (MOSS) algorithm
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationQuaderni di Dipartimento. Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models
Quaderni di Dipartimento Bayesian model comparison based on expected posterior priors for discrete decomposable graphical models Guido Consonni (Università di Pavia) Monia Lupparelli (Università di Bologna)
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationComment on Article by Scutari
Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs
More informationBayesian Inference for Gaussian Mixed Graph Models
Bayesian Inference for Gaussian Mixed Graph Models Ricardo Silva Gatsby Computational Neuroscience Unit University College London rbas@gatsby.ucl.ac.uk Zoubin Ghahramani Department of Engineering University
More informationThe mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors
The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors Adrian Dobra,a,b, Héléne Massam c a Department of Statistics, University of Washington, Seattle, WA 98155-4322,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationAALBORG UNIVERSITY. Learning conditional Gaussian networks. Susanne G. Bøttcher. R June Department of Mathematical Sciences
AALBORG UNIVERSITY Learning conditional Gaussian networks by Susanne G. Bøttcher R-2005-22 June 2005 Department of Mathematical Sciences Aalborg University Fredrik Bajers Vej 7 G DK - 9220 Aalborg Øst
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationGraphical Models 359
8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationSummary STK 4150/9150
STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than
More informationVariational Message Passing
Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationVIBES: A Variational Inference Engine for Bayesian Networks
VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit
More informationComputation fundamentals of discrete GMRF representations of continuous domain spatial models
Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationBayesian Modeling of Conditional Distributions
Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDynamic Matrix-Variate Graphical Models A Synopsis 1
Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke
More informationProbabilistic Graphical Models: Representation and Inference
Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationDirected Graphical Models
Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis
More informationLearning the structure of discrete-variable graphical models with hidden variables
Chapter 6 Learning the structure of discrete-variable graphical models with hidden variables 6.1 Introduction One of the key problems in machine learning and statistics is how to learn the structure of
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More information