Bayesian Graphical Models

Similar documents
DAG models and Markov Chain Monte Carlo methods a short overview

Metropolis-Hastings Algorithm

10. Exchangeability and hierarchical models Objective. Recommended reading

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Probabilistic Graphical Networks: Definitions and Basic Results

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis


Machine Learning for Data Science (CS4786) Lecture 24

Probabilistic Machine Learning

Sampling Methods (11/30/04)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Computer intensive statistical methods

Computer intensive statistical methods

BUGS Bayesian inference Using Gibbs Sampling

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Methods for Machine Learning

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov Chain Monte Carlo (MCMC)

Bayesian Inference and MCMC

An Introduction to Bayesian Machine Learning

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Computational statistics

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

Example using R: Heart Valves Study

Decomposable Graphical Gaussian Models

LECTURE 15 Markov chain Monte Carlo

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian GLMs and Metropolis-Hastings Algorithm

MCMC Methods: Gibbs and Metropolis

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Stochastic Simulation


Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Graphical Models and Kernel Methods

13: Variational inference II

Markov Chain Monte Carlo methods

Bayesian Regression Linear and Logistic Regression

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MCMC and Gibbs Sampling. Sargur Srihari

Bayesian Machine Learning

Markov Chain Monte Carlo methods

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Lecture 2: From Linear Regression to Kalman Filter and Beyond

36-463/663Multilevel and Hierarchical Models

Bayesian Networks in Educational Assessment

Likelihood Inference for Lattice Spatial Processes

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Monte Carlo in Bayesian Statistics

Lecture 6: Graphical Models: Learning

Stat 544 Exam 2. May 5, 2008

Advanced Statistical Modelling

Lecture 8: Bayesian Networks


Sampling Algorithms for Probabilistic Graphical models

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Bayesian Networks BY: MOHAMAD ALSABBAGH

CPSC 540: Machine Learning

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Markov Networks.

David Giles Bayesian Econometrics

Machine Learning Summer School

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Basic math for biology

Lecture 7 and 8: Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Spatial Analysis of Incidence Rates: A Bayesian Approach

Bayesian Inference for Regression Parameters

Stat 544 Final Exam. May 2, I have neither given nor received unauthorized assistance on this examination.

Bayesian Inference for Dirichlet-Multinomials

eqr094: Hierarchical MCMC for Bayesian System Reliability

Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Foundations of Statistical Inference

On Markov chain Monte Carlo methods for tall data

Generalized Linear Models

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Introduction to Machine Learning

Bayesian course - problem set 5 (lecture 6)

MCMC algorithms for fitting Bayesian models

Lecture 8: The Metropolis-Hastings Algorithm

Probabilistic Graphical Models

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Informatics 2D Reasoning and Agents Semester 2,

CS 343: Artificial Intelligence

Fully Bayesian Spatial Analysis of Homicide Rates.

Down by the Bayes, where the Watermelons Grow

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Learning Bayesian network : Given structure and completely observed data

p L yi z n m x N n xi

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

INTRODUCTION TO BAYESIAN STATISTICS

A Lightweight Guide on Gibbs Sampling and JAGS

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Lecture 13 Fundamentals of Bayesian Inference

Transcription:

Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009

Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference about θ from x is then represented through posterior distribution π (θ) = p(θ x). Then, from Bayes formula π (θ) = p(x θ)π(θ)/p(x) L(θ x)π(θ) so the likelihood function is equal to the density of the posterior w.r.t. the prior modulo a constant.

Represent statistical models as Bayesian networks with parameters included as nodes, i.e. for expressions as p(x v x pa(v), θ v ) include θ v as additional parent of v. In addition, represent data explicitly in network using plates. Then about θ can in principle be calculated by probability propagation as in general Bayesian networks. This is true for θ v discrete. For θ continuous, we must develop other computational techniques.

Chest clinic θ L θ A A i S i θ S θ T θb θ X T i L i B i E i X i D i i = 1,..., N θ D Chest clinic example with parameters and plate indicating repeated cases.

Standard repeated samples...... As for a naive Bayes expert system, just let D = θ and X i = F i represent data. Then π (θ) = P(θ X 1 = x 1,..., X m = X m ) is found by standard updating, using probability propagation if θ is discrete.

Bernoulli experiments Data X 1 = x 1,..., X n = x n independent and Bernoulli distributed with parameter θ, i.e. P(X i = 1 θ) = 1 P(X i = 0) = θ. Represent as a Bayesian network with θ as only parent to all nodes x i, i = 1,..., n. Use a beta prior: π(θ a, b) θ a 1 (1 θ) b 1. If we let x = x i, we get the posterior: π (θ) θ x (1 θ) n x θ a 1 (1 θ) b 1 = θ x+a 1 (1 θ) n x+b 1 So the posterior is also beta with parameters (a + x, b + n x).

Linear regression alpha beta tau mu[i] sigma model { } Y[i] for(i IN 1 : N) for( i in 1 : N ) { Y[i] ~ dnorm(mu[i],tau) mu[i] <- alpha + beta * (x[i] - xbar) } tau ~ dgamma(0.001,0.001) sigma <- 1 / sqrt(tau) alpha ~ dnorm(0.0,1.0e-6) beta ~ dnorm(0.0,1.0e-6)

Gamma model for pumpdata alpha beta theta[i] t[i] lambda[i] x[i] for(i IN 1 : N) Failure of 10 power plant pumps.

Data and BUGS model for pumps The number of failures X i is assumed to follow a Poisson distribution with parameter θ i t i, i = 1,..., 10 where θ i is the failure rate for pump i and t i is the length of operation time of the pump (in 1000s of hours). The data are shown below. Pump 1 2 3 4 5 6 7 8 9 10 t i 94.5 15.7 62.9 126 5.24 31.4 1.05 1.05 2.01 10.5 x i 5 1 5 14 3 19 1 1 4 22 A gamma prior distribution is adopted for the failure rates: θ i Γ(α, β), i = 1,..., 10

BUGS program for pumps With suitable priors the program becomes model { } for (i in 1 : N) { theta[i] ~ dgamma(alpha, beta) lambda[i] <- theta[i] * t[i] x[i] ~ dpois(lambda[i]) } alpha ~ dexp(1) beta ~ dgamma(0.1, 1.0)

Growth of rats alpha.tau alpha.c alpha0 beta.c beta.tau alpha[i] beta[i] tau.c mu[i, j] x[j] sigma Y[i, j] for(j IN 1 : T) for(i IN 1 : N) Growth of 30 young rats.

Description of rat data 30 young rats have weights measured weekly for five weeks. The observations Y ij are the weights of rat i measured at age x j. The model is essentially a random effects linear growth curve: and Y ij N (α i + β i (x j x), τ 1 c ) α i N (α c, τ 1 α ), β i N (β c, τ 1 β ) where x = 22, and τ represents the precision (inverse variance) of a normal distribution. Interest particularly focuses on the intercept at zero time (birth), denoted α 0 = α c β c x.

Basic setup The standard Gibbs sampler Finding full conditionals Envelope sampling When exact computation is infeasible, Markov chain Monte Carlo (MCMC) methods are used. An MCMC method for the target distribution π on X = X V constructs a Markov chain X 0, X 1,..., X k,... with π as equilibrium distribution. For the method to be useful, π must be the unique equilibrium, and the Markov chain must be ergodic so that for all relevant A π 1 (A) = lim n π n(a) = lim n n m+n i=m+1 where χ A is the indicator function of the set A. χ A (X i )

Basic setup The standard Gibbs sampler Finding full conditionals Envelope sampling A simple MCMC method is made as follows. 1. Enumerate V = {1, 2,..., V } 2. choose starting value x 0 = x 0 1,..., x 0 V. 3. Update now x 0 to x 1 by replacing xi 0 with xi 1 for i = 1,..., V, where xi 1 is chosen from the full conditionals π (X i x 1 1,..., x 1 i 1, x 0 i+1,... x 0 V ). 4. Continue similarly to update x k to x k+1 and so on.

Properties of Gibbs sampler Basic setup The standard Gibbs sampler Finding full conditionals Envelope sampling With positive joint target density π (x) > 0, the Gibbs sampler is ergodic with π as the unique equilibrium. In this case the distribution of X n converges to π for n tending to infinity. Note that if the target is the conditional distribution π (x A ) = f (x A X V \A = x V \A ), only sites in A should be updated: The full conditionals of the conditional distribution are unchanged for unobserved sites.

Basic setup The standard Gibbs sampler Finding full conditionals Envelope sampling For a directed graphical model, the density of full conditional distributions are: f (x i x V \i ) v V f (x v x pa(v) ) f (x i x pa(i) ) = f (x i x bl(i) ), v ch(i) x where bl(i) is the Markov blanket of node i: f (x v x pa(v) ) bl(i) = pa(i) ch(i) { v ch(i) pa(v) \ {i} }. Note that the Markov blanket is just the neighbours of i in the moral graph: bl(i) = ne m (i).

Basic setup The standard Gibbs sampler Finding full conditionals Envelope sampling There are many ways of sampling from a density f which is known up to normalization, i.e. f (x) h(x). One uses an envelope g(x) Mh(x), where g(x) is a known density and then proceeding as follows: 1. Choose X = x from distribution with density g 2. Choose U = u uniform on the unit interval. 3. If u > Mh(x)/g(x), then reject x and repeat step 1, else return x. The value returned will have density f.