Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.

Similar documents
Markov Random Fields

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CPSC 540: Machine Learning

Probabilistic Graphical Models Lecture Notes Fall 2009

7 Gaussian Discriminant Analysis (including QDA and LDA)

Intelligent Systems:

Markov random fields. The Markov property

MCV172, HW#3. Oren Freifeld May 6, 2017

Adaptive Monte Carlo Methods for Numerical Integration

STA 4273H: Statistical Machine Learning

Graphical Models and Kernel Methods

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Machine Learning Linear Classification. Prof. Matteo Matteucci

Active and Semi-supervised Kernel Classification

Markov Chains and MCMC

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Markov Random Fields and Bayesian Image Analysis. Wei Liu Advisor: Tom Fletcher

CSC 2541: Bayesian Methods for Machine Learning

Markov and Gibbs Random Fields

Markov Random Fields for Computer Vision (Part 1)

CPSC 540: Machine Learning

STA 4273H: Sta-s-cal Machine Learning

CPSC 540: Machine Learning

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

MCMC Sampling for Bayesian Inference using L1-type Priors

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Bayesian model selection in graphs by using BDgraph package

Image segmentation combining Markov Random Fields and Dirichlet Processes

STA 4273H: Statistical Machine Learning

Bayesian Estimation of Bipartite Matchings for Record Linkage

Results: MCMC Dancers, q=10, n=500

Variational Inference (11/04/13)

Part 8: GLMs and Hierarchical LMs and GLMs

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Inference for a Population Proportion

STA 414/2104: Machine Learning

A = {(x, u) : 0 u f(x)},

Linear Regression Models P8111

Bayesian Methods for Machine Learning

6 Markov Chain Monte Carlo (MCMC)

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Nonparametric Bayesian Methods (Gaussian Processes)

Introduction to Machine Learning CMU-10701

Computer intensive statistical methods

Markov Chain Monte Carlo

Bayesian Methods: Naïve Bayes

Stat 516, Homework 1

Statistics & Data Sciences: First Year Prelim Exam May 2018

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

13: Variational inference II

STAT 518 Intro Student Presentation

Bayesian Networks Inference with Probabilistic Graphical Models

DAG models and Markov Chain Monte Carlo methods a short overview

Markov Chain Monte Carlo methods

Computational Genomics

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

State Space and Hidden Markov Models

17 : Markov Chain Monte Carlo

Markov Networks.

Chapter 5. Chapter 5 sections

Bayesian Graphical Models

Introduction to Bayesian methods in inverse problems

Metropolis-Hastings sampling

Probabilistic Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian linear regression

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Introduction to Machine Learning

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Part 6: Multivariate Normal and Linear Models

Lecture 13 : Variational Inference: Mean Field Approximation

Summary STK 4150/9150

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Covariance Matrix Simplification For Efficient Uncertainty Management

Monte Carlo methods for sampling-based Stochastic Optimization

Bayesian Inference for DSGE Models. Lawrence J. Christiano

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

A fast sampler for data simulation from spatial, and other, Markov random fields

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Gentle Introduction to Infinite Gaussian Mixture Modeling

Chapter 16. Structured Probabilistic Models for Deep Learning

Undirected Graphical Models: Markov Random Fields

Introduction to Probabilistic Graphical Models: Exercises

The Ising model and Markov chain Monte Carlo

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Bayesian Inference for DSGE Models. Lawrence J. Christiano

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Bayesian GLMs and Metropolis-Hastings Algorithm

Transcription:

352: 352: Models for discrete data April 28, 2009 1 / 33

Models for discrete data 352: Outline Dependent discrete data. Image data (binary). Ising models. Simulation: Gibbs sampling. Denoising. 2 / 33

SIDS (sudden infant death syndrome) in North Carolina 352: 36.5 N 36 N 35.5 N 35 N 34.5 N 1974 34 N 1979 36.5 N 36 N 35.5 N 35 N 34.5 N 84 W 82 W 80 W 78 W 76 W 34 N 0 10 20 30 40 50 60 3 / 33

Discrete data 352: Description Observations: Incidence of SIDS: {Z i : i is a county in North Carolina}. Births in county: {n i : i is a county in North Carolina}. Natural models: Z i Poisson(λ i ) Z i Binomial(n i, p i ) λ i = exp(x i β): spatial? other features? logit(p i ) = x i β? 4 / 33

Discrete data 352: Models How do we introduce spatial dependence in the Z i s? If the Z i s are Gaussian, all we need is a covariance function... more complicated for Poisson, Binomial. One approach: Markov random fields (a.k.a. graphical models) 5 / 33

Discrete data 352: Models How do we introduce spatial dependence in the Z i s? If the Z i s are Gaussian, all we need is a covariance function... more complicated for Poisson, Binomial. One approach: Markov random fields (a.k.a. graphical models) Instead of defining a general MRF, we ll start with image models. 6 / 33

Simulation of an Ising model 352: 7 / 33

Binary images 352: Ising model Let L = {(i, j) : 1 i n 1 ; 1 j n 2 } be our lattice of pixels. Binary images Z are elements of { 1, 1} L. Let i j be the nearest neighbours (with periodic boundary conditions). Given an inverse temperature β = 1/T P(Z = z) exp β z i z j (i,j):i j = e β P (i,j):i j z i z j P w { 1,1} L eβ (i,j):i j w i w j 8 / 33

Binary images 352: Ising model To a physicist H(z) = β z i z j (i,j):i j is the potential energy of the system. Another representation: H(z) = β ((z i z j ) 2 2) = βz Lz + C (i,j):i j where L is the graph Laplacian. 9 / 33

Binary images 352: Ising model Yet another representation z i z j = #{j : i j, z j = z i } #{j : i j, z j z i } j:i j = 2#{j : i j, z j = z i } 4 The set {i : j i, z j z i } can be thought of as the boundary of the black/white interface of z Leads to an interpretation #{j : i j, z j = z i } = boundary length of z. i 10 / 33

Binary images 352: Ising model Conditional distributions are simple P(z i = 1 z j, j i) e βz P i j:i j z j = P(z i = 1 z j, j i) Full joint distribution requires partition function w { 1,1} L e β which is complicated... P (i,j):i j w i w j Simulation of the Ising model is (relatively) easy 11 / 33

Gibbs sampler (Geman & Geman (1984)) 352: Algorithm def simulate(initial, beta, niter=1000): Z = initial.copy() for k in range(niter): for i in L: s = sum([z[j] for j in nbrs(i, L)]) odds = exp(2*beta*s) p = odds / (1 + odds) Z[i] = bernoulli(p) return Z 12 / 33

Gibbs sampler 352: Convergence For whatever initial configuration initial, as niter Z niter niter Z β where Z β is a realization of the Ising model. In fact, as a process (Z i ) 1 i niter is a Markov chain that has stationary distribution Z β. This is the basis of most of the MCMC literature... We ll see the Gibbs sampler again for more general MRFs. 13 / 33

Ising models 352: Adding an external field We can also add a mean to the Ising model through an external field P(Z = z) exp β z i z j + α i z i i (i,j):i j 14 / 33

Gibbs sampler with an external field 352: Algorithm def simulate(initial, beta, alpha, niter=1000): Z = initial.copy() for k in range(niter): for i in L: s = sum([z[j] for j in nbrs(i, L)]) odds = exp(2*beta*s + 2*alpha[i]) p = odds / (1 + odds) Z[i] = bernoulli(p) return Z 15 / 33

Denoising 352: Model Suppose we observe a noisy image Y { 1, 1} L based on a noise-free image Z { 1, 1} L A plausible choice for noise independent bit-flips P(Y i = y i Z = z) = P(Y i = y i Z i = z i ) { q y i = z i = 1 q y i z i Goal: recover Z, the noise-free image. 16 / 33

Noisy image q = 0.7 352: 17 / 33

Denoising 352: Model If Z were continuous, we might put some smoothness penalty on Z... Recall the Laplacian interpretation of the potential in the Ising model H(z) = βz Lz Suggests the following Ẑ β = argmin log L(Z Y ) β Z { 1,1} L = argmin Z { 1,1} L i L (i,j):i j Z i Y i logit(q) β z i z j (i,j):i j z i z j 18 / 33

Denoising 352: Model Our suggested estimator is the mode of an Ising model with parameter β and field α i = logit(q)y i... In Bayesian terms, if our prior for Z is Ising with parameter β, the posterior is Ising with a field dependent on the observations Y. If it s Bayesian, we can sample from the posterior using Gibbs sampler. 19 / 33

Denoising 352: Estimating Z Based on outputs of the Gibbs sampler (samples from the posterior), we can compute Ẑ i = 1 Zi >0 where niter Z = is the posterior mean after throwing away l samples. j=l Z j 20 / 33

Posterior mean, q = 0.7 352: 21 / 33

Denoising 352: Computing the MAP Our original estimator was the MAP for this prior. Finding the MAP is a combinatorial optimization problem: { 1, 1} L possible configurations to search through. A general approach is based on simulated annealing. (Geman & Geman, 1984). 22 / 33

Denoising 352: Simulated annealing Basic observation: Ẑβ(Y ) is also the mode of P T,β,Y (z) exp 1 β z i z j logit(q) Y i z i T i L (i,j):i j BUT, for T > 1, P T,β,Y is more sharply peaked than P 1,β,Y. Depends on a choice of temperature schedule... To prove things, one often has to assume T iter = log(iter). 23 / 33

Simulated annealing 352: Algorithm def simulate(initial, beta, alpha, temps): Z = initial.copy() for t in temps: for i in L: s = sum([z[j] for j in nbrs(i, L)]) odds = exp((2*beta*s + 2*alpha[i])/t) p = odds / (1 + odds) Z[i] = bernoulli(p) return Z 24 / 33

Denoising 352: Additive Gaussian noise Let the new noisy data be Y i = Z i + ɛ i with ɛ i IID N(0, σ 2 ) The field is now α i = Y i z i and i i (i.e. a new term in the neighbourhood relation). 25 / 33

Additive noise 352: 26 / 33

Additive noise 352: 27 / 33

Additive noise 352: 28 / 33

Denoising 352: Mixture model The additive noise example can be thought of as a two-class mixture model. Suggests using LDA (or QDA if variances unequal) to classify. Problem: the mixing proportion is 0.03... 29 / 33

Mixture density 352: 30 / 33

FDR curve 352: 31 / 33

Thresholded 352: 32 / 33

Where this leaves us 352: Markov random fields MRFs generally have distributions like an Ising model. Gibbs sampler can be used to simulate MRFs. Simulated annealing can also generally be used to find MAP (i.e. modes of an MRF). Bayesian image analysis is a huge field. 33 / 33