When one of things that you don t know is the number of things that you don t know

Similar documents
Trans-dimensional inverse problems, model comparison and the evidence

Transdimensional Markov Chain Monte Carlo Methods. Jesse Kolb, Vedran Lekić (Univ. of MD) Supervisor: Kris Innanen

Geophysical Journal International

STA 4273H: Sta-s-cal Machine Learning

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Probabilistic Geoacoustic Inversion in Complex Environments

Forward Problems and their Inverse Solutions

Lecture : Probabilistic Machine Learning

Inference of past climate from borehole temperature data using Bayesian reversible jump Markov chain Monte Carlo

arxiv:astro-ph/ v1 14 Sep 2005

Multiple Scenario Inversion of Reflection Seismic Prestack Data

Recent Advances in Bayesian Inference Techniques

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Large Scale Bayesian Inference

Bayesian Regression Linear and Logistic Regression

STA 4273H: Statistical Machine Learning

Bayesian Inference and MCMC

Inference of cluster distance and geometry from astrometry

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian lithology/fluid inversion comparison of two algorithms

Fundamental Probability and Statistics

STA414/2104 Statistical Methods for Machine Learning II

STA 4273H: Statistical Machine Learning

Part IV: Monte Carlo and nonparametric Bayes

Introduction to Bayesian Data Analysis

Introduction to Machine Learning CMU-10701

Downloaded 10/02/18 to Redistribution subject to SEG license or copyright; see Terms of Use at

20: Gaussian Processes

MCMC Sampling for Bayesian Inference using L1-type Priors

Markov chain Monte Carlo methods for visual tracking

Bayesian PalaeoClimate Reconstruction from proxies:

Probabilistic surface reconstruction from multiple data sets: An example for the Australian Moho

Bayesian inference J. Daunizeau

Markov Chain Monte Carlo, Numerical Integration

Density Estimation. Seungjin Choi

Transdimensional tomography 15

Advanced Statistical Methods. Lecture 6

Bayesian Machine Learning

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Geophysical Journal International

USING GEOSTATISTICS TO DESCRIBE COMPLEX A PRIORI INFORMATION FOR INVERSE PROBLEMS THOMAS M. HANSEN 1,2, KLAUS MOSEGAARD 2 and KNUD S.

Uncertainty quantification for inverse problems with a weak wave-equation constraint

Bayesian Inference in Astronomy & Astrophysics A Short Course

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes

CPSC 540: Machine Learning

Data-Driven Bayesian Model Selection: Parameter Space Dimension Reduction using Automatic Relevance Determination Priors

Introduction to Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

IJMGE Int. J. Min. & Geo-Eng. Vol.49, No.1, June 2015, pp

Bayesian model selection: methodology, computation and applications

Bayesian Methods for Machine Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Probing the covariance matrix

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Default Priors and Effcient Posterior Computation in Bayesian

Results: MCMC Dancers, q=10, n=500

Markov Networks.

Bayesian inference for multivariate extreme value distributions

The Ising model and Markov chain Monte Carlo

Bayesian Machine Learning

Lecture 4: Probabilistic Learning

Bayesian data analysis in practice: Three simple examples

CSC2515 Assignment #2

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Uncertainty analysis for the integration of seismic and CSEM data Myoung Jae Kwon & Roel Snieder, Center for Wave Phenomena, Colorado School of Mines

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Introduction to Bayesian methods in inverse problems

Dynamic System Identification using HDMR-Bayesian Technique

Probabilistic Graphical Models

Assessing Regime Uncertainty Through Reversible Jump McMC

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Pattern Recognition and Machine Learning

Inference for partially observed stochastic dynamic systems: A new algorithm, its theory and applications

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Phylogenetics:

Bayesian inversion (I):

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

Machine Learning. Probabilistic KNN.

Bayesian inference J. Daunizeau

Infer relationships among three species: Outgroup:

On the Limitation of Receiver Functions Method: Beyond Conventional Assumptions & Advanced Inversion Techniques

Monte Carlo in Bayesian Statistics

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Reconstructing plate-motion changes in the presence of finite-rotations noise

Introduction to Bayes

Bayesian Computation in Color-Magnitude Diagrams

Probabilistic Graphical Networks: Definitions and Basic Results

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

GAUSSIAN PROCESS REGRESSION

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Use of probability gradients in hybrid MCMC and a new convergence test

Efficient adaptive covariate modelling for extremes

Transcription:

When one of things that you don t know is the number of things that you don t know Malcolm Sambridge Research School of Earth Sciences, Australian National University Canberra, Australia. Sambridge. M., Gallagher, K., Jackson, A. & Rickwood, P. (2006) Geophys. J. Int., 167, 528-542, doi: 10.1111/j.1365246X2006.03155.x. http://rses.anu.edu.au/~malcolm/papers 1/31

Trans-dimensional inverse problems, model comparison and the evidence Malcolm Sambridge Research School of Earth Sciences, Australian National University, Canberra, Australia. Sambridge. M., Gallagher, K., Jackson, A. & Rickwood, P. (2006) Geophys. J. Int., 167, 528-542, doi: 10.1111/j.1365246X2006.03155.x. http://rses.anu.edu.au/~malcolm/papers 2/31

This talk is about two things Trans-dimensional inverse problems Those with a variable number of unknowns The evidence A quantity from probability theory that allows us to quantitatively compare independent studies performed by different researchers in different institutes at different times 3/31

Irregular grids in seismology Debayle & Sambridge (2005) 4/31

Self Adaptive seismic tomography Sambridge & Rawlinson (2005) 5/31

Self Adaptive seismic tomography From Sambridge & Faletic (2003) Sambridge & Faletic (2002) 6/31

Adaptive grids in seismic tomography Generating optimized grids from resolution functions Nolet & Montelli (2005) 7/31

Spatially variable grids in tomography The use of spatially variable parameterizations in seismic tomography is not new. Some papers on `static and `dynamic parameterizations: Chou & Booker (1979); Tarantola & Nercessian (1984); Abers & Rocker (1991); Fukao et al. (1992); Zelt & Smith (1992); Michelini (1995); Vesnaver (1996); Curtis & Snieder (1997); Widiyantoro & can der Hilst (1998); Bijwaard et al. (1998); B hm et al. (2000); Sambridge & Faletic (2003). For a recent review see: ``Seismic Tomography with Irregular Meshes, Sambridge & Rawlinson (Seismic Earth, AGU monograph Levander & Nolet, 2005.) 8/31

What is a trans-dimensional inverse problem? As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know. Department of Defense news briefing, Feb. 12, 2002. Donald Rumsfeld. When one of the things you don t know is the number of things you don t know 9/31

How many variables required? Which curve produced that data? 10/31

How many parameters should I use to fit my data? How many components? How many layers? This is a Trans-dimensional data fitting problem 11/31

What is a trans-dimensional inverse problem? k m( x) = α i Bi ( x) i =1 m(x) = Bi (x) = αi = k= Earth model (that we want to recover) Basis functions (that are chosen) Coefficients (unknowns) The number of unknowns (unknown) This is a hierarchical parametrization 12/31

1702 1761 Probabilistic inference All information is expressed in terms of probability density functions Bayes rule Conditional PDF p (m d, H ) p(d m, H ) p (m, H ) All additional assumptions are here A posteriori probability density / Likelihood x a priori probability density 13/31

The evidence The evidencep ( d H ) is also known as the marginal likelihood p(d H ) = p(d m, H ) p(m H )dm p (d m, H ) p (m H ) p(m d, Η ) = p(d H ) Posterior = Likelihood prior Evidence Geophysicists have often overlooked it because it is not a function of the model parameters. It measures the fit of the theory! J. Skilling, however, describes it as the `single most important quantity in all of Bayesian inference. This is because it is transferable quantity between independent studies. If we all published evidence values of our model fit then studies could be quantitatively compared! 14/31

Model choice and Occam s razor `A theory with mathematical beauty is more likely to be correct than an ugly one that fits the some experimental data Paul Dirac Occam s razor suggests we should prefer the simpler theory which explains the data Suppose we have two different theoriesh1 Plausibility ratio H2 and p ( H1 d ) p (d H1 ) p ( H1 ) = p( H 2 d ) p(d H 2 ) p( H 2 ) Model predictions (Bayes factor) or ratio of evidences Prior preferences This tells us how well the data support each theory 15/31

Model choice: An example What are the next two numbers in the sequence? -1, 3, 7, 11,?,? H1 : 15, 19 because xi +1 = xi + 4 But what about the alternate theory? H2 : -19.9, 1043.8 because xi +1 = xi3 11 + 9 xi2 11 + 23 11 p ( H1 d ) p (d H1 ) p ( H1 ) = p( H 2 d ) p(d H 2 ) p( H 2 ) Let us assume we have no prior preference p( H1 ) = for either theory p( H 2 ) 16/31

Model comparison: An example H1 : An arithmetic progression fits the data H2 : A cubic sequence fits the data We must find the evidence ratio p(d H1 ) xi +1 = xi + a 3 2 xi +1 = cxi + dxi + e (2 parameters) (4 parameters) p (d H1 ) p (d H 2 ) Is called the evidence for model 1 and is obtained by specifying the probability distribution each model assigns to its parameters If a and xo equally likely in range [-50,50] then p (d H1 ) = 1 1 = 0.0001 101 101 If c, d, and e have numerators in p (d H ) = 1 4 1 4 1 2 1 = 2.5 10 12 2 101 101 50 101 50 101 50 [-50,50] and denominators in [1,50] 40 million to 1 in favour of the simpler theory! See Mackay (2003) 17/31

The natural parsimony of Bayesian Inference Without a prior preference expressed for the simpler model, Bayesian inference automatically favours the simpler theory with the fewer unknowns (provided it fits the data). p(d H ) = p(d m, H ) p(m H )dm Evidence measures how well the theory fits the data Bayesian Inference rewards models in proportion to how well they predict the data. Complex models are able to produce a broad range of predictions, while simple models have a narrower range. If both a simple and complex model fit the data then the simple model will predict the data more strongly in the overlap region From Mackay (2003) 18/31

Natural parsimony of Bayesian Inference Bayesian s prefer simpler models Likelihood Model more complex Better data fit From Thermochronology study of Stephenson, Gallagher & Holmes (EPSL, 2006) 19/31

Trans-dimensional inverse problems Consider an inverse problem with k unknowns Bayes rule for the model parameters Bayes rule for the number of parameters p(m d, k ) = p (d m, k ) p (m k ) p(d k ) p(k d ) = p(d k ) p (k ) p (d ) By combining we get Bayes rule for both the parameters and the number of parameters p(m, k d ) = p (d m, k ) p (m k ) p (k ) p (d ) Can we sample from trans-dimensional posteriors? 20/31

The reversible jump algorithm A breakthrough was the reversible jump algorithm of Green(1995), which can simulate from arbitrary trans-dimensional PDFs. This is effectively an extension to the well known Metropolis algorithm with acceptance probability p(m, k d ) = p (d m, k ) p (m k ) p (k ) p (d ) p(d m2, k 2 ) p(m2 k 2 ) p (k 2 )q(m1 m2 ) α = Min 1, J p (d m1, k1 ) p(m1 k1 ) p( k1 )q (m2 m1 ) Jacobian J is often, but not always, 1. Automatic (Jacobian) implementation of Green (2003) is convenient. Research focus is on efficient (automatic) implementations for higher dimensions. Detailed balance See Denision et al. (2002); Malinverno (2002), Sambridge et al. (2006) 21/31

Trans-dimensional sampling The reversible jump algorithm produces samples from the variable dimension posterior PDF. Hence the samples are models with different numbers of parameters Posterior samples Standard Fixed dimension MCMC Reversible jump MCMC 22/31

Trans-dimensional sampling: A regression example Standard point estimates give differing answers 23/31

Reversible jump applied to the regression example Use the reversible jump algorithm to sample from the transdimensional prior and posterior p (m, k d ) = p (d m, k ) p (m k ) p (k ) p (d ) For the regression problem k=1,,4, and RJ-MCMC produces State 1: y = a1,1 State 2: y = a1, 2 + a2, 2 x State 3: y = a1,3 + a2,3 x + a3,3 x 2 State 4: y = a1, 4 + a2, 4 x + a3, 4 x + a4, 4 x 2 3 (a1,1 ) j ( j = 1,..., N1 ) (a1, 2, a2, 2 ) j ( j = 1,..., N 2 ) (a1,3, a2,3, a3,3 ) j ( j = 1,..., N 3 ) (a1, 4, a2, 4, a3, 4, a4, 4 ) j ( j = 1,..., N 4 ) 24/31

Reversible jump results: regression example The reversible jump algorithm can be used to generate samples from the trans-dimensional prior and posterior PDFs. By tabulating the values of k we recover the prior and the posterior for k. Linear is the winner! Simulation of the prior Simulation of the posterior 25/31

Trans-dimensional sampling from a fixed dimensional sampler Can the reversible jump algorithm be replicated with more familiar fixed dimensional MCMC? Answer: Yes! 1. 2. First generate fixed dimensional posteriors Then combine them Pk with weights Pk = p (d k ) p (k ) k max k p (d k ) p (k ) Only requires relative evidence values! From Sambridge et al. (2006) 26/31

Trans-dimensional sampling: A mixture modelling example How many Gaussian components? From Sambridge et al. (2006) 27/31

Trans-dimensional sampling: A mixture modelling example Posterior simulation Using reversible jump algorithm, Fixed k sampling with evidence weights. p(k d ) p(d k ) p(k ) p(d k ) = p(d m, k ) p (m k )dm 1 p(d k ) Nk Nk p(d m, k ) i =1 28/31

Trans-dimensional inversion Recent applications Climate histories Inferring ground surface temperature histories from high precision borehole temperatures. (Hopcroft et al. 2008) Thermochronology Inferring cooling or erosion histories from spatially distributed Fission track data. (Gallagher et al., 2005, 2006, Stephenson et al. 2006) Stratigraphic modelling Using borehole data on grain size and sediment thickness to infer sealevel change and sediment flux in a reservoir -> uncertainty on oil production rates. (Charvin et al. 2008) Geochronology Inferring number and ages of distinct geological events from 29/31

Calculating the evidence But how easy is the evidence to calculate? For small discrete problems it can be easy as in the previous example For continuous problems with k < 50. Numerical integration is possible using samples, xi generated from the prior p ( m H ) 1 p (d H ) Ns Ns p(d x, H ) i =1 i 1 Ns i =1 [ p (d xi, H )] 1 For some linear (least squares) problems, analytical expressions can be found ρ (d, xˆ, xo ) C post p(d H ) = 2π ( N d k ) / 2 Cdata C prior 1/ 2 (Malinverno, 2002) For large k and highly nonlinear problems forget it! See Sambridge et al. (2006) for details 30/31

Conclusions Trans-dimensional inverse problems are a natural extension to the fixed dimension Bayesian inference familiar to geoscientists. The ratio of the evidences for fixed dimensions is the key quantity that allows fixed dimensional MCMC tools to be used in variable p ( d k ) dimensions. (Conversely RJ-MCMC can be used to calculate the evidence) The evidence is a transferable quantity that allows us to compare apples with oranges. Evidence calculations are possible in a range of classes. Transdimensional inverse problems are beginning to find applications in the Earth Sciences. Even astronomers are calculating the evidence so why don t we! 31/31

Trans-dimensional sampling: A mixture modelling example Fixed dimension MCMC simulations Reversible jump and weighted fixed k sampling 32/31

An electrical resistivity example True model Synthetic data Posterior density 50 profiles found with posterior sampler From Malinverno (2002) 33/31

Elements of an Inverse problem What you get out depends on what you put in: The data you have (and its noise) The physics of the forward problem Your choice of parameterization Your definition of a solution A way of asking questions of data 34/31

What is an inverse problem? True model m Appraisal problem ~ Estimated model m Forward problem Data d Estimation problem From Snieder & Trampert (2000) 35/31

Irregular grids in seismology Case example From Sambridge, Braun & McQueen (1995) 36/31

Probability density functions Mathematical preliminaries: Joint and conditional PDFs p ( x, y ) = p ( x y ) p ( y ) Marginal PDFs p ( y ) = p( x, y ) dx p ( x, y ) = p ( x, y, z )dz 37/31

What do we get from Bayesian Inference? Generate samples whose density follows the posterior ρ ( m) p ( d m) p ( m) The workhorse technique is Markov chain Monte Carlo e.g. Metropolis, Gibbs sampling 38/31

Self Adaptive seismic tomography From Sambridge & Gudmundsson (1998) From Sambridge & Faletic (2003) 39/31

Example: Measuring the mass of an object If we have an object whose mass, m, we which to determine. Before we collect any data we believe that its mass is approximately 10.0 1µ g. In probabilistic terms we could represent this as a Gaussian prior distribution prior Suppose a measurement is taken and a value 11.2 µ g is obtained, and the measuring device is believed to give Gaussian errors with mean 0 and σ = 0.5 Then the likelihood function can be written µ g. Likelihood Posterior The posterior PDF becomes a Gaussian centred at the value of 10.96 standard deviation σ = (1/5)1/2 ¼ 0.45 µ g with 40/31

Example: Measuring the mass of an object The more accurate new data has changed the estimate of m and decreased its uncertainty One data point problem Bayes rule (1763) Conditional PDFs Assumptions 1702 1761 data model parameters Posterior probability density / Likelihood x Prior probability density What is known after the data are collected Measuring fit to data What is known before the data are collected 41/31

Trans-dimensional sampling: A regression example Which curve produced that data? From Sambridge et al. (2006) 42/31

Irregular grids in seismology Gudmundsson & Sambridge (1998)43/31