Wavelet Extraction: an essay in model selection and uncertainty. James Gunning, CSIRO Petroleum, Australia

Similar documents
Delivery: an open-source Bayesian seismic inversion tool. James Gunning, CSIRO Petroleum Michael Glinsky,, BHP Billiton

Model Selection for Gaussian Processes

Bayesian lithology/fluid inversion comparison of two algorithms

Seismic reservoir characterization in offshore Nile Delta.

7. Estimation and hypothesis testing. Objective. Recommended reading

Model comparison and selection

STA 4273H: Statistical Machine Learning

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Stochastic vs Deterministic Pre-stack Inversion Methods. Brian Russell

7. Estimation and hypothesis testing. Objective. Recommended reading

STAT Financial Time Series

Downloaded 10/02/18 to Redistribution subject to SEG license or copyright; see Terms of Use at

Bayesian inference J. Daunizeau

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

g-priors for Linear Regression

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

SRNDNA Model Fitting in RL Workshop

Machine Learning - MT Classification: Generative Models

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Bayesian Regression Linear and Logistic Regression

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Bayesian Lithology-Fluid Prediction and Simulation based. on a Markov Chain Prior Model

Penalized Loss functions for Bayesian Model Choice

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

When one of things that you don t know is the number of things that you don t know

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

The Laplace Approximation

GWAS IV: Bayesian linear (variance component) models

23855 Rock Physics Constraints on Seismic Inversion

Bayesian linear regression

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Geostatistics for Seismic Data Integration in Earth Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Gaussian Process Regression

How to build an automatic statistician

Thomas Bayes versus the wedge model: An example inference using a geostatistical prior function

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Chris Bishop s PRML Ch. 8: Graphical Models

GAUSSIAN PROCESS REGRESSION

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian methods in economics and finance

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Empirical comparison of two Bayesian lithology fluid prediction algorithms

PATTERN RECOGNITION AND MACHINE LEARNING

Heriot-Watt University

Some Curiosities Arising in Objective Bayesian Analysis

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Quantitative Interpretation

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

1 Hypothesis Testing and Model Selection

Nonparametric Bayesian Methods (Gaussian Processes)

The connection of dropout and Bayesian statistics

MCMC algorithms for fitting Bayesian models

Bayesian Inference: Concept and Practice

Introduction to Maximum Likelihood Estimation

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

Bayesian model selection: methodology, computation and applications

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Transdimensional Markov Chain Monte Carlo Methods. Jesse Kolb, Vedran Lekić (Univ. of MD) Supervisor: Kris Innanen

Variational Methods in Bayesian Deconvolution

CMU-Q Lecture 24:

Bayesian inference J. Daunizeau

Bayesian Learning (II)

Making rating curves - the Bayesian approach

Net-to-gross from Seismic P and S Impedances: Estimation and Uncertainty Analysis using Bayesian Statistics

Bayesian Inference for Normal Mean

CSC2515 Assignment #2

Reservoir connectivity uncertainty from stochastic seismic inversion Rémi Moyen* and Philippe M. Doyen (CGGVeritas)

Bayesian Model Comparison

Gibbs Sampling in Linear Models #2

p(z)

Part 6: Multivariate Normal and Linear Models

Multivariate statistical methods and data mining in particle physics

Statistical Machine Learning Hilary Term 2018

Data-Driven Bayesian Model Selection: Parameter Space Dimension Reduction using Automatic Relevance Determination Priors

Wellcome Trust Centre for Neuroimaging, UCL, UK.

QUANTITATIVE INTERPRETATION

Edinburgh Anisotropy Project, British Geological Survey, Murchison House, West Mains

Modelling behavioural data

Conditional Random Field

Multivariate Bayesian Linear Regression MLAI Lecture 11

An Extended BIC for Model Selection

A Frequentist Assessment of Bayesian Inclusion Probabilities

PMR Learning as Inference

CPSC 540: Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

An introduction to Bayesian inference and model comparison J. Daunizeau

Model Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015

Transcription:

Wavelet Extraction: an essay in model selection and uncertainty James Gunning, CSIRO Petroleum, Australia

Outline Model selection, inference and prediction: Generalities Wavelet extraction Model formulation Generic inversion/inference features for single models Model choice scenarios. 3 examples Morals & conclusions

Model selection Classic statistics problem line? or parabola? or sextic? y Foundations of science Newton or Ptolemy? x F = m a

Generalities Geostatistics is only one branch of multivariate statistics. Every result of significance can be derived from conventional multivariate estimation theory. A Bayesian spin is helpful. The rest of the world understands us better if we write things this way. As statisticians, we have the duties of Model inference Significance testing Model refinement/testing Making predictions with error estimates

Model selection & Bayes factors Suppose there are k=1 N models to explain data D, with parameters m k The model has prior P(m k k)p(k), likelihood L(D m k,k) Here s Three very useful things: Marginal model likelihood: P(D k)= L(D m k,k)p(m k k) dm k Bayes factor B ij = P(D i)/ P(D j) Model probability of model k: P(k D) = N P(D k)p(k) j=1 P(D j)p( j)

Predictive distributions Bayesian model averaging: prediction for new y P(y D) = k P(y M k,d) P(D k) P(y D) Mixture distributions Models/rock-types/properties Most helpful way to manage heavy tails Often: uncertainties within a model < uncertainty between models y

Noise ε (D-f(m)) ε= ε meas +ε model Noise estimation Often ε model > ε meas : we re prepared to live with approximate models ε meas usually stationary. Maybe biased. Roughly known. ε model often nonstationary. Correlated. Often biased. Usually unknown. We blaze away: {ε, σ 2 } ~ N(0, σ 2 )P(σ 2 ) Inference of p(ε) chief issue p(ε) strongly coupled to model dimensionality Demands sensible priors in Bayesian formulation Simplest form adds pieces like - Log(Π) ~ D-f(m) 2 /(2σ 2) +n D log(σ 2 )/2 - Absolute statements of model significance no longer possible

Marginal model likelihood estimators Full linearisation+conjugate priors, analytical Quadratures. Too hard as d=dim(m) gets large MCMC Harmonic mean averages: P(D k) = L(D m k,k)p(m k,k)dm k (1) ˆ P (D k) = Unstable as Nsamples Laplace-Metropolis approx. P(D k) 1 L(D m k,k) Π(mk D )(MCMC ) BIC asymptotics (eqn (1) does the overfit penalty automatically) 1 e f (m) dm (2π) d /2 det{ 2 f m i m j log(p(d k)) ~ log(l(d ˆ m )P( ˆ m ))) + d 2 ˆ m } 1 e f ( ˆ m ) = (2π) d /2 H 1/2 L(D ˆ log(n D ) + O(n 1/2 ) m )P ( m ˆ )

Issues in prior formulation Lindley paradox (1957) Models M 1 and M 2, with d 2 > d 1. Suppose they share (nested) parameters m with prior m 1 ~N(0,σ 2 C 1 ) m 2 ~N({0,0},σ 2 diag{c 1,C 2 }) P(σ)~1/σ (Jeffrey s prior) Then the Bayes factor is (Smith & Spiegelhalter 1980) B 12 ~ C 2 1/2 X 2 T X 2 1/2 (1+((d 2 -d 1 )/(n-d 1 )F) n/2 Always favours simplest model (1) as C 2 (even for proper prior on m 2!) Setting of reasonable prior variances very important in comparing different dimensional models

Segmentation/Blocking algorithms Changepoints a classic modelchoice problem Maximum likelihood configurations for k changepoints τ j accessible from dynamic programming O(kn 2 ) (D.M.Hawkins, Comp. Stat. & Data Analysis 37(3), 2001) Uses: Thinning reflectivity sequence Optimal checkshot choices log(l(y,m,τ)) = 1 2 k τ j j=1 i=τ j 1 +1 (y i m j ) 2 /σ 2 + n 2 log(σ 2 ) + BIC

Wavelet extraction problem True-amplitude imaged seismic t (time) wavelet Systematic registration error? CONVOLUTION Z (depth) reflections noise level from well logs

Real wavelets and effective wavelets: Ricker 1953 (dynamite)

Motivations Wavelet extraction a critical process in inversion studies Commercial codes don't (or wont) tell you Noise estimates Non-normal-incidence wavelets Time to depth or positioning errors May struggle with Multiple wells Deviated wells wavelet uncertainties

Wishlist Wavelet coefficient extraction (plus errors) Wavelet length estimation (plus errors) Noise estimation (plus errors) Petrophysics model choices Time to depth adjustments(plus errors) Integrates checkshots and sonic logs Positioning adjustments (plus errors) Multi-well Multi-stack

Wavelet Parameterization (1) Wavelet coefficients m w Nyquist-rate knots on cubic splines with zero endpoints & derivatives Ensures decent tapering & correct bandwidth Span? (see later) Prior P(m w )=N(0,s 2 I)N(t p,σ p2 )N(Φ(ω),σ φ2 I) Optional timing prior Optional phase prior in passband

Checkshot Parameterization (2) Time to depth parameters t N(t m,i,σ m,i2 ) N(t c,i,σ c,i2 ) Markers (geological identifications) Checkshots (e.g. from VSP tools) Global (well specific) time shift t g ±σ tg z v int (z i+1 +1-z i,t i+1 -t i ) ~ N( N(v log-average average, σ v2 )

Petrophysics parameterizations: Multi stack Near stack Far stack

Petrophysics Parameterizations(4): AVO effects (Linearized Zoeppritz) General approximate p-p reflection coefficient normal incidence angle unknown? R=½( ρ/ ρ + v p / v p ) + B( θ 2 (½ v p / v p -2v s2 ( ρ/ρ+2 v s /v s )/v p2 ) + θ 2 ) δ/2 shear dependent Thomsen anisotropy parameters: Stack angle θ 2 = v p2 /(V stack4 T stack2 /<X stack2 >) R computed at block boundaries: v p,v s,ρ are Backusupscaled segment properties (i.e. error-free log data) Anisotropy: δ Ν(δ f,σ f2 ), f=lumped facies parameter label (from classification of segment)

Anisotropy issues Processing issues for AVO Many potential amplitude effects at wider offset Absorption/scattering/spreading/acquisition footprints, etc.. etc.. etc Ch. 4.3 Quantitative seismic interpretation Avseth, Mukerji, Mavko.

Preprocessing: Impedance Blocking Based on segmentation of p- wave impedance (ρv p ) Max Lik. methods O(N 2 ), or Blended hierarchical stepwise segment/aggregate method (O(Nlog(k)) ref: D.M.Hawkins, Comp. Stat. & Data Analysis 37(3), 2001 Increases speed 2.45 2.40 2.35 2.30 2.25 2.20 2.15 1.640 10 4 1.645 10 4 1.650 10 4 1.655 10 4 1.660 10 4

Optimisation of Bayesian Posterior m={m wavelet,m checkshot,m registration,m petrophysics } Maximise exp(-χ 2 /2 ) χ 2 = Σ t,stacks,wells (w(m)*r eff S) 2 /σ s 2 "good synthetic" + (d+1) Σ log(σ stacks s ) "noise normalisation + Σ intervals,wells (v int (m)-<v int >)2 /σ v 2 prior: interval velocities c.f. sonic logs + (m-<m>) T C -1 p (m-<m>) "prior on checkshots, wavelet coeffs + wavelet timing/phase prior "e.g. constant phase"

Optimisation, Hessians, Approximate uncertainties parameter 2 3 Maximum aposteriori estimates 2.5 2 1.5 Quadratic approximation to uncertainties -(1) direct from F.D. Hessian or -(2) from Bayes' update formula 1 0.5 Gauss Newton optimiser 0 0 0.5 1 1.5 2 2.5 3 log(posterior) surface - C=(Cp -1 +X T C D -1 X) -1 parameter 1 Gauss-Newton, BFGS methods: see e.g. Nocedal & Wright "Numerical Optimization" O(d 2 ) x O(forward model cost) x N(models) Typical dimensionalities: d=10-100, n = 100-1000

Optimisation outputs Maximum aposteriori parameters SU wavelets, ascii time-to-depth files, visualization dumps etc. Covariance diagonals (parameter uncertainties) Posterior model probabilities (wavelet span, checkshot choice etc) stochastic wavelets from the posterior

Generic aspects of well-ties Statistically insignificant realisations Statistically insignificant Typical commercial extraction Bayes ML wavelet

Joint straight-hole & sidetrack No logs: checkshot uncertainties go to prior Looser checkshots with weaker amplitudes Tight checkshot uncertainties at strong amplitudes tvd Sidetrack tvd Straight

Straight hole detail with logs Interval sonic velocities constrained to within 5% of log average

Parameter uncertainty cross sections 1789.5 1792 1789.0 1791 -log(posterior) 1788.5 1788.0 -log(posterior) 1790 1789 1787.5 1788 1787.0 1787 1786.5 6 10 4 4 10 4 2 10 4 0 2 10 4 4 10 4 6 10 4 8 10 4 1786 4.8 5.0 5.2 5.4 5.6 5.8 6.0 wavelet coefficients One standard deviation checkshot knots repulsion to keep interval velocities sane

General effect of allowing more freedom Wavelets sharpen, noise reduces

Cross well issues: Wavelets and Ties Synthetics/traces from joint extraction Wavelets from various extractions

Relations to Cross-validation Leave-one-out pseudo Bayes factor B 12(cross val ) = n D p(d i d { j i},m 1 ) p(d i d { j i},m 2 ) i=1 Has asymptotics of Akaike information criterion Log(P(D k)) += d (c.f. Bayes info. crit. = (d/2)log(n D ) (less severe on richer models)

3 Model Selection examples

(1) Simple Wavelet-length choice

Absolute significance tests MAP Wavelet length not shortest model MAP noise level exceptional on null-hypothesis test Perform ensemble of extractions on fake seismic data with matched bandwidth and univariate statistics: P(σ null ) σ true σ null

(2) Checkshot choice (25 models) Changepoint cascade

Richest model

Max. aposteriori model

Model probabilities and noise

(3) Naïve facies-driven anisotropy choice δ~0.15

Conclusions Model selection is desirable in nearly all inverse problems Marginal model likelihoods can be tricky Careful construction of prior Efficient (differentiable) forward models Efficient optimisers Bayesian model selection is quite opinionated. Very Occamist: severe on overparameterised models Selection is a doable problem for wavelet extraction. Produces Often compact wavelets Simplified checkshots Significant potential for missing-data (petrophysics) problems

O me, the word 'choose!' I may neither choose whom I would nor refuse whom I dislike. Portia, The Merchant of Venice. More info: www.google.com/search?q=james+gunning+csiro Wavelet extractor open source code (java), demos, papers. Part of Delivery suite (Bayesian seismic inversion) Computers and Geosciences 32 (2006)

Data-positioning uncertainty (1) y(m) = f(m) an unusual regression problem Simple toy problem; fitting a line y=a+bx: y ~ X.m + e, Jeffrey's prior on e ~ N(0,σ) Priors: P(σ ) ~ 1/ σ P(m σ) ~ N(0,g σ 2 (X T X) -1 ) ("Zellner") Posterior marginal = L(y m,σ)p(m σ)p(σ)dmdσ ~ (1+g(1-R 2 )) -(n-1)/2 Where R 2 is usual coeff. of determination For two samples of random y B 12 ~ ((1+g(1-R 12 ))/ (1+g(1-R 22 ))) -(n-1)/2

Data-positioning uncertainty (2) Fluctuation in B 12 for problem with y=1+x+(25% fixed noise) + (5% fluctuations) 2.25 2 1.75 1.5 1.25 0.75 20 40 60 80 100 frequency 100 80 60 40 20-2 -1 0 1 2 3 log 10 (BF 12 )