Uncertainty Quantification for Inverse Problems. November 7, 2011

Similar documents
Statistical Measures of Uncertainty in Inverse Problems

Augmented Tikhonov Regularization

Using prior knowledge in frequentist tests

Statistics: Learning models from data

Accounting for Complex Sample Designs via Mixture Models

Can we do statistical inference in a non-asymptotic way? 1

Plausible Values for Latent Variables Using Mplus

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Part 4: Multi-parameter and normal models

41903: Introduction to Nonparametrics

Regression, Ridge Regression, Lasso

Density Estimation. Seungjin Choi

Bayesian Inference. Chapter 1. Introduction and basic concepts

Time Series and Forecasting Lecture 4 NonLinear Time Series

F & B Approaches to a simple model

Remedial Measures for Multiple Linear Regression Models

GAUSSIAN PROCESS REGRESSION

Fractional Imputation in Survey Sampling: A Comparative Review

Modeling Real Estate Data using Quantile Regression

The Bayesian approach to inverse problems

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bagging During Markov Chain Monte Carlo for Smoother Predictions

An Introduction to Bayesian Linear Regression

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Bayesian Aggregation for Extraordinarily Large Dataset

Random Fields in Bayesian Inference: Effects of the Random Field Discretization

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Better Bootstrap Confidence Intervals

Stat 5101 Lecture Notes

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A note on multiple imputation for general purpose estimation

Spatially Adaptive Smoothing Splines

Chapter 8: Sampling distributions of estimators Sections

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Confidence Estimation Methods for Neural Networks: A Practical Comparison

UNIVERSITÄT POTSDAM Institut für Mathematik

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Introduction to Bayesian methods in inverse problems

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Inference For High Dimensional M-estimates. Fixed Design Results

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

A Magiv CV Theory for Large-Margin Classifiers

9. Robust regression

Applied Econometrics (QEM)

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

STAT 100C: Linear models

Nonparametric Drift Estimation for Stochastic Differential Equations

Bayesian data analysis in practice: Three simple examples

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Lecture 2: Statistical Decision Theory (Part I)

Regression Models - Introduction

Linear Models for Regression

Bayesian inverse problems with Laplacian noise

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

Simple Linear Regression (Part 3)

Bayesian linear regression

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Nonparameteric Regression:

Lecture 8: Information Theory and Statistics

Gaussian Process Regression

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas

Discrete Ill Posed and Rank Deficient Problems. Alistair Boyle, Feb 2009, SYS5906: Directed Studies Inverse Problems 1

2 Statistical Estimation: Basic Concepts

Statistical Inference

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

Convergence rates of spectral methods for statistical inverse learning problems

y(x) = x w + ε(x), (1)

Inverse problems in statistics

A Modern Look at Classical Multivariate Techniques

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Outline Lecture 2 2(32)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Lecture 7 Introduction to Statistical Decision Theory

Bayesian Inference: Concept and Practice

What s New in Econometrics. Lecture 13

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Chapter 9. Non-Parametric Density Function Estimation

Interval Estimation III: Fisher's Information & Bootstrapping

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Constrained Gaussian processes: methodology, theory and applications

Post-exam 2 practice questions 18.05, Spring 2014

Quantile Regression for Extraordinarily Large Data

Bayesian Inference. Chapter 9. Linear models and regression

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Transcription:

Uncertainty Quantification for Inverse Problems November 7, 2011

Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance and MSE for Tikhonov Resampling methods Bayesian inversion Exploratory data analysis (by example)

Introduction Inverse problem: to recover an unknown object from indirect noisy observations Example: Deblurring of a signal f Forward problem: µ(x) = 1 A(x t)f (t) dt 0 Noisy data: y i = µ(x i ) + ε i, i = 1,..., n Inverse problem: recover f (t) for t [0, 1]

General Model A = (A i ), A i : H IR, f H y i = A i [f ] + ε i, i = 1,..., n The recovery of f from A[f ] is ill-posed Errors ε i modeled as random Estimate f is an H-valued random variable Basic Question (UQ) How good is the estimate f? What is the distribution of f? Summarize characteristics of the distribution of f E.g., how likely is it that f will be far from f?

Review: least-squares y = Aβ + ε A is n m with n > m and A t A has stable inverse E(ε) = 0, Var(ε) = σ 2 I Least-squares estimate of β: β = arg min y Ab 2 = (A t A) 1 A t y b β is an unbiased estimator of β: E β ( β ) = β β

Covariance matrix of β: Var β ( β ) = σ 2 (A t A) 1 If ε is Gaussian N(0, σ 2 I), then β is the MLE and β N( β, σ 2 (A t A) 1 ) Unbiased estimator of σ 2 : σ 2 = y A β 2 /(n m) = y ŷ 2 /(n dof(ŷ)), where ŷ = H β, H = A (A t A) 1 A t = hat-matrix

Confidence regions 1 α confidence interval for β i : I i (α) : βi ± t α/2,n m σ ( (A t A) 1 ) i,i C.I. s are pre-data Pointwise vs simultaneous 1 α coverage: pointwise: P[ β i I i (α) ] = 1 α simultaneous: P[ β i I i (α) i ] = 1 α Easy to find 1 α joint confidence region R α for β which gives simultaneous coverage i

Residuals r = y ŷ = (I H)y E(r) = 0, Var(r) = σ 2 (I H) If ε N(0, σ 2 I), then: r N(0, σ 2 (I H)) do not behave like original errors ε Corrected residuals r i = r i / 1 H ii Corrected residuals may be used for model validation Other types of residuals (e.g., recursive)

Review: Gaussian Bayesian linear model y β N(Aβ, σ 2 I), β N(µ, τ 2 Σ) µ, σ, τ, Σ known Goal: update prior distribution of β in light of data y. Compute posterior distribution of β given y: β y N(µ y, Σ y ) Σ y = σ 2 ( A t A + (σ 2 /τ 2 )Σ 1 ) 1 µ y = ( A t A + (σ 2 /τ 2 )Σ 1 ) 1 (A t y + (σ 2 /τ 2 )Σ 1 µ) = weighted average of µ ls and µ

Summarizing posterior: µ y = E(µ y) = mean and mode of posterior (MAP) Σ y = Var(µ y) = covariance matrix of posterior 1 α credible interval for β i : (µ y ) i ± z α/2 ( Σ y ) i,i Can also construct 1 α joint credible region for β

Some properties µ y minimizes E δ(y) β 2 among all δ(y) µ y β ls as σ 2 /τ 2 0, µ y µ as σ 2 /τ 2 µ y can be used as a frequentist estimate of β but: µ y is biased its variance is NOT obtained by sampling from the posterior Warning: a 1 α credible regions MAY NOT have 1 α frequentist coverage If µ, σ, τ or Σ unknown use hierarchical or empirical Bayesian and MCMC methods

Parametric reductions for IP Transform infinite to finite dimensional problem: i.e., A[f ] Aa Assumptions y = A[f ] + ε = Aa + δ + ε Function representation: f (x) = φ(x) t a + η(x), A[η] = 0 Penalized least-squares estimate: â = arg min y Ab 2 + λ 2 b t Sb b = (A t A + λ 2 S) 1 A t y f (x) = φ(x)tâ

Example: y i = 1 0 A(x i t)f (t) dt + ε i Quadrature approximation: 1 0 A(x i t)f (t) dt (1/m) Discretized system: m A(x i t j )f (t j ) j=1 y = Af + δ + ε, A ij = A(x i t j ), f i = f (x i ) Regularized estimate: f = arg min g IR m y Ag 2 + λ 2 Dg 2 = (A t A + λ 2 D t D) 1 A t y Ly f (xi ) = f i = e t i Ly

Example: A i continuous on Hilbert space H. Then: orthonormal φ k H, orthonormal v k IR n and non-decreasing λ k 0 such that A[φ k ] = λ k v k, f = f 0 + f 1, with f 0 Null(A), f 1 Null(A) f 0 not constrained by data, f 1 = n i=1 a kφ k Discretized system: y = V a + ε, V = (λ 1 v 1 λ n v n ) Regularized estimate: â = arg min y V b 2 + λ 2 b 2 Ly b IR n f (x) = φ(x) t Ly, φ(x) = (φ i (x))

Example: A i continuous on RKHS H = W m ([0, 1]). Then: H = N m 1 Hm and φ j H and κ j H such that 1 y A[f ] 2 + λ 2 (f (m) 1 ) 2 = (y i κ i, f ) 2 + λ 2 P Hm f 2 0 i f = j a j φ j + η, η span{φ k } Discretized system: Regularized estimate: y = Xa + δ + ε â = arg min y Xb 2 + λ 2 b t Pb Ly b IR n f (x) = φ(x) t Ly, φ(x) = (φ i (x))

To summarize: y = Aa + δ + ε f (x) = φ(x) t a + η(x), A[η] = 0 â = arg min b IR n y Xb 2 + λ 2 b t Pb = (A t A + λ 2 P) 1 A t y f (x) = φ(x)tâ

Bias, variance and MSE of f Is f (x) close to f (x) on average? Bias( f (x) ) = E f (x) f (x) = φ(x) t B λ a+φ(x) t G λ A t δ η(x) Pointwise sampling variability: Pointwise MSE: Var( f (x) ) = σ 2 AG λ φ(x) 2 MSE( f (x) ) = Bias( f (x) ) 2 + Var( f (x) ) Integrated MSE: IMSE( f (x) ) = E f (x) f (x) 2 dx = MSE( f (x) ) dx

What about the bias of f? Upper bounds are sometimes possible Geometric information can be obtained from bounds Optimization approach: find max/min of bias s.t. f S Average bias: choose a prior for f and compute mean bias over prior Example: f in RKHS H, f (x) = ρ x, f Bias( f (x) ) = A x ρ x, f Bias( f (x) ) A x ρ x f

Confidence intervals If ε Gaussian and Bias( f (x) ) B(x), then f (x) ± (zα/2 σ AG λ φ(x) + B(x)) is a 1 α C.I. for f (x) Simultaneous 1 α C.I. s for E( f (x)), x S: find β such that [ ] P sup Z t V (x) β x S α Z i iid N(0, 1), V (x) = K G λ φ(x)/ K G λ φ(x) and use results from maxima of Gaussian processes theory

Residuals r = y Aâ = y ŷ Moments: E(r) = A Bias( â ) + δ Var(r) = σ 2 (I H λ ) 2 Note: Bias( A â ) = A Bias( â ) may be small even if Bias( â ) is significant Corrected residuals: r i = r i /(1 (H λ ) ii )

Estimating σ and λ λ can be chosen using same data y (with GCV, L-curve, discrepancy principle, etc.) or independent training data This selection introduces an additional error If δ 0, σ 2 can be estimated as in least-squares: σ 2 = y ŷ 2 /(n dof(ŷ)) otherwise one may consider y = µ + ε, µ = A[f ] and use methods from nonparametric regression

Resampling methods Idea: If f and σ are good estimates, then one should be able to generate synthetic data consistent with actual observations If f is a good estimate make synthetic data y 1 = A[ f ] + ε i,..., y b = A[ f ] + ε b with ε i from same distribution as ε. Use same estimation procedure to get f i from yi : y 1 f 1,..., y b f b Approximate distribution of f with that of f 1,..., f b

Generating ε similar to ε Parametric resampling: ε N(0, σ 2 I) Nonparametric resampling: sample ε with replacement from corrected residuals Problems: Possibly badly biased and computational intensive Residuals may need to be corrected also for correlation structure A bad f may lead to misleading results Training sets: Let {f 1,..., f k } be a training set (e.g., historical data) For each f j use resampling to estimate MSE( f j ) Study variability of MSE( f j )/ f j

Bayesian inversion Hierarchical model (simple Gaussian case): y a, θ N(Aa + δ, σ 2 I), a θ F a ( θ), θ F θ θ = (δ, σ, τ) It all reduces to sampling from posterior F(a y) use MCMC methods Possible problems: Selection of priors, improper priors Convergence of MCMC Computational cost of MCMC in high dimensions Interpretation of results

Warning Parameters can be tuned so that Tikhonov estimate = f = E(f y) but this does not mean that the uncertainty of f is obtained from the posterior of f given y

Warning Bayesian or frequentist inversions can be used for uncertainty quantification but: Uncertainties in each have different interpretations that should be understood Both make assumptions (sometimes stringent) that should be revealed The validity of the assumptions should be explored exploratory analysis and model validation

Exploratory data analysis (by example) Example: l 2 and l 1 estimates data = y = Af + ε = AW β + ε fl 2 = arg min g y Ag 2 2 + λ 2 Dg 2 2 λ : from GCV fl 1 = W β, β = arg min b y AW b 2 2 + λ b 1 λ : from discrepancy principle E(ε) = 0, Var(ε) = σ 2 I σ = smoothing spline estimate

Example: y, Af & f

Tools: robust simulation summaries Sample mean and variance of x 1,..., x m : Recursive formulas: X = (x 1 + + x m )/m S 2 = 1 (x i m X) 2 i X n+1 = X n + 1 n + 1 (x n+1 X n ) T n+1 = T n + n n + 1 (x n+1 X n ) 2, S 2 n = T n /n Not robust

Robust measures Median and median absolute deviation from median (MAD) X = median{x 1,..., x m } { MAD = median x 1 X m,..., x m X } m Approximate recursive formulas with good asymptotics

Stability of f l 2

Stability of f l 1

CI for bias of A f l 2 Good estimate of f good estimate of Af y: direct observation of Af ; less sensitive to λ 1 α CIs for bias of A f ( ŷ = Hy ): ŷ i y i 1 H ii ± z α/2 σ 1 + (H2 ) ii (H ii ) 2 (1 H ii ) 2

95% CIs for bias

Coverage of 95% CIs for bias

Bias bounds for f l 2 For fixed λ: Var( f l 2) = σ 2 G(λ) 1 A t A G(λ) 1 Bias( f l 2) = E( f l 2 ) f = B(λ), G(λ) = A t A + λ 2 D t D, B(λ) = λ 2 G(λ) 1 D t Df. Median bias: Bias M ( f l 2 ) = median( f l 2 ) f For λ estimated: Bias M ( f l 2 ) median[ B( λ) ]

Hölder s inequality B( λ) i λ 2 U p,i ( λ) Df q B( λ) i λ 2 U w p,i( λ) β q U p,i ( λ) = DG( λ) 1 e i p or U w p,i( λ) = W D t DG( λ) 1 e i p Plots of U p,i ( λ) and U w p,i ( λ) vs x i

Assessing fit Compare y to A f + ε Parametric/nonparametric bootstrap samples y : (i) (parametric) ε i N(0, σ 2 I), or (nonparametric) from corrected residuals r c (ii) y i = A f + ε i Compare statistics of y and {y 1,..., y B } (blue,red) Compare statistics of r c and N(0, σ 2 I) (black) Compare parametric vs nonparametric results Example: Simulations for A f l 2 with: y (1) y (2) y (3) good with skewed noise biased

Example T 1 (y) = min(y) T 2 (y) = max(y) T k (y) = 1 n ( y mean(y) ) k /std(y) k n (k = 3, 4) i=1 T 5 (y) = sample median (y) T 6 (y) = MAD(y) T 7 (y) = 1st sample quartile (y) T 8 (y) = 3rd sample quartile (y) T 9 (y) = # runs above/below median(y) P j = P ( T j (y ) T o j ), T o j = T j (y)

Assessing fit: Bayesian case Hierarchical model: y f, σ, γ N(Af, σ 2 I) f γ exp ( f t D t Df /2γ 2 ) (σ, γ) π with E(f y) = ) 1 (A t A + σ2 γ 2 Dt D A t y Posterior mean = f l 2 for γ = σ/λ Sample (f, σ ) from posterior of (f, σ) y and simulated data y = Af + σ ε with ε N(0, I) Explore consistency of y with simulated y

Simulating f Modify simulations from y i = A f + ε i to y i = Af i + ε i, f i similar to f Frequentist simulation of f? One option: f i from historical data (training set) Check relative MSE for the different f i

Bayesian or frequentist? Results have different interpretations Frequentists can use Bayesian methods to derive procedures; Bayesians can use frequentist methods to evaluate procedures Consistent results for small problems and lots of data Possibly very different results for complex large-scale problems Theorems do not address relation of theory to reality

Basic Bayesian methods: Readings Carlin, JB & Louis, TA, 2008, Bayesian Methods for Data Analysis (3rd Edn.), Chapman & Hall Gelman, A, Carlin, JB, Stern, HS & Rubin, DB, 2003, Bayesian Data Analysis (2nd Edn.), Chapman & Hall Bayesian methods for inverse problems: Calvetti, D. & Somersalo, E, 2007, Introduction to Bayesian Scientific Computing: Ten Lectures on Subjective Computing, Springer Kaipio, J & Somersalo, E, 2004, Statistical and Computational Inverse Problems, Springer Stuart, A.M., Inverse Problems: A Bayesian Perspective, 2010, Acta Numerica, DOI:10.1017/S0962492904 Discussion of frequentist vs Bayesian statistics Freedman, D, Some issues in the foundation of statistics, 1995, Foundations of Sciences, 1, 19 39 Tutorial on frequentist and Bayesian methods for inverse problems Stark, PB & Tenorio, L, 2011, A primer of frequentist and Bayesian inference in inverse problems. In Computational Methods for Large-Scale Inverse Problems and Quantification of Uncertainty, pp.9 32, Wiley Statistics and inverse problems Evans, SN & Stark, PB, 2003, Inverse problems as statistics, Inverse Problems, 18, R55 Marzouk, Y & Tenorio, L, 2012, Uncertainty Quantification for Inverse Problems O Sullivan, F, 1986, A statistical perspective on ill-posed inverse problems, Statistical Science, 1, 502 527 Tenorio, L, Andersson, F, de Hoop, M & Ma, P, 2011, Data analysis tools for uncertainty quantification of inverse problems, Inverse Problems, 27, 045001 Wahba, G, 1990, Spline Smoothing for Observational Data, SIAM Special section in Inverse Problems: Volume 24, Number 3, June 2008 Mathematical statistics Casella, G & Berger, RL, 2001, Statistical Inference, Duxbury Press Schervish, ML, 1996, Theory of Statistics, Springer