Inverse problems and uncertainty quantification in remote sensing

Similar documents
Markov chain Monte Carlo methods in atmospheric remote sensing

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

UV/VIS Limb Retrieval

AEROSOL MODEL SELECTION AND UNCERTAINTY MODELLING BY RJMCMC TECHNIQUE

Markov chain Monte Carlo methods for high dimensional inversion in remote sensing

GOMOS data characterisation and error estimation

PATTERN RECOGNITION AND MACHINE LEARNING

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Fundamentals of Data Assimila1on

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Gaussian Process Approximations of Stochastic Differential Equations

ISTINA - : Investigation of Sensitivity Tendencies and Inverse Numerical Algorithm advances in aerosol remote sensing

Retrieval problems in Remote Sensing

Inverse Theory. COST WaVaCS Winterschool Venice, February Stefan Buehler Luleå University of Technology Kiruna

Machine Learning Overview

Inference and estimation in probabilistic time series models

4. DATA ASSIMILATION FUNDAMENTALS

B4 Estimation and Inference

Variational data assimilation

Short tutorial on data assimilation

Covariance Matrix Simplification For Efficient Uncertainty Management

David Giles Bayesian Econometrics

Lecture 2: Repetition of probability theory and statistics

Introduction to Bayesian methods in inverse problems

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Algorithms for Uncertainty Quantification

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

{ p if x = 1 1 p if x = 0

Probabilistic Graphical Models

Gaussian Processes for Machine Learning

p(z)

Lecture 3: Pattern Classification

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Naïve Bayes classification

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Overfitting, Bias / Variance Analysis

Inverse Problems, Information Content and Stratospheric Aerosols

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Strong Lens Modeling (II): Statistical Methods

Bayesian Machine Learning

Intro to Probability. Andrei Barbu

EO Level1 Lessons learnt

Introduction to Machine Learning

Capabilities of IRS-MTG to sound ozone, CO and methane using ESA pre-phase A specifications

Randomize-Then-Optimize: A Method for Sampling from Posterior Distributions in Nonlinear Inverse Problems

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Formulas for probability theory and linear models SF2941

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Validation of GOMOS High Resolution Temperature Profiles using Wavelet Analysis - Comparison with Thule Lidar Observations

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

MCMC Sampling for Bayesian Inference using L1-type Priors

Course outline, objectives, workload, projects, expectations

(Extended) Kalman Filter

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Monday, Oct. 2: Clear-sky radiation; solar attenuation, Thermal. nomenclature

Continuous Random Variables

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

MEASURING TRACE GAS PROFILES FROM SPACE

Algorithmisches Lernen/Machine Learning

Stratospheric aerosol profile retrieval from SCIAMACHY limb observations

Probability theory. References:

Density Estimation. Seungjin Choi

EE4601 Communication Systems

Expectation Propagation for Approximate Bayesian Inference

COM336: Neural Computing

Lecture : Probabilistic Machine Learning

Bayesian Decision Theory

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Towards a Bayesian model for Cyber Security

Algorithm document for SCIAMACHY Stratozone limb ozone profile retrievals

Inverse Problems in the Bayesian Framework

Multiple Random Variables

Simulation-Based Uncertainty Quantification for Estimating Atmospheric CO2 from Satellite Data

Satellite Observations of Greenhouse Gases

A Global Atmospheric Model. Joe Tribbia NCAR Turbulence Summer School July 2008

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Multiple Scenario Inversion of Reflection Seismic Prestack Data

Robots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models. Domingo Gallardo

GEMS. Nimbus 4, Nimbus7, NOAA-9, NOAA11, NOAA16, NOAA17

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

System Identification, Lecture 4

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 1: Introduction

Bayesian Approaches Data Mining Selected Technique

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Aircraft Validation of Infrared Emissivity derived from Advanced InfraRed Sounder Satellite Observations

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Transcription:

1 / 38 Inverse problems and uncertainty quantification in remote sensing Johanna Tamminen Finnish Meterological Institute johanna.tamminen@fmi.fi ESA Earth Observation Summer School on Earth System Monitoring and Modeling July 30 Aug 11, 2012, Frascati 30.7. 2012

2 / 38 Contents of the three lectures Monday: Introduction to inverse problems and uncertainty quantification in atmospheric remote sensing Tuesday: Introduction of Markov chain Monte Carlo method for estimating uncertainties Wednesday: More examples of how to characterize uncertainties in remote sensing (including modeling uncertainties)

3 / 38 Contents of this lecture Introduction to inverse problems in remote sensing Uncertainties and random variables Bayesian approach for solving inverse problems Example in atmospheric remote sensing: GOMOS

Satellite remote sensing Satellite remote sensing has become an important way to monitor and study our environment. Land, vegetation, oceans, snow, ice, atmosphere,... Global observations from pole to pole Continous observations Operational need for data: monitoring, forecasting, support in emergency situations,... Support for monitoring the effects of international regulations, support for decision making... Research: continous time series, combination of measurements, comparison with models... 4 / 38

5 / 38 Satellite remote sensing and inverse problems Remote sensing measurements are typically non-direct measurements Data processing involves solving inverse problems (data retrieval) Phyiscal modeling of the measurements often complicated: typically nonlinear

6 / 38 Retrieval in practice Operative algorithms need to be fast, robust, reliable Often simplifications and assumptions are needed Typically also additional information needed for solving the problem (ill posed problem). Important to characterize the uncertainties and validity of the simplifications and assumption Uncertainty Quantification (UQ)

7 / 38 Uncertainty Quantification (UQ) Uncertainty quantification: characterizing the errors and uncertainties reduction of the uncertainties, if possible UQ is becoming more and more important in environmental sciences. In remote sensing UQ is particularly important for: Combining data from different sources Assimilation Comparing with models Model discrepancy Time series analysis, trends Supporting decision making Forecasting UQ growing area of research: theory, computational methods, simulations

8 / 38 This lecture series: UQ using Bayesian approach Bayesian formulation gives natural tools to characterize the impact of different errorr sources. Allowes including additional information to the retrieval in a natural way. In atmospheric remote sensing method called Optimal Estimation -algorithm by C. Rodgers used extensively which is based on Bayesian formulation

9 / 38 Forwad problem y = f (x) where y IR m unknown variable x IR n measurements, known paramteres f function describing the relationship between the unknown variable and known parameters

10 / 38 Inverse problem y = f (x, b) where y IR m measurements, observations, data x IR n unknown parameters (unknown state) which we are interested f function describing the relationship between the measurements and the unknown parameters (forward model) b known model parameters

11 / 38 Linear case - example Inverse problem: find x when y is measured and y = f (x) When the dependence f is linear, we can describe the problem with matrix formulation: y = Ax where A is m n matrix Simple solution would now be x = A [ 1] y where A [ 1] would be in some sense the inverse of matrix A. In practice it is often more complicated.

12 / 38 Well posed and ill-posed problems Hadmard s definition (1902) of a well posed problem: (i) The solution exists. (ii) The solution is unique. (iii) The solution hardly changes if the parameters (or initial conditions) are slightly changed. If at least one of the criteria above is not fulfilled the problem is sayd to be ill-posed. In remote sensing the problems are typically ill-posed as there is not enough data to result unique solutions. Therefore we need additional information to solve the problem.

13 / 38 Solving an ill-posed problem Typcal ways of introducing additional information to inverse problems: Decreasing the size of the problem: discretization Regularization methods (eg. Tikhonov regularization: assuming smoothness) Bayesian approach: describing previous knowledge as a priori information

14 / 38 Measurement error, noise In practice measurements include always noise (ε) y = f (x) + ε (assuming that nose is additive) By repeating the measuremenets we get different answers

15 / 38 Measurements as random variables It is natural to consider measurements as random variables. Intuitively Random variables get probable values often and less probable values only occasionally. These realizations form a distribution. Let X be a random variable. The distribution π of its realisations can be described as an integral π π X (A) = P{X A} = p(x)dx where p(x) is the probablility density function (pdf) of the random variable X. A

Gaussian distributions: 1D Gaussian (Normal) distribution (N(µ, σ 2 )) with mean µ and covariance matrix C has pdf p(x) = ( 1 exp 1 2πσ 2 2 (x µ) 2 ) σ 2 Multivariate Gaussian distribution (N(µ, C)) The n dimensional multivariate Gaussian distribution with mean µ and covariance matrix C has pdf p(x) = (2π) n/2 C n/2 e 1 2 (x µ) C 1 (x µ) where C is n n positive definite symmetric matrix. If the components are independent, with C = σ 2 I, then the density simplifies to p(x) = (2π) n/2 σ n e 1 2 ( ) ni=1 xi µ 2 i σ 16 / 38

17 / 38 Statistical inverse problem Because the measurements are random variables it is natural to consider also the unknown X as a random variable. Now the inverse problem is to search for the conditional distribution of X assuming that measurement Y = y: π X Y (A) = P{X A Y = y} = p(x y) dx where p(x y) is the pdf of the conditional distribution. The pdf p(x y) describes the probability that unknown X = x when the measurement Y = y This is what we are looking for! A

18 / 38 Conditional probability According to elementary probability calculation: Joint probability that both events A and B take place P(A B) = P(A B)P(B) Now conditional probability is: P(A B) = P(A B) P(B) = P(B A)P(A) P(B) asuming that P(B) 0.

Bayesian solution to an inverse problem The same idea of elementary probabity is used in Bayesian solution to an inverse problem. Bayes formula where p(x y) = p lh(y x) p pr (x) p(y) p(x y) is the pdf of a posteriori distribution p pr (x) is the pdf of a priori distribution. Describes prior knowledge of x. p lh (y x) is the pdf of the likelihood distribution. Characterizes the dependence of the measurements on the unknown. p(y) 0 is a scaling factor (constant) 19 / 38

20 / 38 The scaling factor is obtained by integrating over the state space: p(y) = p lh (y x)p pr (x)dx It is typically not considered, but we will come back to this later in the lectures.

21 / 38 A posteriori distribution The Bayes formula p(x y) = p lh(y x) p pr (x) p(y) describes the solution of an inverse problem Natural way for using all available information: how to combine new measurements with our old prior knowledge (p(x) p(x y)) The solution is a distribution p(x) p(x y) p(y x)

Toy examples of 2-dimensional posterior distributions 22 / 38

23 / 38 How to characterize solution which is a distribution? Maximum a posteriori (MAP) estimate - most probable value MAP = x = argmax{p(x y)} x Expectation x = IE X Y [x] = x p(x y) dx IR n The uncertainty of the estimates is described as the width of the distribution E[x] MAP p(x y) Shape of the distribution kth moment: IE X Y [(x x) k ] k = 2, variace k = 3, skewness k = 4, kurtosis

24 / 38 Linear inverse problem Linear problem: y = Ax + ε Assume Gaussian noise ε N(0, C y ) and prior information p pr (x) = N(x 0, C x0 ). In this special case the posterior distribution is also normally distributed p(x y) e 1 2 (x x)t Q(x x), where the the expectation and the covariance matrix are: x = Q 1 (Cx 1 0 x 0 + A T Cy 1 y), Q 1 = (Cx 1 0 + A T Cy 1 A) 1. In this case the posterior estimate x = x and the posterior covariance matrix C x = Q 1 fully describe the posterior distribution.

25 / 38 MAP and ML estimate p(x y) p(y x)p(x) Assume non-informative prior distribution p(x) = c Now maximum a posteriori (MAP) estimate is the same as Maximum likelihood estimate (ML) MAP = argmax x {p(x y)} = argmax{p(y x)} = ML x In natural sciences non-informative prior is often attractive as it allows solutions that are purely based on measurements. However, there are often non-physical solutions that shoud not be taken into account, like positivity.

Special case: Gaussian noise Assume additive, normally distributed measurement noise ε N(0, C y ) The likelihood function 1 p lh (y x) = exp( 1 (2π) m 2 C y 2 (f (x) y)t Cy 1 (f (x) y)). Assuming non-informative prior density the posterior distribution is proportional to the likelihood function only: p(x y) p(y x), and the MAP estimate ˆx equals with ML estimate which further equals with the one that minimizes the sum of squared residuals function SSR(x) = (f (x) y) T C 1 (f (x) y). y This formula has been the basis for the traditional parameter estimation, which concentrates simply on minimizing the SSR function (least squares solution). 26 / 38

27 / 38 In practice... Prior: Typically, prior information is defined as Gaussian for simplifying computations. In linear Gaussian case the problem reduces to simply solving weighted least squares problem. Non-linear and/or non-gaussian problems. Typically assumed that solution is close to Gaussian.

28 / 38 Different techniques to solve nonlinear/non-gaussian problems Linearization and Gaussian assumption: Linearize the problem. If noise after linarization is close to Gaussian then linear Gaussian theory and simple matrix inversions can be applied (assuming that prior is also Gaussian).

29 / 38 Non-linear optimization - seaching for maximum a posteriori (MAP) or maximum likelihood (ML) estimate: Iterative methods to search for MAP (ML) and assume linearity around estimate. Approximate uncertainty with covariance matrix. Some commonly used methods: Levenberg-Marquardt iterative algorithm. (See eg. Press et al. Numerical Recipes. The art of scientific computing). Combination of steepst descent (gradient) and Newtonian iteration (approximation with quadratic function). Ready made algorithms are available. MAP (ML) estimate ˆx is computed. Posterior is assumed to be Gaussian close to estimate ˆx and posterior covariance matrix C ˆx is computed. In atmospheric research typically used iteratively method called Optimal estimation which is based on Bayes theorem. (See Rodgers 2001: Inverse methods for atmospheric sounding)

GOMOS/Envisat: Stellar occultation instrument GOMOS - Global Ozone Monitoring by Occultation of Stars One of the three atmospheric instruments on-board ESA s Envisat satellite Launched 2002 Measurements till April 2012 when Envisat lost connection to Earth. Measurement principle Fingerprints of atmospheric gases in transmission spectra 20 40 stellar occultations/orbit. 50 100 ray path measurements/star from 100 km down to 10 km. Self-calibrating T λ,l = I λ,l I star λ 30 / 38

Cross sections of gases relevant to GOMOS (UV-VIS) 31 / 38

32 / 38 Absorption by gases Transmission spectra measured by GOMOS at descinding altitudes from 100 km down to 5 km.

Refraction 33 / 38

34 / 38 Modelling The transmission at wavelength λ, along the ray path l, includes Tλ,l abs due to absorption and scattering by gases and Tλ,l ref due to refractive attenuation and scintillation. Tλ,l abs is given by the Beer s law, Tλ,l abs = exp (z(s))ρgas (z(s))ds l gas α gas λ where the integral is over the ray path l. The temperature dependent cross sections α gas are assumed to be known from laboratory measurements. The inversion problem is to estimate the gas profiles ρ gas (z) from the measurements y λ,l = T abs λ,l T ref λ,l + ɛ λ,l.

Operational algorithm: Assumptions Model: T abs λ,l (ρ) = exp l gas α gas λ From now on we consider data to be simply: y λ,l = T abs λ,l + ɛ λ,l. (z(s))ρgas (z(s))ds where λ = λ 1,..., λ, l = l 1,... l M Noise Gaussian, uncorrelated between different altitudes and wavelengths., scintillation and dilution, obtained from separate measurements (scintillation/turbulence effects not fully corrected). Temperature dependence of cross sections can be modelled with effective cross sections (only wavelength dependence) Temperature obtained from ECMWF T ref λ,l 35 / 38

36 / 38 Operational two step algorithm Since the cross sections are assumed constant on each ray path and noise uncorrelated, the inversion separates into [ Tλ,l abs = exp ] α gas λ,l Ngas l, λ = λ 1,..., λ, gas with N gas l = l ρ gas (z(s))ds, l = l 1,..., l M. Two step approach: Spectral inversion - retrieval of horizontally integrated densities of several constituents but each altitude separately Vertical inversion - retrieval of full profile for each constituent separately

37 / 38 Spectral inversion The line density vector N l = (N gas l ), gas = 1,..., n gas with the posterior density P(N l y l ) e 1 2 (T l(n l ) y l )C 1 l (T l (N l ) y l ) p(n l ), is fitted to the spectral data y l = (y λ,l ), C l = diag(σ 2 λ,l ), λ = 1,...,, separately for each ray path l. Prior: Fixed prior for neutral density from ECMWF. For gases and aerosols non-informative prior Non-linear problem solved iteratievely using Levenber-Marquard algorithm. MAP (=ML) estimate N l obtained with uncertainty described by its covariance matrix C Nl for all ray paths (altitudes) l

38 / 38 Pointestimate demo BLUE - GOMOS measurement RED - GOMOS iterative fit