Parameter estimation: A new approach to weighting a priori information

Similar documents
Parameter estimation: A new approach to weighting a priori information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Inverse Theory Course: LTU Kiruna. Day 1

Linear First-Order Equations

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Math 342 Partial Differential Equations «Viktor Grigoryan

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Implicit Differentiation

New Statistical Test for Quality Control in High Dimension Data Set

Topic 7: Convergence of Random Variables

Linear and quadratic approximation

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

Entanglement is not very useful for estimating multiple phases

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Influence of weight initialization on multilayer perceptron performance

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

Linear inversion. A 1 m 2 + A 2 m 2 = Am = d. (12.1) again.

State observers and recursive filters in classical feedback control theory

Lagrangian and Hamiltonian Mechanics

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Schrödinger s equation.

Diagonalization of Matrices Dr. E. Jacobs

Table of Common Derivatives By David Abraham

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Equilibrium in Queues Under Unknown Service Times and Service Value

Introduction to the Vlasov-Poisson system

Calculus of Variations

A Modification of the Jarque-Bera Test. for Normality

u!i = a T u = 0. Then S satisfies

Mathematical Review Problems

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

The Principle of Least Action

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

Optimization of Geometries by Energy Minimization

Separation of Variables

Unit #6 - Families of Functions, Taylor Polynomials, l Hopital s Rule

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Cascaded redundancy reduction

The Exact Form and General Integrating Factors

A Course in Machine Learning

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Multi-View Clustering via Canonical Correlation Analysis

Least-Squares Regression on Sparse Spaces

G j dq i + G j. q i. = a jt. and

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Regularized extremal bounds analysis (REBA): An approach to quantifying uncertainty in nonlinear geophysical inverse problems

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

A Review of Multiple Try MCMC algorithms for Signal Processing

Final Exam Study Guide and Practice Problems Solutions

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Research Article When Inflation Causes No Increase in Claim Amounts

θ x = f ( x,t) could be written as

Convergence of Random Walks

Role of parameters in the stochastic dynamics of a stick-slip oscillator

Chapter 6: Energy-Momentum Tensors

The Three-dimensional Schödinger Equation

Introduction to Markov Processes

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Generalizing Kronecker Graphs in order to Model Searchable Networks

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations

Generalization of the persistent random walk to dimensions greater than 1

Differentiability, Computing Derivatives, Trig Review. Goals:

The Non-abelian Hodge Correspondence for Non-Compact Curves

Inverse Problems = Quest for Information

Solutions to Practice Problems Tuesday, October 28, 2008

Gaussian processes with monotonicity information

COUPLING REQUIREMENTS FOR WELL POSED AND STABLE MULTI-PHYSICS PROBLEMS

Sturm-Liouville Theory

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

The new concepts of measurement error s regularities and effect characteristics

Sparse Reconstruction of Systems of Ordinary Differential Equations

2Algebraic ONLINE PAGE PROOFS. foundations

arxiv: v1 [physics.flu-dyn] 8 May 2014

Numerical Integrator. Graphics

arxiv: v4 [math.pr] 27 Jul 2016

Acute sets in Euclidean spaces

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

Lecture 10: Logistic growth models #2

Qubit channels that achieve capacity with two states

Differentiability, Computing Derivatives, Trig Review

Computing Derivatives

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

Math 1B, lecture 8: Integration by parts

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

Integration Review. May 11, 2013

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Stable and compact finite difference schemes

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

and from it produce the action integral whose variation we set to zero:

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Transcription:

Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a new approach to weighting initial parameter misfits in a least squares optimization problem for linear parameter estimation. Parameter misfit weights are foun by solving an optimization problem which ensures the penalty function has the properties of a χ 2 ranom variable with n egrees of freeom, where n is the number of ata. This approach iffers from others in that weights foun by the propose algorithm vary along a iagonal matrix rather than remain constant. In aition, it is assume that ata an parameters are ranom, but not necessarily normally istribute. The propose algorithm successfully solve three benchmark problems, one with iscontinuous solutions. Solutions from a more iealize iscontinuous problem show that the algorithm can successfully weight initial parameter misfits even though the two-norm typically smooths solutions. For all test problems sample solutions show that results from the propose algorithm can be better than those foun using the L-curve an generalize cross-valiation. In the cases where the parameter estimates are not as accurate, their corresponing stanar eviations or error bouns correctly ientify their uncertainty. AMS classification scheme numbers: 65F22, 93E24, 62M4 Submitte to: Inverse Problems. Introuction Parameter estimation is an element of inverse moeling in which measurements or ata are use to infer parameters in a mathematical moel. Parameter estimation is necessary in many applications such as biology, astronomy, engineering, Earth science, finance, an meical an geophysical imaging. Inversion techniques for parameter estimation are often classifie in two groups: eterministic or stochastic. Both eterministic an stochastic approaches must incorporate the fact that there are uncertainties or errors associate with parameter estimation [3], [3]. For example, in a eterministic approach such Tikhonov regularization [7] or stochastic approaches which use frequentist or Bayesian probability theory, it is assume that ata contain noise. The ifference between eterministic an stochastic approaches is that in the

Parameter estimation: A new approach to weighting a priori information 2 former it is assume there exists true parameter values for a given set of ata while in the latter the ata, parameter values or both are ranom variables. [4]. Parameter estimation can be viewe as an optimization problem in which an objective function representing ata misfit is minimize in a given norm, []. From a eterministic point of view the two-norm, i.e. a quaratic objective function, is the most attractive mathematically because the minimum can be explicitly written in close form. From the stochastic point of view this choice of two-norm is statistically the most likely solution if ata are normally istribute. However, this estimate is typically less accurate if the ata are not normally istribute or there are outliers in the ata [6]. A more complete optimization problem will inclue a statement about the parameter misfit, in aition to the ata misfit. This statement coul be a eterministic boun such as a positivity constraint on the parameters, or a regularization term which ensures that the first or secon erivative of the parameters is smooth. When parameter misfits are inclue in stochastic approaches their corresponing a priori probability istributions must be specifie. The avantage an isavantage of the stochastic viewpoint is that prior information about the probability istribution of ata or parameters must be specifie. A priori information for the istribution of ata is tractable because ata can (theoretically) be collecte repeately in orer to obtain a sample from which one can infer its probability istribution. A priori inference of the parameter probability istribution is less reliable than that for the ata because it must rely on information from the uncertain ata [3]. Which ever way one views the problem; positivity constraints, regularization, or probability istributions, typically a weighte objective function is minimize. A significant ifference between methos an their solutions lies in how the weights are chosen. An experimental stuy in [5] compares eterministic an stochastic approaches to seismic inversion for characterization of a thin-be reservoir. Their conclusion is that eterministic approaches are computationally cheaper but results are only goo enough for ientifying general trens an large features. They state that stochastic inversion is more avantageous because results have superior resolution an offer uncertainty estimates. The experimental results in [5] which suggest that the stochastic approach is more accurate than the eterministic approach may occur because the stochastic approach better weights the ata an parameter mistfits. Most eterministic approaches such as positivity constraints or regularization only use constant or simple weights on the parameter misfits. On the other han, stochastic approaches which specify prior normal or exponential probability istributions weight the parameter misfit with an inverse covariance matrix. Weighting with accurate non-constant, ense matrices is esirable but it implies that there is goo a priori information. How o we obtain this information, i.e how o we fin accurate weights on the ata an parameter misfits? In this work we use the following piece of a priori information to better weight the parameter misfit: The minimum value of a quaratic cost function representing the ata

Parameter estimation: A new approach to weighting a priori information 3 an parameter misfit is a χ 2 ranom variable with n egrees of freeom, where n is the number of ata. For large n, this is true regarless of the prior istributions of the ata or parameters. For the linear problem, an explicit expression for the minimum value of the cost function is given as a function of the weight on the parameter misfit. Since the cost function follows a χ 2 istribution, this minimum value has known mean an variance. To calculate a weight on the parameter misfit a new optimization problem is solve which ensures the minimum of the cost function lies within the expecte range. In Section 2 we escribe current approaches to solving linear iscrete ill-pose problems. In Section 3 we escribe the new approach an the corresponing algorithm, in Section 4 we show some numerical results, an in Section 5 we give conclusions an future work. 2. Linear Discrete Ill-pose Problems In this work iscrete ill-pose inverse problems of the form = Gm are consiere. Here is a n imensional vector containing measure ata, G is a forwar moeling operator written as an m n matrix an m is a m imensional vector of unknown parameter values. 2.. Deterministic approaches Frequently it is the case that there is no value of m that satisfies () exactly. Simple an useful approximations may be foun by optimizing min m Gm p p. (2) The most common choices for p are p =, 2. If p = 2, this is least squares optimization which is the simplest approach to analyze an statistically results in the most likely solution if the ata are normally istribute. However, the least squares solution is typically not accurate if one atum is far from the tren. If p = accurate solutions can still be foun if there are a few ata far from the tren. In aition, it is statistically the most likely solution if the ata are exponentially istribute. As p increases from 2 the largest element of Gm is given successively larger weight []. Least squares solutions are the simplest to analyze mathematically because the value at which the minimum occurs can be state explicitly. In other wors, min m Gm 2 2 = min( m Gm)T ( Gm) (3) has a unique minimum occurring at ˆm ls = (G T G) G T. (4) ()

Parameter estimation: A new approach to weighting a priori information 4 However, the inverse solution is not that simple because typically G T G is not invertible an the problem must be constraine or regularize. One common way to o this is Tikhonov regularization in the two-norm where m is foun by solving min m { Gm 2 2 + λ L(m m ) 2 2} with m an initial parameter estimate (often taken to be ), λ a yet to be etermine regularization parameter, an L an smoothing operator possibly chosen to represent the first or secon erivative. The optimization problem (5) can be written equivalently as a constraine minimization problem: min m Gm 2 subject to L(m m ) 2 δ. (6) In either formulation, (5) or (6), the optimization problem can be written as { min ( Gm) T ( Gm) + (m m ) T λl T L(m m ) }. (7) m When the optimization problem is written this way we see that the objective function is the sum of a ata an parameter misfit. The function is normalize so that the ata misfit has weight equal to one while the parameter misfit has weight λl T L. Thus λl T L represents an a priori ratio of weights on the ata an parameter misfits. Typically, L is taken to be the ientity, first or secon erivative operator. There are numerous approaches for choosing λ incluing the L-curve [8], Morozov s iscrepancy principle [2] an generalize cross-valiation [9]. The minimum of (7) occurs at ˆm rls = m + (G T G + λl T L) G T ( Gm ). (8) This eterministic parameter estimate, (8), from Tikhonov regularization oes not use a priori knowlege, other than specification of the form of L. 2.2. Stochastic approaches Some stochastic formulations can lea to an optimization problem similar to (5). The ifference between these stochastic approaches an the corresponing eterministic ones is the way in which the weights on the ata an parameter misfits are chosen. For example, assume the ata are ranom following a normal istribution with probability ensity function { ρ() = const exp } 2 ( Gm)T C ( Gm), (9) with Gm the expecte value of an C the corresponing covariance matrix. In orer to maximize the probability that the ata were in fact observe we fin m where the probability ensity is maximum. This is the maximum likelihoo estimate an it is the minimum of the argument in (9), i.e. the optimal parameters m are foun by solving min ( m Gm)T C ( Gm). () (5)

Parameter estimation: A new approach to weighting a priori information 5 This is the weighte least squares problem an the minimum occurs at ˆm wls = (G T C G) G T C. () Similar to (4), G T C G is typically not invertible. In this case the stochastic problem can be constraine or regularize by aing more a priori information. For example, assume the parameter values m are also ranom following a normal istribution with probability ensity function { ρ(m) = const exp } 2 (m m ) T C m (m m ), (2) with m the expecte value of m an C m the corresponing covariance matrix. If the ata an parameters are inepenent then their joint istribution is ρ(, m) = ρ()ρ(m). The maximum likelihoo estimate of the parameters occurs when the joint probability ensity function is maximum, i.e. optimal parameter values are foun by solving { min ( Gm) T C m ( Gm) + (m m ) T C m (m m ) }. (3) The minimum occurs at ˆm = m + (G T C G + C m ) G T C ( Gm ). (4) The stochastic parameter estimate (4) has been foun uner the assumption that the ata an parameters follow a normal istribution an are inepenent an ientically istribute. 2.3. Comparison between Deterministic an Stochastic approaches Now we are in a situation to point out similarities between Tikhonov regularization in the two-norm an a stochastic approach for normally istribute ata an parameters. The two equations (8) an (4) are equivalent if C = I an C m = λl T L. Even though the two-norm smooths parameter estimates an assumes normally probability istributions, we can see uner these simplifying assumptions how a stochastic approach woul give better results. In the stochastic approach ense a priori covariance matrices better weight the the ata an parameter misfits than when the weights are λl T L with Tikhonov regularization. As further explanation of the avantage of a stochastic approach over a eterministic one consier the eterministic constraint m m < λ. When this constraint is applie, each element of m m is equally weighte which implies that the error in the initial guess m is the same for each element. Weighting in this manner will not be the best approach if a large temperature change or other such anomaly is sought. On the other han, non-constant weights such as prior covariances C m may vary along a iagonal matrix an hence give ifferent weight to each element

Parameter estimation: A new approach to weighting a priori information 6 of m m. The weights can be further improve if the prior is non-iagonal because then correlation between initial estimate errors can be ientifie. Regarless of the norm in which the objective function is minimize or how the problem is formulate, the over-riing question is: How shoul weights on the terms in the objective function be chosen? In Section 3 we will show that if a quaratic cost function is use there is one more piece of a priori information that can be use to fin weights on parameter misfits. In Section 4 we will show that when using weights chosen in this manner the parameter estimates are not smoothe an it nee not be assume that the ata or parameters are normally istribute. 3. New approach Rather than choosing a eterministic approach which uses no a priori information or a stochastic approach which may use incorrect a priori information we focus on fining the best way to weight ata an parameter misfits in the two-norm using available a priori information. Consier parameter estimation reformulate in the following manner. Given ata, accurate mapping G an initial estimate m, fin m such that = Gm + ɛ (5) m = m + f (6) where ɛ an f are unknown errors in the ata an initial parameter estimates, respectively. We can view m an as ranom variables or alternatively, m as the true parameter estimate an as ata with error. In either case, parameter estimates are foun by minimizing the errors in the ata (ɛ) an initial estimates (f) in a weighte least squares sense, i.e. solve the following optimization problem min m { ( Gm) T W ( Gm) + (m m ) T W m (m m ) } (7) with W an W m weights (yet to be etermine) on the error in the ata an initial parameter estimates m, respectively. 3.. Choice of weights The weights on the ata misfit will be taken to be the inverse of the covariance of the ata, i.e. W = C. If the statistical properties of the ata are not known this weight can be estimate by collecting the same ata repeately an calculating the sample stanar eviation. We o assume however, that the ata is goo an that Gm is the mean of. To fin the weights on the parameter misfit, W m, we use the following Theorem. Theorem. Define J (m) by J mls = ( Gm) T C ( Gm) + (m m ) T C m (m m ) (8)

Parameter estimation: A new approach to weighting a priori information 7 with an m stochastic. In aition, assume the errors in the ata an initial guess m are not necessarily normally istribute but have mean zero an covariances C an C m, respectively. Then as the number of ata n approaches infinity, the minimum value of (8) is a ranom variable an its limiting istribution is the χ 2 istribution with n egrees of freeom. Proof. Case : Elements of an m are inepenent an ientically normally istribute. It is well known that uner normality assumptions the first an secon terms in the right han sie of (8) are χ 2 ranom variables with n m an m egrees of freeom, respectively. Case 2: Elements of an m are inepenent an ientically istribute but not normally istribute. The minimum value of J (m) occurs at ˆm = m + (G T C G + C m ) G T C ( Gm ). (9) Re-write the matrix in (9) by noting that G T C GC ( ) mg + G T = G T C GCm G T + C = ( G T C G + ) C m Cm G T, thus ( G T C G + C m Let h = Gm an P = GC m G T + C, then ) G T C = C m G ( ) T GC m G T + C. (2) ˆm = m + C m G T P h. (2) The minimum value of J (m) is J ( ˆm) = ( h GC m G T P ) T ( C h GCm G T P ) (22) + ( C m G T P h ) T ( C m Cm G T P h ). Since C an C m are covariance matrices, they are symmetric positive efinite, an we can simplify (22) to: J ( ˆm) = h T P h. (23) In aition, since G is full rank, P an hence P are symmetric positive efinite an we can efine where h = P 2 k, (24) k j = n (P 2 )ji h i. (25) i= If the errors in the ata an initial guess m are normally istribute then h j are normal an hence k j are normal by linearity. On the other han, if the errors in the ata an initial guess m are not normally istribute, then the central limit states that

Parameter estimation: A new approach to weighting a priori information 8 as n approaches infinity, k j efine by (25) is a normally istribute ranom variable with zero mean an unit variance. Now writing (23) in terms of k we have J ( ˆm) = k T P 2 P P 2 k (26) = k T k (27) = k 2 +... k2 n. (28) For large n the k j are normally istribute ranom variables irregarless of the istribution of the errors in an m. Thus as n approaches infinity, J ( ˆm) is a χ 2 ranom variable with n egrees of freeom. This is escribe more generally in []. We have shown that the objective function in (7) is a χ 2 ranom variable with n egrees of freeom regarless of the prior istributions of the ata an parameter misfits. Thus the weights on the parameter misfits will be foun via an optimization problem which ensures that the objective function in (7) lies within a critical region of the χ 2 istribution with n egrees of freeom. 4. Algorithm We can etermine, within specifie confience intervals, values of W m = C m in (7) that ensure J ( ˆm) given by (22) is a χ 2 ranom variable with n egrees of freeom when there is a large amount of ata. The larger the confience interval, the bigger the set of possible values of W m. These values of W m will be for a specifie covariance matrix for the errors in the ata C, an for a specifie initial guess m. Thus for each matrix W m there is an optimal parameter value ˆm uniquely efine by (2). If W m = λ I then this algorithm is similar to approaches such as the L-curve for fining the regularization parameter λ in Tikhonov regularization. However, the avantage of this new approach is when W m is not a constant matrix an hence the weights on the parameter misfits vary. Moreover, when W m has off iagonal elements, correlation in initial parameter estimate errors can be moele. One avantage to viewing optimization stochastically is that once the optimal parameter estimate is foun, the corresponing uncertainty or covariance matrix for ˆm is given by [6] cov( ˆm) = W m W m GT P GW m, (29) with P = GWm GT + C. In Section 5 the numerical estimates of ˆm are plotte with error bars which represent stanar eviations of these estimates. These stanar eviations are the square root of the iagonal elements of (29).

Parameter estimation: A new approach to weighting a priori information 9 4.. Confience Intervals For large n J ( ˆm) has mean n an variance 2n. The ( α)% confience interval for the mean is P ( z α/2 < (J ( ˆm) n) / ) 2 < z α/2 = α, (3) or P (n 2z α/2 < J ( ˆm) < n + ) 2z α/2 = α, (3) where z α/2 is the z-value on the normal curve above which we fin an area of α/2. Thus for a given ( α) confience interval we fin values of Wm that ensure or n 2z α/2 < J ( ˆm) < n + 2z α/2 n 2z α/2 < h T ( GW m GT + C ) h < n + 2zα/2. (32) By choosing a value of α =.5, for example, we are stating that we are 95% confient that the mean of J ( ˆm) is n. In this case the interval in which the cost function lies is [n 2.77, n + 2.77], while for α =. it is [n 3.64, n + 3.64]. For large n this is a small interval, thus in our experiments the optimization problem is to fin a W m such that 4.2. Optimization h T ( GW m GT + C ) h = n. (33) There is actually a set of feasible solutions W m that ensure (33) hols. This bounary surface is well behave as long as P is well-conitione. The solution we seek is the one in which Wm is minimize because this will most likely result in the strongest regularization or well-conitione matrix to invert in (4). The norm we choose is the Frobenius norm, i.e. m n A 2 F = a ij 2, i= j= because it is continuously ifferentiable an it is equivalent to the 2-norm via n A F A 2 A F. If W m is to represent the inverse covariance matrix C m it must be symmetric positive efinite, thus we efine C m = LL T with L lower triangular. We also assume that G is full rank an n is large. However, these assumptions may be roppe if P is symmetric positive efinite an the ata are normally istribute. The corresponing algorithm is given in Table () an results from it are given in Section 5.

Parameter estimation: A new approach to weighting a priori information Table. Algorithm for weights Minimize LL T 2 F subject to n 2z α/2 < h T ( GLL T G T + C ) h < n + 2z α/2 GLL T G T + C well conitione Normally istribute ata.5.5 2 4 6.5 2 4 6 m m 2 4 6 W m 2 5 5 2 4 6 Figure. Parameter misfits an their corresponing weights foun with the propose algorithm in Table. Clockwise from top left: m = (,..., ), (,..., ), (.5,...,.5), (5,..., 5). 5. Numerical Tests 5.. Discontinuous parameters in an iealize moel In the first test we sought to etermine if a iagonal W m coul be foun which accurately weights the error in the initial parameter misfit when the parameters are iscontinuous. The parameters values are (,...,,,...,,,..., ) with m = n = 7. The matrix G is taken to be the ientity so that initial parameter errors are known an the accuracy of the weights are easy to ientify. A more realistic test with a iscontinuous solution is given in Section 5.2.2. The weight on the ata misfit is a iagonal matrix which is the inverse of the ata variances, i.e. W = C = iag(σi )2 while the weights on the parameter misfit are calculate with the algorithm given in Table (). More accurate representations of the error covariance from sample ata takes consierably more work. See for example [7].

Parameter estimation: A new approach to weighting a priori information Exponentially istribute ata.5.5 2 4 6.5 2 4 6 m m 2 4 6 W m 2 5 5 2 4 6 Figure 2. Parameter misfits an their corresponing weights foun with the propose algorithm in Table. Clockwise from top left: m = (,..., ), (,..., ), (.5,...,.5), (5,..., 5). Results from the algorithm are plotte in Figures an 2 when the ata an parameters are taken from normal an exponential istributions, respectively, with a stanar eviation of 5 2. The parameter misfit m m is plotte along with the iagonal entries of Wm foun by the propose algorithm. Each of the four plots in Figures an 2 represent ifferential initial estimates. In three of the four plots, i.e. for (m ) i =,, 5 i =,..., 7, the iagonal elements of Wm foun by the propose algorithm o inee jump between smaller an larger values accurately reflecting the error in the initial parameter estimate (m m ) i. In the fourth plot the calculate (Wm ) i still accurately weights the parameter misfit, but in this case the misfit is constant at.5. Recall that a large iagonal element (Wm ) ii gives small weight to (m m ) i, which is esire when (m m ) i is large. That is, if we have a ba initial guess (m ) i, we on t want to fin a (m) i near it but instea give small weight to minimizing (m m ) i. The ifficulty in weighting parameters misfits in (7) is that typically we o not know the accuracy of m a priori. However, in this simple example, the weights foun by the propose algorithm o appropriately weight the parameter misfits without a priori knowlege.

Parameter estimation: A new approach to weighting a priori information 2 5.2. Benchmark Problems from [5] Both analysis routines an test problems from [5] were use to compare an test the algorithm in Table. Results from the propose algorithm were compare to those foun from Tikhonov regularization with the L-curve, an an from generalize crossvaliation. The L-curve approach plots the parameter misfit (with m = an weighte by λl T L) versus the ata misfit (with weight I) to isplay the compromise between minimizing these two quantities. When plotte in a log-log scale the curve is typically in the shape of an L, an the solution is the parameter values at the corner. These parameter values are optimal in the sense that the error in the weighte parameter misfit an ata misfit are balance. Generalize cross-valiation is base on the theory that if some ata are left out, the resulting choice of parameters shoul result in accurate preiction of these ata. There are a total of 2 test problems in [5] an here we solve three of them: Phillips, Wing an Shaw. They are all erive from approximating a Freholm integral of the first kin: b a K(s, t)f(t)t = g(s). (34) The Phillips an Wing problems use Galerkin methos with particular basis functions to approximate the Freholm integral while the Shaw problem uses a weighte sum quarature metho. All approaches or problems lea to a system of linear algebraic equations Gm = however, in the Phillips an Wing test problem Gm is ifferent from. Noise is ae to from normal or exponential istributions. The stanar eviation varies with each element in, but is of the orer of 2. The initial estimate m is foun similarly, i.e. by aing noise to m from normal or exponential istributions with a varying stanar eviation of the orer of 2. The weights on the ata misfit an the initial estimate m are the same for all three solution methos (L-curve, GCV an the algorithm in Table ) an are W = C = iag(σi )2 as in the first numerical example. The algorithm in Table is use to fin a iagonal weight W m for the parameter misfit. Future work involves fining weights W m with more structure. Since we assume in the propose algorithm that the ata an parameters are ranom, but from arbitrary istributions, we can assign posterior uncertainties via (29). These are represente as error bars in Figures 3-9. 5.2.. Phillips test problem This problem was presente by D.L. Phillips [6] an for (34) uses K(s, t) = ψ(s t) f(t) g(s) = ψ(t) ( = (6 s ) + 2 cos ( πs 3 ) ) + 9 2π sin ( ) π s 3

Parameter estimation: A new approach to weighting a priori information 3.2.8 Phillips Test Problem Sample A (normal istribution) reference solution L curve gcv new algorithm.6.4.2.2 2 3 4 5 Figure 3. Sample parameter estimates for the Phillips test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from a normal istributions. with ψ(x) = { + cos( πx 3, x < 3, x 3. Sample solutions ˆm of Gm = erive from the approximation of (34) are plotte in Figures 3-5. The reference solution is the value of m given by the test problem. In these samples almost every parameter estimates foun by the new algorithm is better than those foun by the other two methos. The error bars associate with the parameter estimates foun by the new algorithm are all small in Figures 3 an 4 because the estimates are goo. However, there are estimates for which the error bars o not reach the reference solution. In Figure 5 another normally istribute sample solution set is plotte on the left. The ranomly generate ata in this sample have the same stanar eviation as the samples in Figures 3 an 4 however, in this run the ata were more noisy. The plot on the right is of the absolute error an stanar eviation of each parameter estimate foun by the new algorithm. The absolute error is the ifference between the parameter estimate an the reference solution. This plot shows that stanar eviation estimates from the new algorithm are of the same orer of magnitue as the absolute error, or istance from the reference solution. By looking more closely at Figures 3-5 we see that in fact error bars on parameter estimates from the new algorithm often, but not always, reach the reference solution. 5.2.2. Wing test problem The solution of this test problem contains iscontinuous parameters, which is a goo test of the propose algorithm since it uses two-norm. In

Parameter estimation: A new approach to weighting a priori information 4.2.8 Phillips Test Problem Sample B (exponential istribution) reference solution L curve gcv new algorithm.6.4.2.2 2 3 4 5 Figure 4. Sample parameter estimates for the Phillips test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from an exponential istributions..5.5 Phillips Test Problem Sample C (normal istribution).7.6.5.4 absolute error stanar eviation.3.2.5 reference solution L curve gcv new algorithm 2 4. 2 4 Figure 5. Sample parameter estimates for the Phillips test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from an exponential istributions.

Parameter estimation: A new approach to weighting a priori information 5.25.2.5 Wing Test Problem Sample A (normal istribution) reference solution L curve gcv new algorithm..5.5. 2 3 4 5 Figure 6. Sample parameter estimates for the Wing test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from a normal istributions. this problem an K(s, t) = te st2 g(s) f(t) = = e s/9 e 4s/9 2s { 3 < t < 2 3 otherwise. Sample parameter estimates for all three methos are given in Figures 6 an 7. Figure 6 shows one sample when the ata are normally istribute while Figure 7 contains two sample solutions when the ata are exponentially istribute. Generalize cross valiation i not perform well in all cases while the results from the L-curve an the propose algorithm are goo. All estimates from the new algorithm are as goo as or better than those from the L-curve. Since the estimates are goo, the corresponing error bars are all small. There are instances for which the error bars o not reach the reference solution however, the stanar eviation still represents the absolute error well. 5.2.3. Shaw test problem This test problem is a one imensional image restoration moel. In this problem ( sin(u) K(s, t) = (cos(s) + cos(t)) 2 u u = π (sin(s) + sin(t)) f(t) = 2e 6(t.8)2 + e 2(t+.5)2. ) 2

Parameter estimation: A new approach to weighting a priori information 6.3.25.2.5 Wing Test Problem Sample B (exponential istribution) reference solution L curve gcv new algorithm.25.2.5 Wing Test Problem Sample C (exponential istribution) reference solution L curve gcv new algorithm...5.5.5..5.5 2 3 4 5. 2 3 4 5 Figure 7. Sample parameter estimates for the Wing test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from exponential istributions. This is iscretize by collocation to prouce Gm, while is foun by multiplying G an m. Two sample results from normal istributions are shown in Figure 8. The left plot shows a sample where all methos performe well while the parameter estimates foun by the new algorithm were slightly closer to the reference solution. Corresponingly, the error bars on the parameter estimates correctly ientify small uncertainty. The right plot is a sample for which the majority of the parameter estimates foun by the new algorithm are better than those foun by the other two methos. However, there are a few estimates which are worse. The new algorithm is still useful in these instances because as is typically the case, the error bars reach or come near the reference solution in every estimate. Figure 9 shows one sample result when the ata are taken from an exponential istribution. Here we see that the L-curve an GCV estimates are much worse than those foun with the new algorithm. The results from this sample are typical when ata are taken from an exponential istribution. Since the ata are not normally istribute, least squares estimation is statistically not the best approach. However, we see that by appropriately weighting the errors in the ata an parameter misfits with the propose algorithm, the two-norm is still a useful estimator. 6. Conclusions We propose a new algorithm which combines ieas from eterministic an stochastic parameter estimation. From a eterministic point of view the new approach is an improvement because it effectively expans Tikhonov regularization in the two-norm in

Parameter estimation: A new approach to weighting a priori information 7 2.5 Shaw Test Problem Sample A (normal istribution) 2.5 Shaw Test Problem Sample B (normal istribution) 2 reference solution L curve gcv 2 reference solution L curve gcv.5 new algorithm.5 new algorithm.5.5 2 3 4 5.5 2 3 4 5 Figure 8. Sample parameter estimates for the Shaw test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from a normal istributions. 2.5 2.5 Shaw Test Problem Sample C (exponential istribution) reference solution l curve gcv new algorithm.5.5 2 3 4 5 Figure 9. Sample parameter estimates for the Shaw test problem. Estimates are foun by (i) the L-curve, (ii) generalize cross-valiation an (iii) the propose algorithm in Table which also has error bars. The ata noise are from exponential istributions.

Parameter estimation: A new approach to weighting a priori information 8 such a way that the regularization parameter can vary along a iagonal to accurately weight initial parameter misfits. The benefits from a stochastic point of view are that with this approach, a priori information about the parameters is not neee nor is it necessary to assume normally istribute ata or parameters. Rather than ientify a priori istributions of the parameters, the parameter misfit weight is foun by ensuring that the cost function, a sum of weighte ata an parameter misfits, is a χ 2 ranom variable with n egrees of freeom. Optimization is one in least squares sense however, we fin that if the misfits are accurately weighte, the parameter estimates are not smoothe. This was shown by both solving benchmark problems in parameter estimation, an by investigating the calculate weights on the initial parameter estimates in a simpler, iealize problem. In the benchmark problems the propose algorithm typically gave better parameter estimates than those foun from the L-curve an generalize cross valiation. In the cases for which the propose algorithm i not perform better, corresponing error bars or uncertainty estimates correctly ientify the error. The goal of the propose algorithm is to accurately weight initial parameter misfits in a least squares minimization problem. Optimal weights will be ense weighting matrices which appropriately ientify initial parameter misfit errors, an their correlations. Thus future work involves fining ense weighting matrices, rather than iagonal matrices, in aition to improving the optimization routine. Limitations of the algorithm inclue the nee for goo initial parameter estimates, an the computational time of the optimization problem in Table. 7. References [] Bennett A 25 Inverse Moeling of the Ocean an Atmosphere (Cambrige University Press) p 234 [2] Casella G an Berger R 2 Statistical Inference (California: Duxbury) 688p [3] Chanrasekaran S, Golub G, Gu M an Saye A 998 Parameter estimation in the presence of boune ata uncertainties SIMAX 9 235-52 [4] Golub G Hansen P an O Leary D 999 Tikhonov Regularization an Total Least Squares SIAM J. Matrix. Anal. App. 2 85-94 [5] Hansen P 994 Regularization Tools: A Matlab Package for Analysis an Solution of Discrete Ill-pose Problems Numerical Algorithms 6-35 [6] Phillips D 962 A technique for the numerical solution of certain integral equations of the first kin J. ACM, 9 84-97 [7] Huang J, Liu N, Pourahmai M an Liu L 26 Covariance matrix selection an estimation via penalise normal likelihoo Biometrika 93 85-98 [8] Hansen P 22 Analysis of iscrete ill-pose problems by means of the L-curve SIAM Review 34 56-8. [9] Hansen P 998 Rank-Deficient an Discrete Ill-Pose Problems: Numerical Aspects of Linear Inversion (SIAM Monographs on Mathematical Moeling an Computation 4) p 247 [] Hansen P 994 Regularization Tools A Matlab Package for Analysis an Solution of Discrete Ill-Pose Problems Num. Alg 6-25 [] Menke W 989 Geophysical Data Analysis: Discrete Inverse Theory (San Diego: Acaemic Press) p 289

Parameter estimation: A new approach to weighting a priori information 9 [2] Morozov 984 Methos for Solving Incorrectly Pose Problems (New York: Springer Verlag) [3] Scales J an Tenorio L 2 Prior Information an Uncertainty in Inverse Problems Geophysics 66 389-97 [4] Scales J Snieer R 997 To Bayes or not to Bayes? Geophysics 63 45-46 [5] Sancevero S Remacre A, an Portugal R 25 Comparing eterministic an stochastic seismic inversion for thin-be reservoir characterization in a turbiite synthetic reference moel of Campos Basin, Brazil The Leaing Ege 68-72. [6] Tarantola A 25 Inverse Problems Theory an Methos for Moel Parameter Estimation (SIAM) p 342 [7] Tikhonov A an Arsenin V 977 Solutions of Ill-Pose Problems (New York: Wiley) p 272