1 Introduction. (1) i=1. χ 2 (p) = w i

Size: px
Start display at page:

Download "1 Introduction. (1) i=1. χ 2 (p) = w i"

Transcription

1 The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems c Henri P. Gavin Department of Civil and Environmental Engineering Duke University March 9, 216 Abstract The Levenberg-Marquardt method is a standard technique used to solve nonlinear least squares problems. Least squares problems arise when fitting a parameterized function to a set of measured data points by minimizing the sum of the squares of the errors between the data points and the function. Nonlinear least squares problems arise when the function is not linear in the parameters. Nonlinear least squares methods involve an iterative improvement to parameter values in order to reduce the sum of the squares of the errors between the function and the measured data points. The Levenberg-Marquardt curve-fitting method is actually a combination of two minimization methods: the gradient descent method and the Gauss-Newton method. In the gradient descent method, the sum of the squared errors is reduced by updating the parameters in the steepest-descent direction. In the Gauss-Newton method, the sum of the squared errors is reduced by assuming the least squares function is locally quadratic, and finding the minimum of the quadratic. The Levenberg-Marquardt method acts more like a gradient-descent method when the parameters are far from their optimal value, and acts more like the Gauss-Newton method when the parameters are close to their optimal value. This document describes these methods and illustrates the use of software to solve nonlinear least squares curve-fitting problems. 1 Introduction In fitting a function ŷ(t; p) of an independent variable t and a vector of n parameters p to a set of m data points (t i, y i ), it is customary and convenient to minimize the sum of the weighted squares of the errors (or weighted residuals) between the measured data y(t i ) and the curve-fit function ŷ(t i ; p). This scalar-valued goodness-of-fit measure is called the chi-squared error criterion. χ 2 (p) = [ ] m 2 y(ti ) ŷ(t i ; p) (1) i=1 w i = (y ŷ(p)) T W(y ŷ(p)) (2) = y T Wy 2y T Wŷ + ŷ T Wŷ (3) The value w i is a measure of the error in measurement y(t i ). The weighting matrix W is diagonal with W ii = 1/w 2 i. If the function ŷ is nonlinear in the model parameters p, then the minimization of χ 2 with respect to the parameters must be carried out iteratively. The goal of each iteration is to find a perturbation h to the parameters p that reduces χ 2. 1

2 2 The Gradient Descent Method The steepest descent method is a general minimization method which updates parameter values in the downhill direction: the direction opposite to the gradient of the objective function. The gradient descent method converges well for problems with simple objective functions [3, 4]. For problems with thousands of parameters, gradient descent methods are sometimes the only viable method. The gradient of the chi-squared objective function with respect to the parameters is p χ2 = (y ŷ(p)) T W (y ŷ(p)) (4) p [ ] ŷ(p) = (y ŷ(p)) T W () p = (y ŷ) T WJ (6) where the m n Jacobian matrix [ ŷ/ p] represents the local sensitivity of the function ŷ to variation in the parameters p. For notational simplicity J will be used for [ ŷ/ p]. The parameter update h that moves the parameters in the direction of steepest descent is given by h gd = αj T W(y ŷ), (7) where the positive scalar α determines the length of the step in the steepest-descent direction. 3 The Gauss-Newton Method The Gauss-Newton method is a method for minimizing a sum-of-squares objective function. It presumes that the objective function is approximately quadratic in the parameters near the optimal solution [1]. For moderately-sized problems the Gauss-Newton method typically converges much faster than gradient-descent methods []. The function evaluated with perturbed model parameters may be locally approximated through a first-order Taylor series expansion. [ ] ŷ ŷ(p + h) ŷ(p) + h = ŷ + Jh, (8) p Substituting the approximation for the perturbed function, ŷ + Jh, for ŷ in equation (3), χ 2 (p + h) y T Wy + ŷ T Wŷ 2y T Wŷ 2(y ŷ) T WJh + h T J T WJh. (9) This shows that χ 2 is approximately quadratic in the perturbation h, and that the Hessian of the chi-squared fit criterion is approximately J T WJ. The parameter update h that minimizes χ 2 is found from χ 2 / h =. h χ2 (p + h) 2(y ŷ) T WJ + 2h T J T WJ, () update and the resulting normal equations for the Gauss-Newton update are [ J T WJ ] h gn = J T W(y ŷ). (11) 2

3 4 The Levenberg-Marquardt Method The Levenberg-Marquardt algorithm adaptively varies the parameter updates between the gradient descent update and the Gauss-Newton update, [ J T WJ + λi ] h lm = J T W(y ŷ), (12) where small values of the algorithmic parameter λ result in a Gauss-Newton update and large values of λ result in a gradient descent update. The parameter λ is initialized to be large so that first updates are small steps in the steepest-descent direction. If an iteration happens to result in a worse approximation, λ is increased. As the solution improves, λ is decreased, the Levenberg-Marquardt method approaches the Gauss-Newton method, and the solution typically accelerates to the local minimum [3, 4, ]. Marquardt s suggested update relationship [] [ J T WJ + λ diag(j T WJ) ] h lm = J T W(y ŷ). (13) makes the effect of particular values of λ less problem-specific, and is used in the Levenberg- Marquardt algorithm implemented in the Matlab function lm.m 4.1 Numerical Implementation Many variations of the Levenberg-Marquardt have been published in papers and in code. This document borrows from some of these, including the enhancement of a rank-1 Jacobian update. In iteration i, the step h is evaluated by comparing χ 2 (p) to χ 2 (p + h). The step is accepted if the metric ρ i [6] is greater than a user-specified threshold, ɛ 4 >. This metric is a measure of the actual improvement in χ 2 as compared to the improvement of an LM update assuming the approximation (8) were exact. ρ i (h lm ) = = = χ 2 (p) χ 2 (p + h lm ) (y ŷ) T (y ŷ) (y ŷ Jh lm ) T (y ŷ Jh lm ) (14) χ 2 (p) χ 2 (p + h lm ) hlm T (λ ih lm + J T W (y ŷ(p))) if using eq n (12) for h lm (1) χ 2 (p) χ 2 (p + h lm ) hlm T (λ idiag(j T WJ)h lm + J T W (y ŷ(p))) if using eq n (13) for h lm (16) If in an iteration ρ i (h) > ɛ 4 then p + h is sufficiently better than p, p is replaced by p + h, and λ is reduced by a factor. Otherwise λ is increased by a factor, and the algorithm proceeds to the next iteration Initialization and update of the L-M parameter, λ, and the parameters p In lm.m users may select one of three methods for inializing and updating λ and p. 1. λ = λ o ; λ o is user-specified []. use eq n (13) for h lm and eq n (16) for ρ if ρ i (h) > ɛ 4 : p p + h; λ i+1 = max[λ i /L, 7 ]; otherwise: λ i+1 = min [λ i L, 7 ]; 3

4 2. λ = λ o max [ diag[j T WJ] ] ; λ o is user-specified. use eq n ( (12) for h lm and eq n (1) for ρ (J α = T W(y ŷ(p)) ) ) T h / ((χ 2 (p + h) χ 2 (p)) /2 + 2 ( J T W(y ŷ(p)) ) ) T h ; if ρ i (αh) > ɛ 4 : p p + αh; λ i+1 = max [λ i /(1 + α), 7 ]; otherwise: λ i+1 = λ i + χ 2 (p + αh) χ 2 (p) /(2α); 3. λ = λ o max [ diag[j T WJ] ] ; λ o is user-specified [6]. use eq n (12) for h lm and eq n (1) for ρ if ρ i (h) > ɛ 4 : p p + h; λ i+1 = λ i max [1/3, 1 (2ρ i 1) 3 ] ; ν i = 2; otherwise: λ i+1 = λ i ν i ; ν i+1 = 2ν i ; For the examples in section 4.4, method 1 [] with L 11 and L 9 has good convergence properties Computation and rank-1 update of the Jacobian, [ y/ p] In the first iteration, in every 2n iterations, and in iterations where χ 2 (p + h) > χ 2 (p), the Jacobian (J R m n ) is numerically approximated using forward differences, or central differences (default) J ij = ŷ i = ŷ(t i; p + δp j ) ŷ(t i ; p) p j δp j J ij = ŷ i = ŷ(t i; p + δp j ) ŷ(t i ; p δp j ) p j 2 δp j, (17), (18) where the j-th element of δp j is the only non-zero element and is set to j (1 + p j ). In all other iterations, the Jacobian is updated using the Broyden rank-1 update formula, J = J + ( (ŷ(p + h) ŷ(p) Jh) h T) / ( h T h ). (19) For problems with many parameters, a finite differences Jacobian is computationally expensive. Convergence can be achieved with fewer function evaluations if the Jacobian is re-computed using finite differences only occaisonally. The rank-1 Jacobian update equation (19) requires no additional function evaluations Convergence criteria Convergence is achieved when one of the following three criteria is satisfied, Convergence in the gradient, max J T W(y ŷ) < ɛ1 ; Convergence in paramters, max h i /p i < ɛ 2 ; or Convergence in χ 2, χ 2 /(m n + 1) < ɛ 3. Otherwise, iterations terminate when the iteration count exceeds a pre-specified limit. 4

5 4.2 Error Analysis Once the optimal curve-fit parameters p fit are determined, parameter statistics are computed for the converged solution using the same weight value w for every data point. This weight value is equal to the mean square measurement error, σy, 2 w 2 = σ 2 y = The parameter covariance matrix is then computed from 1 m n + 1 (y ŷ(p fit)) T (y ŷ(p fit )) i. (2) V p = [J T WJ] 1, (21) with W = I/w 2. If a data covariance matrix V y is provided then W should be set to V 1 y. The asymptotic standard parameter errors are given by σ p = diag([j T WJ] 1 ), (22) The asymptotic standard parameter error is a measure of how unexplained variability in the data propagates to variability in the parameters, and is essentially an error measure for the parameters. The standard error of the fit is given by σŷ = diag(j[j T WJ] 1 J T ). (23) The standard error of the fit indicates how variability in the parameters affects the variability in the curve-fit. The asymptotic standad prediction error reflects the standard error of the fit as well as the mean square measurement error. σŷp = σ 2 y + diag(j[j T WJ] 1 J T ). (24) 4.3 Matlab code: lm.m The Matlab function lm.m implements the Levenberg-Marquardt method for curvefitting problems. The code with examples are available here: hpgavin/lm.m hpgavin/lm examp.m hpgavin/lm func.m hpgavin/lm plots.m

6 1 function [p,x2, sigma_p, sigma_y, corr_p,r_sq, cvg_hst ] = lm(func,p,t,y_dat, weight,dp, p_min, p_max,c, opts ) 2 % [ p, X2, sigma p, sigma y, corr p, R sq, c v g h s t ] = lm ( func, p, t, y dat, weight, dp, p min, p max, c, o p t s ) 3 % 4 % Levenberg Marquardt curve f i t t i n g : minimize sum o f w e i ghted squared r e s i d u a l s % INPUT VARIABLES 6 % func = f u n c t i o n o f n independent v a r i a b l e s, t, and m parameters, p, 7 % r e t u r n i n g t h e s i m u l a t e d model : y h a t = func ( t, p, c ) 8 % p = n v e c t o r o f i n i t i a l g uess o f parameter v a l u e s 9 % t = m v e c t o r s or matrix o f independent v a r i a b l e s ( used as arg to func ) % y d a t = m v e c t o r s or matrix o f data to be f i t by func ( t, p ) 11 % w e i g h t = w e i g h t i n g v e c t o r f o r l e a s t s q u a r e s f i t ( w e i g h t >= ) % i n v e r s e o f t h e standard measurement e r r o r s 13 % D e f a u l t : s q r t ( d. o. f. / ( y dat y d a t ) ) 14 % dp = f r a c t i o n a l increment o f p f o r numerical d e r i v a t i v e s 1 % dp ( j )> c e n t r a l d i f f e r e n c e s c a l c u l a t e d 16 % dp ( j )< one s i d e d backwards d i f f e r e n c e s c a l c u l a t e d 17 % dp ( j )= s e t s corresponding p a r t i a l s to zero ; i. e. h o l d s p ( j ) f i x e d 18 % D e f a u l t :. 1 ; 19 % p min = n v e c t o r o f lower bounds f o r parameter v a l u e s 2 % p max = n v e c t o r o f upper bounds f o r parameter v a l u e s 21 % c = an o p t i o n a l matrix o f v a l u e s passed to func ( t, p, c ) 22 % o p t s = v e c t o r o f a l g o r i t h m i c parameters 23 % parameter d e f a u l t s meaning 24 % o p t s (1) = prnt 3 >1 i n t e r m e d i a t e r e s u l t s ; >2 p l o t s 2 % o p t s (2) = MaxIter Npar maximum number o f i t e r a t i o n s 26 % o p t s (3) = e p s i l o n 1 1e 3 convergence t o l e r a n c e f o r g r a d i e n t 27 % o p t s (4) = e p s i l o n 2 1e 3 convergence t o l e r a n c e f o r parameters 28 % o p t s () = e p s i l o n 3 1e 3 convergence t o l e r a n c e f o r Chi square 29 % o p t s (6) = e p s i l o n 4 1e 1 determines acceptance o f a L M s t e p 3 % o p t s (7) = lambda 1e 2 i n i t i a l v a l u e o f L M paramter 31 % o p t s (8) = lambda UP fac 11 f a c t o r f o r i n c r e a s i n g lambda 32 % o p t s (9) = lambda DN fac 9 f a c t o r f o r d e c r e a s i n g lambda 33 % o p t s ( ) = Update Type 1 1 : Levenberg Marquardt lambda update 34 % 2 : Quadratic update 3 % 3 : Nielsen s lambda update e q u a t i o n s 36 % 37 % OUTPUT VARIABLES 38 % p = l e a s t s q u a r e s optimal e s t i m a t e o f t h e parameter v a l u e s 39 % X2 = Chi squared c r i t e r i a 4 % sigma p = asymptotic standard e r r o r o f t h e parameters 41 % sigma y = asymptotic standard e r r o r o f t h e curve f i t 42 % c o r r p = c o r r e l a t i o n matrix o f t h e parameters 43 % R sq = R squared c o f f i c i e n t o f m u l t i p l e determination 44 % c v g h s t = convergence h i s t o r y The m-file to solve a least-squares curve-fit problem with lm.m can be as simple as: 1 my_data = load( my_data_file ); % l o a d t h e data 2 t = my_data (:,1); % i f t h e independent v a r i a b l e i s in column 1 3 y_dat = my_data (:,2); % i f t h e dependent v a r i a b l e i s in column 2 4 p_min = [ ]; % minimum e x p e c t e d parameter v a l u e s 6 p_max = [. 1. ]; % maximum e x p e c t e d parameter v a l u e s 7 p_init = [ ]; % i n i t i a l g uess f o r parameter v a l u e s 8 9 [ p_fit, X2, sigma_p, sigma_y, corr, R_sq, cvg_hst ] =... lm ( lm_func, p_init, t, y_dat, 1, -.1, p_min, p_max ) where the user-supplied function lm_func.m could be, for example, 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1) * t.* exp(-t/p (2)).* cos (2* pi *( p (3)* t - p (4) )); 6

7 It is common and desirable to repeat the same experiment two or more times and to estimate a single set of curve-fit parameters from all the experiments. In such cases the data file may arranged as follows: 1 % t v a r i a b l e y (1 s t experiment ) y (2 nd experiemnt ) y (3 rd experiemnt ) : : : : 6 etc. etc. etc. etc. If your data is arranged as above you may prepare the data for lm.m using the following lines. 1 my_data = load( my_data_file ); % l o a d t h e data 2 t_column = 1; % column o f t h e independent v a r i a b l e 3 y_columns = [ ]; % columns o f t h e measured dependent v a r i a b l e s 4 y_dat = my_data (:, y_columns ); % t h e measured data 6 y_dat = y_dat (:); % a s i n g l e column v e c t o r 7 8 t = my_data (:, t_column ); % t h e independent v a r i a b l e 9 t = t*ones(1, length( y_columns )); % a column o f t f o r each column o f y t = t (:); % a s i n g l e column v e c t o r 7

8 Note that the arguments t and y dat to lm.m may be matrices as long as the dimensions of t match the dimensions of y dat. The columns of t need not be identical. Results may be plotted with lm plots.m: 1 function lm_plots ( t, y_dat, y_fit, sigma_y, cvg_hst, filename ) 2 % l m p l o t s ( t, y dat, y f i t, sigma y, c v g h s t, f i l e n a m e ) 3 % P l o t s t a t i s t i c s o f t h e r e s u l t s o f a Levenberg Marquardt l e a s t s q u a r e s 4 % a n a l y s i s with lm.m 6 y_dat = y_dat (:); 7 y_fit = y_fit (:); 8 9 [ max_it,n] = size ( cvg_hst ); n = n -3; 11 figure (1); % p l o t convergence h i s t o r y o f parameters, c h i ˆ2, lambda 12 c l f 13 subplot (211) 14 plot ( cvg_hst (:,1), cvg_hst (:,2: n+1), -o, linewidth,4); 1 for i =1: n 16 text (1.2* cvg_hst ( max_it,1), cvg_hst ( max_it,1+ i), sprintf ( %d,i) ); 17 end 18 ylabel ( parameter values ) 19 subplot (212) 2 semilogy( cvg_hst (:,1), [ cvg_hst (:,n +2) cvg_hst (:,n +3)], -o, linewidth,4) 21 text (.8* cvg_hst ( max_it,1),.2* cvg_hst ( max_it,n+2), \ chi ˆ2/(m-n +1)} ); 22 text (.8* cvg_hst ( max_it,1),.* cvg_hst ( max_it,n+3), \ lambda ); 23 % l e g e n d ( \ c h i ˆ2/(m n+1)}, \ lambda, 3 ) ; 24 ylabel ( \ chi ˆ2/(m-n +1)} and \ lambda ) 2 xlabel ( function calls ) figure (2); % p l o t data, f i t, and c o n f i d e n c e i n t e r v a l o f f i t 29 c l f 3 plot (t,y_dat, og, t,y_fit, -b, 31 t, y_fit +1.96* sigma_y,.k, t,y_fit -1.96* sigma_y,.k ); 32 legend( y_{ data }, y_{fit }, 9% c.i.,,); 33 ylabel ( y(t) ) 34 xlabel ( t ) 3 % s u b p l o t (212) 36 % semilogy ( t, sigma y, r, l i n e w i d t h, 4 ) ; 37 % y l a b e l ( \ sigma y ( t ) ) figure (3); % p l o t histogram o f r e s i d u a l s, are t h e y Gaussean? 4 c l f 41 hist ( real ( y_dat - y_fit )) 42 t i t l e ( histogram of residuals ) 43 axis ( tight ) 44 xlabel ( y_{ data } - y_{fit } ) 4 ylabel ( count ) 8

9 4.4 Numerical Examples The robustness of lm.m is tested in three numerical examples by curve-fitting simulated experimental measurements. Noisy experimental measurements y are simulated by adding random measurement noise to the curve-fit function evaluated with a set of true parameter values ŷ(t; p true ). The random measurement noise is normally distributed with a mean of zero and a standard deviation of.2. y i = ŷ(t i ; p true ) + N(,.2). (2) The convergence of the parameters from an erroneous initial guess p initial to values closer to p true is then examined. Each numerical example below has four parameters (n = 4) and one-hundred measurements (m = ). Each numerical example has a different curve-fit function ŷ(t; p), a different true parameter vector p true, and a different vector of initial parameters p initial. For several values of p 2 and p 4, the χ 2 error criterion is calculated and is plotted as a surface over the p 2 p 4 plane. The bowl-shaped nature of the objective function is clearly evident in each example. The objective function may not appear quadratic in the parameters and the objective function may have multiple minima. The presence of measurement noise does not affect the smoothness of the objective function. The gradient descent method endeavors to move parameter values in a down-hill direction to minimize χ 2 (p). This often requires small step sizes but is required when the objective function is not quadratic. The Gauss-Newton method approximates the bowl shape as a quadratic and endeavors to move parameter values to the minimum in a small number of steps. This method works well when the parameters are close to their optimal values. The Levenberg-Marquardt method retains the best features of both the gradient-descent method and the Gauss-Newton method. The evolution of the parameter values, the evolution of χ 2, and the evolution of λ from iteration to iteration is plotted for each example. The simulated experimental data, the curve fit, and the 9-percent confidence interval of the fit are plotted, the standard error of the fit, and a histogram of the fit errors are also plotted. The initial parameter values p initial, the true parameter values p true, the fit parameter values p fit, the standard error of the fit parameters σ p, and the correlation matrix of the fit parameters are tabulated. The true parameter values lie within the confidence interval p fit 1.96σ p < p true < p fit σ p with a confidence level of 9 percent. 9

10 4.4.1 Example 1 Consider fitting the following function to a set of measured data. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1)*exp(-t/p (2)) + p (3)* t.*exp(-t/p (4)); ŷ(t; p) = p 1 exp( t/p 2 ) + p 3 t exp( t/p 4 ) (26) The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table 1. The R 2 fit criterion is 98 percent. The standard parameter errors are less than one percent of the parameter values except for the standard error for p 2, which is 1. percent of p 2. The parameter correlation matrix is given in Table 2. Parameters p 3 and p 4 are the most correlated at -96 percent. Parameters p 1 and p 4 are the least correlated at -3 percent. The bowl-shaped nature of the χ 2 objective function is shown in Figure 1(a). shape is nearly quadratic and has a single minimum. This The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 1(b). The data points, the curve fit, and the curve fit confidence band are plotted in Figure 1(c). Note that the standard error of the fit is smaller near the center of the fit domain and is larger at the edges of the domain. A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed.

11 Table 1. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 2. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p (a) log (χ 2 ) 3 y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y 1 2 (b) parameter values χ 2 and λ count χ 2 λ function calls histogram of residuals 2 1 σ y (t) (c) t (d) y data - y fit Figure 1. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 11

12 4.4.2 Example 2 Consider fitting the following function to a set of measured data. ŷ(t; p) = p 1 (t/ max(t)) + p 2 (t/ max(t)) 2 + p 3 (t/ max(t)) 3 + p 4 (t/ max(t)) 4 (27) This function is linear in the parameters and may be fit using methods of linear least squares. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 mt = max( t); 3 y_hat = p (1)*( t/mt) + p (2)*( t/mt ).ˆ2 + p (3)*( t/mt ).ˆ3 + p (4)*( t/mt ).ˆ4; The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table 3. The R 2 fit criterion is 99.9 percent. In this example, the standard parameter errors are larger than in example 1. The standard error for p 2 is 12 percent and the standard error of p 3 is 17 percent. Note that a very high value of the R 2 coefficient of determination does not necessarily mean that parameter values have been found with great accuracy. The parameter correlation matrix is given in Table 4. These parameters are highly correlated with one another, meaning that a change in one parameter will almost certainly result in changes in the other parameters. The bowl-shaped nature of the χ 2 objective function is shown in Figure 2(a). This shape is nearly quadratic and has a single minimum. The correlation of parameters p 2 and p 4, for example, is easily seen from this figure. The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 2(b). The parameters converge monotonically to their final values. The data points, the curve fit, and the curve fit confidence band are plotted in Figure 2(c). Note that the standard error of the fit approaches zero at t = and is largest at t =. This is because ŷ(; p) =, regardless of the values in p. A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed, and they appear to be so in this example. 12

13 Table 3. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 4. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p parameter values (a) log (χ 2 ) y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y - (b) χ 2 and λ χ 2 λ function calls histogram of residuals count 1 σ y (t) -2 (c) t (d) y data - y fit Figure 2. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 13

14 4.4.3 Example 3 Consider fitting the following function to a set of measured data. ŷ(t; p) = p 1 exp( t/p 2 ) + p 3 sin(t/p 4 ) (28) This function is linear in the parameters and may be fit using methods of linear least squares. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1)*exp(-t/p (2)) + p (3)* sin (t/p (4)); The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table. The R 2 fit criterion is 99.8 percent. In this example, the standard parameter errors are all less than one percent. The parameter correlation matrix is given in Table 6. Parameters p 4 is not correlated with the other parameters. Parameters p 1 and p 2 are most correlated at 73 percent. The bowl-shaped nature of the χ 2 objective function is shown in Figure 3(a). This shape is clearly not quadratic and has multiple minima. In this example, the initial guess for parameter p 4, the period of the oscillatory component, has to be within ten percent of the true value, otherwise the algorithm in lm.m will converge to a very small value of the amplitude of oscillation p 3 and an erroneous value for p 4. When such an occurrence arises, the standard errors σ p of the fit parameters p 3 and p 4 are quite large and the histogram of curve-fit errors (Figure 3(d)) is not normally distributed. The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 3(b). The parameters converge monotonically to their final values. The data points, the curve fit, and the curve fit confidence band are plotted in Figure 3(c). A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed, and they appear to be so in this example. 14

15 Table. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 6. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p (a) log (χ 2 ) 1.8 y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y (b) parameter values χ 2 and λ count χ 2 λ function calls histogram of residuals σ y (t) (c) t (d) y data - y fit Figure 3. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 1

16 4. Fitting in Multiple Dimensions The code lm.m can carry out fitting in multiple dimensions. For example, the function ẑ(x, y) = (p 1 x p 2 + (1 p 1 )y p 2 ) 1/p 2 may be fit to data points z i (x i, y i ), (i = 1,, m), using lm.m using an m-file such as 1 my_data = load( my_data_file ); % l o a d t h e data 2 x_dat = my_data (:,1); % i f t h e independent v a r i a b l e x i s in column 1 3 y_dat = my_data (:,2); % i f t h e independent v a r i a b l e y i s in column 2 4 z_dat = my_data (:,3); % i f t h e dependent v a r i a b l e z i s in column 3 6 p_min = [.1.1 ]; % minimum e x p e c t e d parameter v a l u e s 7 p_max = [.9 2. ]; % maximum e x p e c t e d parameter v a l u e s 8 p_init = [. 1. ]; % i n i t i a l g uess f o r parameter v a l u e s 9 t = [ x_dat y_dat ]; % x and y are column v e c t o r s o f independent v a r i a b l e s [ p_fit, Chi_sq, sigma_p, sigma_y,corr,r2, cvg_hst ] = lm( lm_func2d,p_init,t,z_dat, weight,.1, p_min, p_max ); with the m-function lm_func2d.m 1 function z_hat = lm_func2d (t, p) 2 % example f u n c t i o n used f o r n o n l i n e a r l e a s t s q u a r e s curve f i t t i n g 3 % to demonstrate t h e Levenberg Marquardt f u n c t i o n, lm.m, 4 % in two f i t t i n g dimensions 6 x_dat = t (:,1); 7 y_dat = t (:,2); 8 z_hat = ( p (1)* x_dat.ˆp(2) + (1 -p (1))* y_dat.ˆp(2) ).ˆ(1/ p (2)); 16

17 Remarks This text and lm.m were written in an attempt to understand and explain methods of nonlinear least squares for curve-fitting applications. It is important to remember that the purpose of applying statistical methods for data analysis is primarily to estimate parameter statistics... not only the parameter values themselves. Reasonable parameter values can often be found using non-statistical minimization algorithms, such as random search methods, the Nelder-Mead simplex method, or simply griding the parameter space and finding the best combination of parameter values. Nonlinear least squares problems can have objective functions with multiple local minima. Fitting algorithms will converge to different local minima depending upon values of the initial guess, the measurement noise, and algorithmic parameters. It is perfectly appropriate and good to use the best available estimate of the desired parameters as the initial guess. In the absence of physical insight into a curve-fitting problem, a reasonable initial guess may be found by coarsely griding the parameter space and finding the best combination of parameter values. There is no sense in forcing any statistical curve-fitting algorithm to work too hard by starting it with a poor initial guess. In most applications, parameters identified from neighboring initial guesses (±%) should converge to similar parameter estimates (±.1%). The fit statistics of these converged parameters should be the same, within two or three significant figures. References [1] A. Björck. Numerical methods for least squares problems, SIAM, Philadelphia, [2] K. Levenberg. A Method for the Solution of Certain Non-Linear Problems in Least Squares. The Quarterly of Applied Mathematics, 2: (1944). [3] M.I.A. Lourakis. A brief description of the Levenberg-Marquardt algorithm implemented by levmar, Technical Report, Institute of Computer Science, Foundation for Research and Technology - Hellas, 2. [4] K. Madsen, N.B. Nielsen, and O. Tingleff. Methods for nonlinear least squares problems. Technical Report. Informatics and Mathematical Modeling, Technical University of Denmark, 24. [] D.W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics, 11(2): , [6] H.B. Nielson, Damping Parameter In Marquardt s Method, Technical Report IMM-REP-1999-, Dept. of Mathematical Modeling, Technical University Denmark. [7] W.H. Press, S.A. Teukosky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C, Cambridge University Press, second edition, [8] Mark K.Transtrum, Benjamin B. Machta, and James P. Sethna, Why are nonlinear fits to data so challenging?, Phys. Rev. Lett. 4, 621 (2), [9] Mark K. Transtrum and James P. Sethna Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization, Preprint submitted to Journal of Computational Physics, January 3,

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Linear and Nonlinear Optimization German University in Cairo October 10, 2016 Outline Introduction Gradient descent method Gauss-Newton method Levenberg-Marquardt method Case study: Straight lines have

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

MOTION OF A POINT MASS IN A ROTATING DISC: A QUANTITATIVE ANALYSIS OF THE CORIOLIS AND CENTRIFUGAL FORCE

MOTION OF A POINT MASS IN A ROTATING DISC: A QUANTITATIVE ANALYSIS OF THE CORIOLIS AND CENTRIFUGAL FORCE Journal of Theoretical and Applied Mechanics, Sofia, 2016, vol. 46, No. 2, pp. 83 96 DOI: 10.1515/jtam-2016-0012 MOTION OF A POINT MASS IN A ROTATING DISC: A QUANTITATIVE ANALYSIS OF THE CORIOLIS AND CENTRIFUGAL

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5

BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5 BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5 All computer code should be produced in R or Matlab and e-mailed to gjh27@cornell.edu (one file for the entire homework). Responses to

More information

Nonlinear Least Squares Curve Fitting

Nonlinear Least Squares Curve Fitting Nonlinear Least Squares Curve Fitting Jack Merrin February 12, 2017 1 Summary There is no general solution for the minimization of chi-squared to find the best estimates of the parameters when the model

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one

More information

Non-linear least-squares and chemical kinetics. An improved method to analyse monomer-excimer decay data

Non-linear least-squares and chemical kinetics. An improved method to analyse monomer-excimer decay data Journal of Mathematical Chemistry 21 (1997) 131 139 131 Non-linear least-squares and chemical kinetics. An improved method to analyse monomer-excimer decay data J.P.S. Farinha a, J.M.G. Martinho a and

More information

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit I: Data Fitting Lecturer: Dr. David Knezevic Unit I: Data Fitting Chapter I.4: Nonlinear Least Squares 2 / 25 Nonlinear Least Squares So far we have looked at finding a best

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

Least-Squares Fitting of Model Parameters to Experimental Data

Least-Squares Fitting of Model Parameters to Experimental Data Least-Squares Fitting of Model Parameters to Experimental Data Div. of Mathematical Sciences, Dept of Engineering Sciences and Mathematics, LTU, room E193 Outline of talk What about Science and Scientific

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Sparse Levenberg-Marquardt algorithm.

Sparse Levenberg-Marquardt algorithm. Sparse Levenberg-Marquardt algorithm. R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. Appendix 6 was used in part. The Levenberg-Marquardt

More information

13. Nonlinear least squares

13. Nonlinear least squares L. Vandenberghe ECE133A (Fall 2018) 13. Nonlinear least squares definition and examples derivatives and optimality condition Gauss Newton method Levenberg Marquardt method 13.1 Nonlinear least squares

More information

2 Nonlinear least squares algorithms

2 Nonlinear least squares algorithms 1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M. Introduction to nonlinear LS estimation R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, 2ed., 2004. After Chapter 5 and Appendix 6. We will use x

More information

Computational Methods. Least Squares Approximation/Optimization

Computational Methods. Least Squares Approximation/Optimization Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the

More information

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit

Augmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the

More information

Visual SLAM Tutorial: Bundle Adjustment

Visual SLAM Tutorial: Bundle Adjustment Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Huikang Liu, Yuen-Man Pun, and Anthony Man-Cho So Dept of Syst Eng & Eng Manag, The Chinese

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno Hale Collage Spectropolarimetric Diagnostic Techniques Rebecca Centeno March 1-8, 2016 Today Simple solutions to the RTE Milne-Eddington approximation LTE solution The general inversion problem Spectral

More information

Package alabama. March 6, 2015

Package alabama. March 6, 2015 Package alabama March 6, 2015 Type Package Title Constrained Nonlinear Optimization Description Augmented Lagrangian Adaptive Barrier Minimization Algorithm for optimizing smooth nonlinear objective functions

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Non-linear least squares

Non-linear least squares Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Levenberg-Marquardt Method for Solving the Inverse Heat Transfer Problems

Levenberg-Marquardt Method for Solving the Inverse Heat Transfer Problems Journal of mathematics and computer science 13 (2014), 300-310 Levenberg-Marquardt Method for Solving the Inverse Heat Transfer Problems Nasibeh Asa Golsorkhi, Hojat Ahsani Tehrani Department of mathematics,

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

S(c)/ c = 2 J T r = 2v, (2)

S(c)/ c = 2 J T r = 2v, (2) M. Balda LMFnlsq Nonlinear least squares 2008-01-11 Let us have a measured values (column vector) y depent on operational variables x. We try to make a regression of y(x) by some function f(x, c), where

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

A Trust-region-based Sequential Quadratic Programming Algorithm

A Trust-region-based Sequential Quadratic Programming Algorithm Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Non-polynomial Least-squares fitting

Non-polynomial Least-squares fitting Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts

More information

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!

Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/! Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization

More information

SIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory.

SIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory. SIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory. Forward modeling: data model space space ˆd = f(x, m) m =(m 1,m 2,..., m N ) x =(x 1,x 2,x 3,..., x km ) ˆd =(ˆd 1, ˆd 2, ˆd 3,...,

More information

26. Filtering. ECE 830, Spring 2014

26. Filtering. ECE 830, Spring 2014 26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem

More information

Minimization of Static! Cost Functions!

Minimization of Static! Cost Functions! Minimization of Static Cost Functions Robert Stengel Optimal Control and Estimation, MAE 546, Princeton University, 2017 J = Static cost function with constant control parameter vector, u Conditions for

More information

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Neuro-Fuzzy Comp. Ch. 4 March 24, R p 4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Chapter 6: Derivative-Based. optimization 1

Chapter 6: Derivative-Based. optimization 1 Chapter 6: Derivative-Based Optimization Introduction (6. Descent Methods (6. he Method of Steepest Descent (6.3 Newton s Methods (NM (6.4 Step Size Determination (6.5 Nonlinear Least-Squares Problems

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Fitting Curves to Data, Generalized Linear Least Squares, and Error Analysis

Fitting Curves to Data, Generalized Linear Least Squares, and Error Analysis Fitting Curves to Data, Generalized Linear Least Squares, and Error Analysis CEE 629. System Identification Duke University, Fall 2017 It is sometimes of use to fit a curve to measured data, to determine

More information

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C. Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning

More information

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material

More information

Line Search Algorithms

Line Search Algorithms Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you

More information

Modeling of Data. Massimo Ricotti. University of Maryland. Modeling of Data p. 1/14

Modeling of Data. Massimo Ricotti. University of Maryland. Modeling of Data p. 1/14 Modeling of Data Massimo Ricotti ricotti@astro.umd.edu University of Maryland Modeling of Data p. 1/14 NRiC 15. Model depends on adjustable parameters. Can be used for constrained interpolation. Basic

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review 10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics

More information

Numerical solution of Least Squares Problems 1/32

Numerical solution of Least Squares Problems 1/32 Numerical solution of Least Squares Problems 1/32 Linear Least Squares Problems Suppose that we have a matrix A of the size m n and the vector b of the size m 1. The linear least square problem is to find

More information

Finite Elements for Nonlinear Problems

Finite Elements for Nonlinear Problems Finite Elements for Nonlinear Problems Computer Lab 2 In this computer lab we apply finite element method to nonlinear model problems and study two of the most common techniques for solving the resulting

More information

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes) AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.

More information

ON NUMERICAL RECOVERY METHODS FOR THE INVERSE PROBLEM

ON NUMERICAL RECOVERY METHODS FOR THE INVERSE PROBLEM ON NUMERICAL RECOVERY METHODS FOR THE INVERSE PROBLEM GEORGE TUCKER, SAM WHITTLE, AND TING-YOU WANG Abstract. In this paper, we present the approaches we took to recover the conductances of an electrical

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

weightchanges alleviates many ofthe above problems. Momentum is an example of an improvement on our simple rst order method that keeps it rst order bu

weightchanges alleviates many ofthe above problems. Momentum is an example of an improvement on our simple rst order method that keeps it rst order bu Levenberg-Marquardt Optimization Sam Roweis Abstract Levenberg-Marquardt Optimization is a virtual standard in nonlinear optimization which signicantly outperforms gradient descent and conjugate gradient

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.

More information

Math 409/509 (Spring 2011)

Math 409/509 (Spring 2011) Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate

More information

VHF Dipole effects on P-Band Beam Characteristics

VHF Dipole effects on P-Band Beam Characteristics VHF Dipole effects on P-Band Beam Characteristics D. A. Mitchell, L. J. Greenhill, C. Carilli, R. A. Perley January 7, 1 Overview To investigate any adverse effects on VLA P-band performance due the presence

More information

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x) Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not

More information

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013

Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013 Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013 This method goes by the terms Structured Total Least Squares and Singular Spectrum

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Exploring the energy landscape

Exploring the energy landscape Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy

More information

A Sobolev trust-region method for numerical solution of the Ginz

A Sobolev trust-region method for numerical solution of the Ginz A Sobolev trust-region method for numerical solution of the Ginzburg-Landau equations Robert J. Renka Parimah Kazemi Department of Computer Science & Engineering University of North Texas June 6, 2012

More information

Homework #2 Due Monday, April 18, 2012

Homework #2 Due Monday, April 18, 2012 12.540 Homework #2 Due Monday, April 18, 2012 Matlab solution codes are given in HW02_2012.m This code uses cells and contains the solutions to all the questions. Question 1: Non-linear estimation problem

More information

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...

More information

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB SECTION 7: CURVE FITTING MAE 4020/5020 Numerical Methods with MATLAB 2 Introduction Curve Fitting 3 Often have data,, that is a function of some independent variable,, but the underlying relationship is

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut Collaborators: Jodi Mead and Iveta Hnetynkova DEPARTMENT OF MATHEMATICS

More information

Lecture 17: Numerical Optimization October 2014

Lecture 17: Numerical Optimization October 2014 Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional

More information