1 Introduction. (1) i=1. χ 2 (p) = w i
|
|
- Valentine Peter Mills
- 6 years ago
- Views:
Transcription
1 The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems c Henri P. Gavin Department of Civil and Environmental Engineering Duke University March 9, 216 Abstract The Levenberg-Marquardt method is a standard technique used to solve nonlinear least squares problems. Least squares problems arise when fitting a parameterized function to a set of measured data points by minimizing the sum of the squares of the errors between the data points and the function. Nonlinear least squares problems arise when the function is not linear in the parameters. Nonlinear least squares methods involve an iterative improvement to parameter values in order to reduce the sum of the squares of the errors between the function and the measured data points. The Levenberg-Marquardt curve-fitting method is actually a combination of two minimization methods: the gradient descent method and the Gauss-Newton method. In the gradient descent method, the sum of the squared errors is reduced by updating the parameters in the steepest-descent direction. In the Gauss-Newton method, the sum of the squared errors is reduced by assuming the least squares function is locally quadratic, and finding the minimum of the quadratic. The Levenberg-Marquardt method acts more like a gradient-descent method when the parameters are far from their optimal value, and acts more like the Gauss-Newton method when the parameters are close to their optimal value. This document describes these methods and illustrates the use of software to solve nonlinear least squares curve-fitting problems. 1 Introduction In fitting a function ŷ(t; p) of an independent variable t and a vector of n parameters p to a set of m data points (t i, y i ), it is customary and convenient to minimize the sum of the weighted squares of the errors (or weighted residuals) between the measured data y(t i ) and the curve-fit function ŷ(t i ; p). This scalar-valued goodness-of-fit measure is called the chi-squared error criterion. χ 2 (p) = [ ] m 2 y(ti ) ŷ(t i ; p) (1) i=1 w i = (y ŷ(p)) T W(y ŷ(p)) (2) = y T Wy 2y T Wŷ + ŷ T Wŷ (3) The value w i is a measure of the error in measurement y(t i ). The weighting matrix W is diagonal with W ii = 1/w 2 i. If the function ŷ is nonlinear in the model parameters p, then the minimization of χ 2 with respect to the parameters must be carried out iteratively. The goal of each iteration is to find a perturbation h to the parameters p that reduces χ 2. 1
2 2 The Gradient Descent Method The steepest descent method is a general minimization method which updates parameter values in the downhill direction: the direction opposite to the gradient of the objective function. The gradient descent method converges well for problems with simple objective functions [3, 4]. For problems with thousands of parameters, gradient descent methods are sometimes the only viable method. The gradient of the chi-squared objective function with respect to the parameters is p χ2 = (y ŷ(p)) T W (y ŷ(p)) (4) p [ ] ŷ(p) = (y ŷ(p)) T W () p = (y ŷ) T WJ (6) where the m n Jacobian matrix [ ŷ/ p] represents the local sensitivity of the function ŷ to variation in the parameters p. For notational simplicity J will be used for [ ŷ/ p]. The parameter update h that moves the parameters in the direction of steepest descent is given by h gd = αj T W(y ŷ), (7) where the positive scalar α determines the length of the step in the steepest-descent direction. 3 The Gauss-Newton Method The Gauss-Newton method is a method for minimizing a sum-of-squares objective function. It presumes that the objective function is approximately quadratic in the parameters near the optimal solution [1]. For moderately-sized problems the Gauss-Newton method typically converges much faster than gradient-descent methods []. The function evaluated with perturbed model parameters may be locally approximated through a first-order Taylor series expansion. [ ] ŷ ŷ(p + h) ŷ(p) + h = ŷ + Jh, (8) p Substituting the approximation for the perturbed function, ŷ + Jh, for ŷ in equation (3), χ 2 (p + h) y T Wy + ŷ T Wŷ 2y T Wŷ 2(y ŷ) T WJh + h T J T WJh. (9) This shows that χ 2 is approximately quadratic in the perturbation h, and that the Hessian of the chi-squared fit criterion is approximately J T WJ. The parameter update h that minimizes χ 2 is found from χ 2 / h =. h χ2 (p + h) 2(y ŷ) T WJ + 2h T J T WJ, () update and the resulting normal equations for the Gauss-Newton update are [ J T WJ ] h gn = J T W(y ŷ). (11) 2
3 4 The Levenberg-Marquardt Method The Levenberg-Marquardt algorithm adaptively varies the parameter updates between the gradient descent update and the Gauss-Newton update, [ J T WJ + λi ] h lm = J T W(y ŷ), (12) where small values of the algorithmic parameter λ result in a Gauss-Newton update and large values of λ result in a gradient descent update. The parameter λ is initialized to be large so that first updates are small steps in the steepest-descent direction. If an iteration happens to result in a worse approximation, λ is increased. As the solution improves, λ is decreased, the Levenberg-Marquardt method approaches the Gauss-Newton method, and the solution typically accelerates to the local minimum [3, 4, ]. Marquardt s suggested update relationship [] [ J T WJ + λ diag(j T WJ) ] h lm = J T W(y ŷ). (13) makes the effect of particular values of λ less problem-specific, and is used in the Levenberg- Marquardt algorithm implemented in the Matlab function lm.m 4.1 Numerical Implementation Many variations of the Levenberg-Marquardt have been published in papers and in code. This document borrows from some of these, including the enhancement of a rank-1 Jacobian update. In iteration i, the step h is evaluated by comparing χ 2 (p) to χ 2 (p + h). The step is accepted if the metric ρ i [6] is greater than a user-specified threshold, ɛ 4 >. This metric is a measure of the actual improvement in χ 2 as compared to the improvement of an LM update assuming the approximation (8) were exact. ρ i (h lm ) = = = χ 2 (p) χ 2 (p + h lm ) (y ŷ) T (y ŷ) (y ŷ Jh lm ) T (y ŷ Jh lm ) (14) χ 2 (p) χ 2 (p + h lm ) hlm T (λ ih lm + J T W (y ŷ(p))) if using eq n (12) for h lm (1) χ 2 (p) χ 2 (p + h lm ) hlm T (λ idiag(j T WJ)h lm + J T W (y ŷ(p))) if using eq n (13) for h lm (16) If in an iteration ρ i (h) > ɛ 4 then p + h is sufficiently better than p, p is replaced by p + h, and λ is reduced by a factor. Otherwise λ is increased by a factor, and the algorithm proceeds to the next iteration Initialization and update of the L-M parameter, λ, and the parameters p In lm.m users may select one of three methods for inializing and updating λ and p. 1. λ = λ o ; λ o is user-specified []. use eq n (13) for h lm and eq n (16) for ρ if ρ i (h) > ɛ 4 : p p + h; λ i+1 = max[λ i /L, 7 ]; otherwise: λ i+1 = min [λ i L, 7 ]; 3
4 2. λ = λ o max [ diag[j T WJ] ] ; λ o is user-specified. use eq n ( (12) for h lm and eq n (1) for ρ (J α = T W(y ŷ(p)) ) ) T h / ((χ 2 (p + h) χ 2 (p)) /2 + 2 ( J T W(y ŷ(p)) ) ) T h ; if ρ i (αh) > ɛ 4 : p p + αh; λ i+1 = max [λ i /(1 + α), 7 ]; otherwise: λ i+1 = λ i + χ 2 (p + αh) χ 2 (p) /(2α); 3. λ = λ o max [ diag[j T WJ] ] ; λ o is user-specified [6]. use eq n (12) for h lm and eq n (1) for ρ if ρ i (h) > ɛ 4 : p p + h; λ i+1 = λ i max [1/3, 1 (2ρ i 1) 3 ] ; ν i = 2; otherwise: λ i+1 = λ i ν i ; ν i+1 = 2ν i ; For the examples in section 4.4, method 1 [] with L 11 and L 9 has good convergence properties Computation and rank-1 update of the Jacobian, [ y/ p] In the first iteration, in every 2n iterations, and in iterations where χ 2 (p + h) > χ 2 (p), the Jacobian (J R m n ) is numerically approximated using forward differences, or central differences (default) J ij = ŷ i = ŷ(t i; p + δp j ) ŷ(t i ; p) p j δp j J ij = ŷ i = ŷ(t i; p + δp j ) ŷ(t i ; p δp j ) p j 2 δp j, (17), (18) where the j-th element of δp j is the only non-zero element and is set to j (1 + p j ). In all other iterations, the Jacobian is updated using the Broyden rank-1 update formula, J = J + ( (ŷ(p + h) ŷ(p) Jh) h T) / ( h T h ). (19) For problems with many parameters, a finite differences Jacobian is computationally expensive. Convergence can be achieved with fewer function evaluations if the Jacobian is re-computed using finite differences only occaisonally. The rank-1 Jacobian update equation (19) requires no additional function evaluations Convergence criteria Convergence is achieved when one of the following three criteria is satisfied, Convergence in the gradient, max J T W(y ŷ) < ɛ1 ; Convergence in paramters, max h i /p i < ɛ 2 ; or Convergence in χ 2, χ 2 /(m n + 1) < ɛ 3. Otherwise, iterations terminate when the iteration count exceeds a pre-specified limit. 4
5 4.2 Error Analysis Once the optimal curve-fit parameters p fit are determined, parameter statistics are computed for the converged solution using the same weight value w for every data point. This weight value is equal to the mean square measurement error, σy, 2 w 2 = σ 2 y = The parameter covariance matrix is then computed from 1 m n + 1 (y ŷ(p fit)) T (y ŷ(p fit )) i. (2) V p = [J T WJ] 1, (21) with W = I/w 2. If a data covariance matrix V y is provided then W should be set to V 1 y. The asymptotic standard parameter errors are given by σ p = diag([j T WJ] 1 ), (22) The asymptotic standard parameter error is a measure of how unexplained variability in the data propagates to variability in the parameters, and is essentially an error measure for the parameters. The standard error of the fit is given by σŷ = diag(j[j T WJ] 1 J T ). (23) The standard error of the fit indicates how variability in the parameters affects the variability in the curve-fit. The asymptotic standad prediction error reflects the standard error of the fit as well as the mean square measurement error. σŷp = σ 2 y + diag(j[j T WJ] 1 J T ). (24) 4.3 Matlab code: lm.m The Matlab function lm.m implements the Levenberg-Marquardt method for curvefitting problems. The code with examples are available here: hpgavin/lm.m hpgavin/lm examp.m hpgavin/lm func.m hpgavin/lm plots.m
6 1 function [p,x2, sigma_p, sigma_y, corr_p,r_sq, cvg_hst ] = lm(func,p,t,y_dat, weight,dp, p_min, p_max,c, opts ) 2 % [ p, X2, sigma p, sigma y, corr p, R sq, c v g h s t ] = lm ( func, p, t, y dat, weight, dp, p min, p max, c, o p t s ) 3 % 4 % Levenberg Marquardt curve f i t t i n g : minimize sum o f w e i ghted squared r e s i d u a l s % INPUT VARIABLES 6 % func = f u n c t i o n o f n independent v a r i a b l e s, t, and m parameters, p, 7 % r e t u r n i n g t h e s i m u l a t e d model : y h a t = func ( t, p, c ) 8 % p = n v e c t o r o f i n i t i a l g uess o f parameter v a l u e s 9 % t = m v e c t o r s or matrix o f independent v a r i a b l e s ( used as arg to func ) % y d a t = m v e c t o r s or matrix o f data to be f i t by func ( t, p ) 11 % w e i g h t = w e i g h t i n g v e c t o r f o r l e a s t s q u a r e s f i t ( w e i g h t >= ) % i n v e r s e o f t h e standard measurement e r r o r s 13 % D e f a u l t : s q r t ( d. o. f. / ( y dat y d a t ) ) 14 % dp = f r a c t i o n a l increment o f p f o r numerical d e r i v a t i v e s 1 % dp ( j )> c e n t r a l d i f f e r e n c e s c a l c u l a t e d 16 % dp ( j )< one s i d e d backwards d i f f e r e n c e s c a l c u l a t e d 17 % dp ( j )= s e t s corresponding p a r t i a l s to zero ; i. e. h o l d s p ( j ) f i x e d 18 % D e f a u l t :. 1 ; 19 % p min = n v e c t o r o f lower bounds f o r parameter v a l u e s 2 % p max = n v e c t o r o f upper bounds f o r parameter v a l u e s 21 % c = an o p t i o n a l matrix o f v a l u e s passed to func ( t, p, c ) 22 % o p t s = v e c t o r o f a l g o r i t h m i c parameters 23 % parameter d e f a u l t s meaning 24 % o p t s (1) = prnt 3 >1 i n t e r m e d i a t e r e s u l t s ; >2 p l o t s 2 % o p t s (2) = MaxIter Npar maximum number o f i t e r a t i o n s 26 % o p t s (3) = e p s i l o n 1 1e 3 convergence t o l e r a n c e f o r g r a d i e n t 27 % o p t s (4) = e p s i l o n 2 1e 3 convergence t o l e r a n c e f o r parameters 28 % o p t s () = e p s i l o n 3 1e 3 convergence t o l e r a n c e f o r Chi square 29 % o p t s (6) = e p s i l o n 4 1e 1 determines acceptance o f a L M s t e p 3 % o p t s (7) = lambda 1e 2 i n i t i a l v a l u e o f L M paramter 31 % o p t s (8) = lambda UP fac 11 f a c t o r f o r i n c r e a s i n g lambda 32 % o p t s (9) = lambda DN fac 9 f a c t o r f o r d e c r e a s i n g lambda 33 % o p t s ( ) = Update Type 1 1 : Levenberg Marquardt lambda update 34 % 2 : Quadratic update 3 % 3 : Nielsen s lambda update e q u a t i o n s 36 % 37 % OUTPUT VARIABLES 38 % p = l e a s t s q u a r e s optimal e s t i m a t e o f t h e parameter v a l u e s 39 % X2 = Chi squared c r i t e r i a 4 % sigma p = asymptotic standard e r r o r o f t h e parameters 41 % sigma y = asymptotic standard e r r o r o f t h e curve f i t 42 % c o r r p = c o r r e l a t i o n matrix o f t h e parameters 43 % R sq = R squared c o f f i c i e n t o f m u l t i p l e determination 44 % c v g h s t = convergence h i s t o r y The m-file to solve a least-squares curve-fit problem with lm.m can be as simple as: 1 my_data = load( my_data_file ); % l o a d t h e data 2 t = my_data (:,1); % i f t h e independent v a r i a b l e i s in column 1 3 y_dat = my_data (:,2); % i f t h e dependent v a r i a b l e i s in column 2 4 p_min = [ ]; % minimum e x p e c t e d parameter v a l u e s 6 p_max = [. 1. ]; % maximum e x p e c t e d parameter v a l u e s 7 p_init = [ ]; % i n i t i a l g uess f o r parameter v a l u e s 8 9 [ p_fit, X2, sigma_p, sigma_y, corr, R_sq, cvg_hst ] =... lm ( lm_func, p_init, t, y_dat, 1, -.1, p_min, p_max ) where the user-supplied function lm_func.m could be, for example, 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1) * t.* exp(-t/p (2)).* cos (2* pi *( p (3)* t - p (4) )); 6
7 It is common and desirable to repeat the same experiment two or more times and to estimate a single set of curve-fit parameters from all the experiments. In such cases the data file may arranged as follows: 1 % t v a r i a b l e y (1 s t experiment ) y (2 nd experiemnt ) y (3 rd experiemnt ) : : : : 6 etc. etc. etc. etc. If your data is arranged as above you may prepare the data for lm.m using the following lines. 1 my_data = load( my_data_file ); % l o a d t h e data 2 t_column = 1; % column o f t h e independent v a r i a b l e 3 y_columns = [ ]; % columns o f t h e measured dependent v a r i a b l e s 4 y_dat = my_data (:, y_columns ); % t h e measured data 6 y_dat = y_dat (:); % a s i n g l e column v e c t o r 7 8 t = my_data (:, t_column ); % t h e independent v a r i a b l e 9 t = t*ones(1, length( y_columns )); % a column o f t f o r each column o f y t = t (:); % a s i n g l e column v e c t o r 7
8 Note that the arguments t and y dat to lm.m may be matrices as long as the dimensions of t match the dimensions of y dat. The columns of t need not be identical. Results may be plotted with lm plots.m: 1 function lm_plots ( t, y_dat, y_fit, sigma_y, cvg_hst, filename ) 2 % l m p l o t s ( t, y dat, y f i t, sigma y, c v g h s t, f i l e n a m e ) 3 % P l o t s t a t i s t i c s o f t h e r e s u l t s o f a Levenberg Marquardt l e a s t s q u a r e s 4 % a n a l y s i s with lm.m 6 y_dat = y_dat (:); 7 y_fit = y_fit (:); 8 9 [ max_it,n] = size ( cvg_hst ); n = n -3; 11 figure (1); % p l o t convergence h i s t o r y o f parameters, c h i ˆ2, lambda 12 c l f 13 subplot (211) 14 plot ( cvg_hst (:,1), cvg_hst (:,2: n+1), -o, linewidth,4); 1 for i =1: n 16 text (1.2* cvg_hst ( max_it,1), cvg_hst ( max_it,1+ i), sprintf ( %d,i) ); 17 end 18 ylabel ( parameter values ) 19 subplot (212) 2 semilogy( cvg_hst (:,1), [ cvg_hst (:,n +2) cvg_hst (:,n +3)], -o, linewidth,4) 21 text (.8* cvg_hst ( max_it,1),.2* cvg_hst ( max_it,n+2), \ chi ˆ2/(m-n +1)} ); 22 text (.8* cvg_hst ( max_it,1),.* cvg_hst ( max_it,n+3), \ lambda ); 23 % l e g e n d ( \ c h i ˆ2/(m n+1)}, \ lambda, 3 ) ; 24 ylabel ( \ chi ˆ2/(m-n +1)} and \ lambda ) 2 xlabel ( function calls ) figure (2); % p l o t data, f i t, and c o n f i d e n c e i n t e r v a l o f f i t 29 c l f 3 plot (t,y_dat, og, t,y_fit, -b, 31 t, y_fit +1.96* sigma_y,.k, t,y_fit -1.96* sigma_y,.k ); 32 legend( y_{ data }, y_{fit }, 9% c.i.,,); 33 ylabel ( y(t) ) 34 xlabel ( t ) 3 % s u b p l o t (212) 36 % semilogy ( t, sigma y, r, l i n e w i d t h, 4 ) ; 37 % y l a b e l ( \ sigma y ( t ) ) figure (3); % p l o t histogram o f r e s i d u a l s, are t h e y Gaussean? 4 c l f 41 hist ( real ( y_dat - y_fit )) 42 t i t l e ( histogram of residuals ) 43 axis ( tight ) 44 xlabel ( y_{ data } - y_{fit } ) 4 ylabel ( count ) 8
9 4.4 Numerical Examples The robustness of lm.m is tested in three numerical examples by curve-fitting simulated experimental measurements. Noisy experimental measurements y are simulated by adding random measurement noise to the curve-fit function evaluated with a set of true parameter values ŷ(t; p true ). The random measurement noise is normally distributed with a mean of zero and a standard deviation of.2. y i = ŷ(t i ; p true ) + N(,.2). (2) The convergence of the parameters from an erroneous initial guess p initial to values closer to p true is then examined. Each numerical example below has four parameters (n = 4) and one-hundred measurements (m = ). Each numerical example has a different curve-fit function ŷ(t; p), a different true parameter vector p true, and a different vector of initial parameters p initial. For several values of p 2 and p 4, the χ 2 error criterion is calculated and is plotted as a surface over the p 2 p 4 plane. The bowl-shaped nature of the objective function is clearly evident in each example. The objective function may not appear quadratic in the parameters and the objective function may have multiple minima. The presence of measurement noise does not affect the smoothness of the objective function. The gradient descent method endeavors to move parameter values in a down-hill direction to minimize χ 2 (p). This often requires small step sizes but is required when the objective function is not quadratic. The Gauss-Newton method approximates the bowl shape as a quadratic and endeavors to move parameter values to the minimum in a small number of steps. This method works well when the parameters are close to their optimal values. The Levenberg-Marquardt method retains the best features of both the gradient-descent method and the Gauss-Newton method. The evolution of the parameter values, the evolution of χ 2, and the evolution of λ from iteration to iteration is plotted for each example. The simulated experimental data, the curve fit, and the 9-percent confidence interval of the fit are plotted, the standard error of the fit, and a histogram of the fit errors are also plotted. The initial parameter values p initial, the true parameter values p true, the fit parameter values p fit, the standard error of the fit parameters σ p, and the correlation matrix of the fit parameters are tabulated. The true parameter values lie within the confidence interval p fit 1.96σ p < p true < p fit σ p with a confidence level of 9 percent. 9
10 4.4.1 Example 1 Consider fitting the following function to a set of measured data. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1)*exp(-t/p (2)) + p (3)* t.*exp(-t/p (4)); ŷ(t; p) = p 1 exp( t/p 2 ) + p 3 t exp( t/p 4 ) (26) The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table 1. The R 2 fit criterion is 98 percent. The standard parameter errors are less than one percent of the parameter values except for the standard error for p 2, which is 1. percent of p 2. The parameter correlation matrix is given in Table 2. Parameters p 3 and p 4 are the most correlated at -96 percent. Parameters p 1 and p 4 are the least correlated at -3 percent. The bowl-shaped nature of the χ 2 objective function is shown in Figure 1(a). shape is nearly quadratic and has a single minimum. This The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 1(b). The data points, the curve fit, and the curve fit confidence band are plotted in Figure 1(c). Note that the standard error of the fit is smaller near the center of the fit domain and is larger at the edges of the domain. A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed.
11 Table 1. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 2. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p (a) log (χ 2 ) 3 y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y 1 2 (b) parameter values χ 2 and λ count χ 2 λ function calls histogram of residuals 2 1 σ y (t) (c) t (d) y data - y fit Figure 1. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 11
12 4.4.2 Example 2 Consider fitting the following function to a set of measured data. ŷ(t; p) = p 1 (t/ max(t)) + p 2 (t/ max(t)) 2 + p 3 (t/ max(t)) 3 + p 4 (t/ max(t)) 4 (27) This function is linear in the parameters and may be fit using methods of linear least squares. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 mt = max( t); 3 y_hat = p (1)*( t/mt) + p (2)*( t/mt ).ˆ2 + p (3)*( t/mt ).ˆ3 + p (4)*( t/mt ).ˆ4; The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table 3. The R 2 fit criterion is 99.9 percent. In this example, the standard parameter errors are larger than in example 1. The standard error for p 2 is 12 percent and the standard error of p 3 is 17 percent. Note that a very high value of the R 2 coefficient of determination does not necessarily mean that parameter values have been found with great accuracy. The parameter correlation matrix is given in Table 4. These parameters are highly correlated with one another, meaning that a change in one parameter will almost certainly result in changes in the other parameters. The bowl-shaped nature of the χ 2 objective function is shown in Figure 2(a). This shape is nearly quadratic and has a single minimum. The correlation of parameters p 2 and p 4, for example, is easily seen from this figure. The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 2(b). The parameters converge monotonically to their final values. The data points, the curve fit, and the curve fit confidence band are plotted in Figure 2(c). Note that the standard error of the fit approaches zero at t = and is largest at t =. This is because ŷ(; p) =, regardless of the values in p. A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed, and they appear to be so in this example. 12
13 Table 3. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 4. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p parameter values (a) log (χ 2 ) y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y - (b) χ 2 and λ χ 2 λ function calls histogram of residuals count 1 σ y (t) -2 (c) t (d) y data - y fit Figure 2. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 13
14 4.4.3 Example 3 Consider fitting the following function to a set of measured data. ŷ(t; p) = p 1 exp( t/p 2 ) + p 3 sin(t/p 4 ) (28) This function is linear in the parameters and may be fit using methods of linear least squares. The m-function to be used with lm.m is simply: 1 function y_hat = lm_func (t,p, c) 2 y_hat = p (1)*exp(-t/p (2)) + p (3)* sin (t/p (4)); The true parameter values p true, the initial parameter values p initial, resulting curve-fit parameter values p fit and standard errors of the fit parameters σ p are shown in Table. The R 2 fit criterion is 99.8 percent. In this example, the standard parameter errors are all less than one percent. The parameter correlation matrix is given in Table 6. Parameters p 4 is not correlated with the other parameters. Parameters p 1 and p 2 are most correlated at 73 percent. The bowl-shaped nature of the χ 2 objective function is shown in Figure 3(a). This shape is clearly not quadratic and has multiple minima. In this example, the initial guess for parameter p 4, the period of the oscillatory component, has to be within ten percent of the true value, otherwise the algorithm in lm.m will converge to a very small value of the amplitude of oscillation p 3 and an erroneous value for p 4. When such an occurrence arises, the standard errors σ p of the fit parameters p 3 and p 4 are quite large and the histogram of curve-fit errors (Figure 3(d)) is not normally distributed. The convergence of the parameters and the evolution of χ 2 and λ are shown in Figure 3(b). The parameters converge monotonically to their final values. The data points, the curve fit, and the curve fit confidence band are plotted in Figure 3(c). A histogram of the difference between the data values and the curve-fit is shown in Figure 1(d). Ideally these curve-fit errors should be normally distributed, and they appear to be so in this example. 14
15 Table. Parameter values and standard errors. p initial p true p fit σ p σ p /p fit (%) Table 6. Parameter correlation matrix. p 1 p 2 p 3 p 4 p p p p (a) log (χ 2 ) 1.8 y(t) p p 2 y data y fit y fit +1.96σ y y fit -1.96σ y (b) parameter values χ 2 and λ count χ 2 λ function calls histogram of residuals σ y (t) (c) t (d) y data - y fit Figure 3. (a) The sum of the squared errors as a function of p 2 and p 4. (b) Top: the convergence of the parameters with each iteration, (b) Bottom: values of χ 2 and λ each iteration. (c) Top: data y, curve-fit ŷ(t; p fit ), curve-fit+error, and curve-fit-error; (c) Bottom: standard error of the fit, σŷ(t). (d) Histogram of the errors between the data and the fit. 1
16 4. Fitting in Multiple Dimensions The code lm.m can carry out fitting in multiple dimensions. For example, the function ẑ(x, y) = (p 1 x p 2 + (1 p 1 )y p 2 ) 1/p 2 may be fit to data points z i (x i, y i ), (i = 1,, m), using lm.m using an m-file such as 1 my_data = load( my_data_file ); % l o a d t h e data 2 x_dat = my_data (:,1); % i f t h e independent v a r i a b l e x i s in column 1 3 y_dat = my_data (:,2); % i f t h e independent v a r i a b l e y i s in column 2 4 z_dat = my_data (:,3); % i f t h e dependent v a r i a b l e z i s in column 3 6 p_min = [.1.1 ]; % minimum e x p e c t e d parameter v a l u e s 7 p_max = [.9 2. ]; % maximum e x p e c t e d parameter v a l u e s 8 p_init = [. 1. ]; % i n i t i a l g uess f o r parameter v a l u e s 9 t = [ x_dat y_dat ]; % x and y are column v e c t o r s o f independent v a r i a b l e s [ p_fit, Chi_sq, sigma_p, sigma_y,corr,r2, cvg_hst ] = lm( lm_func2d,p_init,t,z_dat, weight,.1, p_min, p_max ); with the m-function lm_func2d.m 1 function z_hat = lm_func2d (t, p) 2 % example f u n c t i o n used f o r n o n l i n e a r l e a s t s q u a r e s curve f i t t i n g 3 % to demonstrate t h e Levenberg Marquardt f u n c t i o n, lm.m, 4 % in two f i t t i n g dimensions 6 x_dat = t (:,1); 7 y_dat = t (:,2); 8 z_hat = ( p (1)* x_dat.ˆp(2) + (1 -p (1))* y_dat.ˆp(2) ).ˆ(1/ p (2)); 16
17 Remarks This text and lm.m were written in an attempt to understand and explain methods of nonlinear least squares for curve-fitting applications. It is important to remember that the purpose of applying statistical methods for data analysis is primarily to estimate parameter statistics... not only the parameter values themselves. Reasonable parameter values can often be found using non-statistical minimization algorithms, such as random search methods, the Nelder-Mead simplex method, or simply griding the parameter space and finding the best combination of parameter values. Nonlinear least squares problems can have objective functions with multiple local minima. Fitting algorithms will converge to different local minima depending upon values of the initial guess, the measurement noise, and algorithmic parameters. It is perfectly appropriate and good to use the best available estimate of the desired parameters as the initial guess. In the absence of physical insight into a curve-fitting problem, a reasonable initial guess may be found by coarsely griding the parameter space and finding the best combination of parameter values. There is no sense in forcing any statistical curve-fitting algorithm to work too hard by starting it with a poor initial guess. In most applications, parameters identified from neighboring initial guesses (±%) should converge to similar parameter estimates (±.1%). The fit statistics of these converged parameters should be the same, within two or three significant figures. References [1] A. Björck. Numerical methods for least squares problems, SIAM, Philadelphia, [2] K. Levenberg. A Method for the Solution of Certain Non-Linear Problems in Least Squares. The Quarterly of Applied Mathematics, 2: (1944). [3] M.I.A. Lourakis. A brief description of the Levenberg-Marquardt algorithm implemented by levmar, Technical Report, Institute of Computer Science, Foundation for Research and Technology - Hellas, 2. [4] K. Madsen, N.B. Nielsen, and O. Tingleff. Methods for nonlinear least squares problems. Technical Report. Informatics and Mathematical Modeling, Technical University of Denmark, 24. [] D.W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics, 11(2): , [6] H.B. Nielson, Damping Parameter In Marquardt s Method, Technical Report IMM-REP-1999-, Dept. of Mathematical Modeling, Technical University Denmark. [7] W.H. Press, S.A. Teukosky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C, Cambridge University Press, second edition, [8] Mark K.Transtrum, Benjamin B. Machta, and James P. Sethna, Why are nonlinear fits to data so challenging?, Phys. Rev. Lett. 4, 621 (2), [9] Mark K. Transtrum and James P. Sethna Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization, Preprint submitted to Journal of Computational Physics, January 3,
Linear and Nonlinear Optimization
Linear and Nonlinear Optimization German University in Cairo October 10, 2016 Outline Introduction Gradient descent method Gauss-Newton method Levenberg-Marquardt method Case study: Straight lines have
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationMOTION OF A POINT MASS IN A ROTATING DISC: A QUANTITATIVE ANALYSIS OF THE CORIOLIS AND CENTRIFUGAL FORCE
Journal of Theoretical and Applied Mechanics, Sofia, 2016, vol. 46, No. 2, pp. 83 96 DOI: 10.1515/jtam-2016-0012 MOTION OF A POINT MASS IN A ROTATING DISC: A QUANTITATIVE ANALYSIS OF THE CORIOLIS AND CENTRIFUGAL
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters
More informationBSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5
BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5 All computer code should be produced in R or Matlab and e-mailed to gjh27@cornell.edu (one file for the entire homework). Responses to
More informationNonlinear Least Squares Curve Fitting
Nonlinear Least Squares Curve Fitting Jack Merrin February 12, 2017 1 Summary There is no general solution for the minimization of chi-squared to find the best estimates of the parameters when the model
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationNon-linear least-squares and chemical kinetics. An improved method to analyse monomer-excimer decay data
Journal of Mathematical Chemistry 21 (1997) 131 139 131 Non-linear least-squares and chemical kinetics. An improved method to analyse monomer-excimer decay data J.P.S. Farinha a, J.M.G. Martinho a and
More informationApplied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit I: Data Fitting Lecturer: Dr. David Knezevic Unit I: Data Fitting Chapter I.4: Nonlinear Least Squares 2 / 25 Nonlinear Least Squares So far we have looked at finding a best
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationLeast-Squares Fitting of Model Parameters to Experimental Data
Least-Squares Fitting of Model Parameters to Experimental Data Div. of Mathematical Sciences, Dept of Engineering Sciences and Mathematics, LTU, room E193 Outline of talk What about Science and Scientific
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationSparse Levenberg-Marquardt algorithm.
Sparse Levenberg-Marquardt algorithm. R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, 2004. Appendix 6 was used in part. The Levenberg-Marquardt
More information13. Nonlinear least squares
L. Vandenberghe ECE133A (Fall 2018) 13. Nonlinear least squares definition and examples derivatives and optimality condition Gauss Newton method Levenberg Marquardt method 13.1 Nonlinear least squares
More information2 Nonlinear least squares algorithms
1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationIntelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera
Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects
More informationnonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.
Introduction to nonlinear LS estimation R. I. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision. Cambridge University Press, 2ed., 2004. After Chapter 5 and Appendix 6. We will use x
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More informationAugmented Reality VU numerical optimization algorithms. Prof. Vincent Lepetit
Augmented Reality VU numerical optimization algorithms Prof. Vincent Lepetit P3P: can exploit only 3 correspondences; DLT: difficult to exploit the knowledge of the internal parameters correctly, and the
More informationVisual SLAM Tutorial: Bundle Adjustment
Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationUnconstrained Multivariate Optimization
Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued
More informationLocal Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications
Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Huikang Liu, Yuen-Man Pun, and Anthony Man-Cho So Dept of Syst Eng & Eng Manag, The Chinese
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationConjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationHale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno
Hale Collage Spectropolarimetric Diagnostic Techniques Rebecca Centeno March 1-8, 2016 Today Simple solutions to the RTE Milne-Eddington approximation LTE solution The general inversion problem Spectral
More informationPackage alabama. March 6, 2015
Package alabama March 6, 2015 Type Package Title Constrained Nonlinear Optimization Description Augmented Lagrangian Adaptive Barrier Minimization Algorithm for optimizing smooth nonlinear objective functions
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationNon-linear least squares
Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationLevenberg-Marquardt Method for Solving the Inverse Heat Transfer Problems
Journal of mathematics and computer science 13 (2014), 300-310 Levenberg-Marquardt Method for Solving the Inverse Heat Transfer Problems Nasibeh Asa Golsorkhi, Hojat Ahsani Tehrani Department of mathematics,
More informationStable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems
Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationS(c)/ c = 2 J T r = 2v, (2)
M. Balda LMFnlsq Nonlinear least squares 2008-01-11 Let us have a measured values (column vector) y depent on operational variables x. We try to make a regression of y(x) by some function f(x, c), where
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationA Trust-region-based Sequential Quadratic Programming Algorithm
Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Mixed effects models - Part IV Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationNon-polynomial Least-squares fitting
Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts
More informationPose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!
Pose Tracking II! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 12! stanford.edu/class/ee267/!! WARNING! this class will be dense! will learn how to use nonlinear optimization
More informationSIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory.
SIO223A A Brief Introduction to Geophysical Modeling and Inverse Theory. Forward modeling: data model space space ˆd = f(x, m) m =(m 1,m 2,..., m N ) x =(x 1,x 2,x 3,..., x km ) ˆd =(ˆd 1, ˆd 2, ˆd 3,...,
More information26. Filtering. ECE 830, Spring 2014
26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem
More informationMinimization of Static! Cost Functions!
Minimization of Static Cost Functions Robert Stengel Optimal Control and Estimation, MAE 546, Princeton University, 2017 J = Static cost function with constant control parameter vector, u Conditions for
More informationNeuro-Fuzzy Comp. Ch. 4 March 24, R p
4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationChapter 6: Derivative-Based. optimization 1
Chapter 6: Derivative-Based Optimization Introduction (6. Descent Methods (6. he Method of Steepest Descent (6.3 Newton s Methods (NM (6.4 Step Size Determination (6.5 Nonlinear Least-Squares Problems
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationFitting Curves to Data, Generalized Linear Least Squares, and Error Analysis
Fitting Curves to Data, Generalized Linear Least Squares, and Error Analysis CEE 629. System Identification Duke University, Fall 2017 It is sometimes of use to fit a curve to measured data, to determine
More informationDeep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.
Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning
More informationCLASS NOTES Computational Methods for Engineering Applications I Spring 2015
CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationLine Search Algorithms
Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you
More informationModeling of Data. Massimo Ricotti. University of Maryland. Modeling of Data p. 1/14
Modeling of Data Massimo Ricotti ricotti@astro.umd.edu University of Maryland Modeling of Data p. 1/14 NRiC 15. Model depends on adjustable parameters. Can be used for constrained interpolation. Basic
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationNumerical Optimization: Basic Concepts and Algorithms
May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some
More information10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review
10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics
More informationNumerical solution of Least Squares Problems 1/32
Numerical solution of Least Squares Problems 1/32 Linear Least Squares Problems Suppose that we have a matrix A of the size m n and the vector b of the size m 1. The linear least square problem is to find
More informationFinite Elements for Nonlinear Problems
Finite Elements for Nonlinear Problems Computer Lab 2 In this computer lab we apply finite element method to nonlinear model problems and study two of the most common techniques for solving the resulting
More informationCS 450 Numerical Analysis. Chapter 5: Nonlinear Equations
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More informationON NUMERICAL RECOVERY METHODS FOR THE INVERSE PROBLEM
ON NUMERICAL RECOVERY METHODS FOR THE INVERSE PROBLEM GEORGE TUCKER, SAM WHITTLE, AND TING-YOU WANG Abstract. In this paper, we present the approaches we took to recover the conductances of an electrical
More informationx k+1 = x k + α k p k (13.1)
13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationweightchanges alleviates many ofthe above problems. Momentum is an example of an improvement on our simple rst order method that keeps it rst order bu
Levenberg-Marquardt Optimization Sam Roweis Abstract Levenberg-Marquardt Optimization is a virtual standard in nonlinear optimization which signicantly outperforms gradient descent and conjugate gradient
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationMATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.
MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.
More informationMath 409/509 (Spring 2011)
Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate
More informationVHF Dipole effects on P-Band Beam Characteristics
VHF Dipole effects on P-Band Beam Characteristics D. A. Mitchell, L. J. Greenhill, C. Carilli, R. A. Perley January 7, 1 Overview To investigate any adverse effects on VLA P-band performance due the presence
More information(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)
Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not
More informationDUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee
J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM
More informationNeural Networks Lecture 3:Multi-Layer Perceptron
Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural
More informationFiltering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013
Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013 This method goes by the terms Structured Total Least Squares and Singular Spectrum
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationExploring the energy landscape
Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy
More informationA Sobolev trust-region method for numerical solution of the Ginz
A Sobolev trust-region method for numerical solution of the Ginzburg-Landau equations Robert J. Renka Parimah Kazemi Department of Computer Science & Engineering University of North Texas June 6, 2012
More informationHomework #2 Due Monday, April 18, 2012
12.540 Homework #2 Due Monday, April 18, 2012 Matlab solution codes are given in HW02_2012.m This code uses cells and contains the solutions to all the questions. Question 1: Non-linear estimation problem
More informationNonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp
Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...
More informationSECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB
SECTION 7: CURVE FITTING MAE 4020/5020 Numerical Methods with MATLAB 2 Introduction Curve Fitting 3 Often have data,, that is a function of some independent variable,, but the underlying relationship is
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationThe Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation
The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut Collaborators: Jodi Mead and Iveta Hnetynkova DEPARTMENT OF MATHEMATICS
More informationLecture 17: Numerical Optimization October 2014
Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional
More information