Mathematical modelling Chapter 2 Nonlinear models and geometric models Faculty of Computer and Information Science University of Ljubljana 2018/2019 3. Nonlinear models Given is a sample of points {(x 1, y 1 ),..., (x m, y m )}, x i R n, y i R. The mathematical model is nonlinear if the function y = F (x, a 1,..., a p ) is a nonlinear function of the parameters a i. Each data point gives a nonlinear equation for the parameters a 1,..., a p. y i = F (x i, a 1,..., a p ), i = 1,..., m Examples of nonlinear models: 1. Exponential decay F (x, a, k) = ae kx ali rast F (x, a, k) = a(1 e kx ), k > 0 2. Gaussian model: F (x, a, b, c) = ae ( x c b ) 2 a 3. Logistic model: F (x, a, b, c) = (1+be kx ), k > 0
Given the data points {(x 1, y 1 ),..., (x m, y m )}, x i R n, y i R we obtain a system of nonlinear equations for the parameters a i : f i (a 1,..., a p ) = y i F (x i, a 1,..., a p ) = 0, i = i,..., m. Solutions are zeroes of a nonlinear vector function f : R p R m f (a 1,..., a p ) = (f 1 (a 1,..., a p ),..., f m (a 1..., a p )). Solving a system of nonlinear equations is a tough problem (even for n = m = 1)... One possible strategy is to approximate these by zeroes of suitable linear approximations. Example of a nonlinear model: In the area around a radiotelescope the use of microwave owens is forbidden, since the radiation interferes with the telescope. We are looking for the location (a, b) of a microwave owen that is causing problems. The radiation intensity decreases with the distance from the source r according to u(r) = α 1 + r. Measured values of the signal at three locations are z(0, 0) = 0.27, z(1, 1) = 0.36 in z(0, 2) = 0.3. This gives the following system of equations for the parameters α, a, b: α 1 + = 0.27 a 2 + b 2 α 1 + = 0.36 (1 a) 2 + (1 b) 2 α 1 + = 0.3 a 2 + (2 b) 2
3.1 Vector functions of a vector variable Let f be a function from D R n to R m, x 1 f maps a vector x =. f : x 1. x n x n D to a vector f (x) = f 1 (x) f 1 (x 1,..., x n ). =. f m (x) f m (x 1,..., x n ) f 1 (x). f m (x) Examples: 1. A linear vector function f : R n R m is given by f : x Ax + b, where A is a matrix of order m n and b R m. 2. A nonlinear vector function f : R 3 R 2 could, for example, be given by f : x y z [ x 2 + y 2 + z 2 ] 1. x + y + z
The derivative of a vector function f is given by the Jacobian matrix: f 1 f 1 x 1 x n J = Df =..... f m f m x 1 x n If m = 1, n = 1, f is a function from D R to R and Df (x) = f (x). For m = 1 and n general f is a function of n variables and Df (x) = gradf (x). gradf 1 For n and m general Df (x) =.. gradf m Examples 1. If f : R n R m is the linear function x Ax + b, then Df (x) = A. 2. If f : R 3 R 2 is given by x f : y z [ x 2 + y 2 + z 2 ] 1 x + y + z then Df (x) = [ 2x 2y ] 2z 1 1 1
The linear approximation of f at the point a is the linear function that has the same value and the same derivative as f at a: n = 1, m = 1: L a (x) = f (a) + Df (a)(x a). L a (x) = f (a) + f (a)(x a) is the linear approximation of a function of one variable (which you know from Calculus), its graph y = L a (x) is the tangent to the graph y = f (x) at the point a, n = 2, m = 1, i.e. f (x, y) is a function of two variables: L (a,b) (x, y) = f (a, b) + gradf (a, b) [ ] x a, y b the graph z = L (a,b) (x, y) is the tangent plane to the surface z = f (x, y) at the point (a, b). Example: The linear approximation to the function x [ f : y x 2 + y 2 + z 2 ] 1 x + y + z z at a = 1 1 1 is the linear function = L a (x, y, z) = [ ] [ 2 2 2 2 + 1 1 1 1 ] x 1 y + 1 z 1 [ ] [ 2 + 2(x 1) 2(y + 1) + 2(z 1) 2 2 2 = 1 + (x 1) + (y + 1) + (z 2) 1 1 1 ] [ ] x 4 y + 0 z
Geometric picture: Given a vector function f : R 3 R 2, every point (x 0, y 0, z 0 ) lies in the intersection of level surfaces f 1 (x, y, z) = c 1 in f 2 (x, y, z) = c 2, where c 1 = f 1 (x 0, y 0, z 0 ) and c 2 = f 2 (x 0, y 0, z 0 ). The intersection of two surfaces in R 3 determines an implicit curve in R 3. If they are nonzero, the vectors gradf 1 (x 0, y 0, z 0 ) and gradf 2 (x 0, y 0, z 0 ) are normal vectors of the two level surfaces, and gradf 1 (x 0, y 0, z 0 ) gradf 2 (x 0, y 0, z 0 ) is tangential to the implicit curve. For the function f : x y z [ x 2 + y 2 + z 2 ] 1 x + y + z the implicit curve through (1, 1, 1) is given by x 2 + y 2 + z 2 1 = 2 in x + y + z = 1, and the tangent vector is gradf 1 (1, 1, 1) gradf 2 (1, 1, 1) = 2 2 2 1 1 = 1 4 0 4.
3.2 Solving systems of nonlinear equations f : D R m, D R n We are looking for solutions of f 1 (x) f (x) =. = f m (x) f 1 (x 1,..., x n ). f m (x 1,..., x n ) = 0. 0 In many cases an analytic solution does not even exist. A number of numerical methods for approximate solutions is available. We will look at one, based on linear approximations. n = 1, m = 1: solving an equation f (x) = 0, x R. Newton s or tangent method: We construct a recurssive sequence with x 0 initial term x k+1 solution of L xk (x) = f (x k ) + f (x k )(x x k ) = 0, so x k+1 = x k f (x k) f (x k )
The sequence x i converges to a solution α, f (α) = 0, if: 1. f (x) 0 for all x I, where I is an interval [α r, α + r] for some r (α x 0 ), 2. f (x) is continuous for all x I, 3. x 0 is close enough to the solution α. Under these assumptions the convergence is quadratic: if ε i = x i α then ε i+1 Mε 2 i, where M is a constant bounded by f (x) /f (x) on I. m = n > 1: Newton s method generalizes to systems of n nonlinear equations in n unknowns: x 0 initial approximation, x k+1 solution of L xk (x) = f (x k ) + Df (x k )(x x k ) = 0, so x k+1 = x k Df (x k ) 1 f (x k ). In practice the linear system for x k+1 is solved at each step. Df (x k )x k+1 = Df (x k )x k f (x k ) The sequence converges to a solution α if for some r > 0 the matrix Df (x) is nonsingular for all x, x α < r, and x 0 α < r.
Application to optimization: Newton optimization method Let F : R n R, we are looking for the minimum (or maximum) of F. The first step is to find the critical points, i.e. solution of f = gradf = F x1. F xn = 0. This is a system of n equations for n variables, the Jacobian of the vector function f is the Hessian of F : F x1 x 1... F x1 x n Df (x) = H(x) =...... F xn x 1... F xn x n If the sequence of iterates x 0, x k+1 = x k H 1 (x k )gradf (x k ) converges, the limit is a critical point of F, i.e. a candidate for the minimum (or maximum). m > n > 0: We have an overdetermined system f (x) = 0 of m nonlinear equations for n unknowns. The system f (x) = 0 generally does not have a solution. We are looking for a best fit to a solution, that is, for α such that the distance of f (α) from 0 is the smallest possible: f (α) 2 = min{ f (x) 2 }. The Gauss-Newton method is a generalization of the Newton method where instead of the inverse of the Jacobian its MP inverse is used at each step: x 0 initial approximation x k+1 = x k Df (x k ) + f (x k ), where Df (x k ) + is the MP inverse of Df (x k ). If the matrix (Df (x k ) T Df (x k )) is nonsingular at each step k then x k+1 = x k (Df (x k ) T Df (x k )) 1 Df (x k ) T f (x k ).
At each step x k+1 is the least squares approximation to the solution of the overdetermined linear system L xk (x) = 0, that is, L xk (x k+1 ) 2 = min{ L xk (x) 2, x R n }. Convergence is not guaranteed, but: if the sequence x k converges, the limit x = lim k x k is a local (but not nocessarily global) minimum of f (x) 2. It follows that the Gauss-Newton method is an algorithm for the local minimum of f (x) 2.