A brief introduction to ordinary differential equations

Chapter 1 A brief introduction to ordinary differential equations 1.1 Introduction An ordinary differential equation (ode) is an equation that relates a function of one variable, y(t), with its derivative(s) with respect to this variable. A general ode is a relation of the form Φ(t, y(t), y (t), y (t),... ) =, supplemented with the specification of y and/or its derivatives at certain point (e.g., y(5) = 17 and y (1) = 3). The order of an ode is the order of the highest derivative that appears in the equation. A first-order equation The following equation is an explicit 1st-order ode: y = f(t, y), (1.1) where f : R R R is a given function. function, y(t), that satisfies the relation, A solution of this equation is a y (t) = f(t, y(t)). The solution to this equation is in general not unique; a unique solution exists if in addition to the ode we fix an initial condition, y(t ) = y (provided that f satisfies certain conditions). 1

2 Chapter 1 Comments: (1) As a matter of convenience, we will view odes as evolution equations, hence the dependent variable t will be called time. (2) A general first order equation is of the form f(t, y, y ) =. (3) The conditions under which a solution exists and is unique will be stated later on. Example 1.1 The unique solution of the equation y = 5 y y() = 2, is y(t) = 2 e 5t. A second-order equation The following equation is an explicit 2nd-order ode: y = f(t, y, y ). (1.2) Here we need to provide two additional pieces of data in order to guarantee a unique solution. If the two conditions are at the same point t, i.e., y(t ) = y and y (t ) = y, then the problem is called an initial-value problem. Otherwise, if the two conditions are prescribed at different times, it is called a boundary-value problem. Representation as a first-order system An explicit second-order ode can be represented as a system of two first-order equations by defining y 1 (t) = y(t) and y 2 (t) = y (t). Then, y 1 = y 2 y 2 = f(t, y 1, y 2 ) y 1 (t ) = y y 2 (t ) = y. (1.3) Similarly, any explicit n th-order ode can be represented as a system of n first-order equations. For systems of ode it is convenient to adopt a vector notation. Take for example the system (1.3); let ( ) ( ) ( ) y1 (t) y y Y (t) = Y y 2 (t) = and F (t, Y ) = 2. f(t, y 1, y 2 ) Then, (1.3) can be written in vector form: y Y = F (t, Y ), Y () = Y. Exercise 1.1 Represent the fourth-order equations as a system of first-order equations. y (4) = sin (y + t) + y

A brief introduction to ordinary differential equations 3 1.5 y.5 1 1.5 2 1.5.5 1 1.5 2 t Figure 1.1: Vector field and 5 different solutions for the ode y = 1 3t + y + t 2 + ty. Vector fields Consider the scalar equation y = f(t, y), leaving the initial condition y(t ) momentarily unspecified. The solution y(t) that passes, in the (t, y) plane, through the point (t, y ) has at this point a slope f(t, y ), i.e., at every point (t, y) we can draw a line tangent to the solution y(t), that passes though that point. The mapping (t, y) tangent line is called a vector field. A visualization of the vector field that corresponds to the ode y = 1 3t + t + t 2 + ty for (t, y) [ 1, 2] [ 2, 1] is shown in Figure 1.1. A solution y(t) is a curve tangent to the vector field at all points. Exercise 1.2 Write a computer program that draws the vector field for the ode y = y + t for (t, y) [, 2] [ 1, 2]. Plot on top of it the solution y(t) = y e t + t 1 + e t for four different values of y = y() [ 1, 1]. 1.2 Exactly solvable equations An equation is called exactly solvable if we can express its solution in terms of elementary functions. In this section we consider a few examples of exactly solvable equations. Most of the equations in real life applications cannot be solved by analytical means, which is why we need to develop computational algorithms to approximate solutions.

4 Chapter 1 1.2.1 Separable equations Consider a first-order equation of the form: y (t) = f(t) g(y(t)). Then, t y (s) g(y(s)) ds = t f(s) ds. On the left hand side we change variables: z = y(s), dz = y (s) ds, thus y(t) y() dz g(z) = t f(s) ds. Example 1.2 Solve the equation y = 6 tan y. We first have t y (s) tan y(s) ds = t 6 ds = 6t. Changing variables, z = y(s), y(t) y() dz = log sin z y(t) tan z y() = 6t, and y(t) = sin 1 exp [log sin y() + 6t]. Exercise 1.3 Solve the equation y = y 3/2 y() = 1. A first-order linear equation As a special case of separable equations, consider the first-order linear equation: y = f(t) y. The solution is ( t ) y(t) = C exp f(s) ds.

A brief introduction to ordinary differential equations 5 Inhomogeneous linear equation Consider now the inhomogeneous linear equation: y = f(t) y + g(t). (1.4) We solve it by a method called the variation of constants, seeking a solution of the form ( t ) y(t) = c(t) exp f(s) ds. Substituting this ansatz into the differential equation (1.4), we find that c(t) satisfies ( t ) c (t) = g(t) exp f(s) ds. Thus, solving a differential equation has been reduced to standard integration. Example 1.3 Find the general solution to the equation y = ty + 5. Solution: y(t) = [ 5 t ] e 1 2 s2 ds + C e 1 2 t2. 1.2.2 Total differential equation Consider a first-order equation of the form where P and Q satisfy the relation P (t, y) + Q(t, y) y =, P y = Q t. Then there exists a function F (t, y) such that P = F t and Q = F y. Consider now the function F (t, y(t)). Its time derivative is given by hence df dt = P + Q y =, F (t, y(t)) = C is the (implicit) solution to the equation.

6 Chapter 1 Example 1.4 Solve the equation 2t sin y + t 2 cos y y =. }{{}}{{} P Q Solution: t 2 sin y = C. The constant C is determined by the initial conditions. Exercise 1.4 Solve the total differential equation 3t 2 2at + ay 3y 2 y + aty =. 1.2.3 Second-order equations With the exception of linear equations with constant coefficients and a number of classical cases, second-order equations are usually not exactly solvable. Comment: Euler collected all the equations whose analytical solution were known; over 8 pages. Liouville in 1841 was the first who proved that there exist equations, like y = t 2 + y 2, whose solutions cannot be expressed in terms of elementary functions 1.2.4 Linear equations A linear equation of order n is of the form a (t)y (n) + a 1 (t)y (n 1) + + a n (t)y =. (1.5) It can be proved that, under certain regularity conditions on the a j (t), such an equation always has n solutions that are linearly independent. A set of solutions y j (t), j = 1, 2,..., n is called independent if there do not exist nontrivial coefficients c j for which n c j y j (t). j=1 If there exist n functions, (u 1 (t),..., u n (t)), that are all solutions to (1.5), then any linear combination of them is also a solution. Which linear combination to choose is determined by the initial conditions.

A brief introduction to ordinary differential equations 7 1.2.5 Linear equations with constant coefficients We consider now a particular case of (1.5) where all the functions a i (t) are constant: y (n) + a 1 y (n 1) + + a n y =. (1.6) We are looking for a basis of n independent solutions. Trying a solution of the form y = exp(p t) one obtains an algebraic equation for p, p n + a 1 p n 1 + + a n =. If all the roots p i are distinct, then we have found the basis of functions, and the general solution is of the form: y(t) = c 1 e p1t + + c n e pnt. If one of the roots is complex, p = α+i β, then its complex conjugate, α i β, is also a root. From the two one can create two real valued solutions: y 1 (t) = e αt sin(βt) and y 2 (t) = e αt cos(βt). Multiple roots Consider for example the following second-order equation: y 2q y + q 2 y =. If we search for a solution of the form y(t) = e pt we find that p = q is a double root: p 2 2qp + q 2 = = p 1,2 = q, so that y(t) = e qt is a solution, but we still do not have a complete basis. When q is a double root, we look for solutions of the form y(t) = c(t) e qt, from which we find hence c (t) =, y(t) = [a + b t] e qt. More generally, if p is a root of multiplicity k, then forms a k-dimensional basis. y(t) = [ a + a 1 t + + a k 1 t k 1] e pt

8 Chapter 1 1.3 Euler s polygon approximation Consider a first-order scalar equation, { y = f(t, y) y(t ) = y. The goal is to determine the solution, y(t), at some point T > t (we could equally well consider T < t ). Euler s approximation scheme (1768) goes as follows: partition the interval [t, T ] into sub-intervals with breakpoints, or nodes t < t 1 < < t n = T. Then in each interval (t i, t i+1 ] approximate the solution by the first term of the Taylor series, namely, y 1 y = f(t, y )(t 1 t ) y 2 y 1 = f(t 1, y 1 )(t 2 t 1 ). y n y n 1 = f(t n 1, y n 1 )(t n t n 1 ) Thus we generate a polygon that approximates y(t). In each sub-interval, y(t) is approximated by a linear function which is tangent to the vector field at the left endpoint. The question is how accurately does y n approximate y(t ), and whether the accuracy improves as we refine the partition. This method for approximating solutions to odes is the most elementary one, and is known as the forward-euler method. y t t 1 T y n 1.4 Cauchy s existence and uniqueness theorem In 1824 Cauchy proved that Euler s polygon method converges under quite general conditions, as the discretization size tends to zero. We follow his proof step-by-step.cauchy s proof proves, on the one hand, that we study a sensible (well-posed) problem, which has a unique solution. Moreover, it proves that the forward Euler scheme provides a systematic way to approximate this solution to any required accuracy (at least in principle).

A brief introduction to ordinary differential equations 9 Lemma 1.1 Suppose that f(t, y) A on a rectangle D {(t, y) : t [t, T ], y y b}. Then, for all t t b/a: ➀ Euler s polygon solution, y h (t), remains in D for any partition of [t, T ]. ➁ The distance between y h (t) and y is bounded by y h (t) y A t t. ➂ If, in addition, f(t, y) f(t, y ) M for all (t, y) D, then y h (t) y (t t )f(t, y ) M t t. y y + A(t t ) y + f(t, y )(t t ) y h (t) t Proof : T y A(t t ) ➀ We start by proving that the solutions remains in D. Since y h (t) is piecewise linear it can be represented as t y h (t) = y + y h(s) ds. t Thus, as long as the solution remains in D we have y h (t) y t t max f(τ, z) ds A(t t ). τ,z D Suppose that the Euler polygon exits D for the first time at the point t < min(t +b/a, T ), i.e., y h (t) y = b, then b A(t t ), or t > t +b/a, which is a contradiction. ➁ Having established that the Euler polygon remains in D, we have y h (t) y A(t t ).

1 Chapter 1 More specifically, assume that t (t i, t i+1 ]. Then, y h (t) is obtained as follows: y h (t) = y i + f(t i, y i )(t t i ) y i = y i 1 + f(t i 1, y i 1 )(t i t i 1 ). y 1 = y + f(t, y )(t 1 t ). Combining together all these equations, y h (t) y = f(t, y )(t 1 t ) + + f(t i, y i )(t t i ). (1.7) Taking absolute values and using (i) the triangle inequality, and (ii) the bound A of f, we obtain y h (t) y A(t 1 t + t 2 t 1 + + t t i ) = A(t t ). ➂ From (1.7) we have, y h (t) = y + f(t, y )(t t ) + [f(t 1, y 1 ) f(t, y )] (t 2 t 1 )+ + + [f(t i, y i ) f(t, y )] (t t i ). Taking absolute values and using this time the bound M on the variation of f, we find y h (t) y f(t, y )(t t ) M(t t ). Lemma 1.2 Given a partition of the interval [t, T ], let y h (t) and z h (t) be Euler polygons that correspond to two different initial conditions, y and z at time t. If the function f(t, y) satisfies the Lipschitz condition, f(t, z) f(t, y) L z y, for some constant L and for all y and z in a convex region that contains the polygons y h (t) and z h (t), then, z h (t) y h (t) e L(t t) z y. (1.8) Comment: Cauchy originally assumed a bound on a weaker restriction. f y ; Lipschitz continuity is

A brief introduction to ordinary differential equations 11 Proof : Consider the two polygons at the point t 1 : z 1 = z + f(t, z )(t 1 t ) y 1 = y + f(t, y )(t 1 t ). Subtract and use the triangle inequality: z 1 y 1 z y + (t 1 t ) f(t, z ) f(t, y ) [1 + L(t 1 t )] z y e L(t1 t) z y, where we have used the inequality: 1 + x e x, x. Similarly, z 2 y 2 e L(t2 t1) z 1 y 1 z 3 y 3 e L(t3 t2) z 2 y 2. z h (t) y h (t) e L(t ti) z i y i, where as before we assume t (t i, t i+1 ]. Combining all the inequalities, we find: z h (t) y h (t) e L(t t) z y. Equipped with these two lemmas we can prove Cauchy s convergence theorem (1824): Theorem 1.1 (Cauchy) Let f(t, y) be continuous in time and Lipschitz in y with constant L. Let f A on the rectangle Then: D = {(t, y) : t [t, T ], y y < b, T t < b/a}. ➀ When h max i (t i+1 t i ), Euler s polygons converge uniformly to a continuous function, φ(t). ➁ φ(t) is continuously differentiable, and satisfies the differential equation φ (t) = f(t, φ(t)), with initial condition φ(t ) = y. ➂ There is no other solution to the differential equation on y y(t ) = y, on [t, T ]. = f(t, y), Proof :

12 Chapter 1 ➀ The proof resembles the convergence proof of Riemann sums. Let ɛ > ; since f is continuous on a compact set, it is uniformly continuous, implying that there exists a δ >, such that t m t n δ and y m y n A δ f(t m, y m ) f(t n, y n ) ɛ. Take an initial discretization of [t, T ], with h < δ. The corresponding polygon solution is denoted by y () (t). Consider now a refinement of this discretization that consists of adding a finite number of new points, (s 1,..., s k ), between t and t 1 ; the corresponding polygon is denoted by y (1) (t). y y () (t 1 ) y (1) (t 1 ) t t 1 t We now compare the two polygons at the first node, t 1 : y () (t 1 ) = y + f(t, y )(t 1 t ) y (1) (t 1 ) = y + f(t, y )(s 1 t ) + + f(s, y (1) (s k ))(t 1 s k ) Since for all j = 1,..., k we have s j t < δ and y (1) (s k ) y < Aδ, then we can use the bound on the variation of f: y (1) (t 1 ) y () (t 1 ) = (s 2 s 1 ) f(s 1, y (1) (s 1 )) f(t, y ) + + (t 1 s k ) f(s k, y (1) (s k )) f(t, y ) ɛ(t 1 t ). In the remaining points the two discretizations are identical, hence we can use Lemma 1.2, which bounds the deviation of two polygons that start from different initial values: y(1) (T ) y () (T ) e L(T t 1) ɛ(t 1 t ). We next introduce another refinement by adding new points between t 1 and t 2 ; the resulting polygon is denoted by y (2) (t). In a similar way we show that y (2) (t 2 ) y (1) (t 2 ) ɛ(t 2 t 1 ),

A brief introduction to ordinary differential equations 13 hence y(2)(t ) y (1)(T ) e L(T t 2) ɛ(t 2 t 1 ). We keep on adding points between every two breakpoints until we have a fully refined polygon, y (n) (t), that satisfies h (n) < h (). The difference between the refined and original polygon is bounded by: y(n) (T ) y () (T ) ɛ [(t 1 t )e L(T t1) + + (T t i )e L(T T )]. This finite sum is a Riemann sum of the monotonically decreasing function e L(T t), based on a function evaluation at the right end of each interval. Thus, the Riemann sum can be bounded by the integral, and y(n) (T ) y () (T ) ɛ T t e L(T s) ds = ɛ L [ ] e L(T t) 1. By our construction it is clear that the difference, in the supremum norm, between the Euler polygon y () (t) and an arbitrary refinement y (n) (t) is sup y (n) (t) y () (t) ɛ [ ] e L(T t) 1. t T L Consider now two Euler polygons, y h (t), and z h (t), with δ-fine discretizations. Comparing them to a polygon that it a refinement of both, we get by the triangle inequality, sup y h (t) z h (t) 2 ɛ t T L [ ] e L(T t) 1. Thus, for every ɛ > there exists a δ > such that any two discretizations that are finer than δ differ by less than a constant times ɛ. This is Cauchy s condition for uniform convergence! ➁ Let ɛ(δ) be the modulus of continuity of f, ɛ(δ) sup { f(t m, y m ) f(t n, y n ) : t m t n δ, y m y n A δ}. The continuity of f implies that ɛ(δ) as δ. From Lemma 1.1 we have, or, y h (t + δ) y h (t) δ f(t, y h (t)) ɛ(δ) δ, y h (t + δ) y h (t) δ f(t, y h (t)) ɛ(δ), Take the limit h (so that y h (t) φ(t)), and then take δ. This proves that φ(t) is differentiable and satisfies the differential equation.

14 Chapter 1 ➂ To prove uniqueness we prove that any solution to the initial-value problem must be equal to the limit of the polygon solutions. Let ψ(t) be a solution to ψ (t) = f(t, ψ(t)), ψ(t ) = y, i.e., for any t i and t > t i, ψ(t) = ψ(t i ) + t t i f(s, ψ(s)) ds. Given ɛ > we assume a partition that satisfies the condition on the variation of f. Denote now by y (i) (t) the Euler polygon whose initial condition is y (i) (t i ) = ψ(t i ), then ψ(t) y(i) (t) ɛ t ti for t [t i, t i+1 ]. Construct now the telescopic sum and use Lemma 1.2, ψ(t) y h (t) = ψ(t) y(i) (t) + y (i) (t) + y (1) (t) y h (t) ɛ(t t i ) + e L(t ti 1) ɛ(t i t i 1 ) + e L(t ti 2) ɛ(t i 1 t i 2 ) + + e L(t t1) ɛ(t 1 t ) ɛ [ ] e L(t t) 1. L In the limit h and ɛ, ψ(t) φ(t) =. The above existence and uniqueness theorem is local; it applies only for T not too far from the initial time t. This theorem can be extended to a global existence and uniqueness theorem by considering the end-point as a new initial point and continue the solution. Theorem 1.2 Let U be an open set in R 2 and f(t, y) be Lipschitz with constant L on U. Then, for every (t, y ) U, there exists a unique solution of the differential equation, which can be continued up the boundary of U. Exercise 1.5 Show that y = t 2 /4 and y = 1 t are both solutions of the initial-value problem, 2y = t 2 + 4y t y(2) = 1. Why doesn t this contradict the uniqueness theorem?

A brief introduction to ordinary differential equations 15 1.4.1 Error estimate for Euler s polygons In the course of the proof of Cauchy s theorem we showed that for any ɛ > and a discretization fine enough in terms of ɛ, the difference between the Euler polygon, y h (t), and any other refined Euler polygon, yĥ(t), is bounded by, y h (t) yĥ(t) ɛ [ ] e L(t t) 1. L In particular, this estimate is also a bound on the error with respect to the true solution, y(t): y h (t) y(t) ɛ [ ] e L(t t) 1. L This estimate can be refined for the case where f(t, y) is differentiable: Recall that L in the above bound is the Lipschitz constant of f(t, y), i.e., f(t, y) f(t, z) L y z in a neighborhood of the solution. ɛ is arbitrary, and h = max i (t i+1 t i ) has the property that for every t, s such that t s h, and y, z such that y z A h, f(t, y) f(y, z) ɛ, where A is a bound on f. This leads to the following refinement in the case where f(t, y) is differentiable: Theorem 1.3 Suppose that in the neighborhood of the solution f A f t M f y L, then, y h (t) y(t) M + AL L [ ] e L(t t) 1 h. That is, Euler s polygon is a first-order approximation to the solution y(t); the error scales with the mesh size h. Proof : Let t s h and y z A h, then by the mean value theorem, f(t, y) f(y, z) f/ t h + f/ y A h (M + AL) h. this establishes the relation between ɛ and h. In Figure 1.2 we display approximate solutions of the differential equation y = 1 3t + y + t 2 + ty with initial value y(.85) = 2 with four different (uniform) step sizes. A predicted, the smaller h is, the closer is the Euler solution to the exact solution (solid line).

16 Chapter 1 1.5 y.5 1 1.5 2 1.5.5 1 1.5 2 t Figure 1.2: Arrows: the vector field y = 1 3t + y + t 2 + ty. Solid line: solution for the initial value y(.85) = 2. The dotted lines correspond to four Euler approximations with uniform step sizes h =.2, h =.1, h =.5, and h =.25. Exercise 1.6 Apply Euler s method with constant step size, h, to the differential equation, y = y y() = 1. Obtain an approximation to the solution at t = 1, calculate its deviation from the exact solution, and compare the deviation to the error estimate that was derived above. Establish the dependence of the error on h by repeating the integration for various steps sizes. Exercise 1.7 Approximate the solution to the equation y = 1 3t + y + t 2 + ty, y(.85) = 2, at time t = 1 using the Euler method with fixed step size h. Use the values h = {.1,.5,.1,.5,.1,.5,.1}. Treat the solution with h =.1 as the truth, so that you can estimate the error in the evaluation of y(1) as a function of h. Plot the graph error(h) on a log-log coordinate system (equivalently, plot the logarithm of the error versus the logarithm of h). Explain your observations, and related them to the analytical error estimate. This exercise should teach you to estimate the order of convergence from numerical experiments.

A brief introduction to ordinary differential equations 17 1.5 Vector and matrix norms Before extending Cauchy s convergence theorem to systems of equations, we need to define vector and matrix norms, as a generalization to absolute values. 1.5.1 Definitions Definition 1.1 A vector norm,, is a function R n R (or C n R), that satisfies the following properties: ➀ x, x R n. ➁ x = iff x =. ➂ λx = λ x. ➃ x + y x + y, x, y R n. Example 1.5 The L p norm, denoted by p, is defined by: for p 1. ( n ) 1/p y p y p i, i=1 Definition 1.2 Let be a vector norm. Its subordinate matrix norm (the operator norm) is defined by Ax A sup x x. Exercise 1.8 Prove that for any subordinate matrix norm: AB A B, and that from that follows e A e A. 1.5.2 The Euclidean norm The Euclidean norm is the L 2 norm: ( n ) 1/2 y y i 2, i=1 where we consider the more general case of y C n. The Euclidean norm is related to the Euclidean inner product: y 2 = (y, y),

18 Chapter 1 where n (u, v) u i v i. i=1 What is the matrix norm subordinate to the L 2 vector norm? A 2 (Ax, Ax) = sup = sup x (x, x) x (A Ax, x), (x, x) where the superscript denotes the Hermitian transpose. Lemma 1.3 The eigenvalues of A A are all real and positive. Proof : Let the pair (λ, u) be an eigenvalue of A A with the corresponding eigenvector. Then, A Au = λu, and (u, A Au) = λ(u, u), from which immediately follows that hence λ is real and positive. Au 2 = λ u 2, Lemma 1.4 The eigenvectors of A A form an orthonormal basis in C n. Proof : This is a consequence of the spectral theorem for normal matrices, whereby every normal matrix B (i.e., BB = B B) has a n orthonormal set of n eigenvectors (see any linear algebra textbook for details). Let now x = i α iu i, where the u i are the orthonormal eigenvectors of A A with corresponding eigenvalue λ i. Then, (A Ax, x) (x, x) = i,j α iαj λ i(u i, u j ) i i,j α iαj (u = α i 2 λ i i, u j ) i α i 2. This ratio will be maximal if x is the eigenvector that corresponds to the largest eigenvalue. Hence, where ρ( ) denotes the spectral radius. A 2 = max λ i = ρ ( A A ), (1.9) i

A brief introduction to ordinary differential equations 19 1.5.3 The logarithmic norm Let be a subordinate matrix norm. We define the corresponding logarithmic norm (or log-norm) of a matrix: I + ha 1 µ(a) lim. h + h Example 1.6 What is the log-norm that corresponds to L 2? ( ρ (I + ha) (I + ha) ) 1/2 ( 1 ρ I + h(a + A ) ) 1/2 1 µ(a) = lim = lim. h + h h + h But since ρ ( I + h(a + A ) ) = 1 + hρ(a + A ), then [1 + h ρ(a + A )] 1/2 1 µ(a) = lim = 1 h + h 2 ρ(a + A ). Note that the log-norm is not a norm! It is not necessarily positive. Its importance stems from the following fact: Proposition 1.1 Let µ(a) be the log-norm associated with a subordinate matrix norm,. Then e At e µ(a)t, and the bound is sharp; µ(a) is the smallest constant for which this inequality holds. Proof : Since then e A(t+h) = e At (1 + ha) + e At k= e A(t+h) e At 1 + ha + h 2 e At A 2 and e A(t+h) e At h Taking the limit h + : h k+2 A k+2, (k + 2)! k= h k A k (k + 2)!, I + ha 1 e At + h e At A 2 e A t. h d dt eat µ(a) e At, with initial condition e = 1. From this differential inequality (see below) follows that e At e µ(a)t.

2 Chapter 1 Exercise 1.9 Express explicitly the log-norm corresponding to L. Exercise 1.1 Prove the following properties of the logarithmic norm: ➀ µ(αa) = α µ(a) for α. ➁ A µ(a) A. ➂ µ(a + B) µ(a) + µ(b). ➃ µ(a) µ(b) A B. 1.6 Existence and uniqueness for systems of equations Cauchy s existence and uniqueness theorem can be extended to systems of equations. We first formulate the system of equations in vector notation. A system of n first-order equations can be written in the following form, { y = f(t, y) y(t ) = y (1.1) where y(t) = (y 1 (t),..., y n (t)) T y(t ) = y = (y,1,..., y,n ) T f(t, y) = (f 1 (t, y),..., f n (t, y)) T. The goal is to find the value of the vector of functions, y(t), at a prescribed value of T t. The first step is to generalize Euler s scheme. Define a partition, t < t 1 < < t n = T, and construct a piecewise-linear function, y(t) = y i + (t t i ) f(t i, y i ) t [t i, t i+1 ] (note that the y i are vectors, y i = (y i,1,..., y i,n ) T ). Now, everything that was done for the scalar equation can be applied, almost verbatim, to the vector case. Since all (finite dimensional vector norms are equivalent, will refer to any vector norm. Lemma 1.5 Let y = f(t, y) be a system of n equations, with y(t ) = y. If f(t, y) A, then: 1. y h (t) y A t t.

A brief introduction to ordinary differential equations 21 2. If f(t, y) f(t, y ) ɛ in a convex region, U, that contains y h (t), then y h (t) y (t t )f(t, y ) ɛ t t. Lemma 1.6 Given a partition of the interval [t, T ], let y h (t) and z h (t) be the Euler polygons that correspond to different initial condition, y and z. If the function is Lipschitz, i.e., f(t, y) f(t, z) L y z, for all points in a convex region that contains the two polygons, then z h (t) y h (t) e L(t t) z y. Theorem 1.4 Let f(t, y) be Lipschitz with constant L, and let f A. Then Euler s polygons converge uniformly to a continuously-differentiable vector-valued function that satisfies the differential system, and is its unique solution. 1.7 Differential inequalities 1.7.1 Motivation We have approximated the solution y(t) to the system of differential equations, y = f(t, y), by an Euler polygon, y h (t). One is interested in analyzing the error, e(t) y h (t) y(t), as a function of t and h. Can we write a differential equation for e(t)? Note that this function is generally not differentiable. There are corners both due to the polygon and due to the norm. Instead, we introduce the following analogs of one-sided derivative: Definition 1.3 Let u(t) be a real-valued function. defined by: Its Dini derivatives are D + u(t) lim sup h + D u(t) lim sup h D + u(t) lim inf h + D u(t) lim inf h u(t + h) u(t) h u(t + h) u(t) h u(t + h) u(t) h u(t + h) u(t). h

22 Chapter 1 Let now w(t) be a vector of functions that have right-derivatives. From the triangle inequality, w(t + h) w(t) w(t + h) w(t). Dividing by h and taking h + we find, w (t) D + w(t) and w (t) D + w(t), where w (t) here denotes the right derivative. If we apply this inequality to the error, e(t) y h (t) y(t), then D + e(t) y h(t) y (t) = f(t i, y h (t i )) f(t, y(t)) = f(t i, y h (t i )) f(t, y h (t)) + f(t, y h (t)) f(t, y(t)) ɛ + L e(t), where L is the Lipschitz constant and ɛ = ɛ(δ) is the bound of the variation of f(t, y) if the variations of t and y are bounded by δ and Aδ, respectively. This equation almost looks like a differential inequality. It also has an initial condition, e(t ) =. Imagine that we replaced the Dini derivative by a standard derivative, and the inequality by an equality. The solution to e (t) = L e(t) + ɛ, e(t ) = is e(t) = ɛ L [ ] e L(t t) 1. If we reverted now to an inequality we would obtain a bound on the error, which is indeed the correct one. In the following subsection we show how such a procedure can be made rigorous. 1.7.2 Theorems on differential inequalities Theorem 1.5 Suppose that the functions u(t) and v(t) were continuous, and that for all t [t, T ] there exists a function g(t, y) such that Then, D + v(t) g(t, v(t)) D + u(t) > g(t, u(t)) v(t ) u(t ). v(t) u(t) t [t, T ]. (The same holds if D + is replaced by D +.)

A brief introduction to ordinary differential equations 23 Proof : Initially, v(t ) u(t ). Since the two functions are continuous, for the theorem not to hold, there must be a point at which the curves cross. By contradiction, assume a point t 2 for which v(t 2 ) > u(t 2 ), and define t 1 to be the first point on the left of t 2 at which the two curves intersect (t 1 could coincide with t ). For all < h < t 2 t 1 : v(t 1 + h) v(t 1 ) h and taking the limit h + : This contradicts the assumptions that: > u(t 1 + h) u(t 1 ), h D + v(t 1 ) D + u(t 1 ). D + v(t 1 ) g(t 1, v(t 1 )) = g(t 1, u(t 1 )) < D + u(t 1 ). We are now in measure to apply differential inequalities to bound the error of general approximation schemes: Theorem 1.6 Suppose that v(t) approximates the solution of a system of differential equations, y = f(t, y), y(t ) = y, and satisfies: v(t ) y(t ) e v (t) f(t, v(t)) ɛ f(t, v) f(t, y) L v y, where v (t) can be a right-sided derivative. Then for t t we have the error estimate: y(t) v(t) e e L(t t) + ɛ [ ] e L(t t) 1. L Comment: (1) The first condition bounds the error in the initial conditions; so far, we have always taken it to be zero. (2) The second condition is a bound on the defect of the approximation; how well does it satisfy the differential equation. (3) The third condition is the usual Lipschitz bound. (4) The error estimate has two contributions: one from the error in the initial condition, and one from the fact that v(t) does not satisfy the differential equation exactly. Proof : Let e(t) y(t) v(t). As we have seen, D + e(t) L e(t) + ɛ g(t, e(t)), e(t ) = e. To apply the above theorem, we need to compare e(t) to another function, u(t), for which u > g(t, u(t)), u(t ) = e.

24 Chapter 1 For any η >, has the necessary properties, hence u = L u + ɛ + η, u(t ) = e e(t) u(t) = e e L(t t) + ɛ + η L [ ] e L(t t) 1. In particular, as η +, we obtain the required bound. Exercise 1.11 Prove a variant of the above theorem, that the conditions: v(t ) y(t ) e v (t) f(t, v(t)) δ(t) f(t, v) f(t, y) l(t) v y. imply for t t : t ] t y(t) v(t) e [e L(t) + e L(s) δ(s) ds, L(t) = l(s) ds. t t 1.8 Linear systems Consider a linear ode. If the coefficient of the highest derivative, y (n), does not vanish (the equation has no singular points), we can write a general n th-order linear equation as y (n) + a 1 (t) y (n 1) + + a n (t) y = f(t). (1.11) If we define, y 1 (t) = y(t), y 2 (t) = y (t), until y n (t) = y (n 1) (t), then we obtain a system of equations, y 1 1 y 1 y 2 1 y 2 =.... 1. + y n a n (t) a n 1 (t) a 1 (t) y n which we write as a system: y = A(t) y + f(t) (with a slight abuse of notation for f(t)).. f(t), (1.12) The right-hand side is obviously Lipschitz in y, and therefore, if A(t) and f(t) are bounded, then the conditions for existence and uniqueness are satisfied. In particular, the continuity of A(t) and f(t) will ensure boundedness on any compact set, t [t, T ].

A brief introduction to ordinary differential equations 25 Theorem 1.7 For a homogeneous system, f(t) =, the solution y(t) depends linearly on the initial values. That is, there exists an n n matrix, R(t, t ), such that y(t) = R(t, t ) y(t ). Proof : The time evolution operator, T (t, t ), maps a vector in R n to another vector in R n. Namely, y(t) = T (t, t )y(t ). Take now two solutions, u 1 (t) and u 2 (t). a u 1 (t) + b u 2 (t) is also a solution, hence, By the linearity of the equation, But on the other hand, a u 1 (t) + b u 2 (t) = T (t, t ) [a u 1 (t ) + b u 2 (t )]. a u 1 (t) + b u 2 (t) = a T (t, t )u 1 (t ) + b T (t, t )u 2 (t ). This proves that T is a linear mapping, i.e., it can be represented as a matrix. By uniqueness, a solution that starts at (t, y(t )), or a solution that start at (t 1, R(t 1, t )y(t )), must be the same, i.e., y(t 2 ) = R(t 2, t )y(t ) = R(t 2, t 1 )R(t 1, t )y(t ), hence, R(t 2, t ) = R(t 2, t 1 )R(t 1, t ) t < t 1 < t 2. By integrating backwards, letting t = t 1 t, we must arrive by uniqueness to the starting point, hence, R(t, t 1 ) = [R(t 1, t )] 1. 1.8.1 The Wronskian Assume a homogeneous system y = A(t)y, or component-wise, y j = n a jk (t) y k. k=1 Suppose that we know n independent functions, y i (t), with components y i j (t), that satisfy the equation, (y i j) = n a jk (t)yk(t), i i = 1, 2,..., n. k=1

26 Chapter 1 We define the Wronskian matrix: y1(t) 1 y1(t) 2 y1 n (t) y2(t) 1 y2(t) 2 y2 n (t) W (t) =...., yn(t) 1 yn(t) 2 yn(t) n Each column of the Wronskian is a solution to the equation, hence W (t) satisfies the matrix equation W (t) = A(t) W (t). The solutions of the system are spanned by the n columns of the Wronskian. In other words, any solution, y(t), must be of the form: y 1 (t) y1(t) 1 y1(t) 2 y1 n (t) c 1 y 2 (t) y2(t) 1 y2(t) 2 y2 n (t). y n (t) or in matrix notation,. = c 2....., yn(t) 1 yn(t) 2 yn(t) n y(t) = W (t) c. If y(t ) is given, then the vector c can be found by inverting the Wronskian, hence, c = W 1 (t )y(t ), y(t) = W (t)w 1 (t )y(t ) = R(t, t ) = W (t)w 1 (t ). All solutions are therefore known if one finds n independent solutions. c n 1.8.2 Inhomogeneous linear systems Suppose we knew how to solve the homogeneous system, y = A(t)y. As noted, one only needs to know any set of n independent solutions to construct a Wronskian matrix. Any solution is then represented by the matrix R(t, t ) = W (t)w 1 (t ). Consider now the inhomogeneous system, y = A(t) y + f(t). Liouville (1838) proposed a solution based on looking for a solution of the homogeneous equation, with the constant vector c replaced by a function of t (variation of constants): y(t) = W (t)c(t).

A brief introduction to ordinary differential equations 27 Substitute this ansatz into the equation: y = AW c + f = W c + W c. But W = AW, hence or c(t) = c = W 1 (t)f(t), t t W 1 (s)f(s) ds + C. The solution is then: y(t) = W (t) t t W 1 (s)f(s) ds + W (t)c, and the constant vector C is obtained by the initial conditions, C = W 1 (t )y(t ). From that we deduce the following theorem: Theorem 1.8 Let A(t) and f(t) be continuous functions. The solution of y = A(t) y + f(t) with initial condition y(t ) is given by t w(t) = R(t, t )y(t ) + R(t, s)f(s) ds, t where R(t, s) is the solution operator of the homogeneous system. This formula is known as Duhammel s principle; it gives the solution of a linear inhomogeneous system in terms of the solution of the homogeneous part. Exercise 1.12 Verify explicitly that t w(t) = R(t, t )y(t ) + R(t, s)f(s) ds t is a solution of the system y = A y + f, where R(t, t ) = W (t)w 1 (t ). 1.8.3 Systems with constant coefficients A class of linear systems that can be solved exactly, i.e., the matrix R(t, t ) can be calculated, is the case where all the coefficients are constants, i.e., A does not depend on t.

28 Chapter 1 Motivation The interest in such systems is not only because of their restricted use, but mainly because of their application to local analysis of nonlinear systems. Consider a system of nonlinear and autonomous equations: y i = f i (y 1,..., y n ), and imagine that there is a vector y, which with no loss of generality will be taken to be the zero vector, for which all f i vanish. That is, y(t) = is a stationary solution (or a fixed point) of the system. One is often interested in the behavior of the solution in the vicinity of stationary points. We assume that y(t ) is very close to this point, and analyze the solution for t > t as long as it remains close to the stationary point. The function f can then be approximated by its first-order Taylor expansion: f i (y 1,..., y n ) n j=1 f i y j () y j, so that the system is approximated by a linear system with constant coefficients: n n y i f i a ij y j = () y j. y j j=1 Autonomous systems Autonomous systems are invariant under a shift of t. It doesn t matter if we integrate the initial data from t to t 1, or from to t 1 t, i.e., the matrix R(t, t ) satisfies the following symmetry: j=1 R(t, t ) = R(t t, ). Consider a (homogeneous) linear system with con- Solution by diagonalization stant coefficients: and look for a solution of the form y = A y, y(t) = v e λt, where v is a constant vector. Substitution yields, λv = A v, i.e., λ is an eigenvalue of the matrix A, and v is the corresponding eigenvector. Suppose now that there exist n eigenvectors that are linearly independent; let v j i denote the i th component of the j th eigenvector. Then, v1 1 v1 n v1 1 v n λ 1 1 A..... =.......... vn 1 vn n vn 1 vn n }{{}. λ n }{{} T Λ

A brief introduction to ordinary differential equations 29 and Λ = T 1 AT. Consider now the coordinate transformation: The equation satisfied by z(t) is z(t) = T 1 y(t). z = T 1 y = T 1 A y = T 1 AT z = Λ z. Hence, every component of the vector z(t) satisfies an independent equation, whose solution is Back to vector notation: z 1 (t) z(t) =. z n (t) = z i = λ i z, z i (t) = e λit z i (). e λ1t z 1 ()....... z n (). e λn t }{{} e Λt Reverting back to the original variables, we find which means that y(t) = T e Λt T 1 y(), R(t, ) = T e Λt T 1. It is left as an (easy) exercise to show that T e Λt T 1 = e At. Exercise 1.13 Show that for linear systems with constant coefficients, y = Ay, where A is diagonalizable, R(t, ) = e A t. Exercise 1.14 Compute the matrix R(t, t ), for the system and verify that it satisfies y 1 = y 2 y 2 = y 1 R(t 2, t ) = R(t 2, t 1 )R(t 1, t ) and R(t, t 1 ) = [R(t 1, t )] 1.

3 Chapter 1 Exercise 1.15 Find the solution of y 3y 4y = g(t) g(t) = { cos t t π 2 π 2 t, with initial conditions y() = y () = ; use the variation of constants. Non-diagonalizable case What if there is no similarity transformation that diagonalizes A? There is always one that transforms A to a Jordan canonical form, J = T 1 AT, such that (B 1 ) J =... (B r ) where each block B i is of the form, λ i 1 λ i 1 B i =....... As z = J z, the vector z can be partitioned into blocks, and each block can be solved independently. For the i th block, one finds: z 1 (t) e λit te λit t k 1 (k 1)! z 2 (t) eλit z 1 (). = e λit t k 2 (k 2)! eλit z 2 ()..... z k (t) e λit z k () 1.9 Local analysis We conclude this section with a few example that show possible behaviors of solutions in the vicinity of fixed points. We start with 2-by-2 linear systems, that are easy to visualize, and conclude with a nonlinear example which demonstrate the power of linearization and local analysis. λ i Example 1 Consider the following system of two linear equations with constant coefficients: y 1 = 3 y 1 y 2 y 2 = y 1 3 y 2. (1.13)

A brief introduction to ordinary differential equations 31 1.8.6.4.2.2.4.6.8 1 1.8.6.4.2.2.4.6.8 1 Figure 1.3: Phase space flow field for system (1.13). In matrix notation, ( y1 y 2 ) = ( 3 ) ( 1 y1 1 3 y 2 ). The eigenvalues of the matrix A are λ 1,2 = 2, 4. Hence, there exists a linear transformation T : R 2 R 2, so that z 1 (t) = e 2t z 1 () z 2 (t) = e 4t z 2 () This means that no matter where we start, the solution decays exponentially towards the origin, which is a stationary point. This results from the fact that all eigenvalues are negative! How does the solution look when we map back to the original variables, y? Because T maps the origin onto itself, the fundamental behavior is the same. The Figure 1.3 shows the phase-space flow; indeed, all trajectories are attracted by the fixed point at the origin. Example 2 Consider the following system: y 1 = y 2 y 2 = y 1 y 2. (1.14) The matrix of coefficients, A, is ( ) 1 A =, 1 1 and its eigenvalues are the complex pair, λ = 1± 3i 2. Because the real part of the two eigenvalues is negative, the solution (in the z plane) will approach

32 Chapter 1 1.5 1.5.5 1 1.5 1.5 1.5.5 1 1.5 Figure 1.4: Phase space flow field for system (1.14). the origin from all initial points. The imaginary part will induce a rotation. This behavior is again preserved by the linear transformation back to y. The trajectory flow is depicted in Figure 1.4: Example 3 Consider the system: y 1 = y 2 y 2 = y 1 y 2. (1.15) The matrix of coefficients, A, is ( ) 1 A =, 1 1 and its eigenvalues are λ = 1± 5 2. This time there is one negative and one positive eigenvalue. In the z plane one component grows exponentially while the other decays. This means that the solution will grow exponentially, unless the coefficient of this solution is zero. The phase-space flow in the original y coordinates is depicted in Figure 1.5. Example 4 Consider a nonlinear system: y 1 = 1 3 (y 1 y 2 )(1 y 1 y 2 ) y 2 = y 1 (2 y 2 ), (1.16) which has four stationary points, (, ), (, 1), ( 1, 2), and (2, 2). Figure 1.6 shows the flow lines of this system of equations, and the solid lines are trajectories.

A brief introduction to ordinary differential equations 33 1.5 1.5.5 1 1.5 1.5 1.5.5 1 1.5 Figure 1.5: Phase space flow field for system (1.15). 3.5 3 Farjoun1 Farjoun2 2.5 2 1.5 y2 1.5 -.5-1 -1.5-3 -2-1 1 2 3 4 y1 Figure 1.6: Phase space flow field for system (1.16) and trajectories.

34 Chapter 1 Exercise 1.16 Analyze the behavior by local analysis near each stationary point. That is, linearize the system near every fixed point, and determine its stability by calculating the corresponding eigenvalues. Use your results to the determine the orientation of the trajectories in Figure 1.6.