Solving non-linear systems of equations

Solving non-linear systems of equations Felix Kubler 1 1 DBF, University of Zurich and Swiss Finance Institute October 7, 2017 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 1 / 38

The problem We consider a system of equations f (x) = 0, f : U R n R n We will usually assume that f is C 2 although often/sometimes the methods go through with f being continuous. In the economic applications, we will make assumptions that ensure that there is some x with f ( x) = 0 and with D x f ( x) being invertible. If x satisfies the latter condition, we call the solution locally unique. Sometimes methods work without this, but if Jacobian is ill-conditioned, we re typically in trouble Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 2 / 38

Economic application t = 0,..., T. Uncertainty driven by a Markov chain s t {1,..., S} with transition π. Denote by s t a history of shocks (node of the event tree) H agents with utility U h (c) = E 0 T β t u(c t ), Agents have stationary endowments e h (s t ) that depend on the shock, s t. J long lived assets, Lucas trees, that pay dividends d(s t ) 0. It is useful to normalize the price of the consumption good to 1 at each node and we denote by q(s t ) R J the asset prices at node s t. t=0 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 3 / 38

Economic application (cont) A financial markets equilibrium consists of prices (q(s t )) T 1 t=0 and agents choices (c h (s t ), θ h (s t )) h=1,...,h such that markets clear, i.e. for all agents h, (θ h, c h ) arg max U h (c) s.t. c(s t ) = e h (s t ) + θ(s t 1 )(q(s t ) + d(s t )) θ(s t )q(s t ) θ(s T ) = 0 for all s T θ(s 1 ) = 0 and for all s t, t < T H θ h (s t ) = 0. h=1 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 4 / 38

Economic application (cont) Can write down system of Euler equations and market clearing. If d j (.) are close to collinear, this will be very ill-conditioned For the case of complete markets better to solve with Negishi s method Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 5 / 38

Newton s method Suppose that f : R n R n is continuously differentiable. Let z satisfy f (z ) = 0. Suppose D z f (z) is non-singular, define N f (z) = z (D z f (z)) 1 f (z) Given some z 0 define a sequence (z i ) by z i+1 = N f (z i ), i = 0,..., assuming that D z f is non-singular on the entire sequence. If z 0 sufficiently close to z then z i converges to z. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 6 / 38

Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 7 / 38

Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 8 / 38

Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 9 / 38

Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 10 / 38

Newton s method We say that a point x is in the convergence radius of Newton s method (sometimes people say x is an approximate zero) if the sequence x 0 = x and x i+1 = N f (x i ) is defined for all i = 0, 1, 2,..., and if there is a z such that f (z) = 0 and x i z 1 z x. 2 2i 1 Two obvious questions: Can we determine if x 0 is an approximate zero? What if we do have a good starting point? Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 11 / 38

Smale s alpha Let D R n be open and let f : D R n be analytic. For z D, define f (k) (z) to be the k th derivative of f at z. This is a multi-linear operator which maps k-tuples of vectors in D into R n. Define the norm of an operator A to be Ax A = sup x 0 x. Suppose that the Jacobian of f at z, f (1) (z) is invertible and define γ(z) = sup k 2 (f (1) (z)) 1 f (k) (z) k! 1 (k 1) and β(z) = (f (1) (z)) 1 f (z). Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 12 / 38

Smale s alpha theorem Theorem Given a z D, suppose the ball of radius (1 contained in D and that Then there exists a z D with β( z)γ( z) < 0.157. f ( z) = 0 and z z 2β( z). 2 2 )/γ( z) around z is Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 13 / 38

Better convergence for NM - univariate case Note that for the one-variable case if we know that f (a) 0 and f (b) 0 for some a < b it is trivial to find a zero in [a, b] via bisection: If f ( a+b a+b a+b 2 ) 0 let a = 2, otherwise let b = 2 and repeat. T his can be combined with Newton s method by discarding a Newton-iterate whenever it lies outside of the interval in which it is known a root lies. Eventually, Newton s method will make the bisection step unnecessary and we get the quadratic convergence (which is much faster than the linear convergence of bisection). Unfortunately this does not generalize to several dimensions Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 14 / 38

Better convergence for NM - univariate case (cont.) One simple idea that generalizes to several dimensions is to fix some small δ > 0 and to replace the Newton iteration by requiring x i+1 = x i f (x i) δβ, where β = sgn(f (x i )) if f (x i ) < δ and β = f (x i ) otherwise. This alone will imply that, depending on δ NM does not explode, but it will not ensure convergence. One needs to ensure that in each iterate we get closer to a zero. One way to do this is to ensure that f (x) 2 becomes smaller: x i+1 = x i α if (x i ) δβ. where α i is chosen to ensure that f (x i+1 ) 2 < f (x i ) 2. It is true that it is not obvious how to find an α i like this but there are efficient methods to do this and this idea generalizes to several dimensions. Does not always work! Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 15 / 38

Bad Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 16 / 38

Bad Newton Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 17 / 38

Better convergence for NM Obviously one can view the quest of finding an x such that f (x ) = 0 as 1 min f (x) = x 2 f (x)t f (x) Denote the gradient of f by g i.e. g(x) = D x f (x) = (Dx f (x)) T f (x). We write NM somewhat differently: x i+1 = x i + α i p i where α i is a scalar that gives us the step-length and p i gives the direction. Above we obviously have D x f (x i )p i = f (x i ) Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 18 / 38

Better convergence for NM (cont.) As in the one-dimensional case, we want to ensure that the Newton-step decreases f, i.e. we impose for some ɛ > 0. g(x i ) T p i g(x i ) p i ɛ We also want to ensure that the step-length α i is neither too large (and takes us away from the solution) or too small. One way to do this is to impose g(x i + αp i ) T p i ηg(x i ) T p i It can be shown that (under some additional mild conditions) this method gives a sequence of x i such that g(x i ) 0. If D f (x i ) is invertible this implies that f (x) = 0 and we are done. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 19 / 38

Better convergence for NM (concl.) It is strong assumption to impose invertibility of D x f everywhere (it is not enough to have it at x, obviously, since we might not reach x ). The idea out is as follows: ione can obtain the Newton step, p i by solving (D x f (x i )) T D x f (x i )p i = (D x f (x i )) T f (x i ) if the Jacobian is non-singular. If it is singular, one can solve instead ( ) (D x f (x i )) T D x f (x i ) + λ i I p i = (D x f (x i )) T f (x i ). For λ i > 0 this always has a unique solution and if on picks the λ i cleverly this can lead to x. Homotopy methods... Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 20 / 38

NM - Practical issues Always important to have a good starting point. It pays to first solve simple problems and use solution as starting point to more complicated ones Evaluation of the Jacobian can be very costly. Even if analytic derivatives are available Alternative is to use quasi-newton methods (e.g. Broyden) that do not need the Jacobian at every step Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 21 / 38

Non-linear equations - Solvers Important to have a good method to solve non-linear equations! fsolve in Matlab is getting better, used to be horrible (check matlab version) KNITRO, NAG, IPOPT are alternatives which should be taken seriously. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 22 / 38

Very quick outline of homotopy methods 1 Let X, Y R k. Two mappings f, g : X Y are called homotopic if there exists a smooth map H : X [0, 1] Y with H(x, 0) = f (x) and H(x, 1) = g(x) for all x X. The map H is called a (smooth) homotopy. Sometimes one the solution set is a smooth path that connects a solution to f (x) = 0 to a solution to g(x) = 0 and f (x) = 0 is easy to solve. Can follow that path numerically Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 23 / 38

Very quick outline of homotopy methods 2 Given a homotopy function H : R n [0, 1] R n, if D (x,t) H has rank n for all y H 1 (0), then H is called a regular homotopy By the preimage theorem, H 1 (0) only consists of nice paths. Putting it differently, at any t [0, 1) if H(x, t) = 0 and if D x H(x, t) has full rank n we can apply the implicit function theorem and D t x = ( D x H(x, t)) 1 D t H(x, t). If H is regular, D (x,t) H will have rank n and therefore some partial Jacobian (D (x,t) H) i (i.e. taking out one column i of the Jacobian of D (x,t) H) will have full rank and we can imply the implicit function theorem to this. In this case the path might not be strictly increasing in t but that s OK. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 24 / 38

Very quick outline of homotopy methods 3 Of course very few of these paths are good paths if we want to follow the path to compute a solution to H(x, 1) = 0. So in addition to making sure that H 1 (0) is a smooth one-dimensional manifolds, we need some more requirements on H(.,.). 1 Unique starting point: There should be a unique solution to H(x, 0) = 0 2 Boundary-free path: For all t [0, 1] the solution set {x : H(x, t) = 0} should be bounded. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 25 / 38

Very quick outline of homotopy methods 4 To use this numerically, it is useful to describe the path using an additional parameter, arc-length, which measures the distance moved along the path. Along the solutions to H(x, t) = 0 which originate at H(x, 0) = 0, let y = (x, t) be a function of the arc-length r R +, i.e. y(r) H 1 (0) for all r 0. If ẏ i = dy i (r) dr, since H(y(r)) = 0, by the chain rule n+1 H ẏ i = 0. y i i=1 The solution y(r) also solves the following differential equation, ẏ i = ( 1) i det((d y H(y)) i ) for all i = 1,..., n + 1. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 26 / 38

Very quick outline of homotopy methods 5 We use this differential equation to follow the path. Suppose on iteration k we are at y k close to some y with H(y) = 0. For some step-length d k define For some b R n, define ( Q(y) = y k+1 = y k + d k ẏ k H(y) b(y y k ) Apply Newton s method to this function, taking b = ẏ k. ) Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 27 / 38

Very quick outline of Gröbner bases Although not the main focus of this class, sometimes algbraic methods can be useful Can compute roots with arbitrary precision Can find all solutions See Kubler and Schmedders (2010, OR) or Renner, Kubler and Schmedders in Handbook of Computational Economics III Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 28 / 38

Polynomials Monomial in x 1, x 2,..., x n : x α x α 1 1 x α 2 2... x n αn. Exponents α = (α 1, α 2,..., α n ) Z N + Polynomial f in the n variables x 1, x 2,..., x n is a linear combination of finitely many monomials with coefficients in a field K f (x) = α S a α x α, a α K, S Z N + finite Examples of K: Q, R, C Write f K[x 1,..., x n ] Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 29 / 38

Polynomial Equation Solving Objective: find all (positive) real solutions to f (x) = 0 Study of polynomial equations on algebraically closed fields Field R is not algebraically closed, so interpret f : C n C n Define solution set V (f 1, f 2,..., f n ) = {x C n : f 1 (x) = f 2 (x) =... = f n (x) = 0} Polynomial ideal is f 1,..., f n is one basis of I I = f 1,..., f n = { n h i f i : h i K[x]} i=1 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 30 / 38

Polynomial Equation Solving (cont.) Obviously a given ideal can have different bases, i.e. for polynomials g 1,..., g k and f 1,..., f n g 1,..., g k = f 1,..., f n One basis with nice properties is the so called lexicographic Gröbner basis Why do we care? If g 1,..., g k = f 1,..., f n then V (f 1,..., f n ) = V (g 1,..., g k ) To solve a system of equations, find another basis which has nice properties, e.g. is triangular Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 31 / 38

Shape Lemma and Gröbner basis Let I = f 1, f 2,..., f n be a radical zero dimensional polynomial ideal. Let G denote the (lexicographic) Gröbner basis of I. Suppose all roots have distinct value for last coordinate x n THEN G = {x 1 v 1 (x n ), x 2 v 2 (x n ),..., x n 1 v n 1 (x n ), r(x n )} Polynomial r has degree d, polynomials v i have degrees less than d Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 32 / 38

Buchberger s Algorithm Buchberger s algorithm allows calculation of Gröbner bases If all coefficients of f 1,..., f n are rational then the polynomials r, v 1, v 2,..., v n 1 have rational coefficients and can be computed exactly! Solving f 1 (x) = f 2 (x) =... = f n (x) = 0 reduces to solving r(x n ) = 0 Software SINGULAR computes Gröbner basis and numerically approximates all solutions Available free of charge at www.singular.uni-kl.de Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 33 / 38

Shape lemma in SINGULAR Consider the following simple example: x yz 3 2z 3 + 1 = x + yz 3z + 4 = x + yz 9 = 0 ring R = 0, (x, y, z), lp; ideal I = ( x y z 3 2 z 3 + 1, x + y z 3 z + 4, x + y z 9); groebner(i); Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 34 / 38

Shape lemma in SINGULAR Consider the following simple example: x yz 3 2z 3 + 1 = x + yz 3z + 4 = x + yz 9 = 0 ring R = 0, (x, y, z), lp; ideal I = ( x y z 3 2 z 3 + 1, x + y z 3 z + 4, x + y z 9); groebner(i); [1] = 2z11 + 3z9 5z8 + 5z3 4z2 1 [2] = 2y +18z10+25z8 45z7 5z6+5z5 5z4+5z3+40z2 31z 6 [3] = 2x 2z9 5z7 + 5z6 5z5 + 5z4 5z3 + 5z2 + 1 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 34 / 38

Parameter Shape Lemma - Assumptions What if last equation reads ex + yz 9 = 0 with e being a parameter? Let E R m, be an open set of parameters, (x 1,..., x n ) C n a set of variables and let f 1,..., f n K[e 1,..., e m ; x 1,..., x n ], K R. Assume that for each ē = (ē 1,..., ē m ) E, the ideal I(ē) = f 1 (ē; ),..., f n (ē; ) is zero-dimensional and radical. Furthermore, assume that there exist u 1,..., u n K, such that for all ē any solutions x x satisfying f 1 (ē; x) =... = f n (ē; x) = 0 = f 1 (ē; x ) =... = f n (ē; x ) also satisfy i u i x i i u i x i. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 35 / 38

Parameter Shape Lemma - Result There exist r, v 1,..., v n K[e; y] and ρ 1,..., ρ n K[e] such that for y = i u ix i and for all ē outside a closed lower-dimensional subset E 0 of E, {x C n : f 1 (ē, x) =... = f n (ē, x) = 0} {x C n : ρ 1 (ē)x 1 = v 1 (ē; y),..., ρ n (ē)x n = v n (ē; y) for r(ē; y) = 0}. Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 36 / 38

Shape lemma with parameters in SINGULAR ring R= (0, e), (x, y, z), lp; ideal I = ( x y z 3 2 z 3 + 1, x + y z 3 z + 4, e x + y z 9); groebner(i); Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 37 / 38

Shape lemma with parameters in SINGULAR ring R= (0, e), (x, y, z), lp; ideal I = ( x y z 3 2 z 3 + 1, x + y z 3 z + 4, e x + y z 9); groebner(i); [1] = 2 z11 + 3 z9 5 z8 + (5e) z3 + ( 4e) z2 + ( e) [2] = ( e2 e) y + ( 8e 10) z10 + ( 10e 15) z8 + (20e + 25) z7 + (5e) z6 + ( 5e) z5 + (5e) z4 + ( 5e) z3 + ( 20e2 20e) z2 + (16e2 + 15e) z + (3e2 + 3e) [3] = ( e 1) x +2 z9+5 z7 5 z6+5 z5 5 z4+5 z3 5 z2 1 Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 37 / 38

An example in matlab (and Singular) Matlab-command fsolve can be used to solve non-linear equations numerically Might need a pretty good guess :) Example: x 2 + 2y 2 5x + 7y = 40, 3x 2 y 2 + 4x + 2y = 28 Write as function in matlab... Call with guess = [2, 3] [result, fval, exitflag, output] = fsolve(@eqns, guess) Felix Kubler Comp.Econ. Gerzensee, Ch2 October 7, 2017 38 / 38