(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0

One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not an actual root. 1. Start with a and b such that a and b are opposite signs. 2. Choose midpoint c = a + b a/2. 3. I c has a sign opposite o a, then set b = c. Otherwise, set a=c c. 4. Repeat until desired tolerance is attained.

One Root: Brent s Method Brackets with a local quadratic interpolation o three points. At a given iteration, i the net computed point alls outside o the bracketing interval, a bisection step is used. Is the method underlying uniroot in R. More details in Press et al 1992. Brent s is the most is the method most highly recommended by NR or single nonlinear root-inding.

One Root: Newton s Method Local linear approimation using. Steps: With irst guess 0, compute 0 slope o approimating line. Net guess 1 is the root o the tangent line etending rom 0. Iterate until convergence. 1 0

A Comparison Method Requires? Guaranteed? Convergence Bisection No Yes Linear Brent s No Almost Superlinear Newton s Yes No Quadratic* * I close. These same relative trade-os eist or higher-dimensional procedures.

Optimization in One Dimension Problem: or a unction, ind m such that m > or m < or all m. We ll ocus on minima, since inding a ma or is equivalent to inding a min or. Global versus local: Multiple etrema. Boundaries.

One-dimensional: Golden Section Search An analogue to the bisection method or inding roots. Proceeds as ollows: Begin with 3 points 1 < 2 < 3, that are thought to contain a local minimum. Choose new point 0, such that 1 < 0 < 3. Form a new bracketing interval lbased on the relative values o 0 and 2. For eample, i 0 < 2, then the new interval is 0, 3 i 0 > 2, or it s 1, 2 i 0 < 2. Iterate until convergence.

What does Golden mean? The question is: ollowing the steps on the previous slide, how do we select 0? The answer is: we make a choice that guarantees a proportional reduction in the width o the interval at each step. For eample, i 0 < 2, then or this to happen regardless o the value o 0 we need to satisy 0 1 = 3 2 = α 3 1, where α represents the proportion o the interval eliminated at each step. To get the same reduction at the net iteration, the points also must satisy 2 0 = α 3 1 =α[α 3 1 + 2 0 ], so 2 0 = 3 1 α/1 α. Since 0 1 + 2 0 + 3 2 = 3 1, it ollows that 2α + α2/1 α = 1, a quadratic whose only solution satisying 0 < α < 1 is 3 5 / 2. Hence, the proportion o the interval remaining ater each iteration is given by which is known as the Golden Mean. 1 5 1 / 2 0.618,

How do we use the value α?? Start with an interval [ 1, 3 ] thought to contain the min. Select the interior points 0 = 1 + α 3 1 and 2 = 3 α 3 1. Evaluate 0 and 2. I 0 < 2, new interval is [ 1, 2 ] and net point selected is 1 + α 2 1. I 0 > 2, new interval is [ 0, 3 ] and net point selected is 3 α 3 0. Iterate.

Brent s Method Works in a manner analogous to Brent s or root-inding: local quadratic interpolation, with a saety net in case new points all outside o the bracket. Too complicated to describe here a lot o housekeeping computations, although you can ind out more in NR. The method used by R s optimize unction.

Mathematics and Statistics Solving Several Nonlinear Equations The problem is to ind solutions or a system o the orm The problem is to ind solutions or a system o the orm 0,,, 0,,, 2 1 2 2 1 1 p p 0,,, 2 1 2 p 0,,, 2 1 p p

Options Multivariate Newton s, or Newton-Raphson NR. Modiied NR line searches and backtracking. Multivariate i t secant method Broyden s method. Similar trade-os apply as we discussed with one equation in terms o convergence and knowledge o the Jacobian.

Why is inding several roots such a problem? There are no good, general methods or solving systems o more than one nonlinear equation rom NR. Oten, unctions 1, 2,, p have nothing to do with each other. Finding solutions o s means identiying where e the p zero eo contours in the p 1 zero hypersuraces simultaneously intersect. These can be diicult to home in on without t some insight into how the p unctions relate to one another. See eample on ollowing slide, with p = 2.

Reproduced rom Numerical Recipes:

Mathematics and Statistics Developing a multivariate linear approimation: Let denote the entire vector o p unctions, and let = 1,, p denote an entire vector o values i, or i=1,,p. Taylor series epansion o i in a neighborhood o :. 2 p j i i i O Taylor series epansion o i in a neighborhood o :. 1 j j j i i O Note that the partial derivatives in this equation arise rom the Jacobian matri J o. So in matri notation we have: 2 J O. J O

Newton-Raphson From epansion on previous slide, neglecting terms o order δ 2 and higher, and setting equal to zero we obtain a set o linear equations or the corrections δ 2 that move each unction simultaneously l closer to zero: J, which can be solved using LU decomposition. This gives us an iterative approach correcting and updating a solution: new old, which we can iterate to convergence i.e., how close either g the 1-norm or -norm o δ is to zero.

Evaluating the Jacobian As we oten cannot easily evaluate the Jacobian analytically, a conventional option is numerical dierentiation. Numerical evaluation o the Jacobian relies on inite dierence equations. Approimate value o the i,jth element o J is given by: J ij [ i h je j i ]/ h j, where h j is some very small number and e j represents a vector with 1 at the jth position and zeroes everywhere else.

Modiied Newton-Raphson Note that a ull Newton step can be represented as J 1. When we are not close enough to the solution, this is not guaranteed to decrease to decrease the value o the unction. How do we know i we should take the ull step? One strategy is to require that the step decrease the inner product o, which is the same requirement as trying to minimize = /2. Another is to note that that the Newton step is a descent direction: J J 1 0.

Strategy: Modiied Newton-Raphson continued i. Deine p = δ, and a Newton iteration as new old p, where a ull Newton step speciies λ = 1. ii. I is reduced, then go to net iteration. iii. I is not reduced, then backtrack, selecting some λ < 1. Value o λ or a conventional backtrack is selected to ensure that the average rate o decrease is at least some raction o the initial rate o decrease, and that rate o decrease o at new value o is some raction o the rate or the old value o.

Multidimensional Optimization The problem: ind a minimum i or the unction 1,,, p. Note that in many statistical applications the unctions we wish to optimize e.g., loglikelihoods are conve, and hence airly well behaved. Also, in terms o the various approaches, options involve trade-os between rate o convergence and inormation about the gradient and Hessian. The latter two can oten be numerically evaluated.

Strategies 1. Newton-Raphson applied to the gradient. 2. Nelder-Mead Simple Method no gradient required. 3. Powell s Method. 4. Conjugate Gradient Methods. 5. Variable Metric Methods.

Nelder-Mead Simple Approach Simple is a igure with p+1 vertices in p dimensions a triangle in two dimensions, i or a tetrahedron t in three dimensions. i Start with a set o p+1 points that deine a inite simple i.e., one having inite volume. Simple method then takes a series o relective steps, moving the highest point where the is largest through the opposite ace o the simple to a lower point. Steps are designed to preserve the volume, but simple may epand lengthen where easible to acilitate convergence. When simple reaches a valley loor, it takes contractive steps. NR implementation descriptively reers to this routine as NR implementation descriptively reers to this routine as amoeba.

Possible simple moves:

Powell s Method aka, Direction Set Methods We know how to minimize a single nonlinear equation. Given a one-dimensional approach, a direction set method proceeds as ollows: Start at a point 0 = 1, p. Consider a set o vector directions n 1, n 2,,n p e.g., these might arise rom the gradient o. In the direction n 1, ind the scalar that minimizes 0 +λn 1 using a one-dimensional method. Replace 0 with 0 +λn 1. Iterate t through h n 2,,n p, and continue iterating ti until convergence. Note that you can use whatever nonlinear optimization routine you want say, Brent s or the Golden Section Search.

Conjugate Gradient Methods I you can compute the gradient, it turns out that you can enjoy substantial computational savings over a direction set method. Idea is to choose directions based on the gradient, but it turns out that the path o steepest descent i.e., given a current guess i or the minimum, the path o steepest descent is the negative gradient evaluated at i is not a good direction. See igure on slide ollowing. Instead, a set o conjugate directions are derived such that the we will not just proceed down the new gradient, but in a direction that is conjugate to the old gradient and conjugate to all previous directions traversed. Note: given the symmetric hessian H, two vectors i and n j are said to be conjugate i i Hn j = 0.

Problems with Steepest Descent a In a long, narrow valley, steepest descent takes many steps to reach the valley loor. b For a single magniied step, direction begins perpendicular to contours, but winds up parallel to local contours where minimum is reached.

Quasi-Newton Methods Similar to conjugate gradient methods, in the sense that we are accumulating inormation rom p successive line minimizations using gradient inormation to ind the minimum o a quadratic orm. Quasi-Newton methods can be thought o as a means o applying Newton-Raphson to the gradient, without the need or the Hessian. Using N-R with the gradient, given a current guess i the net guess is given by: 1 H. i1 i i Note that with quasi-newton, we start out with a positive-deinite matri used as an approimation to the Hessian. Successive iterations update this approimation, which converges to the actual Hessian. Most common implementations o this approach are so-called Davidon- Fletcher-Powell DFP and Broyden-Fletcher-Goldarb-Shanno BFGS algorithms.

Newton-Raphson in R: Mathematics and Statistics

Simple Mathematics and Quasi-Newton Statistics Methods Dr. Corcoran in R: STAT 6550