Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of m k (p) within the trust region. Typically the trust region is chosen to be a ball around x k of radius k that is updated every iteration. For poorly scaled problems ellipsoidal trust regions can be chosen. The model m k (p) is typically quadratic and given by m k (p) = f k + f T k p + 1 2 pt B k p, f k := f(x k ), f k := f(x k ), and B k is some symmetric matrix. When B k = f k we have a trust region Newton method. In the rest of this section, we will discuss the outline of the trust region algorithm and its convergence, and exact and approximate techniques for solving the constraint optimization problem (1) m k (p) = f k + f T k p + 1 2 pt B k p, p k k. 1.1. Outline of the algorithm and convergence. The agreement between the model m k and the objective function within the trust region is quantified by the ratio (2) ρ k := f(x k) f(x k + p k ) m k (0) m k (p k ). The numerator is called the actual reduction, and the denominator is called the predicted reduction. The predicted reduction is always nonnegative. If ρ k is close to 1, the model is quite accurate, and the trust region can be increased. If ρ k 0, the model makes a poor prediction. Then the trust region needs to be decreased and the the step needs to be rejected. The algorithm implementing these ideas is given below. Algorithm Trust Region Input: max, 0 (0, max ], η [0, 1 4 ). for k = 0, 1, 2,... Obtain p k by solving Eq. (1) exactly or approximately; Calculate ρ k from Eq. (2); if ρ k < 1 4 set k+1 = 1 4 p k else if ρ k > 3 4 and p k = k set k+1 = min{2 k, max }; else set k+1 = k ; if ρ k > η accept step: x k+1 = x k + p k ; else reject step x k+1 = x k ; end The convergence properties of this algorithm depend on the parameter η and on whether some sufficient decrease is achieved at every iteration. The sufficient decrease condition is 1
2 given by the inequality (3) m k (0) m k (p k ) c 1 f k min { k, f } k, c 1 (0, 1]. B k Theorem 1. Suppose B k β for some constant β, and f is continuously differentiable and bounded from below on the set {x f(x) f(x 0 )}. Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf k f k := lim k ( inf m>k f m ) = 0, i.e., one can extract a subsequence from { f k } converging to zero. (2) if η (0, 1 4 ) in the algorithm Trust Region and f is in addition Lipschitz continuously differentiable in the set {x f(x) f(x 0 )}, then lim f k = 0. k 1.2. Characterization of the exact solution of the trust region problem. Theorem 2. Let p be the global solution of the trust-region problem (4) min p R n m(p) = f + gt p + 1 2 pt Bp, p, if and only if there is a scalar λ 0 such that the following conditions are satisfied (5) (6) (7) (B + λi)p = g, λ( p ) = 0, (B + λi) is positive semidefinite. Condition (6) shows that at least one of the following holds: λ = 0 or p =. This means that p is a global minimizer of m(p) or, if not, p =, i.e., if p is not a global minimizer of m(p), the constrained minimum is achieved on the boundary of the region. Condition (5) implies that if λ > 0, p B k p + g m(p ), i.e., p is orthogonal to the level sets of m(p). Condition (7) tells that λ 1 λ 2... λ n are the eigenvalues of B then λ [ λ 1, ). The proof of this theorem relies on the following lemma. Lemma 1. Let m be the quadratic function defined by where B is any symmetric matrix. Then m(p) = g T p + 1 2 pt Bp,
(1) m attains a minimum if and only if B is positive semidefinite and g is in the range of B; (2) m has a unique minimizer if and only if B is positive definite; (3) if B is positive semidefinite, then every p satisfying Bp = g is a global minimizer of m. Note that of g is not in the range of B then m(p) does not attain minimum. For example, let m(x, y) = y + x 2. Here B =. Proof. [ 1 0 0 0 ] [ 0, while g = 1 ] is not in the range of B. Obviously, min m(x, y) = (1) ( =): Since g is in the range of B, one can find p such that Bp = g. Then for all w R n we have m(p + w) = g T (p + w) + 1 2 (p + w)t B(p + w) = (g T p + 1 2 pt Bp) + g T w + (Bp) T w + 1 2 wt Bw = m(p) + 1 2 wt Bw m(p), since B is positive semidefinite. (= ): Let p be a minimizer of m. Since m(p) = Bp + g = 0, g is in the range of B. Also, m(p) = B is positive semidefinite. (2) ( =): Since B is positive definite and hence invertible, one can find p such that Bp = g. Repeating the calculation from the previous item and taking into account that 1 2 wt Bw > 0 for all nonzero w we obtain that the minimizer is unique. (= ): Let p be a minimizer of m. From the proof of previous item B must be positive semidefinite. If B is not positive definite, one can find w 0 such that Bw = 0. Then m(p) = m(p + w), hence the minimizer is not unique, a contradiction. (3) The proof of the last item follows from the proof of the first item. Now we will proof Theorem 2 Proof. ( =): Suppose there is λ 0 such that Eqs (5)-(7) are satisfied. Lemma 1 (3) implies that p is a global minimum of the quadratic function Since ˆm(p) ˆm(p ), we have ˆm(p) = g T p + 1 2 pt (B + λi)p = m(p) + λ 2 pt p. m(p) m(p ) + λ 2 (p T p p T p). Since λ( p ) = 0 and therefore λ( 2 p T p ) = 0 we have m(p) m(p ) + λ 2 ( 2 p T p). 3
4 Since λ 0 we have that m(p) m(p ) for all p such that p. Therefore, p is a global solution of Eq. (4). (= ): Suppose p is a global solution of Eq. (4). First consider the case where p <. Then p is an unconstrained minimizer of m(p). Hence m(p ) = g + Bp = 0, m(p ) = B is positive semidefinite. Hence Eqs. (5)-(7) hold for λ = 0. Now we assume that p =. Then Eq. (6) is satisfied. Hence p is the minimum of m satisfying the constraint p =. Then the Lagrangian function has a stationary point at p satisfying L(p, λ) = m(p) + λ 2 (pt p 2 ) p L(p, λ) = Bp + g + λp = (B + λi)p + g = 0. Hence Eq. (5) holds. Since m(p) m(p ) for all p such that p = we have m(p) m(p ) + λ 2 (p T p p T p). Substituting the expression for g g = (B + λi)p into the last equation we get 1 2 (p p ) T (B + λi)(p p ) 0. Since the set of directions {w : w = ± p } p p p, p = is dense in the unit sphere, we conclude that (B + λi) is positive definite. It remains to show that λ 0. Since we have proven that (B + λi)p = g and B + λi is positive semidefinite, we have that p is a minimum of ˆm(p) = g T p + 1 2 pt (B + λi)p. Hence ˆm(p) ˆm(p ), i.e., m(p) m(p ) + λ 2 (p T p p T p). Now suppose that only some negative λ satisfies Eqs. (5)-(7). Then from the last equation we have that m(p) m(p ) whenever p p =. Since p minimizes m for all p with p we conclude that p is an unconstrained global minimizer of m. From Lemma 1 (1) it follows that them Bp = g and B is positive semidefinite. Hence Eqs. (5)-(7) are satisfies by λ = 0. This contradicts to the assumption that λ satisfying these conditions is negative. Thus, there exists λ 0 satisfying Eqs. (5)-(7).
1.3. Calculation of nearly exact solution. We start solving the trust region problem (4) with checking whether B is positive definite, and if it is, checking whether p = B 1 g satisfies p <. If B is positive semidefinite and g is in the range of B, one can find the minimum norm solution p of the underdetermined system Bp = g and check whether p. Now suppose that either B is not positive semidefinite or the global minimizer of m satisfies p >. Then we define p(λ) := (B + λi) 1 g, λ max{0, λ 1 }, where λ 1 is the smallest eigenvalue of B, and look for λ such that Let B = QΛQ T where Then Then p(λ =. Λ = diag{λ 1,..., λ n }, λ 1... λ n, Q = [q 1... q n ], p(λ) = Q(Λ + λ j I) 1 Q T = (8) p(λ) 2 = n j=1 q T j q k = δ jk n j=1 (q T j g)2 (λ j + λ) 2. q T j g λ j + λ q j. Therefore, the problem of solving (4) is reduced to the 1D root-finding problem (8). Note that if B is positive definite and B 1 g > then there is exactly one solution λ of Eq. (8) on the interval [0, ) since Read [1] for details. lim p(λ) = 0. λ 1.4. Approximate solution of the trust region problem. Three approaches for approximate solution of the trust region problem (4) are considered in [1]: the Dogleg approach, the 2D subspace approach: p span{g, B 1 g}, and Steihaug s approach good for large and sparse B = f k and based on the Conjugate Gradient method. We will consider the Dogleg approach and the 2D subspace minimization approach. We will start with the concept of the Cauchy point that is used for reference: the approximate solution must reduce the objective function f at least as much as the Cauchy point does. 5
6 1.4.1. Cauchy point. The Cauchy point is the minimizer of m k (p) = f k + f T k p + 1 2 pt B k P, p k along the steepest descent direction f k. It is readily found in the explicit form. The steepest descent direction is given by f k. A vector of length k in this direction is p s k := f k f k k. We will look for the Cauchy point in the form p c k = τ kp s k. We need to consider two cases: f T k B k f k 0 and f T k B k f k > 0. If f T k B k f k 0 the function M(τ) := m k (τp s k ) = f k τ f k k + τ 2 2 fk T B k f k f k 2 2 k, τ [ 1, 1] decreases monotonically as τ grows whenever f k. Hence we need to pick the largest admissible τ k, i.e., τ k = 1. If f T k B k f k > 0, the global minimum of M(τ) is achieved at τ min = f k 3 k f T k B k f k. Hence if τ min 1 the global minimum of M is achieved within the interval [ 1, 1]. Otherwise we need to pick the largest τ toward the minimum, i.e., τ = 1. To summarize, we have found that the Cauchy point is given by where p c k = τ f k k f k k { 1, if f T k B k f k 0, τ k = { } f min k 3 k fk T B, 1 otherwise. k f k The Cauchy point provides a sufficient reduction to the model to give global convergence. However, implementing the Cauchy point at every step we simply use the steepest descent algorithm with a particular choice of step length. It is well-known that the steepest descent performs poorly even if the optimal step length is chosen at every iteration. This consideration motivates us to fid a better approximate solution of the trust region problem than the Cauchy point.
7 1.4.2. The Dogleg Method. The Dogleg method method is suitable for the case where B k is positive definite. Its name is motivated by the fact that the solution of the trust region problem is looked along the path consisting of two line segments: from x k to the unconstrained minimum of m k (p) along the steepest descent direction and then to the unconstrained minimum of the quadratic model. We observe that of is small, the quadratic term in m k has little influence on the direction of the step: the direction is approximately f k, while if is large, the solution of the trust region problem is the global minimizer of the quadratic model. The unconstrained minimum of m k along the steepest descent direction is given by p U = gt g g T B k g, g := f k. The global minimizer of the quadratic model is given by p B = B 1 k g. The dogleg path p(τ), τ [0, 2] is defined by { τp U, 0 τ 1 p(τ) = p U + (τ 1)(p B p U ), 1 τ 2. The following lemma shows that the dogleg path intersects the trust region boundary at most once and the intersection point can be computed analytically! Lemma 2. Let B k be positive definite. Then (1) p(τ) is an increasing function of τ; (2) m(p(τ)) is an increasing function of τ. The proof can be found in [1]. The solution is calculated as follows. If p B k then p = p B. If p B l k while p U < k we solve the quadratic equation to find τ: If p U k we set p U + (τ 1)(p B p U ) 2 = 2 k. τ = p U. 1.4.3. Two-dimensional subspace minimization. This approach is an extension of the dogleg approach. Suppose B is positive definite. Then we solve the following constrained minimization problem (9) min p m(p) = f + g T p + 1 2 pt Bp, p, p span{g, B 1 g}. If B has negative eigenvalues, we look for p in another subspace defined by p span{g, (B + αi) 1 g}, α ( λ 1, 2λ 1 ), where λ 1 is the most negative eigenvalue of B. If B has zero eigenvalues but no negative eigenvalues, we use the Cauchy point as an approximate solution.
8 References [1] J. Nocedal, S. Wright, Numerical Optimization, Springer, 1999