To a large extent abandoned once gradient-based methods. One applicable class of methods is Generating set search

Size: px

Start display at page:

Download "To a large extent abandoned once gradient-based methods. One applicable class of methods is Generating set search"

Corey Skinner
5 years ago
Views:

1 Derivative Free Optimization and Average Curvature Information Trond Steihaug Lennart Frimannslund Eighth US-Mexico Workshop on Optimization and its Applications, January th, 007 Compass search in R. Current point is black Outline Part Generating set search Reusing old values Part : Work in progress Exploiting Separability Numerical results Summary Search east, compute function value and check decrease. New point accepted (grey) GSS and Unconstrained optimization Consider the unconstrained optimization problem: min f (x), x R n where only function values are available. One applicable class of methods is Generating set search (GSS). Widely studied in the 90s To a large extent abandoned once gradient-based methods became tractable Interest revived in the 990s with development of convergence theory We illustrate the work with the method called compass search Search north, compute function value and check decrease. New point not accepted (white).

2 Search south, compute function value and check decrease. New point accepted Search north, don t step Search west, compute function value and check decrease. New point not accepted Search south, step to new point Start new sweep through the directions in the same order: Search east, step to new point If no reduction can be found, decrease step sizes

3 Points in a rectangle spanned by ei and ej, like these: Canbeusedinthefinitedifferenceformula: f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj. And thus provide an approximate Hessian element, or average curvature information. Suppose we are searching in R along an orthogonal basis f (x+δ q ) f (x+δ q )+f (x) δ e C Q : [ ] e Suppose we are searching in R along an orthogonal basis e e Suppose we are searching in R along an orthogonal basis f (x+δ q ) f (x+δ q )+f (x) δ e [ ] C Q : e Suppose we are searching in R along an orthogonal basis f (x+δ q +δ q ) f (x+δ q ) f (x+δ q )+f (x) δ δ e [ ] C Q : e Suppose we are searching in R along an orthogonal basis C Q : [ ] e e C Q now contains average curvature information with respect to the search directions Q =[qq]. We call C Q a curvature information matrix.

4 Recall that (C Q )ij = f (x + δ iqi + δjqj) f (x + δiqi) f (x + δjqj)+f (x) δiδj. If the function is sufficiently smooth then (C Q )ij = q T i f (x k )qj, where the vector x k R n, x k = x + τiqi + τjqj τi δi,τj δj. If the function f is quadratic with Hessian matrix C, then (C Q )ij = q T i Cqj, i, j or C Q = Q T CQ In the general case we can always construct the matrix C = QC Q Q T, where Q is the matrix with the search direction vectors as its columns. Using the Hessian approximation The eigenvectors of the matrix C turn out to be useful search directions. 00 Compass Search in narrow valley 00 Our method in a narrow valley Theorem If f is in C, and f (x) f (y) L x y, then we have f ( x) C nlδ, where x is in the neighborhood of the points x k, k =,,...,r, and δ is O(maxi,j x i x j ). If f is quadratic, then L = 0 and we recover the exact Hessian. r = n(n+) (!) Skip proof Proof of theorem Let δ be the diameter of N, the smallest ball containing x k, k =,...,r. Consider the matrix C Q Q T f (x)q, where x N. Element (i, j) can be written q T i ( f (x k ) f (x))qj, so that q T i ( f (x k ) f (x))qj Lδ. Since A n maxi,j aij, and multiplication by orthogonal matrices does not alter norms, the result follows. x y y Extensions The method is able to rotate its orthogonal search directions based on average curvature information. It can rotate once every O(n) while loop iterations, since it has to compute O(n ) average curvature elements Now to work in progress. So far we have not made use of any knowledge of the representation of the function like using partial separability. Separability for differentiable functions If a function can be written as a sum of element functions, f = m i= fi, fi : R n i R where each element function depends on relatively few (ni) of the n components of x, then f is partially separable. For instance f (x, x, x) =f(x, x)+f(x, x), is partially separable, and has a tridiagonal Hessian (if it exists). x

5 Noisy separable functions Suppose the element functions fi(x) are expensive to compute accurately, but can be computed inexactly at a much lower cost, say fi = fi + ɛi, ɛi : R n i R If the function f is partially separable, the computed function f is partially separable but may not be differentiable. Returning to the tridiagonal case: f (x, x, x) =f(x, x)+ɛ(x, x)+f(x, x)+ɛ(x, x). Alternative Approach: Covariation Graph Define a covariation graph G(V, E) with n nodes and no edge between i and j if and only if for δi,δj > 0 and for all x f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj = 0, Observation If the covariation graph is not complete, then the function f is partially separable. For (i, j) E choose δi = xi,δj = xj and observe that f (x) can be written as a sum of three element functions (with n = n, n = n = n. Covariation Graph () We define a covariation graph G(V, E) with n nodes and an edge between i and j if and only if xi and xj appear in the same element function, and an edge (a loop) from each i to itself. Let Ei be the set of element functions for which variable xi appears in the domain. The intersection graph of Ei, i =,...,n is the covariation graph, i.e. there is an edge (i, j) E iff Ei Ej. This graph has an adjacency matrix, A G. If F T =(f,...,fm) then f = F T e (vector of all ones) and if F C then A G and F T F has the same structure. The example: A G has the structure A G :. Application to the method Thus, in the context of our optimization method we can impose the sparsity structure of the adjacency matrix of the covariation graph onto C. Convert the matrix C = QC Q Q T, into the equivalent formulation (Q T Q T )vec(c) =vec(c Q ). Here denotes the Kronecker product and vec( ) stacks the columns of a matrix in a vector. Covariation Graph () Observation If (i, j) E: For all x,δi,δj > 0 f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj = 0, In other words: If f is in C then the adjacency graph of the Hessian matrix is isomorphic to covariation graph G(V, E) )(ora subgraph). A G and the Hessian have the same sparsity structure. The elements in C Q are defined for for both differentiable and non-differentiable functions. Many of these differences will be identical to zero from the (partial) separability of the function regardless of differentiability. Application to the method () Since we know that many of the elements of C are required to be zero, and by in addition requiring C to be symmetric, we get a reduced equation system (Q T Q T )Pcvec(C) =vec(c Q ), () where vec(c) contains the r nonzero elements of, say, the lower triangle of C. The coefficient matrix is n(n + )/ r (Q T Q T )Pc. In order to avoid computing all n(n + )/ elements of vec(c Q ), this equation system should be reduced to an r r system Avec(C) =cγ, by selecting rows from ().

6 At this point we encounter the following subproblem. Given the overdetermined equation system: (Q T Q T )Pcvec(C) =vec(c Q ), how to pick r rows from the n(n + )/ r matrix (Q T Q T )Pc such that the resulting (square) matrix A is Invertible Well-conditioned Easy to compute An initial ordering of the pairs (i, j) in which the rows are chosen based on the magnitude of the components of the search directions qr, qs R n. This heuristic ordering almost every time produces an invertible and in many cases well-conditioned matrix. Skip subproblem A T unfinished = Q R = Candidate column accepted. A T unfinished = Q R = We have a heuristic ordering of the pairs (i, j) in which candidate columns are chosen based on the magnitude of the components of the search directions qr, qs R n A T unfinished = Q R = Choose candidate column, update QR-factorization, reject column and down-date QR-factorization if column is linearly dependent. A T unfinished = Q R = Choose candidate column, update QR-factorization, keep column if linearly independent. A T unfinished = Q R = Candidate column rejected. The cost of such a rejection is a down-dating of Q as well as some housekeeping.

7 A T final = QrRr = This procedure always gives an invertible matrix. Pseudocode Algorithm: : Given f, x, search directions, step lengths and structure : While not converged : Choose the order of the C Q elements to be computed in cγ : For each search direction qi : If f (x + δiqi) ρ(δi) < f (x) : x x + δiqi 7: End if 8: Compute cγ element if applicable 9: End for 0: If r elements in cγ have been computed : Solve for C and update search directions : End if : Update step lengths δi : End while We construct A by building up the QR-factorization of A T one column at a time: A T = QR = The initial ordering (based on the heuristic by looking at (q r T q s T ) which mimic the coordinate vectors), seldom needs to reject a candidate column to A T. Return to subproblem Numerical results Name n Sparse Regular Compass DECONVU 7 Disc. bound. val Ext. Rosenbrock Ext. Powell sing TRIDIA Function evaluations to reduce function value to less than e-, from recommended starting point. The sparse approach rarely performs worse than compass search. What do we loose when r < n(n + )/ Theorem If f C, and f (x) f (y) L x y, then for all x in the neighborhood of the points x k, k =,,...,r: f ( x) C nlδ A, where δ is O(maxi,j x i x j ). If f is quadratic, then L = 0 and we recover the exact Hessian also in the case when r < n(n+). Conclusions and observations Conclusions: Periodically rotating the search directions can significantly reduce the number of function evaluations to reach the optimal solution Eigenvectors of matrices with curvature information make good search directions Separability can be exploited also for noisy functions While the full method can rotate once every O(n) while loop iterations the new method can, for curvature matrices with O(n) elements rotate every O() iterations. This is useful for functions with a topography that warrants frequent basis rotation. However, numerical testing indicates that rotations should not be done too often (i.e. not every iteration).

8 Future work Improvements Develop adaptive rules for rotating search directions Develop schemes for the situation when noise can be controlled Integrate existing work for reducing the number of unsuccessful function evaluations Reducing the number of function evaluations due to computing the elements in the curvature information matrix Try to solve the matrix row selection subproblem Hybrid method based on trust-region and GSS More details [] Lennart Frimannslund and Trond Steihaug. A generating set search method using curvature information. To appear in Computational Optimization and Applications, 007. [] Lennart Frimannslund and Trond Steihaug. Anew Generating Set Search Algorithm for Separable Functions. Submitted to the SIAM Journal on Optimization.

REPORTS IN INFORMATICS

REPORTS IN INFORMATICS ISSN 0333-3590 Using Partial Separability of Functions in Generating Set Search Methods for Unconstrained Optimisation Lennart Frimannslund Trond Steihaug REPORT NO 318 March 006