Nonlinear Programming: Concepts and Algorithms for Process Optimization

Size: px

Start display at page:

Download "Nonlinear Programming: Concepts and Algorithms for Process Optimization"

Nickolas Welch
6 years ago
Views:

Algorithms Newton Methods Quasi-Newton Methods Constrained Optimization Karush Kuhn-ucer Conditions Reduced Gradient Methods (GRG, CONOP, MINOS) Successive Quadratic

1 Nonlinear Programming: Concepts and Algorithms for Process Optimization L.. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Nonlinear Programming and Process Optimization Introduction Unconstrained Optimization Algorithms Newton Methods Quasi-Newton Methods Constrained Optimization Karush Kuhn-ucer Conditions Reduced Gradient Methods (GRG, CONOP, MINOS) Successive Quadratic Programming (SQP) Interior Point Methods (IPOP) Process Optimization Blac Bo Optimization Modular Flowsheet Optimization Infeasible Path he Role of Eact Derivatives Large-Scale Nonlinear Programming rsqp: Process Optimization IPOP: Blending and Data Reconciliation Summary and Conclusions

2 Introduction Optimization: given a system or process, find the best solution to this process within constraints. Objective Function: indicator of "goodness" of solution, e.g., cost, yield, profit, etc. Decision Variables: variables that influence process behavior and can be adjusted for optimization. In many cases, this tas is done by trial and error (through case study). Here, we are interested in a systematic approach to this tas - and to mae this tas as efficient as possible. Some related areas: - Math programming - Operations Research Currently - Over 3 journals devoted to optimization with roughly papers/month - a fast moving field! 3 Optimization Viewpoints Mathematician - characterization of theoretical properties of optimization, convergence, eistence, local convergence rates. Numerical Analyst - implementation of optimization method for efficient and "practical" use. Concerned with ease of computations, numerical stability, performance. Engineer - applies optimization method to real problems. Concerned with reliability, robustness, efficiency, diagnosis, and recovery from failure. 4

3 Optimization Literature Engineering. Edgar,.F., D.M. Himmelblau, and L. S. Lasdon, Optimization of Chemical Processes, McGraw-Hill,.. Papalambros, P. and D. Wilde, Principles of Optimal Design. Cambridge Press, Relaitis, G., A. Ravindran, and K. Ragsdell, Engineering Optimization, Wiley, Biegler, L.., I. E. Grossmann and A. Westerberg, Systematic Methods of Chemical Process Design, Prentice Hall, Biegler, L.., Nonlinear Programming: Concepts, Algorithms and Applications to Chemical Engineering, SIAM,. Numerical Analysis. Dennis, J.E. and R. Schnabel, Numerical Methods of Unconstrained Optimization, Prentice-Hall, (983), SIAM (995). Fletcher, R. Practical Methods of Optimization, Wiley, Gill, P.E, W. Murray and M. Wright, Practical Optimization, Academic Press, Nocedal, J. and S. Wright, Numerical Optimization, Springer, 7 5 Eample: Optimal Vessel Dimensions What is the optimal L/D ratio for a cylindrical vessel? Constrained Problem "! D % () Min # C + C S! DL = cost& $ '!D L s.t. V - = 4 Convert to Unconstrained (Eliminate L) " Min # C $ d(cost) dd! D C 4V D = cost % & ' 4VC s D + S = C! D - = () V D L / 3 " " 4V C D = S $ L = 4V $ /3 & ' & C ' & ' & ' #! C % & ' #! % & # CS ' % ==> L/D = C /C S Note: - What if L cannot be eliminated in () eplicitly? (strange shape) - What if D cannot be etracted from ()? (cost correlation implicit) " $ /3 6 3

4 Unconstrained Multivariable Optimization Problem: Min f() (n variables) Equivalent to: Ma -f(), R n Nonsmooth Functions - Direct Search Methods - Statistical/Random Methods Smooth Functions - st Order Methods - Newton ype Methods - Conjugate Gradients 7 wo Dimensional Contours of F() Conve Function Nonconve Function Multimodal, Nonconve Discontinuous Nondifferentiable (conve) 8 4

5 Local vs. Global Solutions Conveity Definitions a set (region) X is conve, if and only if it satisfies: α y + (-α)z X for all α, α, for all points y and z in X. f() is conve in domain X, if and only if it satisfies: f(α y + (-α) z) α f(y) + (-α)f(z) for any α, α, at all points y and z in X. Find a local minimum point * for f() for feasible region defined by constraint functions: f(*) f() for all satisfying the constraints in some neighborhood around * (not for all X) Sufficient condition for a local solution to the NLP to be a global is that f() is conve for X. Finding and verifying global solutions will not be considered here. Requires a more epensive search (e.g. spatial branch and bound). 9 Linear Algebra - Bacground Some Definitions Scalars - Gree letters, α, β, γ Vectors - Roman Letters, lower case Matrices - Roman Letters, upper case Matri Multiplication: C = A B if A R n m, B R m p and C R n p, C ij = Σ A i B j ranspose - if A R n m, interchange rows and columns --> A R m n Symmetric Matri - A R n n (square matri) and A = A Identity Matri - I, square matri with ones on diagonal and zeroes elsewhere. Determinant: "Inverse Volume" measure of a square matri det(a) = Σi (-) i+j A ij A ij for any j, or det(a) = Σj (-) i+j A ij A ij for any i, where A ij is the determinant of an order n- matri with row i and column j removed. det(i) = Singular Matri: det (A) = 5

6 Linear Algebra - Bacground Gradient Vector - ( f()) #"f / "& %"f / ( "!f = % (... % ( $ "f / " n' Hessian Matri ( f() - Symmetric)! f() = $ " f & " & & " f & %"n " " f "" # # # # " f ' ""n) ) " f "n" # # # # " f ) ) "n ( f Note: = i j! f!j!i Linear Algebra - Bacground Some Identities for Determinant det(a B) = det(a) det(b); det (A) = det(a ) det(αa) = α n det(a); det(a) = Πi λ i (A) Eigenvalues: det(a- λ I) =, Eigenvector: Av = λ v Characteristic values and directions of a matri. For nonsymmetric matrices eigenvalues can be comple, so we often use singular values, σ = λ(a Α) / Vector Norms p = {Σ i i p } /p (most common are p =, p = (Euclidean) and p = (ma norm = ma i i )) Matri Norms A = ma A / over (for p-norms) A - ma column sum of A, ma j (Σ i A ij ) A - maimum row sum of A, ma i (Σ j A ij ) A = [σma(α)] (spectral radius) A F = [Σ i Σ j (A ij )] / (Frobenius norm) κ(α) = A A - (condition number) = σma/σmin (using -norm) 6

7 Linear Algebra - Eigenvalues Find v and λ where Av i = λ i v i, i = i,n Note: Av - λv = (A - λi) v = or det (A - λi) = For this relation λ is an eigenvalue and v is an eigenvector of A. If A is symmetric, all λ i are real λ i >, i =, n; A is positive definite λ i <, i =, n; A is negative definite λ i =, some i: A is singular Quadratic Form can be epressed in Canonical Form (Eigenvalue/Eigenvector) A A V = V Λ V - eigenvector matri (n n) Λ - eigenvalue (diagonal) matri = diag(λ i ) If A is symmetric, all λ i are real and V can be chosen orthonormal (V - = V ). hus, A = V Λ V - = V Λ V For Quadratic Function: Q() = a + ½ A Define: z = V and Q(Vz) = (a V) z + ½ z (V AV)z = (a V) z + ½ z Λ z Minimum occurs at (if λ i > ) = -A - a or = Vz = -V(Λ - V a) 3 Positive (Negative) Curvature Positive (Negative) Definite Hessian Both eigenvalues are strictly positive (negative) A is positive (negative) definite Stationary points are minima (maima) z (λ ) / (λ ) / z 4 7

8 Zero Curvature Singular Hessian One eigenvalue is zero, the other is strictly positive or negative A is positive semidefinite or negative semidefinite here is a ridge of stationary points (minima or maima) 5 Indefinite Curvature Indefinite Hessian One eigenvalue is positive, the other is negative Stationary point is a saddle point A is indefinite Note: these can also be viewed as two dimensional projections for higher dimensional problems 6 8

9 Eigenvalue Eample " Min Q() = % $ ' + " % # & $ ' # & " AV = V( with A = % $ ' # & " V AV = ( = % " / / % $ ' with V = $ ' # 3& #-/ / & All eigenvalues are positive Minimum occurs at z* = -Λ - V a z = V ( ) / = ( + ) / z* = /(3 ) ( + ) / = Vz = ( + ) / / 3 * = / 3 7 What conditions characterize an optimal solution? Unconstrained Local Minimum Necessary Conditions f (*) = p f (*) p for p R n (positive semi-definite) Contours of f() * Unconstrained Local Minimum Sufficient Conditions f (*) = p f (*) p > for p R n (positive definite) f ( ) For smooth functions, why are contours around optimum elliptical? aylor Series in n dimensions about *: = f ( *) + f ( *) ( *) + ( *) f ( *)( *) + O Since f(*) =, f() is purely quadratic for close to * 3 ( * ) 8 9

10 Newton's Method aylor Series for f() about ae derivative wrt, set LHS f() = f( ) + f( ) ( - ) + O( - ) ( - ) d = - ( f( )) - f( ) f() is conve (concave) if for all R n, f() is positive (negative) semidefinite i.e. min j λ j (ma j λ j ) Method can fail if: - far from optimum - f is singular at any point - f() is not smooth Search direction, d, requires solution of linear equations. Near solution: + * * = O 9 Basic Newton Algorithm - Line Search. Guess, Evaluate f( ).. At, evaluate f( ).. Evaluate B = f( ) or an approimation. 3. Solve: B d = - f( ) If convergence error is less than tolerance: e.g., f( ) ε and d ε SOP, else go to Find α so that < α and f( + α d) < f( ) sufficiently (Each trial requires evaluation of f()) 5. + = + α d. Set = + Go to.

11 Comparison of Optimization Methods. Convergence heory Global Convergence - will it converge to a local optimum (or stationary point) from a poor starting point? Local Convergence Rate - how fast will it converge close to this point?. Benchmars on Large Class of est Problems Representative Problem (Hughes, 98) Min f(, ) = α ep(-β) u = -.8 v = - (a + a u (- u) / - a 3 u) α = -b + b u (+u) / + b 3 u β = c v ( - c v)/(+ c 3 u ) a = [.3,.6,.] b = [5, 6, 3] c = [4,, ] * = [.7395,.344] f(*) = Newton's Method - Convergence Path Starting Points [.8,.] needs steepest descent steps w/ line search up to 'O', taes 7 iterations to f(*) -6 [.35,.65] converges in four iterations with full steps to f(*) -6

12 Newton s Method - Notes Choice of B determines method. - Steepest Descent: B = γ I - Newton: B = f() With suitable B, performance may be good enough if f( + αd) is sufficiently decreased (instead of minimized along line search direction). rust region etensions to Newton's method provide very strong global convergence properties and very reliable algorithms. Local rate of convergence depends on choice of B. Newton " Quadratic Rate : lim #$ + " * " * = K Steepest descent " Linear Rate : lim #$ + " * " * < Desired?" Superlinear Rate : lim #$ + " * " * = 3 + B = B + Quasi-Newton Methods Motivation: Need B to be positive definite. Avoid calculation of f. Avoid solution of linear system for d = - (B ) - f( ) Strategy: Define matri updating formulas that give (B ) symmetric, positive definite and satisfy: (B + )( + - ) = ( f + f ) (Secant relation) DFP Formula: (Davidon, Fletcher, Powell, 958, 964) ( y - B s) y + y y - y s + ( B + ) - = H = H + ( B s) ss s y - where: s = + - y = f ( + ) - f ( ) - H y ( y - B s) s y ( y s) ( y s) y H y H y y 4

13 Quasi-Newton Methods BFGS Formula (Broyden, Fletcher, Goldfarb, Shanno, 97-7) + B = B + yy s y - B s s B s B s + ( B ) " = H + = H + ( s - H y) s + s s - y s ( H y) - ( y - H s) y s s y s y s ( )( ) Notes: ) Both formulas are derived under similar assumptions and have symmetry ) Both have superlinear convergence and terminate in n steps on quadratic functions. hey are identical if α is minimized. 3) BFGS is more stable and performs better than DFP, in general. 4) For n, these are the best methods for general purpose problems if second derivatives are not available. 5 Quasi-Newton Method - BFGS Convergence Path Starting Point [.,.8]starting from B = I, converges in 9 iterations to f(*)

14 Constrained Optimization (Nonlinear Programming) Problem: Min f() s.t. g() h() = where: f() - scalar objective function - n vector of variables g() - inequality constraints, m vector h() - meq equality constraints. Sufficient Condition for Global Optimum - f() must be conve, and - feasible region must be conve, i.e. g() are all conve h() are all linear Ecept in special cases, there is no guarantee that a local optimum is global if sufficient conditions are violated. 7 Eample: Minimize Pacing Dimensions What is the smallest bo for three round objects? Variables: A, B, (, y ), (, y ), ( 3, y 3 ) Fied Parameters: R, R, R 3 Objective: Minimize Perimeter = (A+B) Constraints: Circles remain in bo, can't overlap Decisions: Sides of bo, centers of circles. A y 3 $, y " R # B - R, y # A - R & %, y " R # B - R, y # A - R & ' 3, y 3 " R 3 3 # B - R 3, y 3 # A - R 3 in bo,, 3, y, y, y 3, A, B # B y - y ( - ) + ( ) " ( R + R) % $ ( - 3) + ( y - y 3) " ( R + R3) % %( - 3) + ( y - y 3) " ( R + R3) & no overlaps 8 4

15 Characterization of Constrained Optima Mi n Mi n Linear Progr am Linear Progr am (Alter nate Optim a) Min Min Min Conve Objective Functions Linear Constraints Mi n Mi n Mi n Mi n NonconveRegion Mul ti ple O pti ma Mi n NonconveObject ive Mul ti ple O pti ma 9 What conditions characterize an optimal solution? Unconstrained Local Minimum Necessary Conditions f (*) = p f (*) p for p R n (positive semi-definite) Unconstrained Local Minimum Sufficient Conditions f (*) = p f (*) p > for p R n (positive definite) 3 5

16 Optimal solution for inequality constrained problem Min f() s.t. g() Analogy: Ball rolling down valley pinned by fence Note: Balance of forces ( f, g) 3 Optimal solution for general constrained problem Problem:Min f() s.t. g() h() = Analogy: Ball rolling on rail pinned by fences Balance of forces: f, g, h 3 6

17 Optimality conditions for local optimum Necessary First Order Karush Kuhn - ucer Conditions L (*, u, v) = f(*) + g(*) u + h(*) v = (Balance of Forces) u (Inequalities act in only one direction) g (*), h (*) = (Feasibility) u j g j (*) = (Complementarity: either g j (*) = or u j = ) u, v are "weights" for "forces," nown as KK multipliers, shadow prices, dual variables o guarantee that a local NLP solution satisfies KK conditions, a constraint qualification is required. E.g., the Linear Independence Constraint Qualification (LICQ) requires active constraint gradients, [ g A (*) h(*)], to be linearly independent. Also, under LICQ, KK multipliers are uniquely determined. Necessary (Sufficient) Second Order Conditions - Positive curvature in "constraint" directions. - p L (*) p (p L (*) p > ) where p are the constrained directions: h(*) p = for g i (*)=, g i (*) p =, for u i >, g i (*) p, for u i = 33 Single Variable Eample of KK Conditions Min () s.t. -a a, a > * = is seen by inspection Lagrange function : L(, u) = + u (-a) + u (-a-) First Order KK conditions: L(, u) = + u - u = u (-a) = u (-a-) = -a a u, u f() -a a Consider three cases: u, u = Upper bound is active, = a, u = -a, u = u =, u Lower bound is active, = -a, u = -a, u = u = u = Neither bound is active, u =, u =, = Second order conditions (*, u, u =) L (*, u*) = p L (*, u*) p = (Δ) > 34 7

18 Min -() s.t. -a a, a > * = ±a is seen by inspection Lagrange function : L(, u) = - + u (-a) + u (-a-) First Order KK conditions: L(, u) = - + u - u = u (-a) = u (-a-) = -a a u, u Single Variable Eample of KK Conditions - Revisited Consider three cases: u, u = Upper bound is active, = a, u = a, u = u =, u Lower bound is active, = -a, u = a, u = u = u = Neither bound is active, u =, u =, = Second order conditions (*, u, u =) L (*, u*) = - p L (*, u*) p = -(Δ) < -a f() a 35 Interpretation of Second Order Conditions For = a or = -a, we require the allowable direction to satisfy the active constraints eactly. Here, any point along the allowable direction, * must remain at its bound. For this problem, however, there are no nonzero allowable directions that satisfy this condition. Consequently the solution * is defined entirely by the active constraint. he condition: p L (*, u*, v*) p > for the allowable directions, is vacuously satisfied - because there are no allowable directions that satisfy g A (*) p =. Hence, sufficient second order conditions are satisfied. As we will see, sufficient second order conditions are satisfied by linear programs as well. 36 8

19 Role of KK Multipliers -a a a + Δa Also nown as: Shadow Prices Dual Variables Lagrange Multipliers f() Suppose a in the constraint is increased to a + Δa f(*) =- (a + Δa) and [f(*, a + Δa) - f(*, a)]/δa =- a - Δa df(*)/da = -a = -u 37 Another Eample: Constraint Qualifications Min s. t. * = 3 ( ) * = 3( ) + u u -, u, u = ( ) 3, u, u ( 3 ( ) ) = KK conditions not satisfied at NLP solution Because no CQ is satisfied (e.g., LICQ) 38 9

20 Algorithms for Constrained Problems Motivation: Build on unconstrained methods wherever possible. Classification of Methods: Reduced Gradient Methods - (with Restoration) GRG, CONOP Reduced Gradient Methods - (without Restoration) MINOS Successive Quadratic Programming - generic implementations Penalty Functions - popular in 97s, but fell into disfavor. Barrier Methods have been developed recently and are again popular. Successive Linear Programming - only useful for "mostly linear" problems We will concentrate on algorithms for first four classes. Evaluation: Compare performance on "typical problem," cite eperience on process problems. 39 Representative Constrained Problem (Hughes, 98) Min f(, ) = α ep(-β) g = ( +.) [ +(- )(- )] -.6 g = ( -.3) + ( -.3) -.6 * = [.6335,.3465] f(*) =

21 Reduced Gradient Method with Restoration (GRG/CONOP) Min f() Min f(z) s.t. g() + s = (add slac variable) ` s.t. c(z) = h() = a z b a b, s Partition variables into: z B - dependent or basic variables z N - nonbasic variables, fied at a bound z S - independent or superbasic variables Modified KK Conditions "f (z) + "c(z)# $% L + % U = z (i) = z U (i) or % U (i), % L (i) c(z) = z (i) = z L (i), i & N =, i ' N 4 Reduced Gradient Method with Restoration (GRG/CONOP) a) " S f (z) + " S c(z)# = b) " B f (z) + " B c(z)# = c) " N f (z) + " N c(z)# $% L + % U = d) z (i) = z U (i) Solve bound constrained problem in space of superbasic variables (apply gradient projection algorithm) Solve (e) to eliminate z B Use (a) and (b) to calculate reduced gradient wrt z S. Nonbasic variables z N (temporarily) fied (d) or z (i) = z L (i), i & N e) c(z) = ' z B = z B (z S ) Repartition based on signs of ν, if z s remain at bounds or if z B violate bounds 4

22 Definition of Reduced Gradient df = "f + dz B "f dz S "z S dz S "z B Because c(z) =,we have : # dc = "c & # % ( dz S + "c & % ( dz B = $ "z S ' $ "z B ' dz B # = ) "c &# % ( "c & % ( dz S $ "z S ' $ "z B ' his leads to : ) = )* zs c [* zb c] ) df dz S = * S f (z) ) * S c * B c [ ] ) * B f (z) = * S f (z) + * S c(z)+ By remaining feasible always, c(z) =, a z b, one can apply an unconstrained algorithm (quasi-newton) using (df/dz S ), using (b) Solve problem in reduced space of z S variables, using (e). 43 Eample of Reduced Gradient Min s. t. c = [3 4], f = 4 = [ - ] Let z df dz S df d S f = z = =, z S B 3 4 = zs c [ c] zb f z [ ] (- ) = + 3/ B If c is (m n); z S c is m (n-m); z B c is (m m) (df/dz S ) is the change in f along constraint direction per unit change in z S 44

23 Gradient Projection Method (superbasic nonbasic variable partition) Define the projection of an arbitrary point onto bo feasible region. ith component is given by: Piecewise linear path z(α) starting at the reference point z and obtained by projecting steepest descent (or any search) direction at z onto the bo region given by: 45 Setch of GRG Algorithm. Initialize problem and obtain a feasible point at z. At feasible point z, partition variables z into z N, z B, z S 3. Calculate reduced gradient, (df/dz S ) 4. Evaluate gradient projection search direction for z S, with quasi-newton etension 5. Perform a line search. Find α (,] with z S (α ) Solve for c(z S (α), z B, z N ) = If f(z S (α ), z B, z N ) < f(z S, z B, z N ), set z S + =z S (α ), := + 6. If (df/dz S ) <ε, Stop. Else, go to. 46 3

24 Reduced Gradient Method with Restoration z B z S 47 Reduced Gradient Method with Restoration z B Fails, due to singularity in basis matri (dc/dz B ) z S 48 4

25 Reduced Gradient Method with Restoration z B Possible remedy: repartition basic and superbasic variables to create nonsingular basis matri (dc/dz B ) z S 49 GRG Algorithm Properties. GRG has been implemented on PC's as GINO and is very reliable and robust. It is also the optimization solver in MS EXCEL.. CONOP is implemented in GAMS, AIMMS and AMPL 3. GRG uses Q-N for small problems but can switch to conjugate gradients if problem gets large. CONOP uses eact second derivatives. 4. Convergence of c(z S, z B, z N ) = can get very epensive because c(z) is calculated repeatedly. 5. Safeguards can be added so that restoration (step 5.) can be dropped and efficiency increases. Representative Constrained Problem Starting Point [.8,.] GINO Results - 4 iterations to f(*) -6 CONOP Results - 7 iterations to f(*) -6 from feasible point. 5 5

26 Reduced Gradient Method without Restoration z B z S 5 Reduced Gradient Method without Restoration (MINOS/Augmented) Motivation: Efficient algorithms are available that solve linearly constrained optimization problems (MINOS): Min f() s.t.a b C = d Etend to nonlinear problems, through successive linearization Develop major iterations (linearizations) and minor iterations (GRG solutions). Strategy: (Robinson, Murtagh & Saunders). Partition variables into basic, nonbasic variables and superbasic variables... Linearize active constraints at z D z = r 3. Let ψ = f (z) + λ c (z) + β (c(z) c(z)) (Augmented Lagrange), 4. Solve linearly constrained problem: Min ψ (z) s.t. Dz = r a z b using reduced gradients to get z + 5. Set =+, go to. 6. Algorithm terminates when no movement between steps ) and 4). 5 6

27 MINOS/Augmented Notes. MINOS has been implemented very efficiently to tae care of linearity. It becomes LP Simple method if problem is totally linear. Also, very efficient matri routines.. No restoration taes place, nonlinear constraints are reflected in ψ(z) during step 3). MINOS is more efficient than GRG. 3. Major iterations (steps 3) - 4)) converge at a quadratic rate. 4. Reduced gradient methods are complicated, monolithic codes: hard to integrate efficiently into modeling software. Representative Constrained Problem Starting Point [.8,.] MINOS Results: 4 major iterations, function calls to f(*) Successive Quadratic Programming (SQP) Motivation: ae KK conditions, epand in aylor series about current point. ae Newton step (QP) to determine net point. Derivation KK Conditions L (*, u*, v*) = f(*) + ga(*) u* + h(*) v* = h(*) = ga(*) =, where g A are the active constraints. Newton - Step L g A h Δ L (, u, v ) g A Δu = - g A ( ) h Δv h( ) Requirements: L must be calculated and should be regular correct active set g A good estimates of u, v 54 7

28 SQP Chronology. Wilson (963) - active set can be determined by solving QP: Min f( ) d + / d L(, u, v ) d d s.t. g( ) + g( ) d h( ) + h( ) d =. Han (976), (977), Powell (977), (978) - approimate L using a positive definite quasi-newton update (BFGS) - use a line search to converge from poor starting points. Notes: - Similar methods were derived using penalty (not Lagrange) functions. - Method converges quicly; very few function evaluations. - Not well suited to large problems (full space update used). For n >, say, use reduced space methods (e.g. MINOS). 55 Elements of SQP Hessian Approimation What about L? need to get second derivatives for f(), g(), h(). need to estimate multipliers, u, v ; L may not be positive semidefinite Approimate L (, u, v ) by B, a symmetric positive definite matri. + yy B = B + s y - B s s B s B s BFGS Formula s = + - y = L( +, u +, v + ) - L(, u +, v + ) second derivatives approimated by change in gradients positive definite B ensures unique QP solution 56 8

29 Elements of SQP Search Directions How do we obtain search directions? Form QP and let QP determine constraint activity At each iteration,, solve: Min f( ) d + / d B d d s.t. g( ) + g( ) d h( ) + h( ) d = Convergence from poor starting points As with Newton's method, choose α (stepsize) to ensure progress toward optimum: + = + α d. α is chosen by maing sure a merit function is decreased at each iteration. Eact Penalty Function ψ() = f() + µ [Σ ma (, g j ()) + Σ h j () ] µ > ma j { u j, v j } Augmented Lagrange Function ψ() = f() + u g() + v h() + η/ {Σ (h j ()) + Σ ma (, g j ()) } 57 Newton-Lie Properties for SQP Fast Local Convergence B = L Quadratic L is p.d and B is Q-N step Superlinear B is Q-N update, L not p.d step Superlinear Enforce Global Convergence Ensure decrease of merit function by taing α rust region adaptations provide a stronger guarantee of global convergence - but harder to implement. 58 9

30 Basic SQP Algorithm. Guess, Set B = I (Identity). Evaluate f( ), g( ) and h( ).. At, evaluate f( ), g( ), h( ).. If >, update B using the BFGS Formula. 3. Solve: Min d f( ) d + / d B d s.t. g( ) + g( ) d h( ) + h( ) d = If KK error less than tolerance: L(*) ε, h(*) ε, g(*) + ε. SOP, else go to Find α so that < α and ψ( + αd) < ψ( ) sufficiently (Each trial requires evaluation of f(), g() and h()) = + α d. Set = + Go to. 59 Problems with SQP Nonsmooth Functions - Reformulate Ill-conditioning - Proper scaling Poor Starting Points rust Regions can help Inconsistent Constraint Linearizations - Can lead to infeasible QP's Min s.t. + - ( ) - - ( ) -/ 6 3

31 SQP est Problem * Min s.t (- ) - (- ) 3 * = [.5,.375]. 6 SQP est Problem First Iteration Start from the origin ( = [, ] ) with B = I, form:.8.. Min d + / (d + d ) s.t. d d + d d = [, ]. with µ = and µ =. 6 3

32 SQP est Problem Second Iteration * From = [.5, ] with B = I (no update from BFGS possible), form: Min d + / (d + d ) s.t. -.5 d - d d - d d = [,.375] with µ =.5 and µ =.5 * = [.5,.375] is optimal 63 Representative Constrained Problem SQP Convergence Path Starting Point [.8,.] - starting from B = I and staying in bounds and linearized constraints; converges in 8 iterations to f(*)

33 Barrier Methods for Large-Scale Nonlinear Programming Original Formulation min f ( ) #" s.t n c( ) =! Can generalize for a b Barrier Approach min $# % µ n µ s.t c( ) = ( ) = f ( )! " ln i n i= As µ, *(µ) * Fiacco and McCormic (968) 65 Solution of the Barrier Problem e Newton Directions (KK System) f ( ) + A( ) λ v = [,,...], X = diag( ) A = c( ), W = L(, λ, ν ) Xv µ e c( ) = = = Solve Reducing the System W A W Σ A V + A A I d d d + X λ d = λ ν d = µ X e! v! X!! v Vd f + Aλ v ϕ = µ c c Xv µ e " = X! V IPOP Code

34 Global Convergence of Newton-based Barrier Solvers Merit Function Eact Penalty: P(, η) = f() + η c() Aug d Lagrangian: L*(, λ, η) = f() + λ c() + η c() Assess Search Direction (e.g., from IPOP) Line Search choose stepsize α to give sufficient decrease of merit function using a step to the boundary rule with τ ~.99. for α (, α ], v + + α d + + ( τ ) > ( τ ) v = λ + α ( λ λ ) How do we balance φ () and c() with η? = v + α d λ > Is this approach globally convergent? Will it still be fast? v = + α d + 67 Global Convergence Failure (Wächter and B., ) s. t. ( ) A( Min f ( ) 3 3, ) + α d = = Newton-type line search stalls even though descent directions eist Remedies: d + c( > ) = Composite Step rust Region (Byrd et al.) Filter Line Search Methods 68 34

35 Line Search Filter Method Store (φ, θ) at allowed iterates Allow progress if trial point is acceptable to filter with θ margin If switching condition a b α[ φ d] δ[ θ ], a > b > is satisfied, only an Armijo line search is required on φ If insufficient progress on stepsize, evoe restoration phase to reduce θ. Global convergence and superlinear local convergence proved (with second order correction) φ() θ() = c() 69 Implementation Details Modify KK (full space) matri if singular & W + ( + ) $ % A δ - Correct inertia to guarantee descent direction δ - Deal with ran deficient A KK matri factored by MA7 Feasibility restoration phase Min c( ) Apply Eact Penalty Formulation l A # ') I! " + " Q!! Eploit same structure/algorithm to reduce infeasibility u 7 35

36 IPOP Algorithm Features Line Search Strategies for Globalization - l eact penalty merit function - augmented Lagrangian merit function - Filter method (adapted and etended from Fletcher and Leyffer) Hessian Calculation - BFGS (full/lm and reduced space) - SR (full/lm and reduced space) - Eact full Hessian (direct) - Eact reduced Hessian (direct) - Preconditioned CG Algorithmic Properties Globally, superlinearly convergent (Wächter and B., 5) Easily tailored to different problem structures Freely Available CPL License and COIN-OR distribution: IPOP 3. recently rewritten in C++ Solved on thousands of test problems and applications 7 IPOP Comparison on 954 est Problems Percent solved within S*(min. CPU time) S 7 36

37 Recommendations for Constrained Optimization. Best current algorithms GRG /CONOP MINOS SQP IPOP. GRG (or CONOP) is generally slower, but is robust. Use with highly nonlinear functions. Solver in Ecel! 3. For small problems (n ) with nonlinear constraints, use SQP. 4. For large problems (n ) with mostly linear constraints, use MINOS. ==> Difficulty with many nonlinearities Fewer Function Evaluations ailored Linear Algebra Small, Nonlinear Problems - SQP solves QP's, not LCNLP's, fewer function calls. Large, Mostly Linear Problems - MINOS performs sparse constraint decomposition. Wors efficiently in reduced space if function calls are cheap! Eploit Both Features IPOP taes advantages of few function evaluations and largescale linear algebra, but requires eact second derivatives 73 Available Software for Constrained Optimization SQP Routines HSL, NaG and IMSL (NLPQL) Routines NPSOL Stanford Systems Optimization Lab SNOP Stanford Systems Optimization Lab (rsqp discussed later) IPOP GAMS Programs CONOP - Generalized Reduced Gradient method with restoration MINOS - Generalized Reduced Gradient method without restoration NPSOL Stanford Systems Optimization Lab SNOP Stanford Systems Optimization Lab (rsqp discussed later) IPOP barrier NLP, COIN-OR, open source KNIRO barrier NLP MS Ecel Solver uses Generalized Reduced Gradient method with restoration 74 37

38 Rules for Formulating Nonlinear Programs ) Avoid overflows and undefined terms, (do not divide, tae logs, etc.) e.g. + y - ln z = + y - u = ep u - z = ) If constraints must always be enforced, mae sure they are linear or bounds. e.g. v(y - z ) / = 3 vu = 3 u - (y - z ) =, u 3) Eploit linear constraints as much as possible, e.g. mass balance i L + y i V = F z i l i + v i = f i L l i = 4) Use bounds and constraints to enforce characteristic solutions. e.g. a b, g () to isolate correct root of h () =. 5) Eploit global properties when possibility eists. Conve (linear equations?) Linear Program? Quadratic Program? Geometric Program? 6) Eploit problem structure when possible. e.g. Min [ - 3y] s.t. + y - y = 5 4-5y + = 7 (If is fied solve LP) put in outer optimization loop. 75 Process Optimization Problem Definition and Formulation State of Nature and Problem Premises Restrictions: Physical, Legal Economic, Political, etc. Desired Objective: Yield, Economic, Capacity, etc. Decisions Mathematical Modeling and Algorithmic Solution Process Model Equations Constraints Objective Function Additional Variables 76 38

39 Hierarchy of Nonlinear Programming Formulations and Model Intrusion SAND Full Space Formulation Adjoint Sens & SAND Adjoint OPEN Compute Efficiency SAND ailored Direct Sensitivities Multi-level Parallelism Blac Bo CLOSED 3 Decision Variables 77 Large Scale NLP Algorithms Motivation: Improvement of Successive Quadratic Programming as Cornerstone Algorithm process optimization for design, control and operations Evolution of NLP Solvers: SQP rsqp IPOP rsqp++ IPOP : 98-87: - : Simultaneous Static Flowsheet Real-time dynamic optimization optimization over over variables and and constraints Current: ailor structure, architecture and problems 78 39

40 Flowsheet Optimization Problems - Introduction Modular Simulation Mode Physical Relation to Process In Out - Intuitive to Process Engineer - Unit equations solved internally - tailor-made procedures. Convergence Procedures - for simple flowsheets, often identified from flowsheet structure Convergence "mimics" startup. Debugging flowsheets on "physical" grounds 79 Flowsheet Optimization Problems - Features C Design Specifications Specify # trays reflu ratio, but would lie to specify overhead comp. ==> Control loop -Solve Iteratively 3 Nested Recycles Hard to Handle Best Convergence Procedure? 4 Frequent bloc evaluation can be epensive Slow algorithms applied to flowsheet loops. NLP methods are good at breaing loops 8 4

41 Chronology in Process Optimization Sim. ime Equiv.. Early Wor - Blac Bo Approaches Friedman and Pinder (97) 75-5 Gaddy and co-worers (977) 3. ransition - more accurate gradients Parer and Hughes (98) 64 Biegler and Hughes (98) 3 3. Infeasible Path Strategy for Modular Simulators Biegler and Hughes (98) < Chen and Stadtherr (985) Kaijaluoto et al. (985) and many more 4. Equation Based Process Optimization Westerberg et al. (983) <5 Shewchu (985) DMO, NOVA, ROP, etc. (99s) - Process optimization should be as cheap and easy as process simulation 8 Simulation and Optimization of Flowsheets w(y ) h (y )= y Min f(), s.t. g() 5 For single degree of freedom: wor in space defined by curve below. requires repeated (epensive) recycle convergence f(, y()) 8 4

42 Epanded Region with Feasible Path 83 "Blac Bo" Optimization Approach Vertical steps are epensive (flowsheet convergence) Generally no connection between and y. Can have "noisy" derivatives for gradient optimization. 84 4

43 SQP - Infeasible Path Approach solve and optimize simultaneously in and y etended Newton method 85 Optimization Capability for Modular Simulators (FLOWRAN, Aspen/Plus, Pro/II, HySys) Architecture - Replace convergence with optimization bloc - Problem definition needed (in-line FORRAN) - Eecutive, preprocessor, modules intact. Eamples. Single Unit and Acyclic Optimization - Distillation columns & sequences. "Conventional" Process Optimization - Monochlorobenzene process - NH3 synthesis 3. Complicated Recycles & Control Loops - Cavett problem - Variations of above 86 43

44 Optimization of Monochlorobenzene Process PHYSICAL PROPERY OPIONS Cavett Vapor Pressure Redlich-Kwong Vapor Fugacity Corrected Liquid Fugacity Ideal Solution Activity Coefficient OP (SCOP) OPIMIZER Optimal Solution Found After 4 Iterations Kuhn-ucer Error.966E-5 Allowable Kuhn-ucer Error.986E-4 Objective Function Fe ed 8 o F 37 psia Optimization Variables ear Variables.6E ear Variable Errors (Calculated Minus Assumed) -.6E-9.79E-6 S HC S6 A- ABSORBER 5 rays (3 heoretical Stages) 3 psia S4 7 o F S Steam 36 o F H- U = Fe ed Flow Rates LB Moles/Hr HC Benzene 4 MCB 5 F- FLASH E-4.E+.E+ -Results of infeasible path optimization -Simultaneous optimization and convergence of tear streams. S3 P S7 S8 Maimize Profit S5 HC S9 - REAER C P- S5 5 psia S S S3 F U = Cooling Water 8 o F S Benzene,. Lb Mole/Hr of MC B D- DISILLAION 3 rays ( heoretical Stages) 9 o F H- U = Water 8 o F Steam 36 o F, Btu/hr- ft S4 MCB 87 Ammonia Process Optimization H N Hydrogen Feed Nitrogen Feed N 5.% 99.8% H 94.% --- CH4.79 %.% Ar ---.% ν P r r f f o Prod uc t Hydrogen and Nitrogen feed are mied, compressed, and combined with a recycle stream and heated to reactor temperature. Reaction occurs in a multibed reactor (modeled here as an equilibrium reactor) to partially convert the stream to ammonia. he reactor effluent is cooled and product is separated using two flash tans with intercooling. Liquid from the second stage is flashed at low pressure to yield high purity NH 3 product. Vapor from the two stage flash forms the recycle and is recompressed

45 Ammonia Process Optimization Optimization Problem Ma {otal 5% over five years} Performance Characterstics 5 SQP iterations.. base point simulations. s.t. 5 tons NH3/yr. objective function improves by Pressure Balance No Liquid in Compressors $.66 6 to $ H/N 3.5 difficult to converge flowsheet react o F at starting point NH3 purged 4.5 lb mol/hr NH3 Product Purity 99.9 % Item Optimum Starting point ear Equations Objective Function($ 6 ) Inlet temp. reactor ( o F) 4 4. Inlet temp. st flash ( o F) Inlet temp. nd flash ( o F) Inlet temp. rec. comp. ( o F) Purge fraction (%) Reactor Press. (psia) Feed (lb mol/hr) Feed (lb mol/hr) How accurate should gradients be for optimization? Recognizing rue Solution KK conditions and Reduced Gradients determine true solution Derivative Errors will lead to wrong solutions! Performance of Algorithms Constrained NLP algorithms are gradient based (SQP, Conopt, GRG, MINOS, etc.) Global and Superlinear convergence theory assumes accurate gradients Worst Case Eample (Carter, 99) Newton s Method generates an ascent direction and fails for any ε! Min f ( ) = A ε + / ε ε / ε A = / / ε ε ε + ε = [ ] f ( ) = ε g( ) = f ( ) + O( ε ) d = A g( ) κ ( A) = (/ ε) d ideal f -g d actual 9 45

46 Implementation of Analytic Derivatives parameters, p eit variables, s Module Equations c(v,, s, p, y) = y Sensitivity Equations dy/d ds/d dy/dp ds/dp Automatic Differentiation ools JAKE-F, limited to a subset of FORRAN (Hillstrom, 98) DAPRE, which has been developed for use with the NAG library (Pryce, Davis, 987) ADOL-C, implemented using operator overloading features of C++ (Griewan, 99) ADIFOR, (Bischof et al, 99) uses source transformation approach FORRAN code. APENADE, web-based source transformation for FORRAN code Relative effort needed to calculate gradients is not n+ but about 3 to 5 (Wolfe, Griewan) 9 Flash Recycle Optimization ( decisions + 7 tear variables) S3 Ammonia Process Optimization (9 decisions and 6 tear variables) Reac tor S M i er S Flas h P S6 S5 S4 Ratio 3 / M a S3(A) *S3(B) - S3(A) - S3(C) + S3(D) -(S 3(E)) S7 Lo P Flas h Hi P Flas h 8 GRG GRG SQP 6 SQP r SQP CPU Seconds (VS 3) r SQP CPU Seconds (VS 3) 4 Nu merical Eact Nu merical Eact 9 46

47 Large-Scale SQP Min f(z) Min f(z ) d + / d W d s.t. c(z)= s.t. c(z ) + (Α ) d = z L z z U z L z + d z U Characteristics Many equations and variables ( ) Many bounds and inequalities ( ) Few degrees of freedom ( - ) Steady state flowsheet optimization Real-time optimization Parameter estimation Many degrees of freedom ( ) Dynamic optimization (optimal control, MPC) State estimation and data reconciliation 93 Few degrees of freedom => reduced space SQP (rsqp) ae advantage of sparsity of A= c() project W into space of active (or equality constraints) curvature (second derivative) information only needed in space of degrees of freedom second derivatives can be applied or approimated with positive curvature (e.g., BFGS) use dual space QP solvers + easy to implement with eisting sparse solvers, QP methods and line search techniques + eploits 'natural assignment' of dependent and decision variables (some decomposition steps are 'free') + does not require second derivatives - reduced space matrices are dense - may be dependent on variable partitioning - can be very epensive for many degrees of freedom - can be epensive if many QP bounds 94 47

48 48 95 Reduced space SQP (rsqp) Range and Null Space Decomposition = + ) ( ) ( c f d A A W λ Assume no active bounds, QP problem with n variables and m constraints becomes: Define reduced space basis, Z R n (n-m) with (A ) Z = Define basis for remaining space Y R n m, [Y Z ] R n n is nonsingular. Let d = Y d Y + Z d Z to rewrite: [ ] [ ] [ ] = + ) c( ) ( f I Z Y d d I Z Y A A W I Z Y Z Y λ 96 Reduced space SQP (rsqp) Range and Null Space Decomposition (A Y) d Y =-c( ) is square, d Y determined from bottom row. Cancel Y WY and Y WZ; (unimportant as d Z, d Y --> ) (Y A) λ = -Y f( ), λ can be determined by first order estimate Calculate or approimate w= Z WY d Y, solve Z WZ d Z =-Z f( ) - w Compute total step: d = Y d Y + Z d Z = + ) c( ) ( f Z ) ( f Y d d Y A Z W Z Y W Z A Y Z W Y Y W Y Z Y λ

49 Reduced space SQP (rsqp) Interpretation d Z d Y Range and Null Space Decomposition SQP step (d) operates in a higher dimension Satisfy constraints using range space to get d Y Solve small QP in null space to get d Z In general, same convergence properties as SQP. 97 Choice of Decomposition Bases. Apply QR factorization to A. Leads to dense but well-conditioned Y and Z. R A = Q = R [ Y Z]. Partition variables into decisions u and dependents v. Create orthogonal Y and Z with embedded identity matrices (A Z =, Y Z=). A = I Z = C [ c c ] = [ N C] u N N C Y = I 3. Coordinate Basis - same Z as above, Y = [ I ] v Bases use gradient information already calculated. Adapt decomposition to QP step heoretically same rate of convergence as original SQP. Coordinate basis can be sensitive to choice of u and v. Orthogonal is not. Need consistent initial point and nonsingular C; automatic generation 98 49

50 rsqp Algorithm. Choose starting point.. At iteration, evaluate functions f( ), c( ) and their gradients. 3. Calculate bases Y and Z. 4. Solve for step d Y in Range space from (A Y) d Y =-c( ) 5. Update projected Hessian B ~ Z WZ (e.g. with BFGS), w (e.g., zero) 6. Solve small QP for step d Z in Null space. Min s. t. ( Z f ( ) + w ) d L + Yd Y + Zd Z 7. If error is less than tolerance stop. Else 8. Solve for multipliers using (Y A) λ = -Y f( ) 9. Calculate total step d = Y dy + Z dz.. Find step size α and calculate new point, + = + α d 3. Continue from step with = +. Z + / d U Z B d Z 99 rsqp Results: Computational Results for General Nonlinear Problems Vasantharajan et al (99) 5

51 rsqp Results: Computational Results for Process Problems Vasantharajan et al (99) Comparison of SQP and rsqp Coupled Distillation Eample - 5 Equations Decision Variables - boilup rate, reflu ratio Method CPU ime Annual Savings Comments. SQP* hr negligible Base Case. rsqp 5 min. $ 4, Base Case 3. rsqp 5 min. $ 84, Higher Feed ray Location 4. rsqp 5 min. $ 84, Column Overhead to Storage 5. rsqp 5 min $7, Cases 3 and 4 together 8 Q VK Q VK 5

52 Nonlinear Optimization Engines Evolution of NLP Solvers: process optimization for design, control and operations SQP rsqp IPOP 8s: 9s: s: Flowsheet Static Simultaneous Real-time optimization dynamic optimization optimization (RO) over variables variables and constraints and and constraints 3 Many degrees of freedom => full space IPOP W A + Σ A d ϕ( = λ c( + ) ) wor in full space of all variables second derivatives useful for objective and constraints use specialized large-scale Newton solver + W= L(,λ) and A= c() sparse, often structured + fast if many degrees of freedom present + no variable partitioning required - second derivatives strongly desired - W is indefinite, requires comple stabilization - requires specialized large-scale linear algebra 4 5

53 Pipelines Gasoline Blending OIL ANKS Gasoline Blending Here GAS SAIONS Supply tans Intermediate tans FINAL PRODUC RUCKS Final Product tans 5 Blending Problem & Model Formulation q t, i q i Supply tans (i).. q t, j..q t, f t, ij f t, j v v t, j Intermediate tans (j).. f t, Final Product tans () f, v flowrates and tan volumes q tan qualities ma s.t. ( c f c ) t, i, t f t i i f t, j i f t, f = t, j j q f q f + q t, j t, j t, i t, ij t +, j i q f q f = t, t, t, j t, j j q q t, q ma min v j v t, j v j min ma f + v = v t, ij t +, j t, j v t = q v +, j t, j t, j Model Formulation in AMPL 6 53

54 Small Multi-day Blending Models Single Qualities Haverly, C. 978 (HM) Audet & Hansen 998 (AHM) F B F B F P F P B F3 B F3 P B3 7 Honeywell Blending Model Multiple Days 48 Qualities 8 54

55 Summary of Results Dolan-Moré plot Performance profile (iteration count) φ IPOP LOQO KNIRO SNOP MINOS LANCELO. τ 9 Comparison of NLP Solvers: Data Reconciliation Iterations Degrees of Freedom LANCELO MINOS SNOP KNIRO LOQO IPOP CPU ime (s, norm.) 4 6. LANCELO MINOS SNOP KNIRO LOQO IPOP. Degrees of Freedom 55

56 Comparison of NLP solvers (latest Mittelmann study) Mittelmann NLP benchmar (-6-8) fraction solved within log()*minimum CPU time IPOP KNIRO LOQO SNOP CONOP Limits Fail IPOP 7 KNIRO 7 LOQO 3 4 SNOP 56 CONOP 55 7 Large-scale est Problems 5-5 variables, 5 constraints ypical NLP algorithms and software SQP - reduced SQP - NPSOL, VFAD, NLPQL, fmincon SNOP, rsqp, MUSCOD, DMO, LSSOL Reduced Grad. rest. - GRG, GINO, SOLVER, CONOP Reduced Grad no rest. - MINOS Second derivatives and barrier - IPOP, KNIRO, LOQO Interesting hybrids - FSQP/cFSQP - SQP and constraint elimination LANCELO (Augmented Lagrangian w/ Gradient Projection) 56

57 Summary and Conclusions Optimization Algorithms -Unconstrained Newton and Quasi Newton Methods -KK Conditions and Specialized Methods -Reduced Gradient Methods (GRG, MINOS) -Successive Quadratic Programming (SQP) -Reduced Hessian SQP -Interior Point NLP (IPOP) Process Optimization Applications -Modular Flowsheet Optimization -Equation Oriented Models and Optimization -Realtime Process Optimization -Blending with many degrees of freedom Further Applications -Sensitivity Analysis for NLP Solutions -Multi-Scenario Optimization Problems 3 57

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Survey of NLP Algorithms L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA NLP Algorithms - Outline Problem and Goals KKT Conditions and Variable Classification Handling