A Primal-Dual Interior-Point Method for an Optimization Problem Related to the Modeling of Atmospheric Organic Aerosols 1

Size: px

Start display at page:

Download "A Primal-Dual Interior-Point Method for an Optimization Problem Related to the Modeling of Atmospheric Organic Aerosols 1"

Melissa Hodges
5 years ago
Views:

1 A Primal-Dual Interior-Point Method for an Optimization Problem Related to the Modeling of Atmospheric Organic Aerosols 1 N. R. Amundson 2, A. Caboussat 3, J.-W. He 4, J. H. Seinfeld 5 1 This work was supported by U.S. Environmental Protection grant X The second author was partially supported by Swiss National Science Foundation grant PBEL Cullen Professor of Chemical Engineering and Professor of Mathematics, Department of Mathematics, University of Houston, Houston, Texas. 3 Assistant Professor, Department of Mathematics, University of Houston, Houston, Texas. 4 Associate Professor, Department of Mathematics, University of Houston, Houston, Texas. 5 Louis E. Nohl Professor and Professor of Chemical Engineering, Department of Chemical Engineering, California Institute of Technology, Pasadena, California.

2 Abstract. A mathematical model for the computation of the phase equilibrium related to atmospheric organic aerosols is presented. The phase equilibrium is given by the global minimum of the Gibbs free energy for a system that involves water and organic components. This minimization problem is equivalent to the determination of the convex hull of the corresponding molar Gibbs free energy function. A geometrical notion of phase simplex related to the convex hull is introduced to characterize mathematically the phases at equilibrium. A primal-dual interior-point algorithm for the efficient solution of the phase equilibrium problem is presented. A novel initialization of the algorithm, based on the properties of phase simplex, is proposed to ensure the convergence to a global minimum of the Gibbs free energy. For a finite termination of the interior-point method, an active phase identification procedure is incorporated. Numerical results show the robustness and efficiency of the approach for the prediction of liquid-liquid equilibrium in multicomponent mixtures. Key Words. Phase equilibrium problem, minimization of Gibbs free energy, convex hull, phase simplex, primal-dual formulation, interior-point method. 1 Introduction Over the last two decades, a series of thermodynamic modules has been developed in the atmospheric modeling community to predict the phase transition and multistage growth phenomena of inorganic aerosols (Refs. 1 3). However the prediction of phase separation in liquid-liquid equilibria for organic aerosols has gathered much less attention. The phase equilibrium is characterized by the global minimum of the Gibbs free energy of the system. Hence, to solve phase equilibrium problems, the minimization of Gibbs free energy constrained by mass balances is a standard approach. There are essentially two categories of minimization methods that have been proposed in order to determine whether an equilibrium solution corresponds with a global minimum of the Gibbs free energy: direct minimization of the Gibbs free energy (Refs. 4 6) and minimization of the tangent-plane distance function (Refs. 7 9). However, such global minimization methods are computationally intensive making their use in 3D 2

3 air quality models infeasible. In this paper, a primal-dual interior-point algorithm for the efficient solution of the phase equilibrium problem is proposed. A geometrical concept of phase simplex is introduced to characterize an equilibrium solution that corresponds with a global minimum of the Gibbs free energy. To ensure the convergence to a global minimum, a novel initialization of the algorithm, based on the properties of phase simplex, is presented, in which the algorithm is started from an initial solution involving all possible phases in the system. The algorithm applies at each step a Newton method to the Karush- Kuhn-Tucker (KKT) system of equations, perturbed by a log-barrier penalty term, to find the next primal-dual approximation of the solution. To ensure that the algorithm converges to a stable equilibrium rather than any other first order optimality point such as a maximum, a saddle point, or a unstable local minimum, a technique of deflating is implemented by keeping positive definite the reduced Hessian matrix of the molar Gibbs free energy of each phase. For a finite termination of the interior-point method, an active phase identification procedure is incorporated for the correct detection of phases whose total number of substances reaches the lower bound, e.g., zero, at equilibrium. By removing the vanishing phases from the system, it permits to predict the correct number of phases at equilibrium and their compositions precisely. The structure of this paper is the following: in Section 2, the mathematical modeling and analysis of the phase equilibrium problem for organic aerosols are presented. In Section 3, a primal-dual interior-point method is proposed for solving the phase equilibrium problem. Numerical results are presented in Section 4 to illustrate the efficiency of the method. Section 5 consists of the conclusions. 2 Description of Phase Equilibrium Problem 2.1 Formulation of Phase Equilibrium Problem The phase equilibrium for organic aerosols is given by the global minimum of the Gibbs free energy of the system. This minimization problem is to be formulated under three different forms, which can be chosen on the basis of analysis or computational convenience. Let R + denote the set of nonnegative real numbers and R ++ denote the set of positive real numbers. Consider a 3

4 chemical system composed of n s substances at a specified temperature T and pressure P. For a given substance-abundance feed-vector in units of moles b R ns +, we are interested in determining the state of the system at the thermodynamic equilibrium, i.e., how many phases there are and what their compositions are. The phase equilibrium is given by the solution of the following constrained minimization problem min G(π, n 1,..., n π ) = s. t. n α = b, g(n α ), (1a) (1b) n α 0, α = 1,..., π, (1c) where π is the number of phases, n α R ns + is the concentration vector of moles in phase α, for α = 1,, π, and g : R ns + R + is the Gibbs free energy (GFE). Problem (1) is called the phase equilibrium problem (PEP). In (1), relation (1a) gives the total Gibbs free energy of the system, (1b) is the mass balance equation that ensures the total quantity of the substances in the system equal to the feed-vector b, and (1c) enforces the nonnegativity of the concentrations. In the formulation of (1), we assume that all the phases in the system belong to a same phase class so that the GFE function g is the same for all phases; we also assume that all substances can partition into all phases and that no reactions are possible between the different substances. The GFE function g is the relevant thermodynamic function for the PEP, and is usually defined by g(n α ) = n T α µ(n α), where, µ(n α ) R ns is the chemical potential vector, and is a homogeneous function of degree zero. The GFE function g is thus homogeneous of degree one. For a given vector n α R ns +, let y α = e T n α, x α = n α /y α, where e is the vector whose components are equal to one. Then, g(n α ) = y α g(x α ), with g(x α ) := x T α µ(x α). 4

5 Thus, problem (1) is equivalent to min G(π, y 1,..., y π, x 1,..., x π ) = y α g(x α ), (2a) s. t. e T x α = 1, x α 0, α = 1,..., π, (2b) y α x α = b, (2c) y α 0, α = 1,..., π, (2d) where y α R + is the total number of moles in phase α, x α R ns + is the mole-fraction concentration vector of phase α, and g is the molar Gibbs free energy function. From the definition of g(x) := x T µ(x), the molar GFE function g is a homogeneous function of degree one such that g(x) = µ(x), 2 g(x) = µ(x), and the pair (0, x) is an eigen-pair for the matrix 2 g(x), i.e.: 2 g(x)x = 0. (3) Relationship (3) is the so-called Gibbs-Duhem relations. For the common GFE functions used to model the PEP, g is continuous on R ns +, C on R ns ++, and such that lim x i 0 g x i =, i = 1,..., n s ; that is, the values of g approach finite limits as any given mole fraction tends to zero, and these limiting values are approached with negatively infinite slope. Remark 2.1. Since g is homogeneous of degree one, it can be easily seen that, if {y α, x α },π is the solution of (2) for the feed-vector b, then for any q > 0, {qy α, x α },π is the solution of (2) for the feed-vector qb. Therefore, without loss of generality, it is assumed that e T b = 1. 5

6 Let us now define some notations that will be used throughout the paper. Let n s denote the unit simplex of R ns, i.e., Let n = n s 1, define n s = {x R ns : e T x = 1, x 0}. n = {z R n : e T z 1, z 0}, and denote by int n its interior. Note that n = conv(e 0, e 1,..., e n ), with e 0 = 0 and {e 1,..., e n } being the canonical basis of R n. The simplex n can be identified with n s via the mapping Π : n z x = e ns + Z e z n s, where Ze T = (I, e) with I the identity matrix. Let f = g Π. Then, f is C 0 on n, C on int n, and has the subdifferential f(z) = for z n. Let P denote the projection from R ns to R n such that P (x 1,..., x n, x ns ) = (x 1,..., x n ). Let z α = P x α, denoting the reduced composition vector composed of the n first components of x α, for α = 1,..., π, and d = P b (b n s ), denoting the reduced feed vector composed of the n first components of b. Problem (2) is thus equivalent to min G(y 1,..., y π, z 1,..., z π ) = y α f(z α ), (4a) s. t. y α 0, α = 1,..., π, (4b) y α z α = d, (4c) y α = 1, (4d) Remark 2.2. Since the domain of f is n, we do not need to include z α n as constraints in the feasibility conditions of (4). 6

7 An application of Carathéodory s theorem (Ref. 10) implies that the PEP is equivalent to the determination of the convex hull of f on n. Recall that the convex hull of f is the largest convex extended real-valued function majorized by f on n. We have then the following theorem: Theorem 2.1. For every d n, the minimum of (4) is convf(d), the value of the convex hull of f at d. Moreover, one has convf(d) = y α f(z α ) (5) for some convex combination d = y α z α, y α = 1, y α 0, α = 1,..., π, (6) with π n+1. The point (y α, z α ),...,π R (n+1)π + is called a phase splitting of d Remark 2.3. A phase splitting (y α, z α ),...,π of d (d n ) can be quite improper. Some of these phases can be inactive, namely those for which y α = 0, and, the phases z α need not be distinct. It is easy to remedy this by eliminating all the indices α such that y α = 0, and adding all the y α > 0 corresponding to the same phase. Therefore, we need only to consider a phase splitting (y α, z α ),...,π of d that is stable in the sense that all the phases are distinct and active. It is now clear why we have transformed (1) into (4). Knowing that we are actually looking for convf while solving (1) leads us to introduce the notion of phase simplex in Section 2.2, a central notion that allows us to discuss the phase equilibrium problem in an abstract framework. 2.2 Geometrical Aspects of Phase Equilibrium Problem: Convex Hull and Phase Simplex Note that, for a given d in n, it is possible that several stable phase splittings achieve the minimum of (4). However, it is also possible to prove that uniqueness holds in most cases. More precisely, for generic GFE functions, i.e., for GFE functions in a residual set R of E = {f C (int n ) : f C 0 ( n ), f(z) = for z n }, (7) 7

8 there is a unique stable phase splitting, which is a simplex. The following theorem, characterizing the geometrical structure of convf(d) in terms of phase simplex, is taken from (Ref. 11), in which the interested reader will be able to find the proof. Let us just say that the proof involves the multijet theory, and in particular Thom s multijet transversality theorem. Theorem 2.2. There is a residual set R of E such that, for any GFE function f in R, every d int n has a unique stable phase splitting. More precisely, for every d int n, there exists a unique (π 1)-simplex Σ(d) = conv(z 1,, z π ) with π n + 1 such that convf(d) = π y αf(z α ) with the barycentric representation d = π y αz α and π y α = 1. From now on, we assume that the GFE function f belongs to R. For a given d int n, the unique stable phase splitting of d, Σ(d) = conv(z 1,, z π ), is called the phase simplex of d. Phase simplex have some interesting properties that we state in the following theorems, whose proofs are given elsewhere (Refs ). Theorem 2.3. Consider d int n and Σ(d) = conv(z 1,, z π ) the phase simplex of d. The following properties hold: 1. For all α = 1,..., π, z α int n and convf(z α ) = f(z α ). 2. d rint Σ(d) and y α > 0 for α = 1,..., π. 3. convf is affine on Σ(d). 4. For d rint Σ(d), Σ(d ) = Σ(d). The following theorem is related to the so-called Gibbs tangent plane criterion (Refs. 7-4), which states that the affine hyperplane tangent to the graph of f at (z α, f(z α )), α = 1,..., π, lies entirely below the graph. Theorem 2.4. A (π 1)-simplex Σ = conv(z 1,, z π ) is a phase simplex if and only if there exist multipliers η R n and γ R such that f(z α ) + η = 0, α = 1,..., π, (8a) f(z α ) + η T z α + γ = 0, α = 1,..., π, (8b) f(z) + η T z + γ 0, z n. (8c) 8

9 Let Ω 0 = {d int n : dim(σ(d)) = 0} be the set of single-phase points in int n. A point d int n is a singlephase point if and only if convf(d) = f(d). Note that, from Theorem 2.3, the vertices of a phase simplex are single-phase points. Corollary 2.5 gives how the Gibbs tangent plane criterion (8c) can be used to determine whether a tangent simplex is a phase simplex. Corollary 2.5. Let Σ = conv(z 1,...,z π ) be a (π 1)-simplex with z α Ω 0. If there exist multipliers η R n and γ R satisfying conditions (8a) and (8b), then Σ is a phase simplex. 2.3 KKT System and Mathematical Characterizations In the previous section we have studied the convex hull of f to get information about the global minimizer of (4). In this section, we are going to present the results that characterize the local minima of the problem and distinguish them from the global minimum; the proofs of these results are given elsewhere (Refs ). In order to study the local minima, we will use the Kuhn-Tucker theory. The following lemma, which is an adaptation of lemma 4 in (Ref. 14), gives a convenient sufficient condition of d, ensuring that the active phases involved at local minima are in int n, the domain on which the GFE functions are differentiable. Lemma 2.1. Let (y α, z α ),...,π be a local minimizer of (4) for d int n. If y α > 0, then z α int n. The following lemma, which is an adaptation of lemma 1 in (Ref. 15), ensures that constraint qualification holds at any feasible point. Lemma 2.2. The linear independent constraint qualification (LICQ) holds for (4) at any feasible point. From Lemmas 2.1 and 2.2, it is valid to use the Kuhn-Tucker theory. Then we have the following theorem that gives the the Karush-Kuhn-Tucker (KKT) system of the first order necessary optimality conditions for problem (4). 9

10 Theorem 2.6. Let Y = (y α, z α ),...,π be a local minimizer of (4) with d int n. Then there exist unique Lagrange multipliers η R n and γ R such that y α ( f(z α ) + η) = 0, α = 1,..., π, (9a) f(z α ) + η T z α + γ θ α = 0, α = 1,..., π, (9b) d = y α z α, (9c) θ α y α = 0, θ α 0, y α 0, α = 1,..., π, (9d) where θ α R are the Lagrangian multipliers relative to the non-negativeness constraints y α 0, α = 1,..., π. A solution of the KKT system (9), generally non unique, is called a KKT point of (4). The hyperplane associated with the KKT point Y is defined by H(z) = η T z γ. Remark 2.4. Similarly to Remark 2.3, a KKT point Y = (y α, z α ),...,π of (4) can be quite improper. Some of these phases z α can be inactive, namely those for which y α = 0, and, the phases z α need not be distinct. It is easy to remedy this by eliminating all the indices α such that y α = 0, and adding all the y α > 0 corresponding to the same phase. Therefore, we need only to consider a KKT point (y α, z α ),...,π of (4) that is stable in the sense that all the phases are active and distinct. Let A denote the set of indices of active (and distinct) phases. For any α A, we have f(z α ) = η and θ α = 0, so that f(z α ) = η T z α γ. Then the hyperplane H(z) = η T z γ associated with Y is tangent to f at all the active phases. The assumption that the GFE function f belongs to the residual set R implies the following corollary: Corollary 2.7. Let Y = (y α, z α ),...,π be a stable KKT point of (4) with d int n such that y α > 0 and z α are distinct. Then the set Σ = conv(z 1,..., z π ) is a (π 1)-simplex with π n + 1. From Theorem 2.2, we know that for a global minimizer of problem (4), denoted Y = (y α, z α ),...,π, that is a stable phase splitting of d int n, the set Σ = conv(z 1,..., z π ) is the phase simplex of d. From Corollaries

11 and 2.7, the position of the simplex associated with a KKT point of problem (4) might be a criterion to determine if this KKT point is a global minimum of (4). The aim of the following theorem is to make this statement precise. Theorem 2.8. Consider Y = (y α, z α),...,π a feasible point of (4). The point Y is a global minimum of (4) if and only if Y is a KKT point of (4) and z α Ω 0 for y α > 0, i.e., any active phase is a single-phase point. Therefore we have a global criterion to determine whether a KKT point is a global minimum. This criterion is equivalent to require that we determine whether the hyperplane associated with the KKT point lies below the graph of the GFE f on n. Theorem 2.9. Consider Y = (y α, z α),...,π a feasible point of (4). The point Y is a global minimum of (4) if and only if Y is a KKT point of (4) and f(z) H (z) z n, (10) where H (z) is the hyperplane associated with the KKT point Y. Remark 2.5. The implication of Lemmas 2.1, 2.2, and Theorems for problem (2) are obvious. Indeed, (y α, x α ),...,π is a local minimizer of (2) if and only if (y α, z α = P x α ),...,π is a local minimizer of (4), d is in int n if and only if b is in rint n s. Then if (y α, x α ),...,π is a local minimizer of (2) and b is in rint n s, y α > 0 implies that x α belongs to rint n s, so that g is differentiable at x α. Since the LICQ holds at (y α, x α ),...,π, we have then y α ( g(xα ) + λ ) + ζ α e = 0, α = 1,..., π, (11a) g(x α ) + λ T x α θ α = 0, α = 1,..., π, (11b) e T x α = 1, x α > 0, α = 1,..., π, (11c) y α x α = b, (11d) θ α y α = 0, θ α 0, y α 0, α = 1,..., π. (11e) where λ R ns is the multiplier relative to the mass balance equation (2c) or (11c), which is related to the multipliers η and γ in (9) via η = Z T e λ, γ = λ n s, (12) with λ ns the n s th component of λ, and ζ α R are the multipliers relative to the normalization constraints e T x α = 1, α = 1,..., π. 11

12 Note that the tangent plane criterion (10) stated in Theorem 2.9 is a global condition. There is no rigorous approach to determine whether the tangent plane arising from a KKT point lies below the molar Gibbs free energy surface for all feasible compositions z in n. Therefore, one has to rely on local criteria, even if this increases the odds of finding local minima. One such local criteria is related to a local phase stability test, which states that, if a postulated KKT point Y = (y α, z α ),...,π is thermodynamically stable with respect to perturbations in any or all of the phases, then f(z) H (z) (z z α )T 2 f(z α )(z z α ) 0, z B ɛ(z α ), (13) where B ɛ (z α ) is a neighborhood of z α in int n. Relationship (13), is equivalent to 2 f(z α ) 0. From now on, we assume that the Hessian matrix of f, 2 f, is positive at the phases z α of Y, i.e., 2 f(z α ) 0. (14) Relation (14) is also called the meta-stability conditions for problem (4). Similarly, we have the following meta-stability conditions for problem (2) Z T e 2 g(x α )Z e > 0, (15) where x α = Πz α. Therefore, for a globally or meta stable equilibrium, the reduced Hessian matrices Ze T 2 g(x α )Z e must be positive definite. It is also important to recall that the Hessian matrix 2 g(x) is singular for any feasible composition x due to the Gibbs-Duhem relation (3). The above mathematical characterizations are, however, not directly applicable for computation because finding a solution that satisfies the KKT system (9), or equivalently (11), is a difficult problem. The difficulty is mainly caused by the combinatorial aspect of the KKT system (11), or more precisely by the complementary slackness conditions: θ α 0, y α 0 and θ α y α = 0. Indeed one could attempt to guess the optimal active set of phases A = { α : y α > 0 }, i.e., the set of phases that actually exist at the equilibrium. Based on this guess, one could transform (11) into a system of nonlinear equations, which is much more computationally tractable. Unfortunately, the set of all possible active sets grows exponentially with π, the number of phases considered. Moreover, not all the solutions of the KKT system (11) are solutions of (2); some of them could be, for example, maximizers, saddle points, or unstable local minimizers. Therefore this type of approach can only be practical 12

13 if initiated by a correct guess of the active set. This question is addressed in Section Active Phase Identification Procedure The accurate identification of active phases is important (Ref. 16). Such an identification, by removing the difficult combinatorial aspect of (2), reduces the inequality constrained minimization problem to an equality constrained problem which is much easier to deal with. An active phase identification procedure is presented here that correctly detects active phases in a neighborhood of a KKT point. In order to identify accurately the active phases, one needs to have a pair of primal and dual variables, e.g., (y α, x α ; ζ α, λ, θ α ), that is close to a KKT point. The primal-dual interior-point algorithm presented later will produce such a sequence of primal and dual variables. Let us first define an identification function: ρ(y α, x α, ζ α, λ, θ α ) := φ(y α, x α, ζ α, λ, θ α ) 2, (16) where the vector-value function φ is given by ( (yα,x φ(y α, x α, ζ α, λ, θ α ) := α,λ,ζ α)l(y α, x α, ζ α, λ, θ α ) where L is the Lagrangian function of (2), defined by L(y α, x α, ζ α, λ, θ α ) = y α g(x α ) + ζ α (e T x α 1) ( ) +λ T y α x α b θ α y α. Note that the identification function ρ is continuous and takes the value of zero at any KKT point since the first four sets of equations of the KKT system (11) can be written as Then, the index set defined by θ α (yα,x α,ζ α,λ)l(y α, x α, ζ α, λ, θ α ) = 0. E(y α, x α, ζ α, λ, θ α ) := { α y α c ρ(y α, x α, ζ α, λ, θ α ) }, ), 13

14 with c > 0, can be used to detect the vanishing phases whose total number of substances reaches the lower bound, e.g. zero, at the solution, if the pair (y α, x α, ζ α, λ, θ α ) is sufficiently close to a KKT point (Ref. 16). Therefore, the set A of active phases actually present at the equilibrium can be obtained by removing the vanishing phases from the system: A := {1,..., π} E(y α, x α, ζ α, λ, θ α ). (17) The exact solution of (2) can be computed based on the active phases from the following reduced KKT system of equations: ( y α g(xα ) + λ ) + ζ α e = 0, α A, g(x α ) + λ T x α = 0, α A, e T x α = 1, x α > 0, y α > 0, α A, y α x α = b. α A (18) This identification procedure permits to couple the interior-point method with an active set method for the activation/deactivation of the phases. The procedure is the following: the active set procedure begins with the initial guess A = {1,..., π}. At each iteration of the Newton method, the active set of phases A is given by (17) and the problem (18) is solved with a Newton method described in Section 3 to give a new iterate (y α +, x+ α, ζ+ α, λ+, θ α + ). Let P a denote the index set of phases α A, that satisfy y α + c ρ(y α +, x + α, ζ α +, λ +, θ α + ) or 0 < y α + < ɛ y, where ɛ y is a threshold that is set to The set P a is the set of phases that are to be removed from the set of active phases at next iteration. Let P d denote the index set of phases α / A, that satisfy θ α + = g(x+ α ) + (x+ α )T λ + < 0 and that have to be added to the active set. The new active set A + is then given at the next iteration by A + = ( A P d) \P a. (19) The KKT equations (18) are then updated and another Newton iteration is carried out. This active set procedure is coupled with an interior-point method for a fast convergence to the phase equilibrium, as described in Section 3. 14

15 3 A Primal-Dual Interior-Point Method 3.1 A Log/Barrier Penalty Problem The fact that it is impractical to solve directly the KKT system (11) and that a convergent sequence of primal and dual variables is needed in order to accurately identify the active phases (17), gives rise to a primal-dual interiorpoint method that is presented here (Refs ). First a basic description of the algorithm is given, since it is required to understand the analysis that follows. Let us first soften the non-negativity constraints y α 0 by adding slack variables s α, α = 1,..., π, and incorporating them into a logarithmic barrier term in the objective function. Problem (2) is transformed into the following barrier problem: minimize B ν (y α, x α ) = y α g(x α ) ν ln s α, subject to e T x α = 1, x α > 0, α = 1,..., π, y α x α = b, y α s α = 0, s α > 0, α = 1,..., π, (20) where ν is a positive parameter. Problem (20) is not equivalent to (2), but contains only equality constraints, and is much simpler to solve than (2). In our primal-dual interiorpoint algorithm, (20) is approximately solved by applying one Newton iteration to its KKT system of equations, then decreasing ν, and repeating the process. This will lead to a sequence of iterates that will converge to a solution of (2) as ν 0 under certain assumptions, as mentioned in the next lemma, which is an application of theorem 8 in (Ref. 20) with box constraints y α 0. Lemma 3.1. Since the objective function and constraints of the problem (2) are continuous, the solution to the penalized problem (20) converges to the solution to the initial problem (2), when the penalty parameter ν tends to zero. This convergent sequence is used for a finite termination of the algorithm by applying the active phase identification procedure outlined in Section 2.4. Once the vanishing phases are identified and removed from the iterations, the 15

16 exact solution of the phase equilibrium problem can be obtained by setting ν = 0 and computing for a final step an equilibrium point only on the active phases. Let us now consider the problem of finding an approximate solution of problem (20) for a fixed value of the parameter ν. Denoting the Lagrange multipliers for y α s α = 0 again by θ α, α = 1,..., π, the KKT conditions for the barrier problem take the form: ( y α g(xα ) + λ ) + ζ α e = 0, α = 1,..., π, g(x α ) + λ T x α θ α = 0, α = 1,..., π, e T x α = 1, x α > 0, α = 1,..., π, y α x α = b, y α s α = 0, s α > 0, α = 1,..., π, s α θ α ν = 0, θ α > 0, α = 1,..., π, where, the last two sets of equations can be combined by eliminating the slacks s α, yielding the reduced system y α ( g(xα ) + λ ) + ζ α e = 0, α = 1,..., π, (21a) g(x α ) + λ T x α θ α = 0, α = 1,..., π, (21b) e T x α = 1, x α > 0, α = 1,..., π, (21c) y α x α = b, (21d) y α θ α ν = 0, y α > 0, θ α > 0, α = 1,..., π. (21e) Note that the above KKT system (21) contains only equations, and can be viewed as a perturbation of the original KKT system (11) where the complementary slackness conditions are approximated by a set of equations that is controlled by ν. Note also that the KKT system (21) produces a sequence of primal and dual variables (y α, x α ; λ, ζ α, θ α ) that converges to a solution of (11) as ν 0; the convergent sequence is used in the active phase identification procedure (17) for a finite termination of the algorithm. For this reason, the KKT system is not further simplified by eliminating the dual variable θ α from the second set of equations with the relations θ α = ν/y α. Let us ignore (for the moment) the fact that y α and θ α must be positive, and simply apply Newton s method to (21) to compute a displacement in 16

17 (y α, x α, ζ α, λ, θ α ), denoted by (p yα, p xα, p ζα, p λ, p θα ). This gives the following symmetric indefinite system: y α 2 g(x α )p xα + ( g(x α ) + λ)p yα +y α p λ + p ζα e = y α g(x α ) y α λ ζ α e, α = 1,..., π, ( g(x α ) + λ) T p xα + x T αp λ + p θα = g(x α ) x T αλ θ α, y α p xα + α = 1,..., π, e T p xα = 1 e T x α, α = 1,..., π, p yα x α = b y α x α, θ α y 1 α p y α + p θα = νy 1 α θ α, α = 1,..., π, which is further simplified by eliminating p θα from the second set of equations via the relations from the last set of equations: giving p θα = νy 1 α θ α θ α y 1 α p yα, y α 2 g(x α )p xα + ( g(x α ) + λ)p yα (22) +y α p λ + p ζα e = y α g(x α ) y α λ ζ α e, α = 1,..., π, ( g(x α ) + λ) T p xα + x T α p λ +θ α yα 1 y α = g(x α ) x T α λ + νy 1 α, (23) α = 1,..., π, y α p xα + e T p xα = 1 e T x α, (24) α = 1,..., π, x α p yα = b y α x α. (25) by A new estimate of the solution of the KKT system (21) is then proceeded 17

18 Table 1. Primal-Dual Interior-Point method: summary of the algorithm. Initialization of yα 0, x0 α, η0, θ 0 and ν = ν 0. Set A = {1,..., π}; For j = 0, 1, 2,... (1) Compute the reduced Newton direction (p xα, p yα, p λ, p ξα ) by solving one step (22)-(25) of the Newton method; associated with the active set A; (2) Compute the step-length τ in (26) to ensure that y α + > 0 and θ+ α > 0; (3) Update y α +, x + α, λ +,ξ α + and θ α + ; (4) Update the set of active phases A + with (19); (5) Compute the new parameter ν j+1. until some stopping criterion is satisfied y + α = y α + τp yα, α = 1,..., π, (26a) x + α = x α + τp xα, α = 1,..., π, (26b) ζ + α = ζ α + τp ζα, α = 1,..., π, (26c) λ + = λ + τp λ, (26d) θ + α = θ α + τp θα, α = 1,..., π, (26e) where the step-size τ is chosen to ensure that y + α > 0 and θ + α > 0, and that the merit function, defined in the section below is sufficiently reduced. This active set/newton algorithm is summarized in Tab Forcing Global Convergence An important ingredient of the primal-dual method is to ensure the convergence from a general starting point by requiring, through a line search, a sufficient decrease in a merit function that encourages the early iterates to move toward the solution of the barrier problem (20). Let 2 denote the euclidean norm in R m (where the dimension m is implicitly given). Our line-search method is based on the merit function defined by 18

19 M ν,σ (y α, x α, θ α ) = B ν (y α, x α ) + σ 2 θ α y α e T x α y α x α b, which includes both primal and dual variables (Ref. 21). The function M ν,σ (y α, x α, θ α ) is the barrier function B ν (y α, x α ) defined in (20) augmented by a penalty term that measures the proximity of (y α, x α, θ α ) to the true trajectory of minimizers. Our line-search method uses the solution of the KKT system (22)-(25) as a search direction. A sufficient decrease in M ν,σ (y α, x α, θ α ) is used to encourage progress toward a minimizer of B ν (y α, x α ). To conclude the description of the primal-dual interior-point method, the updates of the barrier parameter ν and penalty parameter σ are described. The primal-dual interior-point method has a two-level structure of inner and outer iterations, with the inner iterations corresponding to the iterations of Newton s method for a given value of ν. If ν is reduced at an appropriate rate (Ref. 17), the inner iterations can be terminated after one iteration so that the combined sequence of inner iterates ultimately converges super-linearly to a minimizer. The reduction of ν in the algorithm is based on π ν = δ θ αy α 2 2, π where δ is a fixed real number between 0 and 1, typically δ = 1/10, see e.g. (Ref. 17). For the penalty parameter σ, it can be shown (Ref. 22) that at each iteration, there is a σ for which the Newton direction is a descent direction for the merit function M ν,σ (y α, x α, θ α ) under certain assumptions. 3.3 Initialization Procedure Let Y = (y α, z α ),...,π be a global minimizer of (4). So as to use a common word in the field of interior methods, we would like to have a central path {yi ν, zν i } i=1,...,π generated by the interior-point method as described in Table 1 that converges to Y. From Theorem 2.8, z α Ω 0 for y α > 0, i.e., any active phase is a single-phase point. Then, for every α {1,..., π } such that y α > 0, there would be a i {1,..., π} such that z α is connected in Ω 0 to points z µ i as µ tends to zero. From Theorem 2.2, we know that the number of active phases at equilibrium is less or equal to n + 1, the number 19 2

20 of substances in the system. Therefore, we set π = n + 1 and would like to have the sequence S ν = conv(z1 ν,..., zν π ) to be initialized by a n-simplex S 0 = conv(z0, 0..., zn) 0 whose vertices zi 0 Ω 0 are the points from which such central path starts. This initialization is actually related to the identification of the connected regions of Ω 0 that the phases z α belong to. Thus, in order not to miss any connected regions of Ω 0, one can track them with a grid method. On the other hand, note that each vertex e i, i = 0,..., n, of the unit simplex n corresponds to the single-phase point of a pure substance system. For a neighborhood V i N (e i ), let us define Vi 0 = V Ω 0 ( ). Chemical arguments lead us to the assumption that for every α {1,..., π } such that y α > 0, there exists a set Vi 0, i {0,..., n}, such that z α is connected to a point zi 0 Vi 0 by a continuous path in Ω 0. Then, the initialization of S ν in the interior-point algorithm is given by S 0 = conv(z0, 0, zn) 0 with zi 0 Vi 0. We have the following convergence theorem for the primal-dual interior-point algorithm, whose proof is given elsewhere (Refs ). Theorem 3.1. Let {y α, z α},π be the global minimizer of (4) for a point d int n with d = π y αz α. Assume that y α > 0 and z α are distinct. Let {yi ν, zν i } i=1,π be the sequence of iterates generated by the interior-point method as described in Table 1 and define d ν = π i=1 yν i zν i. Assume that π = n + 1 and that the sequence S ν = conv(z1, ν..., zπ) ν is initialized by S 0. If the sequence S ν remains as (π 1)-simplexes with zi ν Ω 0, then, as ν 0, d ν d. Furthermore, for α {1,..., π }, there is a unique i(α) {1,..., π} such that yi(α) ν y α and zi(α) ν z α; for j {1,..., π} \ {i(α)},π, yj ν 0. The convergent sequence {yi ν, zν i } i=1,π is used for a finite termination of the interior-point algorithm by identifying the vanishing phases as ν 0, i.e. the phases i {1,..., π} such that y i ɛ y, where ɛ y is a given a priori tolerance. Then, starting for the maximum allowable number of phases, the algorithm will identify the number of active phases at the equilibrium, even if this number is much smaller than the number of substances (π << n s ). Once the initial simplex S 0 is set, then we have x 0 i = Πzi 0, and yi 0 are initialized as the barycentric coordinates of d in S 0 : d = n ı=0 y0 i z0 i. The dual variables θi 0 are initialized by ν/yi 0, while λ 0 is set to minimize the initial residuals g(x 0 i ) + λ0, i = 0,..., n in the least squares sense. 20

21 3.4 Solution Method for KKT System The linear system (22)-(25) for the Newton iteration is a complex form of linear KKT systems. A large variety of methods can be found in the literature for solving linear KKT systems of the form ( H A T A 0 ) ( p q ) = ( c d ). (27) Among them may be mentioned null-space methods (Refs ) and rangespace methods based on the Schur complement S = AH 1 A T (Refs ). The null-space method is based on the following result: if A is of full row rank and Z is a null-space matrix of A, then if the reduced Hessian Z T HZ is positive definite, the KKT system (27) has an unique solution. On the other hand, the importance of the Schur complement is evident by the block factorization ( H A T A 0 ) = ( I 0 H 1 A I ) ( H 0 0 AH 1 A T ) ( I H 1 A T 0 I ), provided that H is not singular and that A is of full row rank. The matrix of the KKT system related to (22)-(25) may be written under the form (27), with the block decompositions: where H = L Λ Y Λ T S X Y T X T 0, A = (e T, 0, 0) R 1 (ns π+π+ns) L = diag(y α 2 g(x α )) R nsπ nsπ, Λ = diag( g(x α ) + λ) R nsπ π, Y = diag(y α I ns ) R nsπ ns, X = (x 1,..., x π ) R π ns, and S = diag(ν/y α ) R π π, where I ns is the identity matrix of size n s n s. Because of the singularity relation (3) of 2 g(x α ), techniques based on directly computing the Schur complement or its inverse cannot be applied to the solution of the linear system (22)-(25). Hence a technique of deflating 21

22 2 g(x α ) is applied to transform the linear system (22)-(25) so that the singularity no longer poses a difficulty. More precisely, the idea is to project the system (22)-(25) onto the null-space of e T so that the corresponding reduced Hessian Z T e 2 g(x α )Z e is not singular. The reduced Hessian Z T e 2 g(x α )Z e is positive definite in a neighborhood of a stable equilibrium, thanks to (15). Then, the reduced system with the positive definite Hessian allows us to apply the Schur complement for its solution. The procedure for the reduction of the system (22)-(25) is described in the sequel. Lemma 3.2. The linear system (22)-(25) is equivalent to: where y α 2 z α fp zα + ( zα f + η) p yα + y α p η = b zα, (28) α = 1,..., π, ( zα f + η) T p zα + zα T p η + (e T x α )p γ + θ α yα 1 y α = b yα, (29) α = 1,..., π, y α p zα + z α p yα = b η, (30) (e T x α )p yα = b γ, (31) b zα b yα b η b γ = y α zα f y α η y α Ze T 2 g(x α )p 0 x α = y α zα f y α η y α (1 e T x α ) ( 2 1:n,n s g(x α ) n 2 s,n s g(x α ) e ), = g(x α ) x T α λ + νy 1 α ( g(x α) + λ) T p 0 x α = g(x α ) x T α λ + νy 1 α (1 et x α ) ( ns g(x α ) + λ ns ), = c π y αz α, = e T b π y α, Proof. Let us first eliminate one component of p xα, say the n th s one, from each phase α, by using (24) to express p xα in terms of the reduced variable p zα R n (with n = n s 1): p xα = p 0 x α + Z e p zα, (32) 22

23 where p 0 x α is a particular solution of (24) and Z e is a null-space matrix of e T that are defined by ( ) ( ) p 0 0 In x α = 1 e T, Z x e = α e T. Let us define zα f = Z T e g(x α), 2 z α f = Z T e 2 g(x α )Z e, η = Z T e λ, where zα f and 2 z α f are the reduced gradient and reduced Hessian of g respectively. The stability criterion (15) implies the positive definiteness of the reduced Hessian 2 z α f (at least in a neighborhood of a stable equilibrium). Let us define Hence, by definition of T : T x α = T Z e = T = ( In 0 e T 1 ). ( ) ( ) ( ) zα c e T, T b = x α e T, T T pη p b λ =, p γ ( ) In, x T α 0 p λ = (T x α ) T T T p λ = zα T p η + (e T x α )p γ, where z α and c are the vector consisting of the first n elements of x α and b, respectively, and p η = Ze T p λ and p γ is the n th s element of p λ,, i.e. p λ = (p η + p γ e, p γ ) T. The linear KKT system (22)-(25) is modified by i) replacing p xα in terms of p zα using (32), ii) multiplying (22) by Ze T and iii) multiplying (25) by T, which yields (28)-(31). In order to perform the following solution method, let us make or recall the following assumptions: (H2) The reduced Hessian y α 2 z α f is positive definite, for all α = 1,..., π; (H3) The concentration vectors z α, α = 1,..., π are affinely independent; 23

24 Assumptions (H2) (H3) are required by the algorithm to ensure that the primal-dual algorithm converges to a stable equilibrium rather than any other first order optimality point such as a maximum, a saddle point, or a unstable local minimum (Ref. 8). Thus, if (H2) is not satisfied, it is enforced by convexifying the reduced Hessian: i.e. when H α = 2 z α f is not sufficiently positive definite, H α is replaced by a modification H α that is sufficiently positive definite with bounded condition number. A straightforward way to determine H α is to form a Cholesky factorization of 2 z α f as described in (Ref. 24). If (H3) is not satisfied, the affinely dependent mole-fractions concentration vectors z α, that represent the same mixing of components, are deleted in order to keep an affinely independent set of vectors z α. Theorem 3.2. Under assumptions (H2) and (H3), the linear system (28)- (31) is solvable in a neighborhood of a KKT point. Proof. The above reduced system (28)-(31) can now be solved by range-space methods based on the Schur complement. The following proof is constructive and permits to solve efficiently (28)-(31). Under assumption (H2), the direction p zα is first eliminated from the system (28)-(31) by using equations (28). It follows that where p zα = yα 1 H 1 α (b z α ( zα f + η)p yα y α p η ), (33) The resulting Schur complement system is S η p η + v α p yα = d η, (34) vα T p η (e T x α )p γ yα 1 (θ α w α )p yα = d yα, α = 1,..., π, (35) (e T x α )p yα = b γ, (36) 24

25 S η = y α H 1 α > 0, v α = Hα 1 ( zα f + η) z α, w α = ( zα f + η) T Hα 1 ( z α f + η), d η = Hα 1 b z α b η, d yα = y 1 α ( z α f + η) T Hα 1 b z α b yα. Under assumption (H2) again, the direction p η is now eliminated from the above linear system (34)-(36) by solving for the Schur complement S η in equation (34): p η = S 1 η ( d η which, in turn, gives the resulting system: ) v α p yα, (37) where S y p y + u p γ = h y, (38) u T p y = b γ, (39) S y = V T Sη 1 V + diag ( yα 1 (θ α w α ) ) with V = (v α ) R n π, u = X T e with X = (x α ) R n π, h y = (h yα ) R π p y = (p yα ) R π. with h yα = vα T S 1 η d η d yα In a neighborhood of a KKT point where zα f + η 0, w α = 0 and V = ( z α ) := Z, the Schur complement S y is approximately equal to S y Z T Sη 1 Z + diag ( ) yα 1 θ α, 25

26 Assumption (H3) implies that Z = (z α ) R n π is of full column rank if π < n s, which implies that S y is positive definite. But, the Schur complement S y is singular if π = n s, i.e., if the number of phases considered is equal to the number of substances in the system. Therefore, the range space method cannot be used for the solution to (38)-(39) and hence a null-space method is performed. One component of p y, say the π th one, is first eliminated by using equation (39) to express p y in terms of the reduced variable pỹ R π 1 : p 0 y = ( 0 1 u π b γ p y = p 0 y + Z upỹ, (40) where p 0 y is a particular solution of (39) and Z u is a null-space matrix of u T, defined here by ) ( ) I, Z u = n. 1 u π u T 1:π 1 Then, p y is replaced by pỹ in equation (38) and (38) is multiplied by Z T u, giving the following reduced system: Z T u S y Z u pỹ = Z T u h y Z T u S y p 0 y. (41) The reduced Schur complement Zu T S yz u is positive definite at least in a neighborhood of a KKT point where e T x α = 1 (so that u = X T e R n is equal to e R π and Z u = Z e ). It follows that Zu T S y Z u Ze T Z T Sη 1 ZZ e + Ze T diag ( ) yα 1 θ α Ze, which is positive definite if the matrix ZZ e = (z 1 z π,..., z π 1 z π ) is of full column rank, i.e., under assumption (H3). Then pỹ is obtained by solving (41) and the system (22)-(25) is solvable. pỹ = [ Z T u S yz u ] 1 ( Z T u h y Z T u S yp 0 y). (42) In summary, the linear system (22)-(25) is solved efficiently by obtaining successively pỹ by solving (42) which a Cholesky factorization, that gives p y with (40). Then the direction p γ is obtained by solving (38)-(39) by computing a left-inverse matrix u 1 of u. The variables p η, p zα and p xα are then obtained with (37), (34) and (32) successively, (33) being solved again with a Cholesky factorization. The new iterate is given by 26

27 y + α = y α + τp yα, α A, (43a) z + α = z α + τp zα, α A, (43b) ζ + α = ζ α + τp ζα, α A, (43c) η + = η + τp η, γ + = γ + τp γ, where τ is a step-length, bounded by one and given by (43d) (43e) τ = max { τ 1 : y α + τp yα 0, α A }, (44) For the phases α / A (inactive phases), the multiplier θ α + is computed by θ α + = g(x + α ) + (x + α ) T λ + and the inactive phase α is set to be active for the next iteration if θ α + < 0. 4 Numerical Results The primal-dual interior-point algorithm has been implemented into UHAERO, a general thermodynamic model that is designed to predict efficiently and accurately the phase and composition of atmospheric aerosols under a wide range of atmospheric conditions. Various numerical examples of phase equilibrium problems are considered here to illustrate the efficiency of the algorithm. This first example is widely studied in the literature as the n-butyl- Acetate-Water system, see for instance (Ref. 5). This system involves two components (C 6 H 12 O 2 and H 2 O), thus a maximum of two (liquid) phases at equilibrium (n s = 2, π 2). The temperature is 298 K, while the pressure equals 1 atm. The molar Gibbs free energy is obtained using the Unifac model (Ref. 27). The reduced molar Gibbs free energy f( ) is illustrated as a function of one variable in Fig. 1. It can be seen that f has two local minima corresponding to the two regions, in which f is convex. Note that the convex region associated with the pure component C 6 H 12 O 2 is very small, which renders the computation very difficult. For this example, the penalty parameter ν is set to zero, while the Newton method is performed with modifications of the active set A until convergence is obtained. The method is thus effectively an active set method. The initial 27

28 Figure 1. n-butyl-acetate-water System. Left: equilibrium solution for b = (0.02, 0.98) T (one phase); Right: equilibrium solution for b = (0.4, 0.6) T. Active phases are marked with circle symbol, while a cross symbol indicates an inactive phase. The position of the feed vector is given by the arrow Phase 1 (active) Phase 2 (inactive) Feed vector b 0.2 Phase 1 (active) Phase 2 (active) Feed vector b Gibbs Free Energy Gibbs Free Energy H 2 O C 6 H 12 O 2 H 2 O C 6 H 12 O 2 guess is x 1 = (ε, 1 ε), x 2 = (1 ε, ε), for a small ε > 0 given. This corresponds to the nearly pure components. The algorithm converges if the relative discrepancy between two consecutive iterates is smaller than a tolerance of The active phase identification procedure consists in deactivating the phase α if y α < For this example, a phase that is deactivated is never reactivated. At b = (0.02, 0.98) T, the equilibrium state involves only one phase. At b = (0.4, 0.6) T the equilibrium state involve two phases. In Fig. 1, the converging hyperplane associated to the KKT points is depicted. In the first case, it is not activated (dotted line), while it is activated when two phases are involved at the equilibrium (plain line). Table 2 gives the equilibrium solution for two different cases that can be compared with (Ref. 5). In the first case (b = (0.02, 0.98) T ), only one phase exists at the equilibrium and the equilibrium composition is given by x 1 = b. The algorithm converges in 9 iterations. In the second case (b = (0.4, 0.6) T ), two phases appear at the equilibrium and the algorithm converges in 10 iterations. The second example, extracted from (Ref. 27), is a liquid-liquid equilibrium calculation, namely the Water-n-Propanol-n-Hexane mixture. The 28

29 Table 2. n-butyl-acetate-water System: Equilibrium solutions for two feed vectors b. Components Feed Liquid I Liquid II C 6 H 12 O E E00 H 2 O E E-3 y α 0.100E E00 C 6 H 12 O E E00 H 2 O E E-2 y α 0.545E E00 system involves three components (H 2 O, C 3 H 7 OH and C 6 H 14 ), thus a maximum of three phases at equilibrium. The temperature in the system is 211 K and the pressure is 1 atm. The Gibbs free energy g is computed with the Unifac model and the reduced Gibbs free energy f shows three convex regions on 2, as illustrated in Fig. 2 where the level lines of the determinant of the Hessian 2 f are depicted. The penalty parameter ν is initialized here by ν 0 = E 3. One iteration of the Newton method is performed for each value ν k. The penalty parameter is updated with the rule ν k+1 = 0.7ν k. Let ε > 0 be a given small positive parameter The initial guess is x 1 = (ε, 1 2ε, ε), x 2 = (1 2ε, ε, ε), x 3 = (ε, ε, 1 2ε), which are close to the vertices of 2, see Fig. 2. The method converges if the relative discrepancy between two consecutive iterates, corresponding to ν k and ν k+1 is smaller than a tolerance of E-3. The active phase identification procedure consists in deactivating the phase α if y α < Again a phase that is deactivated is never reactivated for this example. To illustrate the sensibility of the phase equilibrium calculation to the Unifac parametrization, two different sets of parameters are taken from (Refs ). Table 3 and Fig. 3 show the optimal solution for b = (0.5, 0.1, 0.25) T (Ref. 5) for these two parameterizations. Note that the topology of the reduced Gibbs free energy is changed between the two parameterizations. The main feature of the algorithm consists in the fast and accurate recognition of the active phases at the equilibrium. Figure 4 illustrates the rate of convergence of the total number of moles y α, α = 1, 2, 3 for the two examples of Fig. 3. Results show that the detection of inactive phases (y α = 0) is fast 29

30 Figure 2. Water-n-Propanol-n-Hexane System. Contour plot of the determinant of the Hessian of the Gibbs Free energy. There are three convex regions, each of them containing one vertex of 2. H 2 O C 6 H C 3 H 7 OH Table 3. Water-n-Propanol-n-Hexane System. Equilibrium phases and value of the Gibbs free energy (GFE) for optimal solution for a given feed vector b = (0.5, 0.1, 0.25) T (normalized) using two different sets of parameters for the Unifac model (extracted from (Ref. 27) and (Ref. 28) respectively). Unifac Components Feed Liquid Liquid Liquid GFE parameters I II III (Ref. 27) H 2 O E E E C 3 H 7 OH E E E-1 C 6 H E E E-3 y α 0.106E E (Ref. 28) H 2 O E E C 3 H 7 OH E E-1 C 6 H E E-4 y α 0.387E E00 0.0E00 30

Dynamic Optimization in Air Quality Modeling

Dynamic Optimization in Air Quality Modeling A. Caboussat Department of Mathematics, University of Houston Houston, Texas 204-3008 caboussat@math.uh.edu http://aero.math.uh.edu Project supported by the