Optimality Functions and Lopsided Convergence

Similar documents
Lopsided Convergence: an Extension and its Quantification

Lopsided Convergence: an Extension and its Quantification

VARIATIONAL THEORY FOR OPTIMIZATION UNDER STOCHASTIC AMBIGUITY

Optimization and Optimal Control in Banach Spaces

Stability of efficient solutions for semi-infinite vector optimization problems

MULTIVARIATE EPI-SPLINES AND EVOLVING FUNCTION IDENTIFICATION PROBLEMS

New Class of duality models in discrete minmax fractional programming based on second-order univexities

AW -Convergence and Well-Posedness of Non Convex Functions

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Optimality Conditions for Constrained Optimization

Epiconvergence and ε-subgradients of Convex Functions

Subdifferential representation of convex functions: refinements and applications

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

WEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

DUALIZATION OF SUBGRADIENT CONDITIONS FOR OPTIMALITY

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

Rate of Convergence Analysis of Discretization and. Smoothing Algorithms for Semi-Infinite Minimax. Problems

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

Extremal Solutions of Differential Inclusions via Baire Category: a Dual Approach

A quasisecant method for minimizing nonsmooth functions

NONSMOOTH ANALYSIS AND PARAMETRIC OPTIMIZATION. R. T. Rockafellar*

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

A convergence result for an Outer Approximation Scheme

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

REPORT DOCUMENTATION PAGE

On duality theory of conic linear problems

Constrained Optimization and Lagrangian Duality

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

On Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing

Journal of Inequalities in Pure and Applied Mathematics

Solving generalized semi-infinite programs by reduction to simpler problems.

Implications of the Constant Rank Constraint Qualification

Keywords. 1. Introduction.

On Nonconvex Subdifferential Calculus in Banach Spaces 1

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Introduction to Correspondences

arxiv: v2 [cs.gt] 1 Dec 2018

Dedicated to Michel Théra in honor of his 70th birthday

Douglas-Rachford splitting for nonconvex feasibility problems

Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Continuous Sets and Non-Attaining Functionals in Reflexive Banach Spaces

Relationships between upper exhausters and the basic subdifferential in variational analysis

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

On smoothness properties of optimal value functions at the boundary of their domain under complete convexity

arxiv: v2 [math.st] 21 Jan 2018

Metric Spaces and Topology

Algorithms for Nonsmooth Optimization

Preprint Stephan Dempe and Patrick Mehlitz Lipschitz continuity of the optimal value function in parametric optimization ISSN

Near-Potential Games: Geometry and Dynamics

LINEAR-CONVEX CONTROL AND DUALITY

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

COSMIC CONVERGENCE. R.T. Rockafellar Departments of Mathematics and Applied Mathematics University of Washington, Seattle

Stationarity and Regularity of Infinite Collections of Sets. Applications to

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

A Proximal Method for Identifying Active Manifolds

Preprint ISSN

Constructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space

Asymptotics of minimax stochastic programs

Constraint qualifications for convex inequality systems with applications in constrained optimization

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

Yuqing Chen, Yeol Je Cho, and Li Yang

On the Midpoint Method for Solving Generalized Equations

MEAN CURVATURE FLOW OF ENTIRE GRAPHS EVOLVING AWAY FROM THE HEAT FLOW

Convex Functions. Pontus Giselsson

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Neighboring feasible trajectories in infinite dimension

Maximal Monotone Inclusions and Fitzpatrick Functions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Jørgen Tind, Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen O, Denmark.

Mathematical Programming Involving (α, ρ)-right upper-dini-derivative Functions

VARIATIONAL CONVERGENCE OF BIFUNCTIONS: MOTIVATING APPLICATIONS

Convex Analysis and Economic Theory Winter 2018

RANDOM APPROXIMATIONS IN STOCHASTIC PROGRAMMING - A SURVEY

Empirical Processes: General Weak Convergence Theory

Chapter 2 Convex Analysis

An inexact subgradient algorithm for Equilibrium Problems

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Asymptotic Convergence of the Steepest Descent Method for the Exponential Penalty in Linear Programming

Semi-infinite programming, duality, discretization and optimality conditions

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

FULL STABILITY IN FINITE-DIMENSIONAL OPTIMIZATION

An introduction to Mathematical Theory of Control

SOME PROPERTIES ON THE CLOSED SUBSETS IN BANACH SPACES

Optimality, identifiability, and sensitivity

General Theory of Large Deviations

Statistical Estimation: Data & Non-data Information

THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION

Transcription:

Optimality Functions and Lopsided Convergence Johannes O. Royset Operations Research Department Naval Postgraduate School joroyset@nps.edu Roger J-B Wets Department of Mathematics University of California, Davis rjbwets@ucdavis.edu Abstract. Optimality functions pioneered by E. Polak characterize stationary points, quantify the degree with which a point fails to be stationary, and play central roles in algorithm development. For optimization problems requiring approximations, optimality functions can be used to ensure consistency in approximations, with the consequence that optimal and stationary points of the approximate problems indeed are approximately optimal and stationary for an original problem. In this paper, we review the framework and apply it to nonlinear programming. This results in a convergence result for a primal interior point method without constraint qualifications or convexity assumptions. Moreover, we introduce lopsided convergence of bifunctions on metric spaces and show that this notion of convergence is instrumental in establishing consistency of approximations. Lopsided convergence also leads to further characterizations of stationary points under perturbations and approximations. Dedication. We dedicate this paper to our long-time friend, colleague, collaborator, and advisor Elijah (Lucien) Polak in honor of his 85th birthday. We wish him fair weather and following snow conditions. Keywords: epi-convergence, lopsided convergence, consistent approximations, optimality functions, optimality conditions, interior point method AMS Classification: Date: March 16, 015 This material is based upon work supported in part by the U. S. Army Research Laboratory and the U. S. Army Research Office under grant numbers 00101-80683, W911NF-10-1-046 and W911NF-1-1-073. 1

1 Introduction It is well-known that optimality conditions are central to both theoretical and computational advances in optimization. They were developed over centuries starting with the pioneering works of Bishop N. Oresme (14th century) and P. de Fermat (17th century), and brought to their modern form by Karush, John, Kuhn, Tucker, Polak, Mangasarian, Fromowitz, and many others. In this paper, we discuss quantification of first-order necessary optimality conditions in terms of optimality functions as developed by E. Polak and co-authors; see [1] for numerous examples in nonlinear programming, semiinfinte optimization, and optimal control as well as [14, 17, 5, 9] for recent applications in stochastic and semiinfinte programming, nonsmooth optimization, and control of uncertain systems. It is apparent that how far a set of equalities, inequalities, and inclusions are from being satisfied can be quantified in numerous ways. The framework of optimality functions, as laid out in [1, Section 3.3] and references therein, stipulates axiomatic requirements that such quantifications should satisfy to facilitate the study and computation of approximate stationary points. Specifically, for an optimization problem that can only be solved through the solution of an approximating problem, one seeks to determine whether a near-stationary point of the approximating problem is an approximate stationary point of the original problem. The requirements on optimality functions exactly ensure this property. Moreover, there is ample empirical indications and some theoretical evidence (see for example [17, 16, 15, 8]) that computational benefits accrue from approximately solving a sequence of approximating problems with increasing fidelity, each warm-started with the previously obtained point. Optimality functions are tools to carry out such a scheme and give rise to adaptive rules for determining the timing of switches to higher-fidelity approximations. Consequently, the framework of optimality functions provides a pathway to constructing implementable algorithms consisting only of a finite number of arithmetic operations and function evaluations. In this paper, we review the notion of optimality functions and illustrate the vast number of possibilities through several examples. In a novel application to nonlinear programming, we establish the convergence of a primal interior point method in the absence of constraint qualifications and convexity assumptions. We show that lopsided convergence of bifunctions [6, 7] is a useful tool for analyzing optimality functions and the associated stationary points. In particular, we prove that lopsided convergence of certain bifunctions, defining optimality functions of approximating problems, to a bifunction associated with an optimality function of the original problem, guarantees the axiomatic requirements on optimality functions. Moreover, we provide characterizations of stationary points under perturbations and approximations using lopsided convergence. In the process, we extend the primary definitions and results on lopsided convergence in [6, 7] from finite dimensions to any metric space. The paper is organized as follows. Section defines optimality functions and gives several examples. Section 3 introduces approximating optimization problems, epi-convergence, and consistent approximations as defined by corresponding optimality functions, and demonstrates the implication for algorithmic development. Section 4 develops lopsided convergence in the context of metric spaces. The paper ends by utilizing lopsided convergence in the context of optimality functions. The distinction between implementable and conceptual algorithms appears to be due to E. Polak [11, 10].

Optimality Functions: Definitions and Examples We consider optimization problems defined on a metric space (X, ρ X ), where C X is a nonempty feasible set and f : C IR an objective function, i.e., problems of the form minimize f(x) subject to x C X. The function f might be defined and finite-valued outside C, but that will be immaterial to the following treatment. Therefore, the notation f : C IR specifies the components f and C of optimization problems of this form, without implying that f is necessarily finite only on C. We denote by inf C f [, ) and argmin C f C the corresponding optimal value and set of optimal points, respectively, the latter possibly being empty. For ε 0, the set of ε-optimal solutions is denoted by ε- argmin C f = {x C f(x) inf C f + ε}. As usually, we say that x X is locally optimal (for f : C IR) if there exists a δ > 0 such that f(x ) f(x) for all x C with ρ X (x, x ) δ. Throughout the paper, we have that C is a nonempty subset of X and IR = [, 0]. We characterize stationary points in terms of optimality functions as defined next..1 Definition (optimality function) An upper semicontinuous function θ : X IR is an optimality function for f : C IR if C X X and x C locally optimal for f : C IR = θ(x) = 0. The corresponding sets of stationary points and quasi-stationary points are S C,θ = {x C θ(x) = 0} and Q θ = {x X θ(x) = 0}, respectively. A series of examples help illustrate the concept; see also 5 and [1, 14, 17, 5, 9]. Example 1: Constrained Optimization over Convex Set. Consider the case X = IR n, C X closed and convex, and f : C IR continuously differentiable. Then, the function θ(x) = min { f(x), y x + 1 } y y C x, x X = C, satisfies the requirements of Definition.1 and is therefore an optimality function for f : C IR. If C = IR n, then the expression simplifies to which, of course, corresponds to the classical condition f(x) = 0. θ(x) = 1 f(x), (1) Example : Nonlinear Programming. Consider the case X = IR n, C = {x IR n f j (x) 0, j = 1,..., q}, and f, f 1,..., f q real-valued and continuously differentiable on IR n. Let ψ(x) = max,...,q f j (x) 3

and ψ + (x) = max{0, ψ(x)}. Then, the function { θ(x) = min max ψ + (x) + f(x), y x + 1 y IR n y x, max {f j(x) ψ(x) + + f j (x), y x } + 1 y x,...,q }, x X = IR n, () satisfies the requirements of Definition.1 and is therefore an optimality function for f : C IR. The condition θ(x) = 0 is equivalent to the Fritz-John conditions in the sense that when x C, θ(x) = 0 there exist µ 0, µ 1,..., µ q 0, with such that µ 0 f(x) + q µ j = 1, j=0 q µ j f j (x) = 0 q µ j f j (x) = 0. However, since θ is defined beyond C, it might also be associated with quasi-stationary points outside C. We refer to [1, Theorem..8] for proofs and further discussion. Example 3: Minimax Problem. Consider the case X = C = IR n and f(x) = max z Z φ(x, z), x IR n, where φ : IR n IR p IR is continuous, the gradient x φ : IR n IR p IR n with respect to the first argument exists and is continuous in both arguments, and Z is a compact subset of IR p. Then, the function θ(x) = min max y IR n z Z {φ(x, z) f(x) + x φ(x, z), y x + 1 y x }, x X = IR n, (3) satisfies the requirements of Definition.1 and is therefore an optimality function for f : IR n IR. Moreover, θ(x) = 0 if and only if 0 f(x) (the subdifferential of f); see [1, Theorem 3.1.6] for details. We note that the upper semicontinuity of optimality functions ensures the computationally significant property that if a sequence x ν x and θ(x ν ) 0, for example with {x ν } obtained as approximate solutions of a corresponding optimization problem with gradually smaller tolerance, then θ(x) = 0 and x Q θ, i.e., x is quasi-stationary. Although not discussed further here, the optimality functions in Examples 1-3, and others, are also instrumental in constructing descent directions for the respective optimization problems; see [1] for details. 3 Approximations and Implementable Algorithms Problems involving functions defined in terms of integrals or optimization problems (as the maximization in Example 3), functions defined on infinite-dimensional spaces, and/or feasible sets defined by an infinite number of constraints almost always require approximations. For example, one might 4

resort to an approximating space X ν X with points characterized by a finite number of parameters. Here, the superscript ν indicates that we might consider a family of such approximating spaces, ν IN = {1,,..., }, with usually ν IN X ν dense in X. A feasible set C ν X ν may be an approximation of C or simply C ν = C X ν ; see 5 for a concrete illustration in the area of optimal control. A function f ν : C ν IR could be a tractable approximation of f : C IR. An example helps illustrate the situation. Example 3: Minimax Problem (cont.). Suppose that f ν (x) = max z Z ν φ(x, z), x IR n, with Z ν Z consisting of a finite number of points. Clearly, f ν is a (lower bounding) approximation of f = max z Z φ(, z) as defined above. The function f ν : IR n IR can be associated with the optimality function θ ν (x) = min max y IR n z Z ν {φ(x, z) f ν (x) + x φ(x, z), y x + 1 y x }, x X = IR n, which, as formalized in 5, approximates the optimality function θ in (3). We note that θ ν can be evaluated in finite time by solving a convex quadratic program with linear constraints; see [1, Theorem.1.6]. We next examine approximating functions f ν : C ν IR and review the notion of epi-convergence, which provides a path to establishing that optimal points of the corresponding approximating problems indeed approximate optimal points of an original problem. To establish the analogous results for stationary points, we turn to optimality functions and slightly extend the approach in [1, Section 3.3] by considering arbitrary metric spaces and other minor generalizations. The section ends with a result that facilitates the development of implementable algorithms for the minimization of f : C IR, which is then illustrated with the construction of an interior point method. Throughout the paper, we have that C ν is a nonempty subset of X. 3.1 Epi-Convergence We recall that epi-convergence, as defined next, is the key property when examining approximations of optimization problems; see [1, 3, 13] for more comprehensive treatments. 3.1 Definition (epi-convergence) The functions {f ν : C ν IR} ν IN epi-converge to f : C IR if (i) for every sequence x ν x X, with x ν C ν, we have that liminf f ν (x ν ) f(x) if x C and f ν (x ν ) otherwise; (ii) for every x C, there exists a sequence {x ν } ν IN, with x ν C ν, such that x ν x and limsup f ν (x ν ) f(x). A main consequence of epi-convergence is the following well-known result. 3. Theorem (convergence of minimizers) Suppose that the functions {f ν : C ν IR} ν IN epi-converge to f : C IR. Then, limsup (inf C ν f ν ) inf C f. Moreover, if x k argmax C ν k f ν k and x k x for some increasing subsequence {ν 1, ν,...} IN, then x argmax C f and lim k inf C ν k f ν k = inf C f. 5

Proof. The second part is essentially in [, Theorem.5], but not for the finite-valued setting. The first and second parts are in [6, Theorem.6] for the IR n case. The proof carries over essentially verbatim. A strengthening of the notion of epi-convergence ensures the convergence of infima. 3.3 Definition (tight epi-convergence) The functions {f ν : C ν IR} ν IN epi-converge tightly to f : C IR if f ν epi-converge to f and for all ε > 0, there exists a compact set B ε X and an integer ν ε such that inf Bε C ν f ν inf C ν f ν + ε for all ν ν ε. 3.4 Theorem (convergence of infima) Suppose that the functions {f ν : C ν IR} ν IN epi-converge to f : C IR and inf C f is finite. Then, they epi-converge tightly (i) if and only if inf C ν f ν inf C f. (ii) if and only if there exists a sequence ε ν 0 such that ε ν - argmin C ν f ν set-converges to argmin f. C Proof. Again, the proof in [6, Theorem.8] can be immediately translated to the present setting. 3. Consistent Approximations The convergence of optimal points as stipulated above is fundamental, but an analogous result for stationary points is also important, especially for nonconvex problems. Optimality functions play a central role in the development of such results. Combining epi-convergence with a limiting property for optimality functions lead to consistent approximations in the sense of E. Polak as defined next. We note that our definition of consistent approximations is an extension from that in [1, Section 3.3] as we consider arbitrary metric spaces and not only normed linear spaces. 3.5 Definition (consistent approximations) The pairs {(f ν : C ν IR, θ ν : X ν IR )} ν IN of functions and corresponding optimality functions are weakly consistent approximations of the function and optimality-function pair (f : C IR, θ : X IR ) if (i) {f ν : C ν IR} ν IN epi-converge to f : C IR and (ii) for every x ν x X, with x ν X ν, limsup θ ν (x ν ) θ(x) if x X, and θ ν (x ν ) otherwise. If in addition θ ν (x) < 0 for all x X ν \C ν and ν, then {(f ν : C ν IR, consistent approximations of (f : C IR, θ : X IR ). θ ν : X ν IR )} ν IN are We recall that the epigraph of f : C IR is defined by epi f = {(x, x 0 ) X IR x C, f(x) x 0 }. We recall that the outer limit of a sequence of sets {A ν } ν IN, denoted by limsup A ν, is the collection of points y to which a subsequence of {y ν } ν IN, with y ν A ν, converges. The inner limit, denoted by liminf A ν, is the points to which a sequence of {y ν } ν IN, with y ν A ν, converges. If both limits exist and are identical, we say that the set is the Painlevé-Kuratowski limit of {A ν } ν IN and that A ν set-converges to this set; see [4, 13]. 6

Since epi-convergence is equivalent to the set-convergence of the corresponding epigraphs, we have that Definition 3.5(i) amounts to epi f ν set-converges to f. Similarly, the hypograph of f : C IR is defined by hypo f = {(x, x 0 ) X IR x C, f(x) x 0 }. In view of the definition of set-convergence, we therefore have that Definition 3.5(ii) amounts to limsup hypo θ ν hypo θ. The additional condition in Definition 3.5 removing weakly can be viewed as a constraint qualification as it eliminates the possibility of quasi-stationary points that are not stationary point for f ν : C ν IR, which might occur if the domain of θ ν is not restricted to C ν or other conditions are included. The main consequence of consistency is given next. 3.6 Theorem (convergence of stationary points) Suppose that the pairs {(f ν : C ν IR, θ ν : X ν IR )} ν IN are weakly consistent approximations of (f : C IR, θ : X IR ) and {x ν } ν IN, x ν X ν, is a sequence satisfying θ ν (x ν ) ε ν for all ν, with ε ν 0 and ε ν 0. Then, every cluster point x of {x ν } ν IN satisfies x Q θ, i.e., θ(x) = 0. If in addition the pairs are consistent approximations, ε ν = 0 for sufficiently large ν, and {f ν (x ν )} ν IN is bounded from above, then x C, i.e., x S C,θ. Proof. Suppose that x ν x. Since ε ν θ ν (x ν ) for all ν, x X. Moreover, 0 limsup θ ν (x ν ) θ(x) 0 and the first conclusion follows. In view of the definition of consistent approximations, we find that θ ν (x ν ) = 0 for sufficiently large ν and therefore x ν C ν for such ν. The epi-convergence of f ν : C ν IR to f : C IR implies that liminf f ν (x ν ) f(x) if x C and f ν (x ν ) if x C. The latter possibility is ruled about by assumption and therefore x C. 3.3 Algorithms Theorem 3.6 provides a direct path to the construction of an implementable algorithm for minimizing f : C IR. Specifically, construct a family of approximations {f ν : C ν IR} and corresponding optimality functions {θ ν : X ν IR }, and then implement the following algorithm. Algorithm. 1. Set ε ν 0, with ε ν 0, and ν = 1.. Obtain an approximate (quasi-)stationary point x ν for f ν : C ν IR that satisfies θ ν (x ν ) ε ν. Here, we consider set-convergence of subsets of X IR, which is equipped with the metric ρ((x, x 0 ), (x, x 0)) = max{ρ X (x, x ), x 0 x 0 } for x, x X and x 0, x 0 IR. 7

3. Replace ν by ν + 1 and go to Step. If the pairs {(f ν : C ν IR, θ ν : X ν IR )} ν IN are weakly consistent approximations of (f : C IR, θ : X IR ), then every cluster point of the constructed sequence {x ν } will be quasi-stationary for f : C IR by Theorem 3.6. The algorithm is fully implementable under the practically reasonable assumption that one can obtain an approximate quasi-stationary point of f ν : C ν IR in finite time. Example : Nonlinear Programming (cont.). Consider the standard logarithmic barrier approximation q f ν (x) = f(x) t ν log[ f j (x)], x C ν = {x IR n f j (x) < 0, j = 1,..., q}, where t ν 0. We first establish epi-convergence of f ν : C ν IR to f : C IR. Suppose that x ν x, with x ν C ν. Since C ν C and C is closed, x C. Let ε > 0. There exists a ν ε such that t ν log[ f j (x ν )] > ε/q for all j with log[ f j (x)] 0 and ν ν ε. Hence, f ν (x ν ) f(x ν ) ε for all ν ν ε. In view of the continuity of f and the fact that ε is arbitrary, we conclude that Definition 3.1(i) is satisfied. Next, let x C. There exists a sequence {x ν } ν IN such that x ν C ν tends to x sufficiently slowly such that t ν q log[ f j(x ν )] 0. Consequently, f ν (x ν ) f(x), which satisfies Definition 3.1(ii). Thus, f ν : C ν IR epi-converge to f : C IR. We next analyze optimality functions. Using a minmax theorem, one can show that () is equivalently expressed as q θ(x) = min µ M µ 0ψ + (x) + µ j [ψ + (x) f j (x)] + 1 q µ 0 f(x) + µ j f j (x), x X = IRn (4) where M = {(µ 0, µ 1,..., µ q ) µ j 0, j = 0, 1,..., q, q j=0 = 1}; see [1, Theorem..8]. By (1) and direct differentiation of f ν, we obtain an approximating optimality function θ ν (x) = 1 q f(x) + m ν j (x) f j (x), x C ν, where m ν j (x) = tν f j (x). Suppose that x ν x IR n, with x ν C ν. Since x ν C ν C and C is closed, x C. Let c ν = 1 + q m ν j (x ν ), µ ν 0 = 1 c ν, and µν j = mν j (xν ) c ν, j = 1,..., q. Consequently, µ ν = (µ ν 0, µν 1,..., µν q ) M for all ν. Since M is compact, {µ ν } has at least one convergent subsequence. Suppose that µ ν N µ, with N an infinite subsequence of IN. If j is such that f j (x) < 0, 8

then µ ν j N 0 and consequently µ j = 0 necessarily. In view of the continuity of the gradients, we then have that θ ν (x ν ) (c ν ) = 1 1 q m ν c ν f(xν j ) + (xν ) c ν f j (x ν ) N 1 q µ 0 f(x) + µ j f j (x). Since x C, ψ + (x) = 0. Therefore we also have that θ ν (x ν ) (c ν ) N µ 0 ψ + (x) q µ j [ψ + (x) f j (x)] 1 µ 0 f(x) + q µ j f j (x) θ(x), where the inequality follows from the fact that µ M furnishes a possibly suboptimal solution in (4). Because θ ν (x ν ) 0 and (c ν ) 1, the inequality remains valid when we drop the denominator on the left-hand side. Hence, we have shown that limsup θ ν (x ν ) θ(x). This establishes the consistency of {f ν : C ν IR, θ ν : C ν IR )}. Consequently, the above algorithm, which can then be viewed as a primal interior point method, generates cluster points that are stationary for f : C IR in the sense of Fritz-John. We observe that this is achieved without any constraint qualifications and convexity assumptions. In this case, Step of the algorithm can be achieved by any of the standard unconstrained optimization methods in finite time. The key technical challenge associate with the above scheme is to establish (weak) consistency. In the next section, we provide tools for this purpose that rely on lopsided convergence. 4 Lopsided Convergence In view of the definition of optimality functions, it is apparent that if Q θ, then Q θ = argmax X θ. Moreover, Examples 1-3 indicate that many optimality functions take the form θ(x) = inf F (x, y), with Y Y (5) y Y for some metric space (Y, ρ Y ) and function F. In fact, in our examples, Y = IR n and F involves gradients and other quantities; 5 provides an example in infinite dimensions. From these observations it is apparent that the consideration of maxinf-problems of the form max inf F (x, y) x X y Y for bifunction F : X Y IR will provide direct insight about (quasi-)stationary points of optimization problems. We therefore set out to describe the fundamental tool for examining the convergence of such maxinf-problems, which is lopsided convergence. In the process, we extend some of the results in [6, 7] to general metric spaces. 9

Suppose that (X, ρ X ) and (Y, ρ Y ) are metric spaces, X X and Y F : X Y IR is a bifunction. We say that x is a maxinf-point of F if { } x argmax inf F (x, y). x X y Y Y are nonempty, and The study of such functions is facilitated by the notion of lopsided convergence as defined next. 4.1 Definition (lopsided convergence) The bifunctions {F ν : X ν Y ν IR} ν IN lop-converge to F : X Y IR if (i) for all y Y and x ν x X, with x ν X ν, there exists y ν y, with y ν Y ν, such that limsup F ν (x ν, y ν ) F (x, y) if x X and F ν (x ν, y ν ) otherwise. (ii) for all x X, there exists x ν x, with x ν X ν, such that for all y ν y Y, with y ν Y ν, liminf F ν (x ν, y ν ) F (x, y) if y Y and F ν (x ν, y ν ) otherwise. We assume throughout that the sets X ν X and Y ν Y are nonempty. We start with a preliminary result. 4. Proposition (epi-convergence of slices) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge to F : X Y IR. Then, for all x X, there exists x ν x, with x ν X ν such that the functions F ν (x ν, ) epi-converge to F (x, ). Proof. We follow the same arguments as in [6, Proposition 3.], where X = IR n is considered. From Definition 4.1(ii) there exists x ν x, with x ν X ν, such that the functions {F ν (x ν, )} ν IN and F (x, ) satisfy Definition 3.1(i). From Definition 4.1(i), for any y Y and x ν x, with x ν X ν, one can find y ν y, with y ν Y ν, such that Definition 3.1(ii) is also satisfied. We recall that the inf-projections of the bifunctions F ν : X ν Y ν IR and F : X Y IR are defined as the functions h(x) = inf y Y F (x, y), for x X, and hν (x) = inf y Y ν F ν (x, y), for x X. In addition to their overall interest, inf-projections of bifunctions are central to the study of optimality functions as clearly highlighted by (5). 4.3 Theorem (containment of inf-projections) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge to F : X Y IR and < inf Y F (x, ) for some x X. Then, the inf-projections h ν : X ν [, ) and h : X [, ) satisfy limsup hypo h ν hypo h. Proof. Suppose that (x, x 0 ) limsup hypo h ν. Then there exists a sequence {(x ν, x ν 0 )} ν N, with N an infinite subsequence of IN, x ν X ν, h ν (x ν ) x ν 0, xν N x, and x ν 0 N x 0. If x X, then take y Y and construct a sequence y ν y, with y ν Y ν, such that F ν (x ν, y ν ) N, which exists by Definition 4.1(i). However, x ν 0 h ν (x ν ) F ν (x ν, y ν ), ν N, 10

imply to a contradiction since x ν 0 N x 0 IR. Thus, x X. If h(x) =, then there exists y Y such that F (x, y) x 0 1. Definition 4.1(i) ensures that there exists a sequence y ν y, with y ν Y ν, such that limsup F ν (x ν, y ν ) F (x, y). Consequently, x 0 = limsup ν N x ν 0 limsup ν N h ν (x ν ) limsup ν N F ν (x ν, y ν ) F (x, y) x 0 1, which is a contradiction. Hence, it suffices to consider the case with h(x) finite. Given any ε > 0 arbitrarily small, pick y ε Y such that F (x, y ε ) ε h(x). Then Definition 4.1(i) again yields y ν y ε, with y ν Y ν, such that limsup ν N h ν (x ν ) limsup ν N F ν (x ν, y ν ) F (x, y ε ) h(x) + ε, implying limsup ν N h ν (x ν ) h(x). Since the conclusion follows. x 0 = limsup ν N x ν 0 limsup ν N h ν (x ν ) h(x), Additional results can be obtained under a strengthening of the lopsided convergence analogous to tight epi-convergence. 4.4 Definition (ancillary-tight lop-convergence) The lop-convergence of bifunctions {F ν : X ν Y ν IR} ν IN to F : X Y IR is ancilliary-tight if Definition 4.1 holds and for any ε > 0 one can find a compact set B ε Y, depending possibly on the sequence x ν x selected in Definition 4.1(ii), such that inf F ν (x ν, y) inf F ν (x ν, y) + ε for sufficiently large ν. y Y ν B ε y Y ν Under ancillary-tight lop-convergence, we can strengthen the conclusion of Theorem 4.3 as follows. 4.5 Theorem (hypo-convergence of inf-projections) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge ancillary-tightly to F : X Y IR and < inf Y F (x, ) for some x X. Then, the corresponding inf-projections h ν : X ν [, ) hypo-converges to the inf-projection h : X [, ), i.e., hypo h ν set-converges to hypo h. Proof. Since it is a very short proof, we include it for completeness sake. It is verbatim the same as that of [6, Theorem 3.4]. Let x X be such that h(x) is finite. Now, choose x ν x, with x ν X ν, such that F ν (x ν, ) epi-converge to F (x, ), cf. Proposition 4.. In fact, they epi-converge tightly as an immediate consequence of ancillary-tightness. Thus, via Theorem 3.4. h ν (x ν ) = inf F ν (x ν, y ν ) inf F (x, y) = h(x), y Y ν y Y Further strengthening of the notion is also beneficial. 4.6 Definition (tight lop-convergence) The lop-convergence of bifunctions {F ν : X ν Y ν IR} ν IN to F : X Y IR is tight if Definition 4.4 holds and for any ε > 0 one can find a compact set A ε X such that sup inf F ν (x, y) sup inf F ν (x, y) ε for sufficiently large ν. y Y ν y Y ν x X ν A ε x X ν 11

Under tight lop-convergence, we can strengthen the conclusion of Theorem 4.5 as follows. 4.7 Theorem (approximating maxinf-points) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge tightly to F : X Y IR and sup X inf Y F is finite. Then, sup x X ν inf F ν (x, y) sup y Y ν x X y Y inf F (x, y). Moreover, for every x argmax x X inf y Y F (x, y), there exist an infinite subsequence N of IN, {ε ν } ν N, with ε ν 0, and {x ν } ν N, with x ν ε ν - argmax x X ν inf y Y ν F ν (x, y), such that x ν N x. Conversely, if such sequences exists, then sup x X ν inf y Y ν F ν (x, y) N inf y Y F (x, ) Proof. We refer to the arguments of the proof of [6, Theorem 3.7]. 5 Applications and Further Examples We now return to the context of optimality functions of the form (5). We start with providing a sufficient condition for the required upper semicontinuity of an optimality function; see Definition.1. We state the result for general inf-projections. 5.1 Theorem (upper semicontinuity of inf-projection) If the bifunction F : X Y IR is lower semicontinuous, F (, y) is upper semicontinuous for all y Y, and Y is closed, then the corresponding inf-projection h(x) = inf Y F (x, ), x X, is upper semicontinuous. Proof. Let x ν x X, with x ν X. We show that the functions F (x ν, ) epi-converge to F (x, ), which are all defined on Y. Suppose that y ν y, with y ν Y. By the closedness of Y, y Y. Moreover, liminf F (x ν, y ν ) F (x, y) by the lower semicontinuity of F at (x, y). Consequently, part (i) of Definition 3.1 is satisfied. Next, let y ν = y Y. Then, limsup F (x ν, y ν ) = limsup F (x ν, y) F (x, y) by the upper semicontinuity of F (, y) and part (ii) of Definition 3.1 is satisfied. Since F (x ν, ) epiconverge to F (x, ), Theorem 3. demonstrates that limsup h(x ν ) h(x). Applications of this theorem to Examples 1-3 establish the upper semicontinuity of the corresponding optimality functions. We next turn to the requirement for consistency in Definition 3.5(ii). 5. Theorem (sufficient condition for consistency, optimality function part) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge to F : X Y IR and that the bifunctions define the optimality functions θ ν = inf y Y ν F ν (, y) and θ = inf y Y F (, y), with < θ(x) for some x X. Then, for every x ν x X, with x ν X ν, limsup θ ν (x ν ) θ(x) if x X, and θ ν (x ν ) otherwise. Proof. The result is a direct consequence of Theorem 4.3. In view of this result, it is clear that (weak) consistency will be ensured by epi-convergence of the approximating objective functions and feasible sets as well as lopsided convergence of the approximating bifunctions defining the corresponding optimality functions. 1

We illustrate Theorem 5. in the context of Example 3. Example 3: Minimax Problem (cont.). Suppose that for every z Z, there exists a sequence z ν Z ν such that z ν z. Let F ν (x, y) = max {φ(x, z) f ν (x) + x φ(x, z), y x + 1 } y z Z x, x, y IR n, ν and F be defined similarly with the superscripts removed. We next show lopsided convergence of F ν to F. First consider part (i) of Definition 4.1. Let y IR n and x ν x IR n. Set y ν = y for all ν. Clearly, limsup F ν (x ν, y ν ) limsup F (x ν, y) = F (x, y) by the continuity of F and part (i) holds. Second, we consider part (ii). Let x IR n and y ν y IR n. Set x ν = x for all ν. Let { z x argmax z Z φ(x, z) f(x) + x φ(x, z), y x + 1 } y x. Let ε > 0. By assumption on Z ν and the continuity of φ(x, ) and x φ(x, ), there exists z ν Z ν and ν 0 such that φ(x, z ν ) φ(x, z x ) > ε for all ν ν 0. Consequently, ν ν 0, x φ(x, z ν ) x φ(x, z x ) < min { ε, } ε y x F ν (x ν, y ν ) = F ν (x, y ν ) = max {φ(x, z) f ν (x) + x φ(x, z), y ν x + 1 } z Z yν x ν φ(x, z ν ) f(x) + x φ(x, z ν ), y ν x + 1 yν x = φ(x, z x ) f(x) + x φ(x, z x ), y x + 1 y x + φ(x, z ν ) φ(x, z x ) + x φ(x, z ν ) x φ(x, z x ), y x + x φ(x, z ν ), y ν y + 1 yν x 1 y x > φ(x, z x ) f(x) + x φ(x, z x ), y x + 1 y x ε ε + x φ(x, z ν ), y ν y + 1 yν x 1 y x = F (x, y) ε + x φ(x, z ν ), y ν y + 1 yν x 1 y x Since y ν y, {z ν } is bounded, and x φ is continuous, it follows that liminf F ν (x ν, y ν ) F (x, y) ε. Since ε was arbitrary, part (ii) of Definition 4.1 holds and F ν therefore lop-converge to F. In view of Theorem 5. and the fact that epi-convergence is also easily established, we have that {(f ν : IR n IR, θ ν : IR n IR )} are consistent approximations of {(f : IR n IR, θ : IR n IR )} in this case. The above algorithm therefore is implementable for the solution of the semiinfinite minimax problem min x IR n max z Z φ(x, z). 13

As we see next, under slightly stronger assumptions, the approximating bifunctions do not need to be associated with an optimality function to achieve convergence to quasi-stationary points. 5.3 Theorem (convergence to quasi-stationary points) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge ancillary-tightly to F : X Y IR and θ = inf y Y F (, y) is an optimality function for f : C IR with Q θ. If x ν argmax x X ν inf y Y ν F ν (x, y) for all ν, then every cluster point x of {x ν } ν IN is quasi-stationary for f : C IR, i.e., x Q θ. Proof. By Theorem 4.5, h ν = inf y Y ν F ν (, y) hypo-converge to h = inf y Y F (, y). By translating Theorem 3. from the stated minimization framework to a maximization framework we obtain that x argmax X h. Since Q θ, Q θ = argmax X h and the conclusion follows. Further characterization of (quasi-)stationary points is available under tight lopsided convergence. 5.4 Theorem (characterization of quasi-stationary points) Suppose that the bifunctions {F ν : X ν Y ν IR} ν IN lop-converge tightly to F : X Y IR and θ = inf y Y F (, y) is an optimality function for f : C IR with Q θ. For every x Q θ there exist an infinite subsequence N of IN, {ε ν } ν N, with ε ν 0, and {x ν } ν N, with x ν ε ν - argmax x X ν inf y Y ν F ν (x, y), such that x ν N x. Proof. The result is a direct consequence of Theorem 4.7. We end the paper with an example from the area of optimal control and adjust the notation accordingly. Example 4: Optimal Control. We here follow the set-up in Section 5.6 and Chapter 4 of [1], which contain further details. For g : IR n IR m IR n, we consider the dynamical system ẋ(t) = g(x(t), u(t)), for t [0, 1], with x(0) = ξ IR n, where the control u L m = {u : [0, 1] IR m measurable, essentially bounded}. Since such controls are contained in the space of square-integrable functions from [0, 1] to IR m, the usual L -norm applies; see [1, p.709] for a motivation for this hybrid set-up. Let H = IR n L m. For initial condition and control pairs η = (ξ, u) H and η = ( ξ, ū) H, we equip H with the inner product and norm η, η H = ξ, ξ + 1 0 u(t), ū(t) dt and η H = η, η H. We consider control constraints of the form u(t) C, for almost every t [0, 1] for some given convex and compact set C IR m. By imposing the constraints for almost every t instead of every t, we deviate slightly from [1] and follow [9]. We therefore also define the feasible set U = L m {u u(t) C, for almost every t [0, 1]} and H = IR n U. Under standard assumptions, a solution of the differential equation, for a given η H, denoted by x η is unique, Lipschitz continuous, and Gateaux differentiable in η. Consequently, for a given φ : IR n IR n IR, Lipschitz continuously differentiable on bounded sets, the function f : H IR defined by f(η) = φ(ξ, x η (1)), for η = (ξ, u) H, 14

has a Gateaux differential of the form f(η), η η H for some Lipschitz continuous gradient f(η) given in [1, Corollary 5.6.9]. The optimal control problem minimize f(η) subject to η H, analogous to Example 1, has an optimality function θ(η) = min F (η, η), for η H, η H where F (η, η) = f(η), η η H + 1 η η H, for η, η H. We next consider approximations. Let U ν U, ν IN, consist of the piecewise constant functions that are constant on each of the intervals [(k 1)/ν, k/ν), k = 1,..., ν. Set H ν = IR n U ν. Moreover, let x ν η be the (unique) solution of the forward Euler approximation of the differential equation, using time-step 1/ν, given input η = (ξ, u) H. An approximate problem then takes the form minimize f ν (η) subject to η H ν, where One can show that where f ν (η) = φ(ξ, x ν η(1)). θ ν (η) = min η H ν F ν (η, η), for η H ν, F ν (η, η) = f ν (η), η η H + 1 η η H, for η, η H ν, is an optimality function of f ν : H ν IR, where the Lipschitz continuous gradient f ν (η) is given in [1, Theorem 5.6.19]. By [1, Theorem 4.3.], for every bounded set S H, there exists a C S < such that f(η) f ν (η) C S /ν and f(η) f ν (η) H C S /ν for all η S. Moreover, ν IN H ν is dense in H. Consequently, it is easily established that f ν : H ν IR epi-converge to f : H IR. We next consider the optimality functions. Let η H and η ν η H, with η ν H ν. Necessarily, η H. Due to the density result, there exists η ν η, with η ν H ν. Hence, F ν (η ν, η ν ) F (η, η) f ν (η ν ) f(η) H η ν η ν H + f(η) H η ν η ν η + η H + 1 ην η ν H 1 ην η ν H 0 and we have shown Definition 4.1(i). Using similar arguments, we also establish part (ii) and the lopsided convergence of F ν to F. Consequently, {(f ν : H ν IR, θ ν : H ν IR )} ν IN are consistent approximations of (f : H IR, θ : H IR ). Since the minimization of f ν : H ν IR is equivalent to an optimization problem on a Euclidean space, the above algorithm is implementable for the infinitedimensional problem f : H IR. 15

References [1] H. Attouch. Variational Convergence for Functions and Operators. Applicable Mathematics Sciences. Pitman, 1984. [] H. Attouch and R. Wets. A convergence theory for saddle functions. Transactions of the American Mathematical Society, 80(1):1 41, 1983. [3] J.-P. Aubin and H. Frankowska. Set-Valued Analysis. Birkhäuser, 1990. [4] G. Beer. Topologies on Closed and Closed Convex Sets, volume 68 of Mathematics and its Applications. Kluwer, 199. [5] J.C. Foraker, J.O. Royset, and I. Kaminer. Search-trajectory optimization: Part 1, formulation and theory. in review. [6] A. Jofre and R. J-B Wets. Variational convergence of bivariate functions: Lopsided convergence. Mathematical Programming B, 116:75 95, 009. [7] A. Jofre and R. J-B Wets. Variational convergence of bifunctions: Motivating applications. SIAM J. Optimization, 4(4):195 1979, 014. [8] R. Pasupathy. On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Operations Research, 58(4):889 901, 010. [9] C. Phelps, J.O. Royset, and Q. Gong. Optimal control of uncertain systems using sample average approximations. in review. [10] E. Polak. Computational Methods in Optimization. A Unified Approach. Academic Press, 1971. [11] E. Polak. On the use of models in the synthesis of optimization algorithms. In H. Kuhn and G. Szego, editors, Differential Games and Related Topics, pages 63 79. North Holland, Amsterdam, 1971. [1] E. Polak. Optimization. Algorithms and Consistent Approximations, volume 14 of Applied Mathematical Sciences. Springer, 1997. [13] R.T. Rockafellar and R. Wets. Variational Analysis, volume 317 of Grundlehren der Mathematischen Wissenschaft. Springer, 3rd printing-009 edition, 1998. [14] J. O. Royset. Optimality functions in stochastic programming. Mathematical Programming, 135(1):93 31, 01. [15] J. O. Royset. On sample size control in sample average approximations for solving smooth stochastic programs. Computational Optimization and Applications, 55():65 309, 013. [16] J. O. Royset and R. Szechtman. Optimal budget allocation for sample average approximation. Operations Research, 61:76 776, 013. 16

[17] J.O. Royset and E.Y. Pee. Rate of convergence analysis of discretization and smoothing algorithms for semi-infinite minimax problems. Journal of Optimization Theory and Applications, 155(3):855 88, 01. 17