On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-Objective Black-Box Global Optimization

Size: px
Start display at page:

Download "On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-Objective Black-Box Global Optimization"

Transcription

1 On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-Objective Black-Box Global Optimization Rommel G. Regis Citation: R. G. Regis On the convergence of adaptive stochastic search methods for constrained and multiobjective black-box optimization. Journal of Optimization Theory and Applications, 170(3): The Version of Record of this manuscript has been published and is available in the Journal of Optimization Theory and Applications (2016), 1

2 JOTA manuscript No. (will be inserted by the editor) On the Convergence of Adaptive Stochastic Search Methods for Constrained and Multi-Objective Black-Box Global Optimization Rommel G. Regis June 30, 2016 Abstract Stochastic search methods for global optimization and multi-objective optimization are widely used in practice, especially on problems with black-box objective and constraint functions. Although there are many theoretical results on the convergence of stochastic search methods, relatively few deal with black-box constraints and multiple black-box objectives and previous convergence analyses require feasible iterates. Moreover, some of the convergence conditions are difficult to verify for practical stochastic algorithms and some of the theoretical results only apply to specific algorithms. First, this article presents some technical conditions that guarantee the convergence of a general class of adaptive stochastic algorithms for constrained black-box global optimization that do not require iterates to be always feasible and applies them to practical algorithms, including an evolutionary algorithm. The conditions are only required for a subsequence of the iterations and provide a recipe for making any algorithm converge to the global minimum in a probabilistic sense. Second, it uses the results for constrained optimization to derive convergence results for stochastic search methods for constrained multi-objective optimization. Keywords Constrained optimization multi-objective optimization random search convergence evolutionary programming Mathematics Subject Classification (2000) 65K05 90C29 Rommel G. Regis Department of Mathematics Saint Joseph s University Philadelphia, Pennsylvania 19131, USA rregis@sju.edu

3 1 Introduction Stochastic search methods have been widely used to solve constrained global optimization and multiobjective optimization problems in situations where the objective or constraint functions are black-box. Here, black-box means that the mathematical expression defining the function is not provided, and instead, the function values are obtained via a computer simulation. Stochastic search methods for constrained black-box global optimization include random search techniques (e.g., [1, 2]), evolutionary algorithms and swarm algorithms. Moreover, the most popular stochastic search methods for multiobjective optimization are multi-objective evolutionary algorithms (MOEAs) (e.g., [3, 4]). Many papers have been written on the convergence of stochastic search methods to the global minimum in a probabilistic sense (e.g., [5 10]). However, relatively few papers have dealt with black-box constraints and infeasible start points. For example, Baba [1] and Pinter [11] proved convergence results for some stochastic search algorithms for constrained black-box global optimization. However, their algorithms need a feasible starting point and they require the iterates to remain feasible. Unfortunately, finding feasible starting points and maintaining feasibility can be challenging for highly constrained blackbox problems and so this requirement can be very restrictive in practice. Moreover, their convergence conditions are not easy to verify for some practical stochastic search algorithms. Also, relatively few papers addressed the convergence of stochastic search methods to the Pareto set. For example, Baba et al. [12] proposed a random search algorithm for multi-objective constrained black-box optimization and proved that it converges with probability 1 to an approximate Pareto-optimal solution under certain conditions. However, the algorithmic framework they used again requires feasible iterates. Moreover, other papers (e.g., [13 15]) analyzed the convergence of MOEAs. In addition, Schüetze et al. [16] proved the convergence of stochastic search algorithms to finite size Pareto set approximations using the concept of ϵ-dominance. For a review of convergence results for MOEAs, see [17]. The first part of this paper presents some technical conditions that guarantee the convergence of a general class of adaptive stochastic algorithms for constrained black-box global optimization and applies them to some practical algorithms, including an Evolutionary Programming (EP) algorithm. Although the algorithms are stochastic, this paper focuses on problems with deterministic objective and constraint functions. These results extend the ones given in [18] and [8]. The convergence results provided here do 3

4 not require iterates to be always feasible and some of the convergence conditions are stated mainly in terms of the conditional densities of the random vector iterates. In addition, the algorithmic framework is flexible in that the convergence conditions are only required for a subsequence of the iterations. In fact, it provides a recipe for making any algorithm converge to the global minimum in a probabilistic sense by simply inserting iterations that satisfy some global search conditions. The second part of this paper uses the results for constrained black-box optimization to derive convergence results for stochastic search methods in a multi-objective setting. It applies the convergence results for the constrained case to single-objective reformulations of the multi-objective problem. Moreover, it provides some analysis of stochastic algorithms that work directly on convex constrained multi-objective problems. Some of the main differences between the multi-objective results in this paper and previous work in [12] are that the algorithmic framework presented does not require feasible iterates, the type of convergence result presented is different, and the convergence conditions are again required only on a subsequence of iterations. This paper is among the few to prove the convergence of random search methods for constrained and multi-objective optimization in a probabilistic sense. This paper is organized as follows. Section 2 deals with convergence results for constrained black-box optimization. Section 3 provides the convergence results for adaptive random search for multi-objective optimization. Finally, Section 4 provides a summary and conclusions. 2 Constrained Black-Box Global Optimization 2.1 Preliminaries and Notations This section focuses on the convergence of adaptive stochastic search methods for the following constrained black-box global optimization problem (CBOP): min f(x) subject to: G(x) := (g 1 (x),..., g m (x)) 0 and l x u (1) Here, f : R d R and g j : R d R, j = 1,..., m are deterministic black-box measurable functions and l, u R d are the bounds on the decision variables. In many practical applications, the values of the 4

5 objective and constraint functions are obtained by running computer simulations and their derivatives are unavailable. Hence, assume that one simulation at a point in [l, u] yields the values of f, g 1,..., g m at that point. For convenience, problem (1) is denoted by CBOP(f, G, [l, u]). Let D := { x R d : l x u, G(x) 0 } be the feasible region of problem (1). If D is compact and f is continuous over D, then problem (1) is guaranteed to have an optimal solution over D. Note that D is compact if each constraint function g j is also continuous. As mentioned earlier, this paper focuses on stochastic algorithms for problem (1) but the objective and constraint functions are all assumed to be deterministic. For problems where the objective or constraint functions are stochastic, see [8]. Since the constraint functions in (1) are black-box, the feasible region D is unknown. Hence, the algorithms for finding the global minimum of f over D will be searching throughout the box-shaped region [l, u] defined by the bound constraints. We refer to the region [l, u] as the search space of problem (1). Definition 2.1 Let S be a proper subset of R d. A mapping ρ S : R d S such that ρ S (x) = x for all x S is said to be an absorbing transformation. Absorbing transformations are necessary for stochastic search methods because most search spaces in practical problems are compact subsets of some Euclidean space and some probability distributions yield iterates that are outside the search space. For example, Gaussian iterates might fall outside the search space even if the center of the distribution is well inside the search space. An absorbing transformation is meant to absorb runaway iterates into the search space. Commonly used absorbing transformations are projections onto the search space or successive reflections about a boundary. Definition 2.2 Consider a CBOP(f, G, [l, u]) and let D := { x R d : l x u, G(x) 0 }. A constraint violation (CV) function for G over [l, u] is a function V G : [l, u] R + with the following properties: (i) V G (x) = 0 for all x D; (ii) V G (x) > 0 for all x D; and (iii) If G(x) G(y), then V G (x) V G (y). A CV function is a measure of the degree of constraint violation of a point in the search space [l, u]. Two examples are V G (x) = m j=1 [max{g j(x), 0}] and V G (x) = m j=1 [max{g j(x), 0}] 2. Before defining the concept of convergence of a stochastic algorithm to the global minimum, we first recall the definitions of convergence in probability and almost sure convergence (e.g., see [19]). In the 5

6 definitions below, {X n } n 1 is a sequence of random vectors and X is another random vector defined on the same probability space (Ω, B, P ), where Ω is the sample space, B is a σ-field of subsets of Ω, and P is the probability measure. That is, the X n s and X are mappings X n : (Ω, B) (R d, B(R d )) and X : (Ω, B) (R d, B(R d )), where B(R d ) consists of the Borel subsets of R d. The definitions below also apply to the special case where {X n } n 1 and X are random variables instead of random vectors. Throughout this paper, denotes the 2-norm. Moreover, if E is a collection of random elements defined on a probability space, then σ(e) is the σ-field generated by E. We can think of σ(e) as representing all the information that can be derived from the random elements in E. Definition 2.3 Let {X n } n 1 and X be random vectors defined on the same probability space (Ω, B, P ). The sequence {X n } n 1 converges in probability to X iff for any ϵ > 0, lim n P [ X n X > ϵ] = 0. Moreover, {X n } n 1 converges almost surely (a.s.) to X, written X n X a.s., iff there exists an event N B such that P (N ) = 0 and lim n X n (ω) = X(ω) for all ω Ω \ N. It is well-known that convergence a.s. implies convergence in probability but the converse is false. One of the goals of this paper is to provide simple conditions that guarantee the convergence of a general class of adaptive stochastic algorithms for problem (1) that is described in the framework in Section 2.3. Because the algorithms that follow this framework are stochastic, the iterates are treated as d-dimensional random vectors. Consider a stochastic algorithm whose iterates are given by {Y n } n 1 defined on a probability space (Ω, B, P ). Here, the random vector Y n : (Ω, B) (R d, B(R d )) represents the nth trial iterate generated by some probability distribution such as the uniform distribution over [l, u] or a multivariate Normal distribution centered at a current best solution. As mentioned above, the trial iterate Y n could fall outside of [l, u] so it needs to be brought back to [l, u]. Hence, define {X n } n 1 to be the sequence of actual random vector iterates where X n := ρ [l,u] (Y n ) for some absorbing transformation ρ [l,u]. Note that the objective and constraint function evaluations are carried out on the sequence {X n } n 1 rather than the trial sequence {Y n } n 1. We refer to {Y n } n 1 as the trial random vector iterates while {X n } n 1 are the actual random vector iterates. Next, we clarify what it means for a stochastic algorithm to converge to the global minimum in a probabilistic sense. Given the sequence {X n } n 1 of actual random vector iterates, we define the sequence {Xn} n 1 of the best points visited by the algorithm with respect to the objective function f(x) and a 6

7 constraint violation function V G (x) as follows: set X 1 := X 1, and for any n 1, set Xn := X n X n if X n and X n 1 are feasible and f(x n ) < f(x n 1) if X n 1 is infeasible and V G (X n ) < V G (X n 1) (2) Xn 1 otherwise Definition 2.4 Let f be a real-valued measurable function defined on a measurable set D R d such that the global minimum value f := min x D f(x) exists. A stochastic algorithm whose actual iterates are denoted by the sequence of random vectors {X n } n 1 is said to converge to the global minimum of f over D almost surely (or in probability) iff the sequence of random variables {f(xn)} n 1, where Xn is defined in (2), converges to f almost surely (or in probability). 2.2 Simple Stochastic Search Algorithms for Constrained Optimization Before presenting a general framework for adaptive stochastic search methods for constrained black-box optimization, we first present two simple stochastic algorithms for constrained optimization that are easy to use in practice. The first one uses uniform random search over the search space while the second one uses Gaussian steps centered at the current best solution. Algorithm A1. Uniform Random Search for Constrained Optimization Inputs: (i) CBOP(f, G, [l, u]); (ii) CV function V G : [l, u] R +. Step 0. Set n = 1. Step 1. Generate a realization of X n U[l, u]. Step 2. Evaluate f(x n ) and G(X n ). (This is equivalent to one simulation.) Step 3. Let Xn be the best point found so far with respect to f(x) and V G (x) as defined in (2). Step 4. Increment n n + 1 and go back to Step 1. Algorithm A1 samples uniformly at random over the search space [l, u]. Then it updates the current best solution Xn after the objective and constraint function values of X n are obtained. This is the simplest conceivable stochastic search algorithm for problem (1). Since the X n s are independent and identically distributed (iid), this algorithm is not really adaptive but it can be shown to converge to the global minimum under some simple assumptions as will be seen in the next section. 7

8 To make the algorithm somewhat adaptive, one can use Gaussian distributions centered at the current best solution. By localizing the search in this manner, the chances of finding better iterates are improved. Algorithm A2. Localized Random Search for Constrained Optimization Inputs: (i) CBOP(f, G, [l, u]); (ii) CV function V G : [l, u] R + ; (iii) Initial covariance matrix for the Gaussian steps C 1 ; (iv) Absorbing transformation ρ [l,u] : R d [l, u]; (v) Initial solution X 0 [l, u]. Step 0. Initialize X0 := X 0 and set n = 1. Step 1. Generate a realization of Z n such that Z n σ({z 1,..., Z n 1 }) N(0 d 1, C n ), and set Y n := Xn 1 + Z n. Step 2. Set X n := ρ [l,u] (Y n ). Step 3. Evaluate f(x n ) and G(X n ). (This is equivalent to one simulation.) Step 4. Let Xn be the best point found so far with respect to f(x) and V G (x) as defined in (2). Step 5. Determine the next covariance matrix C n+1, possibly using information obtained so far. Step 6. Increment n n + 1 and go back to Step 1. In Step 5 of Algorithm A2, the covariance matrix of the Gaussian random vector iterates may be kept constant but it also allows for the possibility of adjusting this covariance matrix based on information obtained so far. For example, such an adjustment strategy is commonly employed by evolution strategies (ES) such as CMA-ES [20] and its variants. In the next section, we provide a general framework that includes Algorithms A1 and A2 as special cases and we prove the convergence of algorithms that follow this framework. 2.3 General Framework for Adaptive Stochastic Search for Constrained Optimization In Algorithm A1, the actual random vector iterates {X n } n 1 are iid and X n U[l, u] for all n 1. This paper focuses on the more general case of adaptive algorithms where each (trial or actual) random vector iterate possibly depends on previous random vector iterates, i.e., the trial random vector iterates {Y n } n 1 are dependent, and so, {X n } n 1 are also dependent. For example, in Algorithm A2, the trial iterate Y n is obtained by adding random perturbations to the current best solution that are Normally distributed with zero mean. Since the current best solution Xn 1 depends on the previous trial iterates Y 1,..., Y n 1 and other associated random vectors, so does Y n. However, the results in this paper also 8

9 apply to the case where the actual random vectors are iid as in the case of Algorithm A1. Moreover, as in [18], the framework below allows for the possibility that the trial iterate Y n is a deterministic function of current and previous intermediate random elements, denoted by {Λ i,j : i = 1,..., n, j = 1,..., k i }, in order to encompass many practical stochastic algorithms. These Λ i,j s could be random variables, random vectors or other types of random elements defined on the same probability space (Ω, B, P ) and they are meant to capture all random decisions prior to generating a trial iterate. Below is the Generalized Adaptive Random Search for Constrained Optimization (GARSCO) framework for solving problem (1). This framework is an extension of the GARS framework in [18] that can handle black-box inequality constraints. In the notation below, n is the number of simulations, where one simulation yields the values of f, g 1,..., g m at a given point in the search space [l, u]. Algorithm A3. Generalized Adaptive Random Search for Constrained Optimization (GARSCO) Inputs: (i) CBOP(f, G, [l, u]); (ii) CV function V G : [l, u] R + ; (iii) Collection of intermediate random elements {Λ i,j : (Ω, B) (Ω i,j, B i,j ) : i 1 and j = 1, 2,..., k i } used to determine the trial random vector iterates; (iv) Absorbing transformation ρ [l,u] : R d [l, u]. Step 0. Set n = 1. Step 1. Generate a realization of the random vector Y n : (Ω, B) (R d, B(R d )) as follows: Step 1.1. For each j = 1,..., k n, generate a realization of the intermediate random element Λ n,j : (Ω, B) (Ω n,j, B n,j ) according to some probability distribution. Step 1.2. Set Y n := Θ n (E n ) for some deterministic function Θ n, where E n := {Λ i,j : i = 1, 2,..., n; j = 1, 2,..., k n }. Step 2. Set X n := ρ [l,u] (Y n ). Step 3. Evaluate f(x n ) and G(X n ). (This is equivalent to one simulation.) Step 4. Let Xn be the best point found so far with respect to f(x) and V G (x) as defined in (2). Step 5. Increment n n + 1 and go back to Step 1. To see how Algorithm A1 follows the GARSCO framework, set k n := 1 and Y n := Λ n,1 U[l, u] for all n 1. Moreover, the choice of absorbing transformation does not matter since X n Y n for all n 1. For Algorithm A2, the intermediate random elements depend on how the covariance matrix C n is updated. The simplest case is when the covariance update is deterministic given a fixed setting of all 9

10 previous random elements. This includes the case where C n is constant for all n 1. It also includes the case where C n+1 possibly depends only on C 1,..., C n and the realizations of the random vectors in n i=1 {Z i, Y i, X i, f(x i ), G(X i )}, which represents the history of the algorithm, and not on some other independent random elements. For example, if the current iterate X n is not an improvement over Xn 1 in terms of f(x) and V G (x), one can set C n+1 := αc n, where 0 < α < 1, in the hopes of generating an iterate closer to Xn 1. In this case, k n := 1, Λ n,1 := Z n, where Z n σ(e n 1 ) N(0 d 1, C n ), and Y n := Xn 1 + Z n is a function of E n := {Z 1,..., Z n } for all n 1. In the more complex case, the covariance matrix update involves additional random elements beyond the Z n s such as in the case of the Evolutionary Programming algorithm in Section 2.4. The following convergence result for a GARSCO algorithm for problem (1) is similar to Theorem 1 in [18], which is a generalization of the theorem on page 40 of [8]. Below, σ(e (nk ) 1) is the σ-field generated by the random elements in E (nk ) 1 for all k 1. One big difference between the following result and the ones in [1,11,18] is that, in these earlier papers, the actual iterates X n where f and G are evaluated are all feasible. However, this is difficult, if not impossible, to ensure when the constraint functions are black-box since D is unknown. The GARSCO framework is more flexible in that it allows the iterates to be infeasible at the beginning. In particular, Xn is the best solution so far according to the given objective function f(x) and the measure of constraint violation V G (x). That is, if Xn is feasible then it has the best value of f(x) among all feasible realizations of X 1,..., X n. On the other hand, if Xn is infeasible, then X 1,..., X n are all infeasible and Xn has the best value of V G (x) among the realizations of these random vector iterates. The proof for the proposition below is given in the Appendix. Proposition 2.1 Consider a CBOP(f, G, [l, u]) with feasible region D and assume that f := inf x D f(x) >. Suppose that a GARSCO algorithm applied to this problem satisfies the condition: For every ϵ > 0, there exists 0 < L(ϵ) < 1 such that P [Y nk D : f(x nk ) < f + ϵ σ(e (nk ) 1)] L(ϵ) (3) for some subsequence {n k } k 1. If V G (x) is a constraint violation function for G over [l, u] and {X n} n 1 is defined by (2), then f(x n) f almost surely (a.s.). 10

11 Note that the condition (3) above is expressed in terms of the trial random vectors {Y nk } k 1, which are random vectors generated by the GARSCO algorithm. The sequence {X nk } k 1 are the images of {Y nk } k 1 under the chosen absorbing transformation. The next proposition deals with the case where the objective function has a unique global minimizer over the feasible region. It is similar to Theorem 2 in [18] but the meaning of Xn in this proposition is different. The proof of this proposition is also given in the Appendix. Proposition 2.2 Consider a GARSCO algorithm applied to problem (1) and suppose the assumptions in Proposition 2.1 hold. Moreover, suppose x is the unique global minimizer of f over D in the sense that f(x ) := inf x D f(x) > and inf x D, x x η f(x) > f(x ) for all η > 0. Then X n x a.s. Verifying condition (3) in Proposition 2.1 is difficult, if not impossible, in a practical setting where the constraint functions are black-box because the feasible region D is not known. A much simpler condition to verify is given by the next proposition. Before we proceed, we make the following assumptions. Assumptions A. Consider a CBOP(f, G, [l, u]) with a compact feasible region D := {x R d : l x u, G(x) 0}. Suppose f := inf x D f(x) > and that f is continuous at a global minimizer of the problem. Moreover, suppose D has a nonempty interior and that every neighborhood of a boundary point of D intersects the interior of D. Throughout this paper, a neighborhood of a point z R d is a closed ball of some radius δ > 0 centered at z, denoted by B(z, δ) := {x R d : x z δ}. Proposition 2.3 Suppose Assumptions A hold for a CBOP(f, G, [l, u]) and suppose that a GARSCO algorithm applied to the problem satisfies the condition: z [l, u] and δ > 0, 0 < ν(z, δ) < 1 such that P [Y nk B(z, δ) [l, u] σ(e (nk ) 1)] ν(z, δ), for some subsequence {n k } k 1. If V G (x) is a constraint violation function for G over [l, u] and {X n} n 1 is defined by (2), then f(x n) f a.s. Moreover, if x is the unique global minimizer of f over D in the sense of Proposition 2.2, then X n x a.s. Proof We show that the conditions of Proposition 2.1 are satisfied. Suppose f is continuous at a global minimizer x of the CBOP. Fix ϵ > 0 and an integer k 1. Since f is continuous at x, δ(ϵ) > 0 such 11

12 that f(x) f(x ) < ϵ whenever x x < δ(ϵ). Hence, P [X nk D : f(x nk ) < f(x ) + ϵ σ(e (nk ) 1)] P [X nk D : X nk x < δ(ϵ) σ(e (nk ) 1)] = P [X nk B(x, δ(ϵ) D σ(e (nk ) 1)] P [Y nk B(x, δ(ϵ) D σ(e (nk ) 1)]. Next, since D is closed, it follows that D = int(d) bd(d), where int(d) is the interior of D and bd(d) is the boundary of D. There are two cases to consider: (1) x int(d); and (2) x bd(d). First, suppose x int(d). Then 0 < η 1 δ(ϵ), where η 1 depends on δ(ϵ) such that B(x, η 1 ) D. Now observe that P [Y nk B(x, δ(ϵ)) D σ(e (nk ) 1)] P [Y nk B(x, η 1 ) [l, u] σ(e (nk ) 1)] ν(x, η 1 ) =: L 1 (ϵ) > 0 (since η 1 depends only on ϵ). Next, suppose x bd(d). By assumption, z int(d) B(x, δ(ϵ)), where z depends on δ(ϵ). Since z int(d), 0 < η 2 δ(ϵ), where η 2 also depends on z and δ(ϵ), such that B(z, η 2 ) D. Again, P [Y nk B(x, δ(ϵ)) D σ(e (nk ) 1)] P [Y nk B(z, η 2 ) [l, u] σ(e (nk ) 1)] ν(z, η 2 ) =: L 2 (ϵ) > 0 (since z and η 2 depend only on ϵ). Define L(ϵ) := L 1 (ϵ) if x int(d) and L(ϵ) := L 2 (ϵ) if x bd(d). Clearly, 0 < L(ϵ) < 1 and P [X nk D : f(x nk ) < f(x ) + ϵ σ(e (nk ) 1)] P [Y nk B(x, δ(ϵ) D σ(e (nk ) 1)] L(ϵ) > 0. By Proposition 2.1, f(x n) f a.s. Proposition 2.3 says that a GARSCO algorithm converges to the global minimum almost surely if it has a subsequence of iterations {n k } k 1 where the random trial iterate Y nk hits any ball of positive radius centered at a point in the search space [l, u] with positive probability in addition to the conditions in Assumptions A. These conditions are satisfied in the special case where the random vector Y nk has a uniform distribution over the search space [l, u] as shown in Corollary 2.1. Consequently, any search method for problem (1) can be made to converge to the global minimum almost surely if one interjects a uniform random sampling step over the search space [l, u] for a subsequence of the iterations regardless of what the method is doing in its actual iterations. In practice though, it would be more efficient if 12

13 the subsequence of iterations responsible for convergence somehow interacts with the rest of the search method. For example, one can use Gaussian iterations where the covariance matrix is chosen so that the trial random vector iterates are more likely to be generated in promising regions based on the history of points visited by the algorithm. The condition in Assumptions A that f is continuous at a global minimizer over D is important. Suppose the function h has a unique global minimizer x over D and that h is continuous at x. Define the function f over D as follows: f(x) := h(x) for all x D, x x and f(x ) := h(x ) 1. Clearly, the f has a unique global minimizer at x but it is discontinuous at that point. Now even if the rest of Assumptions A and the conditions in Proposition 2.3 hold, f(xn) cannot converge to f(x ) a.s. The condition in Assumptions A that every neighborhood around every boundary point of D intersects int(d) is not difficult to satisfy. For example, the following proposition shows that this condition is satisfied under some mild assumptions. The proof is also in the Appendix. Proposition 2.4 If S is a nonempty set in R d such that cl(int(s)) = cl(s), then every neighborhood of a boundary point of S intersects the interior of S. In particular, if C is a closed convex set in R d with a nonempty interior, then every neighborhood of a boundary point of C intersects the interior of C. Although the condition in Proposition 2.4 is satisfied by convex sets D with a nonempty interior, convexity of D is not necessary. It is easy to create examples where cl(int(d)) = cl(d) but D is nonconvex. The next proposition is a modification of Theorem 4 from [18] applied to problem (1). Since D is not known, some of the conditions of the proposition use the search space [l, u] instead of D. Proposition 2.5 Suppose Assumptions A hold for problem (1). Moreover, suppose a GARSCO algorithm applied to the problem has the property that there is a subsequence {n k } k 1 such that for each k 1, Y nk has a conditional density g nk (y σ(e (nk ) 1)) satisfying the condition: µ({y [l, u] : h(y) = 0}) = 0, where h(y) := inf k 1 g nk (y σ(e (nk ) 1)) and µ is the Lebesgue measure on R d. Then f(xn) f a.s. Proof Fix δ > 0 and z [l, u]. For all k 1, P [Y nk B(z, δ) [l, u] σ(e (nk ) 1)] = g nk (y σ(e (nk ) 1)) dy h(y) dy =: ν(z, δ). B(z,δ) [l,u] B(z,δ) [l,u] 13

14 Since h(y) is a nonnegative function on [l, u], µ({y [l, u] : h(y) = 0}) = 0 and µ(b(z, δ) [l, u]) > 0, it follows that ν(z, δ) > 0. By Proposition 2.3, f(x n) f a.s. Next, we apply the previous propositions to commonly used distributions in practice such as the uniform distribution, the Gaussian distribution or the more general class of elliptical distributions. Corollary 2.1 Suppose Assumptions A hold for problem (1). Moreover, suppose a GARSCO algorithm applied to the problem has the property that there is a subsequence {n k } k 1 such that for each k 1, Y nk has a uniform distribution on [l, u]. Then f(x n) f a.s. In particular, Algorithm A1 converges to the global minimum of f over D a.s. Next, we consider GARSCO algorithms that use elliptical distributions to generate its iterates. The class of elliptical distributions include the multivariate Normal and Cauchy distributions. Let Z : (Ω, B) (R d, B(R d )) be a random vector that has an elliptical distribution. If Z has a density, then it has the form ([21]) g(z) = γ [det(c)] 1/2 Ψ((z u) T C 1 (z u)), z R d where u R d, C is a symmetric and positive definite matrix, Ψ is a nonnegative function over the positive reals such that 0 z (d/2) 1 Ψ(z) dz <, and γ is the normalizing constant given by γ := 1 ( 1 2 π d/2 Γ (d/2) z d 1 Ψ(z 2 ) dz). (4) 0 Elliptical distributions generalize widely used probability distributions, including the multivariate Gaussian distribution (Ψ(y) = e y/2 ). The following result states that GARSCO algorithms that use elliptical distributions, where Ψ is monotonically nonincreasing and the eigenvalues of the symmetric and positive definite matrix C are bounded away from 0, converge to the global minimum almost surely. In the notation below, λ min (C) and λ max (C) denote the smallest and largest eigenvalues of C. Moreover, U k = Φ k (E (nk ) 1) represents a deterministic function of the random elements in E (nk ) 1. A typical setting in practice is U k = X(n k ) 1, which is the best solution after (n k) 1 simulations. However, U k could also be any random vector whose realization is in the search space [l, u]. 14

15 Proposition 2.6 Suppose Assumptions A hold for problem (1). Moreover, suppose a GARSCO algorithm applied to the problem satisfies the condition: There is a subsequence {n k } k 1 such that for each k 1, Y nk = U k + Z k, where U k = Φ k (E (nk ) 1) for some deterministic function Φ k and Z k is a random vector whose conditional distribution given σ(e (nk ) 1) is an elliptical distribution with conditional density q k (z σ(e (nk ) 1)) = γ[det(c k )] 1/2 Ψ(z T C 1 k z), z Rd, where γ is defined in (4). Furthermore, suppose the following properties hold: (P1) Ψ is monotonically nonincreasing; and (P2) inf k 1 λ min (C k ) > 0. Then f(x n) f a.s. In particular, Algorithm A2 converges to the global minimum of f over D a.s. provided inf k 1 λ min (C k ) > An Evolutionary Algorithm for Constrained Optimization In this section, one of the convergence results from the previous section is applied to an evolutionary programming (EP) algorithm for constrained optimization. Below is a pseudo-code of the (µ + µ)-ep for constrained black-box optimization described in [22]. This evolutionary algorithm solves a problem (1) by using only a Gaussian mutation operator and the mutations on the components of a parent solution vector are independent (i.e., the covariance matrix of the random vector of Gaussian mutations is a diagonal matrix). The notation in Section 2.1 is used in the algorithm below. Each individual in the EP below is a pair (X n, C n ), where X n is the nth point where f and G are evaluated and C n is the diagonal covariance matrix associated with X n that is used to generate an offspring. The initial parent population (generation 0) is denoted by P(0) := {P 1 (0),..., P µ (0)} := {(X 1, C 1 ),..., (X µ, C µ )}. Moreover, for t 1, the offspring population of generation t is denoted by M(t) := {(X tµ+1, C tµ+1 ),..., (X tµ+µ, C tµ+µ )} and the parent population at the end of generation t is denoted by P(t) := {P 1 (t),..., P µ (t)}. The offspring population M(t) is generated by applying a mutation operator to each of the µ parents of the previous generation P(t 1) and the parent population P(t) is obtained from the offspring of the current generation M(t) and the parent population at the end of the previous generation P(t 1). For convenience, the individuals in the parent population P(t) are denoted by P(t) = {P 1 (t),..., P µ (t)} := {( X tµ+1, C tµ+1 ),..., ( X tµ+µ, C tµ+µ )}. In addition, the solution 15

16 vector and the covariance matrix associated with the ith parent at the end of generation t are denoted by X(P i (t)) := X tµ+i and C(P i (t)) := C tµ+i. Algorithm A4: Evolutionary Programming for Constrained Black-Box Optimization Inputs: (i) CBOP(f, G, [l, u]); (ii) CV function V G : [l, u] R + ; (iii) Number of offspring (also the number of parents) in every generation, denoted by µ; (iv) Initial and minimum standard deviations of the Gaussian mutations, denoted by σ init > 0 and σ min > 0, respectively; (v) Absorbing transformation ρ [l,u] : R d [l, u]. Step 1. (Initialize Parent Population) Set t = 0 and for each i = 1, 2,..., µ, generate Y i according to some probability distribution whose realizations are on R d, where Y i possibly depends on Y 1,..., Y i 1, and set X i := ρ [l,u] (Y i ). Moreover, for each i = 1, 2,..., µ, set P i (0) := (X i, C i ), where C i := σinit 2 I d. Step 2. (Evaluate the Initial Parent Population) For each i = 1, 2,..., µ, evaluate f(x i ) and G(X i ). Step 3. (Iterate) While termination condition is not satisfied do Step 3.1. (Update Generation Counter) Reset t := t + 1. Step 3.2. (Generate Offspring by Mutation) For each i = 1, 2,..., µ, set (Y tµ+i, C tµ+i ) := Mut(P i (t 1)) and X tµ+i := ρ [l,u] (Y tµ+i ). Step 3.3. (Evaluate the Offspring) For each i = 1, 2,..., µ, evaluate f(x tµ+i ) and G(X tµ+i ). Step 3.4. (Select New Parent Population) P(t) := Sel(P(t 1) M(t)) (see below for explanation). End. Recall that for t 1, P(t) = {P 1 (t),..., P µ (t)} = {( X tµ+1, C tµ+1 ),..., ( X tµ+µ, C tµ+µ )}. Now in Step 3.2, the mutation operator is defined as follows: For each t 1 and i = 1, 2,..., µ, Y tµ+i := X(P i (t 1)) + Z tµ+i = X (t 1)µ+i + Z tµ+i, where Z tµ+i is a random vector whose conditional distribution given σ(e tµ+i 1 ) (defined below) is a Gaussian distribution with mean vector 0 and diagonal covariance matrix ( ( Cov(Z tµ+i ) = C(P i (t 1)) = C ) 2 ( ) 2 ( ) ) 2 (t 1)µ+i = diag σ (1) tµ+i, σ (2) tµ+i,..., σ (d) tµ+i. 16

17 Note that σ (1) tµ+i,..., σ(d) tµ+i are the standard deviations of the Gaussian mutations for the d components of the solution vector X(P i (t 1)). Moreover, for t 1 and i = 1,..., µ, ( ) C tµ+i := C(P i (t 1)) diag exp(τ ξ (0) t,i + τξ(1) t,i ), exp(τ ξ (0) t,i + τξ(2) t,i ),..., exp(τ ξ (0) t,i + τξ(d) t,i ) = C (t 1)µ+i exp(τ ξ (0) t,i ) diag ( exp(τξ (1) t,i ) ), exp(τξ(2) t,i ),..., exp(τξ(d) t,i ), where τ = 1/ 2 d and τ = 1/ 2d (Bäck 1993) and ξ (0) t,i, ξ(1) t,i,..., ξ(d) t,i are iid standard Normal random variables. For convenience, define the random vector Ξ t,i := [ξ (0) t,i, ξ(1) t,i,..., ξ(d) t,i ] for all t 1, i = 1,..., µ. In addition, to prevent the standard deviations of the Gaussian mutations from becoming too small, a minimum standard deviation σ min is used. That is, for t 1, i = 1,..., µ and k = 1,..., d, C tµ+i (k, k) = max(c tµ+i (k, k), σmin 2 ). It will be shown later (see proof of Proposition 2.7) that E tµ+i 1 = {Y 1,..., Y µ } t 1 µ {Z sµ+j, Ξ s,j } i 1 {Z tµ+j, Ξ t,j }. s=1 j=1 j=1 In Step 3.4, the selection of the parent solutions for the next generation is usually accomplished by probabilistic q-tournament selection as described in Bäck (1993). As q increases, this q-tournament selection procedure becomes more and more greedy. For simplicity, we assume that the selection of parent solutions proceeds in a completely greedy manner. That is, P(t) is simply the collection of µ best solutions from P(t 1) M(t) in terms of the objective function f(x) and the constraint violation function V G (x). The next result, whose proof is in the Appendix, shows that the above EP follows the GARSCO framework and it also shows almost sure convergence to the global minimum. Proposition 2.7 The EP in Algorithm A4 follows the GARSCO framework. Moreover, if this EP is applied to a CBOP(f, G, [l, u]) such that Assumptions A hold, then f(x n) f a.s. 3 Multi-Objective Constrained Black-Box Optimization 3.1 Problem Statement and Preliminaries The goal of this section is to prove some convergence results for adaptive stochastic search methods for solving the following multi-objective constrained black-box optimization problem (MCBOP): 17

18 min F (x) := (f 1 (x),..., f k (x)) s.t. G(x) := (g 1 (x),..., g m (x)) 0 and l x u (5) Here, f i : R d R, i = 1,..., k and g j : R d R, j = 1,..., m are again deterministic black-box measurable functions and l, u R d. As before, let D be the feasible region and assume that one simulation yields the values of F and G at a given input. This problem is denoted by MCBOP(F, G, [l, u]). In the event that the objective and constraint functions are not black-box and their mathematical forms are actually known, it might be more efficient to take advantage of the mathematical structures of these functions and possibly their gradients. The reader is referred to a standard textbook on nonlinear multiobjective optimization (e.g., [23]) for more suitable methods that can be used. Consider an MCBOP(F, G, [l, u]) of the form (5). We employ some terminology found in standard texts in multi-objective optimization (e.g., [23]). Below are some basic terms. Definition 3.1 Given x, y D, we say that x dominates y, written x y, iff f i (x) f i (y) for all i = 1,..., k and f j (x) < f j (y) for some j. Definition 3.2 A point x D is said to be a (global) Pareto minimizer of F over D iff y D s.t. y x. The Pareto set of F over D is the set of all global Pareto minimizers of F over D. The Pareto front of F over D is the image of the Pareto set under the mapping F, i.e., it is the set of objective vectors {F (x ) : x is a global Pareto minimizer of F over D}. Ideally, we wish to determine, or at least characterize, the entire Pareto set and Pareto front of F over D. However, for many practical problems, the Pareto front is an infinite set, and so, one can only hope to find a finite representative subset of the Pareto set and Pareto front. In practice, many stochastic algorithms strive to find a non-dominated subset of objective vectors, sometimes with no guarantee of being Pareto optimal. The solutions found are then presented to a decision maker who selects one or a few non-dominated solutions for implementation. In this paper, we would like to develop stochastic algorithms that are guaranteed to converge to Pareto optimal solutions. Even better would be to design algorithms that can, in principle, find every Pareto optimal point. In some situations, it would be convenient to focus on convex MCBOPs, which is defined next. 18

19 Definition 3.3 The above MCBOP is convex iff all the objective functions f 1,..., f k and the feasible region D := { x R d : l x u, G(x) 0 } are convex. For example, the feasible region D is convex if the inequality constraint functions g 1,..., g m are all quasi-convex. A special case of a convex MCBOP occurs when F : R d R k is a linear function (F (x) = Mx for some matrix M R k d ) and G : R d R m is an affine function (i.e., G(x) = Ax + v for some matrix A R m d and vector v R m. This is equivalent to the case where each of the objective functions f 1,..., f k and each of the inequality constraint functions g 1,..., g m are linear. Since this paper assumes that the objective and constraint functions in (5) are black-box, it is not really possible to know whether the given MCBOP is convex. However, it would still be valuable to explore convergence results for the convex case since it is more mathematically tractable. Moreover, any convergence result proved for the convex case will hold in the event that the black-box MCBOP happens to be convex (even if this is unknown to the user of the algorithm). 3.2 Weighted Combination of Objective Functions A basic method for finding Pareto optimal solutions is to convert problem (5) into a single-objective optimization problem where the objective function is a weighted combination of f 1,..., f k. More precisely, consider an MCBOP(F, G, [l, u]) and for each λ R k, λ 0, define problem MCBOP (λ) as follows: k min λ T F (x) = λ i f i (x) i=1 (MCBOP (λ)) s.t. x D := { x R d : l x u, G(x) 0 } The following result, which is Theorem of Part II of [23], is well-known since the time of Geoffrion [24]. It says that by choosing weights that are strictly positive, an optimal solution to the above problem with the weighted combination of objectives will always yield a Pareto optimal solution. Proposition 3.1 For any λ R k, λ > 0, an optimal solution to the problem MCBOP (λ) is a Pareto minimizer for the original MCBOP. Unfortunately, it is also well-known that the previous result is not guaranteed to yield all possible Pareto optimal points unless the MCBOP is convex. Below is a result from Part II of [23]. 19

20 Proposition 3.2 Suppose an MCBOP(F, G, [l, u]) is convex. For any Pareto optimal solution x to this MCBOP, λ R k, λ > 0 such that x is an optimal solution of MCBOP (λ). However, Soland [25] proved that by adding upper bounds on the objective functions, the resulting problem will be able to generate the entire Pareto set. More precisely, consider an MCBOP(F, G, [l, u]). For each λ, b R k, λ > 0, define problem MCBOP (λ, b) as follows: min λ T F (x) = k i=1 λ if i (x) s.t. x D, F (x) b (MCBOP (λ, b)) The following result characterizes all Pareto-optimal points of the given MCBOP(F, G, [l, u]). Proposition 3.3 (Soland [25]) Fix λ R k, λ > 0. Then x is a Pareto-optimal solution to the MCBOP(F, G, [l, u]) problem if and only if x is optimal in problem MCBOP (λ, b) for some b R k. Because of Propositions 3.2 and 3.3, we can obtain Pareto optimal solutions by fixing λ, b R k, λ > 0, and then applying a GARSCO algorithm from the previous section. In particular, the following procedure can be used to generate Pareto optimal solutions for the MCBOP in (5): Algorithm B0. Generalized Adaptive Random Search on Weighted Objectives with Constraints Inputs: (i) MCBOP(F, G, [l, u]); (ii) A GARSCO algorithm for finding the global minimum of a weighted combination of the objective functions over [l, u] subject to the given inequality constraints G(x) 0; (iii) Number of iterations (or maximum number of Pareto optimal solutions to obtain), denoted by T. Step 0. Initialize the Pareto set X0 := and the Pareto front F0 :=. Set the iteration counter t = 1. Step 1. Select a particular λ t R k, λ t > 0. Step 2. Randomly generate points uniformly over [l, u] until a feasible point x t is obtained. Step 3. Calculate the objective vector b t of the feasible point found in Step 2, i.e., b t := F ( x t ). Step 4. Run the GARSCO algorithm to solve MCBOP (λ t, b t ) and let x t be the solution found. Step 5. Update Xt := Xt 1 {x t } and Ft := Ft 1 {F (x t )}. Step 6. If t T, then stop and return Xt and Ft ; else, t t + 1 and go back to Step 1. In Step 0, the Pareto set, Pareto front and iteration counter are initialized. Then in Step 1, the vector of weights for the objective functions (denoted by λ t ) is chosen. Proposition 3.3 says that any choice 20

21 of λ t > 0 should be able to generate all Pareto optimal solutions using suitable upper bounds. These weights could be the same for all iterations, they may be chosen uniformly at random over the unit simplex {λ R k : λ > 0, λ T 1 k = 1}, or they may incorporate some information about a decision maker s preferences. In practice, it makes sense to set the weights according to the relative importance of the objective functions if such information is available. Next, Step 2 repeatedly generates a point uniformly at random over the search space [l, u] until a feasible point is obtained. Since it was assumed that the feasible region has a nonempty interior, a feasible point will be obtained in a finite number of trials with probability 1. The purpose of this step is to find suitable upper bounds for the objective functions so that the resulting single-objective optimization problem in Step 4 will be feasible. In Step 3, we calculate the objective vector b t corresponding to the feasible point obtained in Step 2. Then, in Step 4, a GARSCO algorithm is used to solve the MCBOP (λ t, b t ) problem. Proposition 3.3 says that any optimal solution x t to MCBOP (λ t, b t ) is guaranteed to be Pareto optimal. Next, the Pareto set and Pareto front are updated in Step 5. Finally, in Step 6, the Pareto set and Pareto front are returned if the number of iterations reached T ; otherwise, the iteration counter is incremented and the algorithm goes back to Step 1. If the given MCBOP is known to be convex (though perhaps the exact mathematical forms of its objective and constraint functions are unknown), Step 3 of Algorithm B0 can be removed. Moreover, Step 4 uses the chosen GARSCO algorithm to solve the MCBOP (λ t ) problem. This applies to the special case where the objective and constraint functions are all linear. However, if this is known in advance, then there are more suitable and efficient approaches that can be used (e.g., see [23]). 3.3 Adaptive Stochastic Search Algorithms for Constrained Multi-Objective Optimization In the previous section, we applied a GARSCO algorithm on the weighted combination of the objectives subject to additional constraints giving upper bounds on the objectives. However, we would also like to analyze stochastic algorithms that work directly on the above MCBOP problem without scalarization. To do this, we extend the notion of domination to infeasible points as given in the next definition. Definition 3.4 Consider an MCBOP(F, G, [l, u]) with feasible region D and let V G (x) be a constraint violation function for G over [l, u]. Given x, y [l, u], we say that x dominates y, written x y, iff any 21

22 one of the following conditions hold: (i) x, y D and f i (x) f i (y) for all i = 1,..., k and f j (x) < f j (y) for some j; (ii) x D but y D; or (iii) x, y D and V G (x) < V G (y). The next proposition says that the sequence of best solutions for any of the single-objective weighted combination problems obtained from (5) for any choice of positive weights is always captured by the set of all nondominated points at any given iteration. Proposition 3.4 Consider any algorithm for an MCBOP(F, G, [l, u]) that evaluates F (x) and G(x) at a sequence of points {x n } n 1. Let D be the feasible region and let V G (x) be the constraint violation function for G over [l, u] used by the algorithm. Moreover, for any n 1, let A n := {x 1,..., x n } be the set of all points that have been evaluated after n simulations and let A n be the set of nondominated points in A n. Fix λ R k, λ > 0. For n 1, let x n A n be the best point among all points in A n in terms of the objective function λ T F (x) and constraint violation function V G (x) in the following sense: If A n has a feasible point, then x n is feasible and λ T F ( x n ) λ T F (x), x A n D; else V G ( x n ) V G (x), x A n. Then x n A n for any n 1. Proof Fix n 1. Consider two cases. First, suppose x n A n is infeasible. In this case, all points in A n are infeasible (if one of these is feasible, then x n would not be the best) and x n has the best value of V G (x) among all points in A n. This means x n is nondominated by any other point in A n. Next, suppose x n A n is feasible. If x n is the only feasible point among A n, then A n = { x n }. So assume A n has at least two feasible points. To show that x n A n, argue by contradiction. Suppose x n A n. Then there exists x A n that dominates x n, i.e., f i (x) f i ( x n ) for all i and f j (x) < f j ( x n ) for some index j. Note that this implies that λ T F (x) < λ T F ( x n ) λ T F (x), x A n. Hence, λ T F (x) < λ T F (x), x A n, which is a contradiction since x A n. Again, before presenting a general framework for adaptive stochastic search methods for constrained multi-objective optimization, we first present two simple stochastic algorithms that are easy to use in practice. Algorithm B1 uses uniform random search over the region defined by the bound constraints while Algorithm B2 uses Gaussian steps centered at a current non-dominated point. Below, A n is the set of all previously evaluated points and A n is the set of all nondominated points after n simulations that yield the objective and constraint function values. 22

23 Algorithm B1. Uniform Random Search for Constrained Multi-Objective Optimization Inputs: (i) MCBOP(F, G, [l, u]); (ii) CV function V G : [l, u] R +. Step 0. Initialize A 0 := and A 0 :=. Set n = 1. Step 1. Generate a realization of X n U([l, u]). Step 2. Evaluate F (X n ) and G(X n ). (This is equivalent to one simulation.) Step 3. Set A n := A n 1 {X n } and A n := Update(A n 1, {X n }). Step 4. Increment n n + 1 and go back to Step 1. To make the previous algorithm somewhat adaptive, we again use Gaussian distributions centered at one of the current non-dominated points. The selection of the non-dominated point to be the center of the Gaussian random vector iterates can be done uniformly at random or in some deterministic fashion (e.g., choose the most isolated non-dominated point). By localizing the search in the neighborhoods of current non-dominated points, the chances of finding better non-dominated solutions are improved. Algorithm B2. Localized Random Search for Constrained Multi-Objective Optimization Inputs: (i) MCBOP(F, G, [l, u]); (ii) CV function V G : [l, u] R + ; (iii) Initial covariance matrix for the Gaussian steps C 1 ; (iv) Absorbing transformation ρ [l,u] : R d [l, u]; (v) Initial solution X 0 [l, u]. Step 0. Initialize X0 := X 0, A 0 := and A 0 :=. Set n = 1. Step 1. Generate a realization of Z n such that Z n σ({z 1,..., Z n 1 }) N(0 d 1, C n ), and set Y n := Xn 1 + Z n. Step 2. Set X n := ρ [l,u] (Y n ). Step 3. Evaluate F (X n ) and G(X n ). (This is equivalent to one simulation.) Step 4. Set A n := A n 1 {X n } and A n := Update(A n 1, {X n }). Step 5. Select Xn from A n (uniformly at random or in some deterministic fashion). Step 6. Determine the next covariance matrix C n+1, possibly using information obtained so far. Step 7. Increment n n + 1 and go back to Step 1. Algorithms B1 and B2 are special cases of a more general class of adaptive stochastic search methods for problem (5) that we refer to as Generalized Adaptive Random Search for CONstrained Multi-Objective Optimization (GARSCOM) and is given in the framework below. This framework is also a modification 23

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Introduction to Topology

Introduction to Topology Introduction to Topology Randall R. Holmes Auburn University Typeset by AMS-TEX Chapter 1. Metric Spaces 1. Definition and Examples. As the course progresses we will need to review some basic notions about

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

Convergence of a Generalized Midpoint Iteration

Convergence of a Generalized Midpoint Iteration J. Able, D. Bradley, A.S. Moon under the supervision of Dr. Xingping Sun REU Final Presentation July 31st, 2014 Preliminary Words O Rourke s conjecture We begin with a motivating question concerning the

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

Lecture 4: Completion of a Metric Space

Lecture 4: Completion of a Metric Space 15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying

More information

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

02. Measure and integral. 1. Borel-measurable functions and pointwise limits (October 3, 2017) 02. Measure and integral Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2017-18/02 measure and integral.pdf]

More information

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL JUUSO TOIKKA This document contains omitted proofs and additional results for the manuscript Ironing without Control. Appendix A contains the proofs for

More information

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively); STABILITY OF EQUILIBRIA AND LIAPUNOV FUNCTIONS. By topological properties in general we mean qualitative geometric properties (of subsets of R n or of functions in R n ), that is, those that don t depend

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ. Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Convex relaxations of chance constrained optimization problems

Convex relaxations of chance constrained optimization problems Convex relaxations of chance constrained optimization problems Shabbir Ahmed School of Industrial & Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta, GA 30332. May 12, 2011

More information

Lecture 19 L 2 -Stochastic integration

Lecture 19 L 2 -Stochastic integration Lecture 19: L 2 -Stochastic integration 1 of 12 Course: Theory of Probability II Term: Spring 215 Instructor: Gordan Zitkovic Lecture 19 L 2 -Stochastic integration The stochastic integral for processes

More information

Quick Tour of the Topology of R. Steven Hurder, Dave Marker, & John Wood 1

Quick Tour of the Topology of R. Steven Hurder, Dave Marker, & John Wood 1 Quick Tour of the Topology of R Steven Hurder, Dave Marker, & John Wood 1 1 Department of Mathematics, University of Illinois at Chicago April 17, 2003 Preface i Chapter 1. The Topology of R 1 1. Open

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints

Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints An Li and Jane J. Ye Abstract. In this paper we study an optimal control problem with nonsmooth

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

Zangwill s Global Convergence Theorem

Zangwill s Global Convergence Theorem Zangwill s Global Convergence Theorem A theory of global convergence has been given by Zangwill 1. This theory involves the notion of a set-valued mapping, or point-to-set mapping. Definition 1.1 Given

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

On the Properties of Positive Spanning Sets and Positive Bases

On the Properties of Positive Spanning Sets and Positive Bases Noname manuscript No. (will be inserted by the editor) On the Properties of Positive Spanning Sets and Positive Bases Rommel G. Regis Received: May 30, 2015 / Accepted: date Abstract The concepts of positive

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

A Parametric Simplex Algorithm for Linear Vector Optimization Problems A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

ERRATA: Probabilistic Techniques in Analysis

ERRATA: Probabilistic Techniques in Analysis ERRATA: Probabilistic Techniques in Analysis ERRATA 1 Updated April 25, 26 Page 3, line 13. A 1,..., A n are independent if P(A i1 A ij ) = P(A 1 ) P(A ij ) for every subset {i 1,..., i j } of {1,...,

More information

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Topology, Math 581, Fall 2017 last updated: November 24, 2017 1 Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Class of August 17: Course and syllabus overview. Topology

More information

Appendix B Convex analysis

Appendix B Convex analysis This version: 28/02/2014 Appendix B Convex analysis In this appendix we review a few basic notions of convexity and related notions that will be important for us at various times. B.1 The Hausdorff distance

More information

Math 117: Topology of the Real Numbers

Math 117: Topology of the Real Numbers Math 117: Topology of the Real Numbers John Douglas Moore November 10, 2008 The goal of these notes is to highlight the most important topics presented in Chapter 3 of the text [1] and to provide a few

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

APPLICATIONS OF DIFFERENTIABILITY IN R n.

APPLICATIONS OF DIFFERENTIABILITY IN R n. APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball

More information

JUSTIN HARTMANN. F n Σ.

JUSTIN HARTMANN. F n Σ. BROWNIAN MOTION JUSTIN HARTMANN Abstract. This paper begins to explore a rigorous introduction to probability theory using ideas from algebra, measure theory, and other areas. We start with a basic explanation

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

2 Sequences, Continuity, and Limits

2 Sequences, Continuity, and Limits 2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1 Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Linear Programming Methods

Linear Programming Methods Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Mathematics 530. Practice Problems. n + 1 }

Mathematics 530. Practice Problems. n + 1 } Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined

More information

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Here we consider systems of linear constraints, consisting of equations or inequalities or both. A feasible solution

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

A Concise Course on Stochastic Partial Differential Equations

A Concise Course on Stochastic Partial Differential Equations A Concise Course on Stochastic Partial Differential Equations Michael Röckner Reference: C. Prevot, M. Röckner: Springer LN in Math. 1905, Berlin (2007) And see the references therein for the original

More information

MATH 202B - Problem Set 5

MATH 202B - Problem Set 5 MATH 202B - Problem Set 5 Walid Krichene (23265217) March 6, 2013 (5.1) Show that there exists a continuous function F : [0, 1] R which is monotonic on no interval of positive length. proof We know there

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Math 341: Convex Geometry. Xi Chen

Math 341: Convex Geometry. Xi Chen Math 341: Convex Geometry Xi Chen 479 Central Academic Building, University of Alberta, Edmonton, Alberta T6G 2G1, CANADA E-mail address: xichen@math.ualberta.ca CHAPTER 1 Basics 1. Euclidean Geometry

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

CHAPTER 6. Limits of Functions. 1. Basic Definitions

CHAPTER 6. Limits of Functions. 1. Basic Definitions CHAPTER 6 Limits of Functions 1. Basic Definitions DEFINITION 6.1. Let D Ω R, x 0 be a limit point of D and f : D! R. The limit of f (x) at x 0 is L, if for each " > 0 there is a ± > 0 such that when x

More information

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Introduction to Proofs in Analysis updated December 5, 2016 By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Purpose. These notes intend to introduce four main notions from

More information

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace Takao Fujimoto Abstract. This research memorandum is aimed at presenting an alternative proof to a well

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Modulation of symmetric densities

Modulation of symmetric densities 1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information