McMaster University. Advanced Optimization Laboratory. Title: S-lemma: a survey. Authors: Imre Pólik, Tamás Terlaky. AdvOL-Report No.

Similar documents
McMaster University. Advanced Optimization Laboratory. Title: A survey of the S-lemma. Authors: Imre Pólik, Tamás Terlaky. AdvOL-Report No.

Math 341: Convex Geometry. Xi Chen

Lecture Note 5: Semidefinite Programming for Stability Analysis

Research overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.

Assignment 1: From the Definition of Convexity to Helley Theorem

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Optimization Theory. A Concise Introduction. Jiongmin Yong

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

New stopping criteria for detecting infeasibility in conic optimization

Linear Algebra I. Ronald van Luijk, 2015

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Convex Sets with Applications to Economics

CO 250 Final Exam Guide

A strongly polynomial algorithm for linear systems having a binary solution

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Chapter 1. Preliminaries

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Introduction to Real Analysis Alternative Chapter 1

Convex Functions and Optimization

Linear Programming Redux

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Lecture 6 Positive Definite Matrices

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

The following definition is fundamental.

October 25, 2013 INNER PRODUCT SPACES

Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization

A Criterion for the Stochasticity of Matrices with Specified Order Relations

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

NORMS ON SPACE OF MATRICES

Week 3 Linear programming duality

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Convex hull of two quadratic or a conic quadratic and a quadratic inequality

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

EE731 Lecture Notes: Matrix Computations for Signal Processing

Linear Algebra Primer

Least Sparsity of p-norm based Optimization Problems with p > 1

1 Euclidean geometry. 1.1 The metric on R n

Linear Algebra March 16, 2019

4TE3/6TE3. Algorithms for. Continuous Optimization

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Chapter Two Elements of Linear Algebra

POLARS AND DUAL CONES

STAT200C: Review of Linear Algebra

Semidefinite Programming

Lecture 7: Positive Semidefinite Matrices

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Linear Algebra. Preliminary Lecture Notes

Lecture 7: Semidefinite programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Tangent spaces, normals and extrema

Semidefinite Programming Duality and Linear Time-invariant Systems

Solution to Homework 1

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Notes taken by Graham Taylor. January 22, 2005

Duality Theory of Constrained Optimization

Lecture: Cone programming. Approximating the Lorentz cone.

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

W2 ) = dim(w 1 )+ dim(w 2 ) for any two finite dimensional subspaces W 1, W 2 of V.

TOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS

4. Algebra and Duality

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

A NICE PROOF OF FARKAS LEMMA

IE 521 Convex Optimization Homework #1 Solution

3. Linear Programming and Polyhedral Combinatorics

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

Unbounded Convex Semialgebraic Sets as Spectrahedral Shadows

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Chapter 2 Convex Analysis

Lagrange Multipliers

Lecture Notes 1: Vector spaces

Lecture Notes in Mathematics. Arkansas Tech University Department of Mathematics. The Basics of Linear Algebra

15. Conic optimization

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture

1. Introduction. Consider the following quadratically constrained quadratic optimization problem:

MIT Final Exam Solutions, Spring 2017

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Conic Linear Optimization and its Dual. yyye

Optimization and Optimal Control in Banach Spaces

Algebraic Methods in Combinatorics

Linear Algebra (Math-324) Lecture Notes

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

Lecture 1: Review of linear algebra

Rank-one LMIs and Lyapunov's Inequality. Gjerrit Meinsma 4. Abstract. We describe a new proof of the well-known Lyapunov's matrix inequality about

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

Balanced Truncation 1

Vector Spaces. Addition : R n R n R n Scalar multiplication : R R n R n.

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

Optimality Conditions for Constrained Optimization

Transcription:

McMaster University Advanced Optimization Laboratory Title: S-lemma: a survey Authors: Imre Pólik, Tamás Terlaky AdvOL-Report No. 2004/14 September 2004 July 2005 (revised) Hamilton, Ontario, Canada

S-LEMMA: A SURVEY IMRE PÓLIK AND TAMÁS TERLAKY Abstract. In this paper we review the many facets of the S-lemma, a result about the correctness of the S-procedure. The basic idea of this widely used method came from control theory but it has important consequences in quadratic and semidefinite optimization, convex geometry and linear algebra as well. These areas were subject to active research, but as there was little or no co-operation between researchers from these areas their results remained mainly isolated. Here we give a unified analysis of the theory by providing several proofs for the S-lemma and revealing hidden connections with various areas of mathematics. We prove some new duality result and present applications from control theory, error estimation and computational geometry. Key words. S-lemma, S-procedure, control theory, theorem of alternatives, numerical range, relaxation theory, semidefinite optimization, generalized convexities AMS subject classifications. 90C20, 90C22, 49M20 DOI. This review is divided into two parts, both of which are self-contained. The first part gives a general overview of the S-lemma starting from the basics, providing four different proofs of the fundamental result and showing some applications from engineering and control theory. This part is written in a way that enables the majority of the SIAM community including those who are not experts on this topic to understand the concepts and proofs. Some illustrative applications from control theory and error estimation are also discussed. The second part, starting with 5 shows how the basic theory is related to various fields of mathematics. This part goes beyond the proofs and demonstrates how the same result was discovered independently throughout the 60-year history of the S- procedure. PART I 1. Introduction. In this section we expose the basic question of the S-lemma and provide some historical background. 1.1. Motivation. The fundamental question of the theory of the S-lemma is the following: When is a quadratic (in)equality a consequence of other quadratic (in)equalities? If we ask the same question for linear or general convex (in)equalities then the Farkas Lemma ([10], Theorem 1.3.1.) and the Farkas Theorem (Theorem 2.1 in this review, or [73], 6.10) give the answer, respectively. These results essentially state that a convex inequality is a (logical) consequence of other convex inequalities if and only if it is a nonnegative linear combination of them. This notion is not guaranteed McMaster University, Department of Mathematics and Statistics, Hamilton (ON), L8S 4K1, Canada (poliki@mcmaster.ca) McMaster University, Department of Computing and Software, Hamilton (ON), L8S 4K1, Canada (terlaky@mcmaster.ca) 1

2 I. Pólik, T. Terlaky to work in a general setting. Consider the inequalities 1 (1.1a) (1.1b) (1.1c) (1.1d) u 2 1 v 2 1 u 0 v 0 then the inequality (1.1e) uv 1 is clearly a consequence of those four inequalities. However, taking those four and the trivially true inequalities (such as 0 > 1, u 2 ±2uv +v 2 0, u 2 +v 2 0, etc.) we can not combine them in a linear way to obtain this consequence. Thus only a fraction of the logical consequences can be reached using linear combinations. The importance of it is that it is relatively easy to check if an inequality is a linear combination of some other inequalities. In this survey we discuss when we can apply this argument to quadratic systems. As it is shown by the above simple example it does not work in general, but there are some special cases when we can answer this question using various methods. Quadratic inequalities are formed naturally in many areas of theoretical and applied mathematics. Consider the following examples. Quadratic intersections: The following problem arises in computer graphics, see [47] for a standard reference. Let us take two quadratic surfaces represented by the equations x T A i x + b T i x + c i = 0, i = 1, 2. How can we decide whether the two surfaces intersect without actually computing the intersections? How can we do the same for the solid bodies bounded by the surfaces? Can we compute the intersection efficiently? Noise estimation: Measurements in R n with a Gaussian noise on them can be viewed as ellipsoids. Given two such measurements we need to find a Gaussian estimation of the error of the sum of them. In other words, we are looking for the smallest ellipsoid that contains the sum of the two ellipsoids. Trust Region Problem: A broad range of functions can be efficiently approximated locally with quadratic functions. The Trust Region Problem is to minimize such a function, i.e. to solve (1.2a) (1.2b) min x T Ax + b T x + c x 2 α, where α, c R, b, x R n, A R n n. These problems are easily solvable and the method based on these approximations is widely applied to nonlinear optimization. The most comprehensive book on these methods is [17]. Quadratically Constrained Quadratic Minimization: In a more general setting we can replace the norm constraint (1.2b) with a set of general quadratic constraints: (1.3a) (1.3b) min x T Ax + b T x + c x T A i x + b it x + c i 0, i = 1,..., m. 1 This example is from [10].

S-lemma: a survey 3 Besides the solvability of this problem it is often needed to decide if the problem is feasible at all (i.e. whether the equations (1.3b) are solvable). If the problem is not feasible then one might require a certificate for that. Since this last problem includes integer programming we can not hope for very general results. The S-procedure is a frequently used technique originally arising from the stability analysis of nonlinear systems. Despite being widely used in practice, its theoretical background is often mystified. On the other hand the concept of the S-procedure is well-known in the optimization community, although not under this name. In short terms the S-procedure is a relaxation method, it tries to solve a system of quadratic inequalities via a Linear Matrix Inequality (LMI) relaxation. Yakubovich [81] was the first to give sufficient conditions on when this relaxation is exact, i.e. when it is possible to obtain a solution for the original system using the solution of the LMI, this result is the S-lemma. The advantage we gain is the computational cost: while solving a general quadratic system takes an exponential amount of work, LMIs can be solved more efficiently. 1.2. Historical background. The earliest result of this kind is due to Finsler [25], which was later generalized by Hestenes and McShane [34]. In 1937 Finsler proved that if A and B are two symmetric matrices and x T Bx = 0 (x 0) implies x T Ax > 0 then there exists an y R such that A + yb is positive definite. On the practical side, this idea was first used probably by Lure e and Postnikov [52] in the 1940s, but at that time there was no well-founded theory behind the method. The theoretical background was developed some 30 years later by Yakubovich: in the early 70 s he proved a theorem known as the S-lemma [81, 83] using an old theoretical result of Dines [21] on the convexity of homogeneous quadratic mappings. The simplicity of the method allowed a fast spread in control theory. Applications grew but it was not until the 1990 s that new theoretical results appeared in the area. It was Megretsky and Treil [55] who extended the results to infinite dimensional spaces giving rise to more general applications. Articles written since then mainly discuss some new applications while the theory did not have much contribution. Yakubovich himself showed some applications [82], which were followed by many others [13], including contemporary ones [29, 49], spanning over a broad range of engineering, financial mathematics and abstract dynamical systems. We will discuss various applications in 4. Although the result emerged mainly from practice, Yakubovich himself was aware of the theoretical implications [26] of the S-lemma. The theoretical line was then continued by others (see e.g. [13], or recently [19, 20, 50, 51, 74, 75] but apart from a few exceptions such as [10, 13, 49] or [42] the results did not reach the control community. Moreover, to our best knowledge no thorough study presenting all these approaches has been written so far. The collection of lecture notes by Ben-Tal and Nemirovski [10] contains many of the thoughts explained here and several aspects of this theory have been presented in the context of the LMI relaxation of systems of quadratic inequalities. The term S-method was coined by Aizerman and Gantmacher in their book [1], but later it changed to S-procedure. 2 The S-method tries to construct a Lyapunov matrix justifying the stability of a linear system of differential equations introducing 2 This might be due to the fact that these works are translated from Russian.

4 I. Pólik, T. Terlaky an auxiliary matrix S. This construction leads to a system of quadratic equations (the Lure e resolving equations, [52]). If that quadratic system can be solved then a suitable Lyapunov function can be constructed. The term S-lemma refers to results stating that such a system can be solved under some conditions; the first such result is due to Yakubovich [81]. In this survey our main interest is the S-lemma, but we will present a control theoretical example in 4. 1.3. About this survey. In this paper we aim to show how the S-lemma relates to well known concepts in optimization, relaxation methods and functional analysis. This survey is structured as follows. After the introduction we show three independent proofs for the S-lemma in 2 illustrating how the original result is connected to more general theories. In 3.3 we show some examples and counterexamples, and present other variants of the S-lemma. Applications from control theory and computational geometry are discussed in 4. In the second part we go deeper and investigate three major topics. First, in 5 we discuss a classical topic, the convexity of the numerical range of a linear operator. Following that, in 6 we present a seemingly different field, rank-constrained optimization. In 6.5 we merge the results of these two fields and show the equivalence of the theories. Finally, in 7 we put the problem in a more general context and show that the S-lemma is a special case of a duality theorem due to Illés and Kassay [39, 40, 41]. Some miscellaneous topics are then discussed in 8. These topics (trust region problems, simultaneous diagonalization, algebraic geometric connections and complexity issues) are each subject of several books and papers, thus we can not present all of them in full detail due to lack of space. Instead, we briefly summarize their connection to the S-lemma and indicate the directions for further research. Possible future directions and some open questions are discussed in 9. We made every effort to make the major sections ( 5-7) self-contained, i.e. any one of them can be skipped depending on the reader s area of interest. Each of them starts with a motivation part where we describe how the S-lemma is related to the selected area, then we summarize the major theoretical results of the field, and finally we apply the theory to the problem and conclude the results. 1.4. Notations. Matrices are denoted by capital letters (A, B,...), the j th element of the i th row is A ij. Vectors are denoted by lowercase Roman letters (u, v,...), the i th component of a vector is of the form u i, while u i is the i th vector in a list. Vectors are considered to be column vectors, implying that u T u is the scalar or dot product while uu T is the so-called outer product of two vectors. Sometimes we will break this convention for typographic reasons, e.g. we will simply use (1, 4) to denote a column vector, and accordingly (1, 4) T (2, 3) will denote the scalar product of two such vectors. We try to avoid any ambiguities and use the notations in a clear and concise way. Borrowing the usual Matlab notation we will use u 2:n to denote the vector (u 2,..., u n ) T and A :,1 to denote the full first column of A. Matrices in this review are usually symmetric and positive (semi)definite. Let S n be the space of n n real symmetric matrices, PS n S n the convex cone of positive semidefinite matrices. In the following A 0 will denote that A is symmetric and positive definite, while A 0 will denote that A is symmetric and positive semidefinite. The notations A B and A B are interpreted as A B 0 and A B 0, respectively. Considering the n n matrices as vectors in R n n we can use the scalar product for vectors, i.e. the sum of the products of the corresponding elements. Denoting this scalar product of two matrices A and B with A B we have the following properties, see, e.g. [38]:

S-lemma: a survey 5 1. A B = Tr(AB) = Tr(BA). 2. If B = bb T is a rank-1 matrix then A B = b T Ab. 3. If A and B are positive semidefinite matrices then A B 0. We use the standard symbols R n and C n to denote the n-dimensional linear space of real and complex numbers, respectively. In addition, we use R n + to denote the set of n-dimensional vectors with nonnegative coordinates, i.e. the nonnegative orthant. The symbol R n is interpreted accordingly. To avoid confusion with the indices we will use i to denote the imaginary unit, i.e. i 2 = 1. 2. Proofs for the basic S-lemma. In this section we present four proofs for the basic S-lemma. We start with the original proof of Yakubovich, then we present a modern proof based on LMIs. The third proof is more elementary, it uses Helly s selection theorem and finally, the fourth proof is as elementary as possible. The key concepts of these proofs will be further investigated and generalized in the remaining sections. 2.1. The two faces of the S-lemma. Now we present the central result of the theory starting from the very basics and showing the main directions of the rest of the survey. The theorems we are going to discuss can be viewed in two ways. As it is shown by the example of 4 the original application of the S-lemma is to decide whether a (quadratic) (in)equality is satisfied over a domain. As these domains are usually defined by one or more quadratic inequalities the question we are investigating is when a quadratic inequality is a consequence of other quadratic inequalities. This idea can be formalized as follows: (2.1) x g j (x) 0, (j = 1,..., m)? x h(x) 0, where x R n and h, g j, (j = 1,..., m) : R n R are quadratic functions. The alternative approach views it as the feasibility of the complementary problem: is there any point in the domain where the inequality in question does not hold? This problem is of the same form as the Farkas Theorem, a fundamental theorem of alternatives in convex analysis, see e.g. [69], Theorem 21.1., [73], 6.10, or [18]: Theorem 2.1 (Farkas Theorem). Let f, g 1,... g m : R n R be convex functions, C R n a convex set and let us assume that the Slater-condition holds for g 1,..., g m, i.e. there exists an x rel int C such that g j ( x) < 0, j = 1,..., m. The following two statements are equivalent: (i) The system (2.2a) (2.2b) (2.2c) f(x) < 0 g j (x) 0, j = 1,..., m x C is not solvable. (ii) There are y 1,..., y m 0 such that (2.3) m f(x) + y j g j (x) 0, for all x C. The proof is based on a separation argument using the fact that the set { (u, v1,..., v m ) R m+1 : x R n, f(x) < u, g i (x) v i, i = 1,..., m } (2.4)

6 I. Pólik, T. Terlaky is convex and that system (2.2) is not solvable if and only of the origin is not in this convex set. The convexity of this set is trivial if all the functions are convex, but in other cases it typically fails to hold. When writing this review we had to decide which form to use. Since in our opinion the latter one is easier to write and more popular in the optimization community we will present all the results in the form of the Farkas Theorem. The theorem we present here was first proved by Yakubovich [81, 83] in 1971. Theorem 2.2 (S-lemma, Yakubovich, 1971). Let f, g : R n R be quadratic functions and suppose that there is an x R n such that g( x) < 0. Then the following two statements are equivalent. (i) There is no x R n such that (2.5a) (2.5b) f(x) < 0 g(x) 0. (ii) There is a non-negative number y 0 such that (2.6) f(x) + yg(x) 0, x R n. 2.2. The traditional approach. Yakubovich used the following convexity result of Dines [21] to prove the S-lemma: Proposition 2.3 (Dines, 1941, [21]). If f, g : R n R are homogeneous quadratic functions then the set M = {(f(x), g(x)) : x R n } R 2 is convex. Proof. We will verify the definition of convexity directly. Let us take two points namely u = (u 1, u 2 ) T and v = (v 1, v 2 ) T. If these two points and the origin are collinear then obviously the line segment between u and v belongs to M, since the functions are homogeneous. From now on we will assume that these points are not collinear with the origin. Since they belong to M there are points x u, x v R n such that (2.7a) (2.7b) u 1 = f(x u ), u 2 = g(x u ) v 1 = f(x v ), v 2 = g(x v ). We will further assume without loss of generality that (2.8) v 1 u 2 u 1 v 2 = d 2 > 0. Let λ (0, 1) be a constant. We try to show that there exists an x λ R n such that (2.9) (f(x λ ), g(x λ )) = (1 λ)u + λv. Let us look for x λ in the form (2.10) x λ = ρ(x u cos θ + x v sin θ), where ρ and θ are real variables. Substituting these to the defining equation of x λ we get: (2.11a) (2.11b) ρ 2 f(x u cos θ + x v sin θ) = (1 λ)u 1 + λv 1 ρ 2 g(x u cos θ + x v sin θ) = (1 λ)u 2 + λv 2. Eliminating ρ 2 from these equations and expressing λ as a function of θ we get (2.12) λ(θ) = u 2 f(x u cos θ + x v sin θ) u 1 g(x u cos θ + x v sin θ) (u 2 v 2 )f(x u cos θ + x v sin θ) (u 1 v 1 )g(x u cos θ + x v sin θ).

S-lemma: a survey 7 Here the denominator of λ(θ) is a quadratic function of cos θ and sin θ. It is easy to check that the denominator is positive for θ = π 2, 0, π 2. Since it is a quadratic function it can have at most two zeros in the interval [ π 2, π 2 ] implying that it is positive on one of the intervals [ π 2, 0] and [0, π 2 ]. We assume without loss of generality that it is the latter, the other case being similar. Then λ(θ) is defined on the whole interval [0, π 2 ] and it is continuous as well. Since λ(0) = 0 and λ( π 2 ) = 1 we can find a value θ λ [0, π 2 ] such that λ(θ λ) = λ. Using this θ λ we get ρ from (2.11) and the desired vector x λ from (2.10). This completes the proof. Yakubovich used Dines result to prove Theorem 2.2. Proof. (Yakubovich, 1971, [81]) It is obvious that (ii) implies (i). On the other hand let us assume (i) and try to prove (ii). First let f and g be homogeneous functions, then by Proposition 2.3 the 2D image of R n under the mapping (f, g) is convex, and by (i) this image does not intersect the convex cone C = {(u 1, u 2 ) : u 1 < 0, u 2 0} R 2, thus they can be separated by a line. This means that there are real numbers y 1 and y 2 such that (2.13a) (2.13b) y 1 u 1 + y 2 u 2 0, (u 1, u 2 ) C y 1 f(x) + y 2 g(x) 0, x R n. Taking ( 1, 0) C we have y 1 0 and setting ( ε, 1) C where ε is arbitrarily small gives y 2 0. The case y 1 = 0 can be ruled out by substituting x in the second equation, so we have y 1 > 0. Letting y = y 2 /y 1 0 then satisfies (ii). Now let f and g be general, not necessarily homogeneous quadratic functions satisfying (i). First let us notice that we can assume x = 0, if this is not the case then let ḡ(x) = g(x + x) be our new function. Let the functions be defined as (2.14a) (2.14b) f(x) = x T A f x + b T f x + c f, g(x) = x T A g x + b T g x + c g, then the Slater condition is equivalent to g(0) = c g < 0. Let us introduce the homogeneous version of our functions (2.15a) (2.15b) f : R n+1 R, f(x, τ) = x T A f x + τb T f x + τ 2 c f g : R n+1 R, g(x, τ) = x T A g x + τb T g x + τ 2 c g. Now we prove that the new functions satisfy (i), i.e. there is no (x, τ) R n+1 such that (2.16a) (2.16b) f(x, τ) < 0 g(x, τ) 0. Let us assume on the contrary that there is an (x, τ) R n+1 with these properties. If τ 0 then (2.17a) (2.17b) f(x/τ) = f(x, τ)/τ 2 < 0, g(x/τ) = g(x, τ)/τ 2 0, which contradicts to (i). If τ = 0 then x T A f x < 0 and x T A g x 0, therefore (2.18a) (2.18b) (λx) T A f (λx) +λb T f x + c f < 0, if λ is large enough, and }{{} <0 (λx) T A g (λx) +λb T g x + c g < 0, if λ has the proper sign. }{{}}{{} 0 <0

8 I. Pólik, T. Terlaky Thus we would contradict to (i). This implies that the new system (2.16) is not solvable. Further, taking (0, 1) gives (2.19) g(0, 1) = g(0) < 0, therefore the Slater-condition is satisfied so we can apply the already proved homogeneous version of the theorem. We get that there exists a y 0 such that (2.20) f(x, τ) + y g(x, τ) 0, (x, τ) R n+1, and with τ = 1 we get (ii). Problems similar to Dines theorem were first investigated by Hausdorff [33] and Toeplitz [76] in the late 1910 s in a more general context: the joint numerical range of Hermitian operators. The importance of this simple fact becomes more obvious if we recall that the S-lemma is actually a non-convex theorem of alternatives, an extended version of the Farkas Theorem. The key step of the proofs in these theorems is usually a separation idea. If system (2.5) is not solvable then the set { (u, v) R 2 : x R n, f(x) < u, g(x) v } (2.21) does not contain the origin. If this set is convex then it can be separated from the origin with a hyperplane, i.e. we can find nonnegative numbers y 1 and y 2 such that (2.22) y 1 f(x) + y 2 g(x) 0. If both f and g are convex functions then the set in (2.21) is convex, but in other cases it is not easy to guarantee the convexity of this set. We will prove that for a pair of quadratic function this convexity still holds. The final step in the classical proofs uses the Slater-condition to ensure that y 1 can be chosen to be positive and it can be eliminated. 2.3. A modern approach. This proof is similar to the one found in [10], but extends it for the nonhomogeneous case. The following lemmas from [74, 75] play a crucial role in this theory: Lemma 2.4. Let G, X R n n be symmetric matrices X being positive semidefinite and rank r. Then G X 0 if and only if there exist p 1,..., p r R n such that r (2.23a) X = p i p it and p it Gp i 0, i = 1,..., r. Lemma 2.5. Let G, X R n n be symmetric matrices X being positive semidefinite and rank r. Then G X = 0 if and only if there exist p 1,..., p r R n such that r (2.23b) X = p i p it and p it Gp i = 0, i = 1,..., r. Proof. We only prove the first lemma since the proof of the second one is very similar. The proof is based on [74, 75] and is constructive. Consider the following procedure: Input: X and G R n n such that X 0 and G X 0, rank(x) = r.

Step 1: Compute the eigenvalue decomposition of X, i.e. S-lemma: a survey 9 (2.24) r X = p i p it. Step 2: If (p 1T Gp 1 )(p it Gp i ) 0 for all i = 2,..., r then return y = p 1. Otherwise we have a j such that (p 1T Gp 1 )(p j T Gp j ) < 0. Step 3: Since p 1T Gp 1 and p j T Gp j have opposite sign we must have an α R such that (2.25) (p 1 + αp j ) T G(p 1 + αp j ) = 0. In this case we return (2.26) y = pi + αp j 1 + α 2. Output: y R n such that 0 y T Gy G X, X yy T is positive semidefinite and rank(x yy T ) = r 1. If the procedure stops in Step 2 then p it Gp i have the same sign for all i = 1,..., r. Since the sum of these terms is negative we have y T Gy = p 1T Gp 1 0 and X yy T = r i=2 pi p it implying that the remaining matrix is positive semidefinite and has rank r 1. If the procedure does not stop in Step 2 then the quadratic equation for α must have two distinct roots, and by definition 0 = y T Gy G X. Finally we can see that (2.27a) r X yy T = uu T + p i p it, i {2,3,...,r}\j where (2.27b) u = pj αp 1 1 + α 2. This decomposition ensures that X yy T has rank r 1 and the procedure is correct. Applying the procedure r times we get the statement of the lemma. Remark 2.6. The lemma can also be proved using the rank-1 approach presented in the proof of Theorem 6.1. Now we can finish the proof of Theorem 2.2. Proof. (Theorem 2.2, S-lemma) It is obvious that if either of the two systems has a solution then the other one can not have one, so what we have to prove is that at least one system has a solution. Let the functions be given as (2.28a) (2.28b) f(x) = x T A f x + b T f x + c f, g(x) = x T A g x + b T g x + c g, and let us consider the following notation: [ ] [ cf b T f cg b H f =, H g = T g (2.29) b f A f b g A g ].

10 I. Pólik, T. Terlaky Using this notation the first system can be rewritten as: [ ] [ ] 1 x T 1 x T (2.30) H f x xx T < 0, H g x xx T 0, x R n. Here the rank-1 matrices (2.31) [ 1 x T x xx T ] are positive semidefinite and symmetric. This can inspire us to look at the following relaxation of (2.30): (2.32) H f Z < 0, H g Z 0, Z 0. This system is a LMI and is subject of active research. The key idea of our proof is to show that this relaxation is exact in the sense that the problem (2.30) is solvable if and only if the relaxed problem (2.32) is solvable. More specifically we will prove the following lemma. Lemma 2.7. Using the notations introduced in (2.29)-(2.32), if the relaxed system (2.32) has a solution then it has a rank-1 solution of the form Z = zz T, where the first coordinate of z is 1. This gives a solution for (2.30). Moreover (2.32) has a solution that strictly satisfies all the inequalities including the semidefinite constraint. Proof. Let Z be a solution of (2.32). Then, since Z is positive semidefinite it can be written as (2.33) Z = r q j q j T, j=1 where q j R n+1 and r = rank(z). Applying Lemma 2.4 we see that it is possible to choose these vectors such that (2.34) H g q j q j T = q j T H g q j 0, j = 1,..., r. Now, from the strict inequality of (2.32) we can conclude that there is a vector q = q j for some 1 j r such that (2.35) H f qq T < 0, otherwise the sum of these terms could not be negative. It means that Z = qq T is a rank-1 solution of (2.32). Observe that this result was obtained without using the Slater-condition. If the first coordinate of q is nonzero then x = q 2:n+1 /q 1 gives a solution for (2.30). If this is not the case then let us introduce [ ] 1 (2.36) q = q + α, x where x is the point satisfying the Slater-condition. Notice that [ ] [ H f q q T = H f qq T 1 +2αH f q T + α 2 1 x T (2.37a) H }{{} x f x x x T <0 ] < 0

if α is small and [ H g q q T = H g qq T 1 (2.37b) +2αH g }{{} x 0 S-lemma: a survey 11 ] [ ] q T + α 2 1 x T H g x x x T }{{} =g( x)<0 if the sign of α is chosen to make the middle term negative. Further, if α q 1 then q 1 0. It is obvious that these conditions can be satisfied simultaneously, i.e. we have q q T that solves (2.32) and x = q 2:n+1 / q 1 gives a solution for (2.30). Finally, letting < 0 (2.38) Z = q q T + βi, where β R and I R (n+1) (n+1) is the identity matrix, provides H f Z and H g Z < 0 if β is small enough and Z 0. In other words Z satisfies the strict version of all the inequalities. This completes the proof of the lemma. Now we can easily finish the proof of Theorem 2.2. It follows directly from the Farkas Theorem (see Theorem 2.1) that system (2.32) solvable if and only if the dual system (2.39a) (2.39b) H f + yh g 0 y 0. is not solvable. Now, by Lemma 2.7 the solvability of the original quadratic system (2.30) is equivalent to the solvability of its LMI relaxation (2.32), which by duality is equivalent to the non-solvability of the dual system (2.39). This means exactly that there is a y 0 such that f(x) + yg(x) 0 for all x R n. This completes the proof of the S-lemma. Quadratic systems can always be relaxed using linear matrix inequalities, so the key question is when this relaxation is exact. This topic is discussed in more detail in 6. 2.4. A third approach. This proof is outlined in [10]. We will use the following special result: 3 Proposition 2.8. Let A and B be 2 2 symmetric matrices. The quadratic system (2.40a) (2.40b) x T Ax < 0 x T Bx 0 is not solvable if and only if there is a non-negative multiplier y such that (2.41a) x T Ax + yx T Bx 0, x R 2, or in other words (2.41b) A + yb 0. 3 A similar result although not explicitly stated is used in [10] but the proof presented there is not complete, it misses the case when B is indefinite.

12 I. Pólik, T. Terlaky Remark 2.9. Notice that we do not assume the Slater-condition. This result does not carry over to higher dimensions. Proof. It is trivial that if system (2.40) is solvable then system (2.41) is not. Assume now that the quadratic system (2.40) is not solvable. Let us introduce a new basis in which B is diagonal and assume that the matrices are of the form: ( ) ( ) α β µ 0 (2.42) A =, B =. β γ 0 ν There are five possible cases: µ, ν 0 : In this case any x satisfies the second equation, so from the non-solvability of (2.40) we get that A is positive semidefinite, thus (2.41) is satisfied with y = 0. µ 0, ν > 0 : If µ > 0 then B is positive definite, therefore some sufficiently large y will satisfy (2.41). If µ = 0 then x = (1, 0) T solves the second inequality, so by the non-solvability we get α 0, and again a sufficiently large y satisfies (2.41). µ > 0, ν 0 : This case can be treated similarly to the previous one. µ < 0, ν > 0 : We can assume without the loss of generality that µ = 1 and ν = 1. Taking x = (1, 0) T certifies that α 0, otherwise (2.40) would be solvable. Now, if det(a) 0 then A is positive semidefinite, thus y = 0 satisfies (2.41), so let us assume that det(a) < 0. If A + yb is positive semidefinite then necessarily det(a + yb) 0, i.e. (2.43) det(a + yb) = y 2 + (α γ)y + αγ β 2 0. }{{} =det(a)<0 Our goal is to show that this inequality has a nonnegative solution. Since (1, ±1) T B(1, ±1) = 0 by the non-solvability of (2.40) we must have (2.44) (1, ±1) T A(1, ±1) = α + γ ± 2b 0. Thus for the discriminant of (2.43) we have (2.45) D 2 = (α γ) 2 + 4(αγ β 2 = (α + γ + 2β)(α + γ 2β) 0, therefore the quadratic equation has real roots, and from the coefficients we can deduce that the product of the two roots is positive. If α γ 0 then the sum of the two roots is also positive, so both roots are actually positive and we have a y satisfying (2.41). Assume on the contrary that this is not the case, i.e. there exist α, β, γ such that (2.46a) (2.46b) (2.46c) γ > α αγ β 2 < 0 (det A < 0) α 0 (2.46d)αu 2 + 2βuv + γv 2 0 u, v : u 2 v 2 (non-solvability of (2.40). Since αγ > 0 we must have β > 0. Moreover, since in the last inequality we can set the sign of u and v arbitrarily we can assume that β > 0, whence we

S-lemma: a survey 13 can introduce ᾱ = α β and γ = γ β and rewrite the system as (2.47a) (2.47b) (2.47c) (2.47d) γ > ᾱ ᾱ γ < 1 ᾱ 0 ᾱu 2 + 2uv + γv 2 0 u, v > 0 : u 2 v 2, where we could assume the positivity of u and v, since if either one of them is negative then that constraint is redundant. These constraints are shown on Fig. 2.1. γ 2 u v ᾱ = γ ᾱ γ = 1 2 v u ᾱ Fig. 2.1. The constraints in system (2.47). Now if ᾱ and γ satisfy the first three constraints then setting u = γ and v = ᾱ in the fourth constraint is valid ( γ > ᾱ), and gives 0 ( u v ) ( ) u (2.48) A = 2ᾱ γ 2 ᾱ γ. v Since ᾱ and γ are positive this implies that ᾱ γ 1, thus contradicting to (2.47). This justifies that γ > ᾱ is not possible. µ > 0, ν < 0 : This case can be proved similarly to the previous one. This completes the proof. We can now finish the proof of Theorem 2.2 easily. Let us assume now that the system (2.49a) (2.49b) x T Ax < 0 x T Bx 0 is not solvable, but there exists a Slater-point x R n such that x T B x < 0. Let us define the following sets for each x R n : (2.50) Q x = { y 0 : x T Ax + yx T Bx 0 }.

14 I. Pólik, T. Terlaky We will apply the infinite version of the Helly Theorem (see e.g. [69]). Theorem 2.10 (Helly Theorem). Given the convex sets C 0, C 1,... R d where C 0 is compact and the intersection of any d+1 sets is non-empty then the intersection of all the sets is non-empty as well. In our case (d = 1) we have to prove the following proposition: Proposition 2.11. The following assertions hold for the sets Q x : (i) Each set is a non-empty, closed and convex subset of the real line. (ii) There is a set which is bounded (thus compact). (iii) The intersection of any two sets is non-empty. Proof. If there is an x such that Q x is empty then x T Bx 0 and x T Ax < 0, but this contradicts to the non-solvability of (2.49). Taking a convergent series (y n ) Q x converging to y we can easily see that y Q x, thus Q x is closed. Since Q x is an interval it is convex. The set Q x corresponding to the Slater-point is bounded since (2.51) y xt A x x T B x and the denominator is nonzero by the assumption. Finally, let us assume that x, z R n and consider the following system (2.52a) (2.52b) (αx + βz) T A(αx + βz) < 0 (αx + βz) T B(αx + βz) 0, where α and β are real numbers. This system is quadratic in α and β, and is not solvable, otherwise our original system would also be solvable. Therefore we can use Proposition 2.8 proved above, and we can conclude that there exists a nonnegative y such that (2.53) (αx + βz) T A(αx + βz) + y(αx + βz) T B(αx + βz) 0, α, β R. Now substituting α = 1, β = 0 and α = 0, β = 1 gives (2.54a) (2.54b) x T Ax + yx T Bx 0 z T Az + yz T Bz 0 respectively, thus y Q x Q z. This completes the proof of the S-lemma. Applying the Helly theorem we have a y 0 such that x T Ax + yx T Bx 0 for every x R n, so we proved the non-trivial direction of the basic S-lemma. The key element in this proof was the result about the 2 2 systems (Proposition 2.8). 2.5. An elementary proof. This proof is based on Lemma 2.3 in [86], 4 and it is the most elementary one of the proofs presented in this section. We will use the following lemma: Lemma 2.12 (Yuan, [86], 1990). Let A, B R n n be two symmetric matrices and let F, G R n be closed sets such that F G = R n. If (2.55a) (2.55b) x T Ax 0, x T Bx 0, x F x G 4 We thank one of the unknown referees for this reference.

S-lemma: a survey 15 then there is a t [0, 1] such that ta + (1 t)b is positive semidefinite. Proof. The lemma is trivially true if either of the two sets is empty, so we can assume that both are non-empty. Further, we can assume that both sets are symmetric to the origin, i.e. F = F and G = G. Let v(t) be the smallest eigenvalue of ta + (1 t)b. If v(t) 0 for some t [0, 1] then the lemma is true. Let us assume now that v(t) < 0 for all t [0, 1]. Define the following set: (2.56) S(t) = { x : x T (ta + (1 t)b)x = v(t), x 1 }. Since v(t) is an eigenvalue, S(t) is not empty and it is closed by continuity, so { } (2.57) S(t) x : x = lim x k, x k S(t k ), t = lim t k. k k It follows from the assumptions on F that S(0) F. Let t max be the largest number in [0, 1] such that S(t max ) F, this number exists due to (2.57). If t max = 1 then by the assumptions on G we have S(1) G. If t max < 1 then for every t (t max, 1] we have (2.58) S(t) G = (S(t) G) (S(t) F) = S(t), }{{} = where we used the assumption that F G = R n. Again, using (2.57) we get that (2.59) S(t max ) G. (2.60) Now let us work on the structure of S(t). If x S(t) and x < 1 then ( ) T ( ) x x A = xt Ax x x x 2 = λ x 2 < λ, since λ is negative, thus it would contradict to the fact that λ is a minimum eigenvalue. This shows that for every x S(t), x = 1. Let x be a unit vector in the subspace spanned by the set { x : x T Ax = λ, x = 1 }, then it can be written as (2.61) k x = α i u i, where u it Au i = λ, u i = 1, for all i = 1,..., k, and we can assume without the loss of generality that they form an orthogonal system. This implies that (2.62) On the other hand (2.63) x T Ax = k αi 2 = 1. k α i α j u it Au j = i,j=1 k αi 2 u it Au i = k αi 2 λ = λ, since x is a unit vector. This shows that x S(t), and thus S(t) is the intersection of a subspace and the unit ball. Such an intersection is a unit ball in some dimension, therefore S(t) is either connected, or it is the union of two symmetric points.

16 I. Pólik, T. Terlaky If S(t max ) is a connected ball, then any path connecting a point in S(t max ) F with a point in S(t max ) G contains a point in S(t max ) F G, since both F and G are closed. This shows that S(t max ) F G, thus there exists an x F G such that x T (ta + (1 t)b) x = v(t) < 0, but then either x T Ax < 0 or x T Bx < 0, contradicting to (2.55). If on the other hand S(t max ) consists of two points then since F = F and G = G we have S(t max ) F and S(t max ) G, thus we reach the same conclusion. This completes the proof of Lemma 2.12. Now the S-lemma (Theorem 2.2) can be proved easily. Let A and B be symmetric matrices and assume that the system (2.64a) (2.64b) x T Ax < 0 x T Bx 0 is not solvable, but the Slater condition is satisfied, i.e. x R n : x T B x < 0. Defining F = { x : x T Bx 0 }, G = { x : x T Bx 0 } satisfies F G = R n and both are closed sets. By the assumption of nonsolvability we have that x T Ax 0 : x F and x T Bx 0 : x G, thus all the conditions of Lemma 2.12 are satisfied and we can conclude that there is a t [0, 1] such that ta + (1 t)b is positive semidefinite. Now t can not be 0, otherwise B would be positive semidefinite and the Slater condition could not be satisfied. Dividing by t we get that A + 1 t t B is positive semidefinite. Remark 2.13. Lemma 2.12 would deserve further attention, since it is very simple in form yet powerful. 3. Special results and counterexamples. In this section we present some related results and counterexamples. 3.1. Other variants. For the sake of completeness we enumerate other useful forms of the basic S-lemma. One can get these results by modifying the original proof slightly. For references see [42, 51]. Proposition 3.1 (S-lemma with equality). Let f, g : R n R be quadratic functions where g is assumed to be strictly concave (or strictly convex) and let us assume a stronger form of the Slater-condition, namely g takes both positive and negative values. Then the following two statements are equivalent: (i) The system (3.1a) (3.1b) f(x) < 0 g(x) = 0 is not solvable. (ii) There exists a multiplier y R such that (3.2) f(x) + yg(x) 0, x R n. In the presence of two non-strict inequality constraints we have the following result. Proposition 3.2 (S-lemma with non-strict inequalities). Let f, g : R n R be homogeneous quadratic functions. The following two statements are equivalent. (i) The system (3.3a) (3.3b) is not solvable. f(x) 0 g(x) 0

S-lemma: a survey 17 (ii) There exist nonnegative multipliers y 1, y 2 0 such that (3.4) y 1 f(x) + y 2 g(x) > 0, x R n \ {0}. If we assume the Slater-condition for one of the functions then we can make the corresponding multiplier positive. 3.2. General results. In this section we will present some known results on how the solvability of the system (3.5a) (3.5b) (3.5c) f(x) < 0 g i (x) 0, x R n i = 1,..., m and the existence of a dual vector y = (y 1,..., y m ) 0 such that m (3.6) f(x) + y i g i (x) 0 are related to each other. We will assume that x R n (3.7a) (3.7b) f(x) = x T Ax + b T x + c g i (x) = x T B i x + p it x + q i, i = 1,..., m where A and B i are symmetric but not necessarily positive semidefinite matrices. Unfortunately the S-lemma is not true in this general setting. It also fails to hold even if we restrict ourselves to the m = 2 case and assume the Slater-condition. This does not prevent us from studying the idea of the S-procedure even when the procedure is theoretically not exact. It is trivial that the two systems in the S-lemma can not be solved simultaneously, so if we are lucky enough to find a solution for the second system then we can be sure that the first system is not solvable. However, the non-solvability of the second system does not always guarantee the solvability of the first one. First let us present what can be found in the literature about this general setting. Our main sources are [10] and [74, 75]. We will outline the proofs as necessary. Let us start with a general result that contains the S-lemma as a special case. Theorem 3.3. Consider the systems (3.5)-(3.6) and let us assume that the functions f and g i, i = 1,..., m are all homogeneous and m n. If system (3.5) is not solvable then there exist a nonnegative vector y = (y 1,..., y m ) 0 and an n m + 1 dimensional subspace V n m+1 R n such that (3.8) f(x) + In other words, the matrix (3.9) m y i g i (x) 0 x V n m+1. A + m y i B i has at least n m + 1 nonnegative eigenvalues (counted with multiplicities) and the above subspace is spanned by the corresponding eigenvectors.

18 I. Pólik, T. Terlaky Remark 3.4. In the case when m = 1 this gives the usual S-lemma. The proof of this theorem is based on some differential geometric arguments. The key idea is simple but it takes some time to work out the details, see [10] Chapter 4.10. Despite the relative strength of the theorem, it is not straightforward to apply it in practice. One such way is to exploit the possible structure of the linear combination and rule out a certain number of eigenvalues. Besides this general result we have only very special ones. Proposition 3.5. If A and B i, i = 1,..., m are all diagonal matrices, i.e. f(x) and g i (x) are all weighted sums of squares, or linear combinations of two fixed matrices, i.e. rank(a, B 1,..., B m ) 2, then the system (3.10a) (3.10b) (3.10c) f(x) < 0 g i (x) 0, x R n i = 1,..., m is not solvable if and only if there exists a nonnegative vector y = (y 1,..., y m ) 0 such that m (3.10d) f(x) + y i g i (x) 0 x R n. In other words, the matrix (3.10e) A + m y i B i is positive semidefinite. The first part of this proposition can be proved easily using the substitution z i = x 2 i and applying the Farkas Lemma. This simple fact leads us to the concept of simultaneous diagonalizability, which will be discussed in 8.2. The second part is a new result, we will prove it in 5.4, see Theorem 5.23. If we want to incorporate more quadratic constraints, we need extra conditions to guarantee the validity of the S-lemma. Proposition 3.6 (m = 2, n 2). Let f, g 1 and g 2 be homogeneous quadratic functions and assume that there is an x R n such that g 1 ( x), g 2 ( x) < 0 (Slatercondition). If either m = 2, n 3 and there is a positive definite linear combination of A, B 1 and B 2, or m = n = 2 and there is a positive definite linear combination of B 1 and B 2, then the following two statements are equivalent: (i) The system (3.11a) (3.11b) (3.11c) f(x) < 0 g i (x) 0, i = 1, 2 x R n is not solvable. (ii) There are nonnegative numbers y 1 and y 2 such that (3.12) f(x) + y 1 g 1 (x) + y 2 g 2 (x) 0 x R n.

S-lemma: a survey 19 Remark 3.7. The condition n 3 in the first part is necessary, see the counterexample for n = 2 in 3.3.2. Remark 3.8. An equivalent condition on when some matrices have a positive definite linear combination is given in [22]. If n 3 then the property that two symmetric matrices have a positive definite linear combination is equivalent to the nonexistence of a common root of the quadratic forms, see [25]. Further, in this case the matrices are simultaneously diagonalizable by a real congruence, see [5, 62, 79]. For a complete review of simultaneous diagonalizability see [38], Theorem 7.6.4., or Uhlig s survey [78]. In the general case, symmetric matrices A 1,..., A m have a positive definite linear combination if and only if A i S = 0, i = 1,..., m implies that S is indefinite. This result is a trivial corollary of the duality theory of convex optimization, but was only rediscovered around the middle of the 20th century, see [22]. These ideas will be further examined in 8.2. One additional step is to include linear constraints. First some equalities: Proposition 3.9. Let f, g : R n R be quadratic functions, A R m n and b R m. Assume that there exists an x R n such that A x = b and g( x) < 0. The following two statements are equivalent: (i) The system (3.13a) (3.13b) (3.13c) (3.13d) f(x) < 0 g(x) 0 Ax = b x R n is not solvable. (ii) There is a nonnegative number y such that (3.13e) f(x) + yg(x) 0 x R n, Ax = b. With some convexity assumption we can include a linear inequality: Proposition 3.10. Let f, g : R n R be quadratic functions, c R n and b R. Assume that g(x) is convex and there exists an x R n such that c T x < b and g( x) < 0. The following two statements are equivalent: (i) The system (3.14a) (3.14b) (3.14c) (3.14d) f(x) < 0 g(x) 0 c T x b x R n is not solvable. (ii) There is a nonnegative number y 0 and a vector (u 0, u) R n+1 such that (3.15a) f(x) + yg(x) + (u T x u 0 )(c T x b) 0 x R n, (3.15b) (3.15c) u T x 0 x R n : x T A g x 0, b T g x 0 u T x u 0 0 x R n : g(x) 0, c g + b T g x 0, where g(x) = x T A g x + b T g x + c g. For a proof see [74, 75]. We can see that including even one linear inequality is difficult, and the dual problem (3.15) is not any easier than the primal problem (3.14).

20 I. Pólik, T. Terlaky 3.3. Counterexamples. In this section we present some counterexamples. 3.3.1. More inequalities. The generalization to the case m 3 is practically hopeless as it is illustrated by the following example taken from [10]. Consider the matrices (3.16a) (3.16b) A = 1 1.1 1.1 1.1 1 1.1 1.1 1.1 1 B 2 = 1 2.1 2.1, B 1 = 1, B 3 = 1 1 1 1 2.1 Lemma 3.11. There is an x such that x T B i x < 0 (Slater). Further, the system,. (3.17a) (3.17b) x T Ax < 0 x T B i x 0, i = 1, 2, 3 is not solvable in R 3. Proof. Let us look for a solution in the form x = (x 1, x 2, x 3 ). If x 3 = 0 then by the last three inequalities we can conclude that x 1 = x 2 = 0, which does not satisfy the first inequality. Since all the functions are homogeneous, and since x and x are essentially the same solution we can assume that x 3 = 1, thus we can reduce the dimension of the problem. Now all four of the inequalities define some quadratic areas in 2D. Instead of a formal and tedious proof we simply plot these areas in Fig. 3.1. The grey area represents the set of points (x 1, x 2 ) where x = (x 1, x 2, 1) satisfies x T Ax < 0, while the four black corners are the feasible set for the remaining three inequalities. Intuitively, the last three inequalities are satisfied for values close to ±1. However, it Fig. 3.1. The quadratic regions defined by system (3.16) with x 3 = 1. is easy to see that such values can not satisfy the first inequality.

S-lemma: a survey 21 Lemma 3.12. There are no nonnegative multipliers y 1, y 2, y 3 for which A+y 1 B 1 + y 2 B 2 + y 3 B 3 0. Proof. Consider the matrix (3.18) X = 2 1 1 1 2 1 1 1 2 Then X is positive semidefinite with eigenvalues 0 and 3. Moreover,. (3.19a) (3.19b) A X = 0.6 < 0 B i X = 0.2 0. Now for any nonnegative linear combination of the matrices we have that (3.20) (A + y 1 B 1 + y 2 B 2 + y 3 B 3 ) X = 0.6 0.2(y 1 + y 2 + y 3 ) < 0, therefore A + y 1 B 1 + y 2 B 2 + y 3 B 3 can never be positive semidefinite, since the scalar product of positive semidefinite matrices is nonnegative. These two lemmas show that neither of the alternatives is true in the general theorem. 3.3.2. The n = 2 case. Discussing the m = 2 case we noted that the result fails to hold if n = 2, thus we need an extra condition. Here is a counterexample taken from [10] to demonstrate this. Let us consider the following three matrices: (3.21a) (3.21b) (3.21c) ( A = ( B = C = λµ 0.5(µ λ) 0.5(µ λ) 1 ) ) µν 0.5(µ ν) 0.5(µ ν) 1 ( ) λ 2 0. 0 1 We will verify the following claims: Proposition 3.13. Let λ = 1.1, µ = 0.818 and ν = 1.344. (i) There is a positive definite linear combination of A, B and C. (ii) There is a vector x such that x T B x < 0 and x T C x < 0. (iii) The quadratic system (3.22a) (3.22b) (3.22c) x T Ax < 0 x T Bx 0 x T Cx 0 is not solvable. (iv) There is no y 1, y 2 0 such that (3.23) A + y 1 B + y 2 C 0. Proof.

22 I. Pólik, T. Terlaky (3.24) (i) The linear combination 1.15A 0.005B C = ( 0.18072696 0.160835 0.160835 0.1450 is positive definite as it is seen by the diagonal elements and the determinant. (ii) Let x = (1, 0) T then clearly x T B x < 0 and x T C x < 0. (iii) Let us exploit the special structure of the matrices. If we are looking for a solution x = (x 1, x 2 ) R 2 then we get (3.25a) (3.25b) (3.25c) x T Ax = λµx 2 1 x 2 2 + (µ λ)x 1 x 2 = (λx 1 + x 2 )(µx 1 x 2 ) x T Bx = µνx 2 1 + x 2 2 (µ ν)x 1 x 2 = (x 2 + νx 1 )(x 2 µx 1 ) x T Cx = λ 2 x 2 1 + x 2 2 = (x 2 λx 1 )(x 2 + λx 1 ). Now, in order to satisfy (3.22) we need to solve one of the following two systems corresponding to which terms are negative and positive: (3.26a) (3.26b) (3.26c) (3.26d) or (3.27a) (3.27b) (3.27c) (3.27d) λx 1 + x 2 > 0 µx 1 x 2 < 0 νx 1 + x 2 0 λx 1 + x 2 0 λx 1 + x 2 < 0 µx 1 x 2 > 0 νx 1 + x 2 0 λx 1 + x 2 0. It is easy to check that with the values specified in the statement both of these systems are inconsistent, therefore (3.22) is not solvable. (iv) The proof of this part is similar to the proof of the previous lemma. The matrix ( ) 1 0 (3.28) X = 0 1 clearly satisfies (3.29a) (3.29b) (3.29c) A X < 0 B X 0 C X 0, thus no nonnegative linear combination A + y 1 B + y 2 C of A, B and C is positive semidefinite. This completes the proof of the lemma. Let us examine briefly why the S-lemma fails to hold for this example. We have already shown in Proposition 3.6 that if n = m = 2 then in order for the result to hold we need to assume that a certain linear combination of B and C is positive definite. However, taking ( ) 1 λ 2 µν µ ν (3.30) X = λ 2 µν µ ν λ 2 )

S-lemma: a survey 23 yields (3.31a) (3.31b) B X = 0 C X = 0, therefore no linear combination of B and C can be positive definite. (4.1a) (4.1b) 4. Practical applications. 4.1. Stability analysis. The first example is based on the one presented in [42]. Let us consider the following dynamical system ẋ = Ax + Bw, x(0) = x 0 v = Cx with a so-called sector constraint (4.1c) σ(v, w) = (βv w) T (w αv) 0, where α < β are real numbers. We would like to use the basic tool of Lyapunov functions [13]: for the quadratic stability of the system it is necessary and sufficient to have a symmetric matrix P such that V (x) = x T P x is a Lyapunov function, i.e. (4.2) V (x) = 2x T P (Ax + Bw) < 0, (x, w) 0 s.t. σ(cx, w) 0. Introducing the quadratic forms (4.3a) [ ] T [ ] [ ] x A σ 0 (x, w) = P + P A P B x w B T P 0 w (4.3b) [ ] T [ x 2βαC σ 1 (x, w) = 2σ(Cx, w) = C (β + α)c T w (β + α) 2 the Lyapunov condition can be written as (4.4) σ 0 (x, w) < 0, (x, w) 0 s.t. σ 1 (x, w) 0, ] [ x w or in other words, we have to decide the solvability of the quadratic system (4.5a) (4.5b) σ 0 (x, w) 0 σ 1 (x, w) 0. Using α < β we can see that the strict version of the second inequality can be satisfied. Based on a suitable form of the S-lemma (Proposition 3.2) we get that the non-solvability of this system is equivalent to the existence of y 0 such that (4.6) σ 0 (x, w) + yσ 1 (x, w) < 0 (x, w) (0, 0). Now y = 0 would imply that σ 0 is negative definite and would contradict to the nonsolvability of (4.5). Thus we can divide with y and use P to denote P/y. Finally, we can state the criterion using the LMI formulation. We proved the following theorem: Theorem 4.1 (Circle criterion, [13]). A necessary and sufficient condition for the quadratic stability of the system (4.7a) (4.7b) (4.7c) ẋ = Ax + Bw, x(0) = x 0 v = Cx σ(v, w) = (βv w) T (w αv) 0, ]