Chapter 2 - Introduction to Vector Spaces

Chapter 2 - Introduction to Vector Spaces Justin Leduc These lecture notes are meant to be used by students entering the University of Mannheim Master program in Economics. They constitute the base for a pre-course in mathematics; that is, they summarize elementary concepts with which all of our econ grad students must be familiar. More advanced concepts will be introduced later on in the regular coursework. A thorough knowledge of these basic notions will be assumed in later coursework. No prerequisite beyond high school mathematics are required. Although the wording is my own, the definitions of concepts and the ways to approach them is strongly inspired by various sources, which are mentioned explicitly in the text or at the end of the chapter. This version: September 9th 2015 Center for Doctoral Studies in Economic and Social Sciences. Contact: justin.leduc@uni-mannheim.de 1

Contents 1 Introduction 3 2 From Classical Vectors to Vector Spaces 4 3 Subspaces, Linear Combinations, and Linear Dependence 7 4 Affine and Convex Sets 10 4.1 Affine Sets..................................... 11 4.2 Convex Sets.................................... 13 4.3 Important Examples: Hyperplanes and Half Spaces.............. 15 5 Normed Vector Spaces and Continuity 17 5.1 Distance and Norm in a Vector Space...................... 17 5.2 Open sets, Closed sets, Compact sets...................... 20 5.3 Continuity..................................... 23 A Appendix 1: The Separating and Supporting Hyperplane Theorems 25 2

1 Introduction In the previous chapter, I devoted a whole section to vectors. The reason is that they constitute an extremely important benchmark to get into the topic of vector spaces, as the name of the topic itself indicates. The main objective of the theory of vector spaces is sometimes 1 described as follows: Geometrical insights at hand with 2- or 3-dimensional real vectors are really helpful. Can we, in some way, generalize these insights to other mathematical objects, for which a geometric picture is not available? By insights, we here think of our ability to identify low dimensional real vectors with geometrical entities, and, thereby to guide our thoughts. For instance, various vector operations are easily plotted: Figure 1: 2D Geometrical Interpretation of Vector Operations And, in turn, these geometrical representations give us simple and fundamental principles that help us solving various problems. Think, for instance, of the projection theorem in two dimensions, which states that the shortest path from a point x to a line l is that which lies on the perpendicular to l. While geometrically very intuitive, this result would probably have been very hard to find out, had we not had a geometrical representation for vectors. And yet, it is a fundamental result for optimization problems such as the least square problems you (may) have studied in your undergrad. Its generalization to n-dimensional real vectors allows, among other things, the application of the least squares method in situations where no picture can guide the thoughts. What this chapter will try to convince you about, is that the generalization applies to many more mathematical objects than n-tuples of real numbers 2. IN THIS CHAPTER, BY VECTOR WE NEED NOT MEAN A N-TUPLE OF REAL 1 See e.g. Luenberger [4] or Gross [3]. 2 For instance, functions and matrices! 3

NUMBERS, BUT MAY REFER TO MANY MORE OBJECTS (FUNCTIONS, SEQUENCES, MATRICES, )! Now, generalizing something need not be easy. Indeed, one has to pin down the exact features of the simpler concepts which guarantee the validity of the insights they bring to us. In our case, these features are precisely the algebraic structure which we have defined for matrices (and hence, for vectors) in our previous chapter. Therefore, in the sequel, we find important to distinguish simple sets from algebraically structured sets, i.e. sets equipped with an algebraic structure. In order to mark the change in emphasis when we manipulate such structured sets, we give them a different name, namely, we call them spaces. A vector space, then, is a set of mathematical objects 3 equipped with the same algebraic structure as that of vectors of real numbers. 2 From Classical Vectors to Vector Spaces Let us try to summarize the algebraic structure we introduced in the last chapter. Remember that, by vector, unless otherwise stated, we meant a column vector, i.e. a n-tuple of real numbers. This notion of a vector will now be referred to as the classical notion of a vector. Definition: (The Vector Space of Classical Vectors V n ) Let V n denote the set collecting all n-tuples of real numbers. Associate to this set two operations: the vector addition, which associate to any x and y in V n the n 1 vector x + y, and the scalar multiplication, which associate to any scalar λ and any x in V n the n 1 vector λx. Denote by V n the space (V n, +, ). The following properties hold: (i) Vector addition is commutative: x, y V n (ii) Vector addition is associative: x, y, z V n x + y = y + x (iii) There exists a null vector 0 n in V n such that: x V n (iv) Scalar multiplication is associative: λ, µ R x V n x + (y + z) = (x + y) + z x + 0 = x λ(µx) = (λµ)x (v) Scalar multiplication is distributive over vector and scalar additions: λ R x, y V n λ(x + y) = λx + λy λ, µ R x V n (λ + µ)x = λx + µx (vi) If 1 denotes the scalar multiplicative identity and 0 the scalar zero, then: x V n 1x = x and 0x = 0 n 3 For instance, vectors in the classical sense of the term (i.e. n-tuples of real numbers), but not necessarily! Functions, matrices, sequences,... all can fit in this new, more general, concept of a vector! Beware, do not understand here that it is possible to identify those mathematical objects with n-tuples of real numbers. Rather, this means that our concept of what a vector is now includes the concept of vectors you have studied in high school, while not being itself included in this concept. 4

Remark: I did not call my set of length n classical vectors R n. Any idea why? Observe that I have been silent about another vector operation: the scalar product! It is conventional to forget about the scalar product when defining vector spaces. By doing so, one includes in the definition of vector spaces all these sets which can be equipped with the above algebraic structure but may not be equipped with a scalar product. The absence of a scalar product is sometimes emphasized by calling the vector space under consideration a linear space. R n, however, is itself a notation for a space of n-tuples of real numbers, but a space with a nicer algebraic structure than V n : one that includes the scalar product! A vector space that possesses a well defined scalar product is called a pre-hilbert space 4. As mentioned in the introduction, in order to define general vector spaces, we take all these algebraic properties and use them as defining axioms. Note the change of emphasis here. A property is the consequence of some set of axioms, i.e., initial assumptions. Using the algebraic properties of classical vectors as defining axioms means that we now understand as vectors not only n-tuples of real numbers, but all mathematical objects which are member of a space endowed with these algebraic laws. Definition: (Real Vector Space) Let X := (X, +, ) be a set of elements, which we will from now on call vectors, together with two operations. Namely, the vector addition, which associate to any x and y in X the n 1 vector x + y, and the scalar multiplication, which associate to any scalar λ and any x in X the n 1 vector λx. X is called a vector space if the following properties hold: (i) Vector addition is commutative: x, y X (ii) Vector addition is associative: x, y, z X x + y = y + x (iii) There exists a null element 0 in X such that: x X (iv) Scalar multiplication is associative: λ, µ R x X x + (y + z) = (x + y) + z x + 0 = x λ(µx) = (λµ)x (v) Scalar multiplication is distributive over vector and scalar additions: λ R x, y X λ(x + y) = λx + λy λ, µ R x X (λ + µ)x = λx + µx (vi) If 1 denotes the scalar multiplicative identity and 0 the scalar zero, then: x X 1x = x and 0x = 0 n 4 Actually, R n possesses a further important, yet non-algebraic, property: that of completeness. It therefore qualifies for the name of Hilbert space! 5

Remark: the attribute real in our definition is simply here to express the fact that our scalars are elements of the real line. The concept of vector spaces can naturally be extended to that of vector space over a different set of numbers, such as for instance C, the set of complex numbers. Yet, we will focus on real vector spaces in what follows. It is a good exercise to verify that the following sets, endowed with proper operations, can now also be considered as vector spaces: V = {f : dom(f) = [a, b]}, S = {x x = ζ k } k=1 the set of infinite sequences of real numbers, The set C n (X) of all bounded and continuous real valued functions with domain in R n, M m n, the space of m n matrices,... And many others! Elementary but important properties of vector spaces are the cancellation laws. Proposition: (Cancellation laws) Let X := (X, +, ) be a real vector space, x, y and z belong to X, and λ and γ belong to R. Then we have the following Cancellation laws: (i) If x + y = x + z, then y = z (ii) If λx = λy and λ 0, then x = y (iii) If λx = γx and x 0, then λ = γ Proof: See Exercise 1 To conclude this section, let us generalize another point with which you have become familiar when working with classical vector spaces: the relation between R, R 2, R 3, and so on... This relation is made formal through the concept of Cartesian product: Definition: (Cartesian product) Let X := (X, +, ) and Y := (Y, +, ) be two real vector spaces. We define the cartesian product of X and Y, denoted X Y as the collection of ordered pairs (x, y) with x element of X and y element of Y together with two operations: addition and scalar multiplication, defined respectively as (x 1, y 1 ) + (x 2, y 2 ) = (x 1 + x 2, y 1 + y 2 ) and λ(x, y) = (λx, λy). Remark: You may want to verify that the Cartesian product of two real vector spaces is itself a vector space. 6

3 Subspaces, Linear Combinations, and Linear Dependence Considering subsets of a universal or ambient set often proves useful in mathematics. For instance, we may like to consider only the integers and not the whole real line. It is possible to generalize the concept of a subset to the context of spaces (i.e. algebraically structured sets). If we are to do so, however, we do not want to loose the very structure we looked for when moving from the notion of a set to that of a space 5. I first introduce a concept that will help us to go on along this line, and then proceed with the notion of vector subspace. Definition: (Closure Under an Operation) Let X := (X, +, ) be a real vector space. We say that Y X is closed under the addition if and only if, for any two elements y 1 and y 2 in Y, we have that y 1 + y 2 belongs to Y. Similarly, we can define closure under scalar multiplication. Definition: (Vector Subspace) Let X := (X, +, ) be a real vector space and Y a non empty subset of X. We say that Y is a subspace of X if and only if Y is closed under vector addition and scalar multiplication. Notation: In what follows, I will generally denote a real vector space (X, +, ) by X. Remark: If you understood properly the idea of closure, you should be able to conclude that a simple way to check whether Y is a subspace of X or not is to verify or falsify the following statement: y 1, y 2 Y λ, µ R λy 1 + µy 2 Y This means that a subspace of a real vector space is a subset that contains any linear combinations of two of its elements! Keep that in mind! Any subspace of a vector space is itself a vector space. Further, note that the entire space X is a subspace of X as X is by definition a subset of itself. In the same way that we call a subset a proper subset whenever the inclusion is strict, a subspace not equal to the entire space is called a proper subspace. For instance, we may think of the the space of convergent infinite real sequences as a proper subspace of that of infinite real sequences mentioned above. 5 Remember, the structure will guarantee the valid extension of our geometrical insights! 7

Proposition: (Intersection and Addition of Subspaces) Let M and N be subspaces of a real vector space X. Then: (i) their intersection, M N, is a subspace of X. (ii) their sum, M + N, is a subspace of X. Proof: See Exercise 2 Remark: Note that nothing is said about the union!! (Counterexample?) Summing up, we have defined vector spaces, vector subspaces, and argued that any linear combination of vectors in a vector (sub)space also lie in that (sub)space. The next result establishes a converse proposition: linear combinations can be used to construct a subspace from an arbitrary set of vectors in a vector space. Proposition: (Generated Subspace (a.k.a Span)) Let Y be a subset of a real vector space X. Then, the set Span(Y), which consists of all vectors in X that can be expressed as linear combinations of vectors in Y, is a subspace of X. It is called the subspace generated by Y or span of Y and it is the smallest subspace which contains Y. Example: In chapter 1, section 7.4, we have seen that the equation Ax = b could be understood as a linear combination of A s columns. We have argued that, if the columns are linearly independent, then, for all b in R n, the equation Ax = b has a unique solution. The reason is that n linearly independent vectors will span the whole of R n! Had only m < n columns been independent, the space spanned by the columns of A, a.k.a. the column space of A, would have been a proper subspace of dimension m. The b s in R n but outside the column space of A would have been such that the equation Ax = b has no solution. Actually, the vector space terminology allows us to define more precisely the concept of linear independence that we informally mentioned until now. Definition: (Linear Dependence, Linear Independence) Let x be an element of a real vector space X. x is said to be linearly dependent upon a set S of vectors of X if it can be expressed as a linear combination of vectors from S. Equivalently, x is linearly dependent upon S if and only if x Span(S). If that is not the case, the vector x is said to be linearly independent of the set S. Finally, a set of vectors is said to be a linearly independent set if each vector of the set is linearly independent of the remainder of the set. 8

Thus, two vectors are linearly independent if they do not lie on a common line through the origin, three vectors are linearly independent if they do not lie in a plane through the origin, etc... Remark: Clearly, 0 is dependent on any given vector x (Why?). By convention, the set consisting of 0 only is understood to be a dependent set and a set consisting of a single nonzero vector an independent set. Theorem: (Testing Linear Independence) A necessary and sufficient condition for the set of vectors x 1, x 2,..., x n to be linearly independent is that: n If λ k x k = 0, then k = 1, 2,..., n λ k = 0 k=1 Proof: (Necessary part; By contradiction) Let the set of vectors x 1, x 2,..., x n be linearly independent and assume there is a λ k different from zero in the above sum. For simplicity, name the vectors so that this λ k be the one corresponding to the nth vector. Then, n n 1 λ k x k = 0 λ k x k = λ n x n Therefore, the following holds: k=1 Which contradicts our initial assumption. k=1 n 1 λ k x n = x k λ n k=1 Finally, the following definition will come in handy if you are to use tools of advanced calculus. Definition: (Basis and Space Dimension) A finite set S of linearly independent vectors is said to be a basis for the space X if S generates X. A vector space having a finite basis is said to be finite dimensional. All other vector spaces are said to be infinite dimensional. Example: Consider R 3 with coordinate axes x, y, and z. Then any 3-tuple of independent 3-dimensional vectors spans the whole space R 3. A particular example, known as the canonical basis, is the following set of three vectors: {(1, 0, 0); (0, 1, 0); (0, 0, 1)} 9

Theorem: (Uniqueness of the Dimension) Any two bases for a finite dimensional vector space contain the same number of elements. Proof: (The idea only) The result is quite intuitive. Assume it is not the case, i.e., one basis, say, basis 1, has n elements and another basis, say, basis 2, has m elements, with m n. Without loss of generality, assume m < n. Because basis 1 is a basis, you may express all elements of basis 2 as linear combinations of the elements of basis 1. And because each element of basis 2 is by definition independent, you can one by one substitute the first m elements of basis 1 by m elements of basis 2 so expressed. Thus, only m elements of basis 2 suffice to generate the whole subspace, and it must be that n = m. Example: Any pair of independent vectors in R n generates a plane, which is thus a finite dimensional space of dimension two. No finite collection of vectors will suffice to define C n (X), the set of all bounded and continuous real valued functions with domain X R n. It is thus an infinite dimensional space. 4 Affine and Convex Sets The concept of a subspace is fundamental. Yet, in many applications, this remains, geometrically speaking, too inadequate a concept, and we need to operate a trade off. While, on the one hand, we would be happy to keep as much as possible of the introduced algebraic structure, on the other hand, we must frequently work with subsets of vectors spaces which are not themselves subspaces and which we are not willing to replace with their span. For instance, in optimization, we most often face constraints on our vectors, constraints which define the subset of the ambient space over which we are maximizing. Of course, because these constraints arise from our real environment, there is a priori no reason that the region they define constitute a subspace, i.e. that it be closed under arbitrary linear combinations. And yet substituting the subspace spanned by the constraint set to the constraint set would precisely nullify our attempts to model the constraints imposed by the environment we try to model. In this section, we present two type of subsets which provide, in many a situation, a satisfactory answer to this trade-off. Note, because we loose on the algebraic dimension, we drop the label of space and come back to that of set. I hope you perceived that, in the previous section, we established a strong relation between linear combinations and subspaces. I will now present the concepts of affine combination, 10

which is fundamental to understand what are affine sets 6, and that of convex combination, which relates to convex sets. Both impose algebraic restrictions on the already discussed concept of linear combination. Each restriction is of course a loss on the algebraic dimension. But keep in mind that it is also the only way to gain in the geometrical dimension, that is, to enlarge our collection of workable sets. Indeed, instead of requiring from our sets that they contain all linear combinations of their elements, we will ask from them to contain only a specific subset of the linear combinations of their elements. Such an algebraic requirement being less demanding, our class of satisfying sets will be larger. 4.1 Affine Sets It is sometimes suggested 7 that a second advantage of vector spaces, stemming from the focus on the algebraic structure, is to free ourselves from the coordinate system. Instead of a specific coordinate system, vectors exists for themselves and become the main object of study. The next concept, which is sometimes described as an attempt to forget about the origin of the vector space, pushes the idea even further as it simply drops the requirement that our subset entails the origin of our ambient space (requirement (iii) in our definition of vector spaces). Definition: (Affine Sets) The translation a of a subspace is called an affine set. a A translation consists of moving an object from a point to another. magnitude of this move are usually specified by the mean of a vector. The direction, sense and This is a clear definition to expose the concept (see Figure 2). However, the following characterization via affine combinations proves more useful in applications: Definition: (Affine Combination) An affine combination of the vectors x 1, x 2,...,x n is a linear combination of the vectors, n i.e., a sum λ i x i, λ i R, such that the following additional requirement holds: i=1 n λ i = 1 i=1 6 Also often referred to as affine subspaces or linear varieties. Yet, as explained above, this is not strictly speaking a space anymore and we should therefore be wary of the first alternative denomination. The second alternative denomination could be misleading as well, as it uses the term linear while we will, in fact, be talking of sets which preserve affine (and not linear) combinations. 7 See e.g. Gross [3]. 11

Propostion: (Characterization of Affine Sets) Let Y be a subset of a vector space X. Y is affine if and only if it contains every affine combinations of its elements. Proof: (I just give the idea) Let Y be a set that contains every affine combination of its elements. Consider any element y 0 of Y. It can be shown that Ỹ := X y 0 = {y y 0 x Y} is a subspace. Further, it can be shown that the subspace thus defined does not depend on which y 0 you picked! Said otherwise, a set closed under affine combinations is the translation of a subspace. Figure 2: To every affine set is associated a unique subspace The other direction is straightforward, as any affine set is a translated subspace, the only property of vector spaces that is not fulfilled is that of containing the origin. Remark: Hence, to verify whether a set is affine or not, one has to make sure that any line going through two different points of the set is fully contained in it! Clearly affine sets conserve most of the properties of subspaces, and this probably explains why the terminology affine spaces is popular. Further many interesting sets of constraints actually define an affine set as our optimization region. Linear equality constraints constitute such an example, in the sense that the solution to a set of linear equations X A = {x Ax = b} where x belongs to X R n, A belongs to M m n and b belongs to R m. Technically, they are even more than a simple example. Rather, they constitute a third characterization of affine sets, as it can be shown that every affine set can be expressed as the solution set of a system of linear equations. 12

4.2 Convex Sets The strong relation between affine sets and sets of linear constraints directly points towards some limitations. For instance, constraints are often non-linear. But, as we will see in the next chapter, a very important class of non-linear constraints can be locally approximated by linear constraints, so that non-linearity need not be a too big problem. However, even when constraints are linear, a more serious issue is the existence often inequality (rather than or in addition to equality) constraints. The concept of convex sets helps us deal with that issue, at least if many relevant cases. It is associated with the concept of convex combination and arise naturally when the constraints of our optimization problem are convex functions 8. Definition: (Convex Combination) A convex combination of the vectors x 1, x 2,...,x n is a linear combination of the vectors, n i.e., a sum λ i x i, λ i R, such that the following additional requirements hold: i=1 n λ i = 1 and i λ i [0, 1] i=1 Propostion: (Characterization of Convex Sets) Let Y be a subset of a vector space X. Y is convex if and only if the convex combination between any two of its elements is contained in Y. Remark: Hence, to verify whether a set is convex or not, one has to make sure that any line segment going through two different points of the set is fully contained in it! Figure 3: Convex and non-convex sets 8 We ll define what that mean in the next chapter ;) 13

If a set is not convex, we may convexify it by adding the smallest possible amount of elements such that the resulting set is convex. This idea is very close to that of the span. When a subset is not a subspace, the minimal amount of points I have to add for it to become a subspace are those which guaranty the closure of the set under linear combinations. To make a non convex subset convex, I simply add those elements which guaranty the closure of the set under convex combinations. Definition: (Convex Hull) Let Y be the subset of a vector space X. The convex hull, denoted Co(Y) is the smallest convex set containing Y. The convex hull of Y may also be expressed as the set of all possible convex combinations of the elements of Y: { } n n Co(Y) = x X : y 1, y 2,, y n Y and λ [0, 1] n s.t. λ i = 1 and x = λ i y i i=1 i=1 Figure 4: Convex Hulls To conclude the subsection, here are some important properties that will, in some cases, help you decide on the convexity of a given set: Proposition: (Operations which preserve convexity) Let C be the collection of all convex sets in the ambient space X. Let C a be an arbitrary collection of convex sets. Then: (i) K C α R αk C a (ii) K, G C (iii) K Ca K C K + G C a For all k in K, let k α := αk. Then et αk := {k α X} Proof: See Exercise 4. 14

4.3 Important Examples: Hyperplanes and Half Spaces Please keep in mind that every subspace is an affine set, that every affine set is a convex set, but that the converses of these two statements are not true!! In this section we shall display two families of sets which play an important role in convex optimization theory. The first type of sets, called hyperplanes, is a family of affine sets and is thus also a family of convex sets. The second type, called halfspaces, constitute only a family of a convex sets. Formally, a hyperplane should be defined as a maximal proper affine set, that is, an affine set Y that is a strict subset of the ambient space X and is such that if any affine set V Y contains Y, then V = X. However, economist generally use a specific characterization of hyperplanes to define them, which is valid for classical vector spaces (i.e., Euclidean spaces). This characterization is very important on its own right and I use it too to define hyperplanes: Definition: (Hyperplane) Let X be a subspace of R n. Then, a hyperplane of X is a set of the form: H b a := {x X a x = b} where a is an element of R n that is different from 0 and b is an element of R. Figure 5: Hyperplanes in R and R 2 Geometrically, the hyperplane H b a := {x X a x = b} can be interpreted as the set of points whose inner product to a given vector a is constant, or, put otherwise as a plane 15

with normal vector a 9. The constant b determines the offset of the hyperplane from the origin. A hyperplane of a space of dimension n necessarily is a set of dimension (n 1), as its characterizing equation restricts only one degree of freedom: that spanned by the vector a. Further, one may wish to define an above and a below along the dimension spanned by a. This is the job of half-spaces. Definition: (Halfspace) Let X be a subspace of R n. Then, a halfspace of X is a set of the form: H b a := {x X a x b} or H b+ a := {x X a x b} where a is an element of R n that is different from 0 and b is an element of R. Put differently, a hyperplane incidentally defines two halfspaces. The halfspace determined by a x b is the halfspace extending in the direction of a and the halfspace determined by a x b is the halfspace extending in the direction of a. Figure 6: Halfspaces in R 2 9 a x = b a x b = 0 a (x y) = 0 where y is such that a y = b and also belongs to the hyperplane!! Reminder, a zero inner product indicates orthogonality! 16

5 Normed Vector Spaces and Continuity You may remember from your previous classes, if they involved some optimization problems, that you were usually considering continuous and, actually, differentiable functions. The reason for that is as follows. Assume one wants to find out the maximum of a given function. An obvious, yet inconvenient, way to proceed is to take every single element of the function s domain, to plug it into the function, to look at the output and to compare it to that associated with the other elements. Needless to say, no one seriously considers this procedure! So we have to find some kind of tools or tricks that will spare us all that tedious work. But, of course, a tool is generally designed for a specific kind of object. Certainly, you would not claim to efficiently open a bottle of wine with a beer opener. Well, the same applies to our optimization tools. They will work very well, but only on a specific class of functions. Namely, those which are continuous and differentiable. Therefore, it is important for us to know what continuity precisely means. If you remember the preliminary chapter, we associated continuity with the requirement that the images of two nearby points should not stand too far apart one from another. The purpose of this section is to make this statement formal, within the context of vector spaces. As the algebraic concepts we just discussed do not say anything specific about how close two objects are in the space, we introduce the concepts of distance and norm in a vector space. 5.1 Distance and Norm in a Vector Space Many basic mathematical concepts are very intuitive. For sure, the concept of a distance function is one of the most intuitive that could be. Consider two objects that stand nearby you, and ask yourself what properties you would like a distance function to have if it was to give you the distance between these two objects. You shall wish to set a conventional minimal distance, for cases where the two objects considered lie at the same place (i.e., are the same object). Thus, a first wish could be that the function always yields a non-negative output. Second, it seems natural to consider only such functions which give an output robust to changes in the sense of measurement: whether one starts from object 1 and measures all the way to object 2, or whether one proceeds the other way around, the output should be the same. Finally, a third natural requirement is the following: when asked (i) to measure the distance between object 1 and 2 and (ii) to measure the distance between object 1 and 2 while being imposed to pass by object 3, one should hope the outcome from (i) to be, in some sense, smaller than the outcome from (ii). As the following formal definition will show, these three properties are exactly what defines, in the eyes of mathematicians, a distance function. 17

Definition: (Metric Space) Let X be a vector space. If we can define a real-valued function d(.,.) which maps any two elements x and y in X into a real number d(x, y), and if that function is such that: (i) x, y X, d(x, y) 0, d(x, y) = 0 if and only if x = y, (ii) x, y X, d(x, y) = d(y, x), (iii) x, y, z X d(x, y) d(x, z) + d(z, y), (non-negativity) (symmetry) (triangle inequality) then d(.,.) is called a distance function a for X and (X, d(.,.)) a metric space. a a.k.a. metric. Remark: Note that the possibility to define such a function isn t guarantied under all circumstances, which implies a loss of generality. But do not worry to much about that, most economic applications can make use of metric spaces and if not, then generalizations exist! Example: An important example of a distance function in R is the absolute value of the difference: d(x, y) = x y. The attentive reader may claim to have been fooled. Indeed, I consciously passed over another important property which we would like distance functions to possess. Assume one picks the two objects you investigated earlier on and translate them in the same sense from 1 meter, i.e. move them from 1 meter in a parallel fashion and in the same sense. Would you expect the distance between the two objects to have changed? Certainly not. Well, this requirement is not imposed in the above definition, and there is thus no reason that it be fulfilled. (We will see an example at the end of this section!) One strategy to impose this further requirement is to define the distance via a norm. Let us detail that strategy. Definition: (Normed Space) Let X be a vector space. If we can define a real-valued function. which maps each element x in X into a real number x, and if that function is such that: (i) x X, x 0, x = 0 if and only if x = 0, (ii) x, y X, x + y x + y, (non-negativity) (triangle inequality) (iii) x X λ R, λx = λ x. (absolute homogeneity a ) Then. is called a norm for X and (X,. ) a normed space. a You ll understand in chapter 3 why it is called this way. Keep the name in mind! ;) Remark: From point (ii) it is easy to derive the following fact (do it!): x y x y 18

An important example for us is the Euclidean norm in R n : ( n ) 1/2 x = (x 1,..., x n ) R n x := = (x x) 1/2 But this is certainly not the only one. We can define, for instance, the norm of a matrix as follows: A M m n A := i=1 x 2 i max { Ax } x R n, x =1 Norms relates to distance functions as follows: on a given vector every norm defines a distance function! In other words, the existence of a norm is slightly more demanding than that of the distance function and, as a consequence, any normed vector space necessarily is a metric space (while the converse need not be true!). To get a bit of intuition, let us consider a distance function between any x in X and the zero element of X (i.e. the origin). By the above definition of a distance, we have: (i) x X, d(x, 0) 0, d(x, 0) = 0 if and only if x = 0, (ii) x X, d(x, 0) = d(0, x), (iii) x, y X d(x, y) d(x, 0) + d(0, y). Now the two concepts look even more similar! It is thus worth asking the following question: does d(., 0) define a norm in X? Clearly, the triangular inequality is fulfilled and so is the non-negativity property. Yet, nothing here guarantees the absolute homogeneity property, and, thus, unless we add some more requirements, d(., 0) need not define a norm in X. However, the following result can actually be shown: Proposition: (Norm vs. Distance) Let (X,. ) be a normed vector space. Then, the function d(x, y) = x y x, y X is a distance function in X. Further, d(.,.) exhibits the following extra properties: (i) x, y X λ R d(λx, λy) = λ d(x, y), (ii) x, y, z X d(x + z, y + z) = d(x, y). (absolute homogeneity) (translation invariance) Conversely, a metric d(.,.) that exhibits the extra properties of absolute homogeneity and translation invariance defines a norm by defining the norm of an element x as its distance from the origin. Example: (The French Railway Metric is not translation invariant) Consider the following example in R 2 : 19

d(x, y) = { x + y if x and y are independent x y if x and y are colinear with the bars denoting the euclidean norm. You should verify that it satisfies the three properties defining a distance function. Yet, it is not translation invariant since if we take, for instance, a strictly positive vector z in R and two independent strictly positive vectors x, y, then d(x + z, y + z) = x + z + y + z = x + y +2 z > d(x, y) For those who want a bit of intuition here, that distance function is simply imposed to pass by the origin when measuring the distance between two points that are not contained in a single ray from the origin. It is called the French Railway Metric because it used to be almost true that, in France, if you were to travel between two cities that are not contained in a single ray from Paris, then you was imposed to travel through Paris. See, for instance, the figure below, where to go from T (Toulouse) to B (Barcelona) you can proceed without going through Paris, while to go from T to B (Bordeaux), you need to go through Paris. Then, if one translates the origin to, say, Mannheim, the distance between Toulouse and Barcelona, or between Toulouse and Bordeaux, as measured by the French Railway metric, will change! Figure 7: The French Railway Metric is not Translation Invariant 5.2 Open sets, Closed sets, Compact sets All the statement of this subsection are assuming the ambient space to be a metric space, so of course they do hold in normed vector spaces! In our quest for a formal definition of continuity, the concepts of open and closed sets are the next necessary stop. So let us proceed. 20

Definition: (ε-open Ball) Let (X, d(.,.)) be a metric space, x 0 be an element of X, and ε be a strictly positive real number. The ε-open ball B ε (x 0 ) centered at x 0 is the set of points whose distance from x 0 is strictly smaller than ε, that is: B ε (x 0 ) = {x x X, d(x x 0 ) < ε} Definition: (ε-closed Ball) Let (X, d(.,.)) be a metric space, x 0 be an element of X, and ε be a strictly positive real number. The ε-closed ball B ε [x 0 ] centered on x 0 is the set of points whose distance from x 0 is smaller than or equal to ε, that is: B ε [x 0 ] = {x x X, d(x, x 0 ) ε} Example: You can see any open interval (a, b) with a, b R and a < b as a (b a)/2-open ball centered around (a + b)/2 in R. Similarly, any closed interval [a, b] with a, b R and a < b can be seen as a closed ball in R. Note that neither can be seen as closed or open balls if the universal space has a dimension higher than that of R! Coming back to geometrical intuition, wherein a set corresponds to a geographical area in the universal space, we can informally grasp the concepts of open and closed sets. Namely, an area may or may not have a boundary. Assume it has one. If that boundary is considered to belong to the area, then we call the area a closed area. If, on the countrary, the boundary is not considered to belong to the area, we call the area an open area. If part of the boundary belongs to the area while some other part does not, then the area is neither closed nor open! Finally, by convention, if no boundaries exist, as for the empty set of the universal set, the area is said to be both closed and open. The next definitions use the concepts of open and close balls to formalize these ideas. Definition: (Interior Point, Interior) Let A be a subset of a metric space X. The point a in A is said to be an interior point of A if and only if there exists ε > 0 such that the ε-open ball centered at a lies entirely inside A. The collection of all interior points of A is called the interior of A, denoted Int(A) or Å Definition: (Open Set) Let A be a subset of a metric space X. A is said to be an open set if an only if A =Int(A) Hence, any set A contains its interior, but the converse is true if and only if A is open. 21

Definition: (Closure Point, Closure) Let A be a subset of a metric space X. The point x in X is said to be a closure point of A if and only if, for every ε > 0, the ε-open ball centered at x contains at least one point a that belongs to A. The collection of all closure points of A is called the closure of A, denoted Ā Definition: (Closed Set) Let A be a subset of a metric space X. A is said to be a closed set if an only if A = Ā Hence, any set A is included in its closure, but the converse is true if and only if A is closed. We now can characterize the boundary as the a set of elements such that, if they all belong to A, then A is closed, and, if none of them belong to A, then A is open. Definition: (Boundary Point, Boundary) Let A be a subset of a metric space X. The point x in X is said to be a boundary point of A if and only if, for every ε > 0, the ε-open ball centered on x contains at least one point a that belongs to A and at least one point a c that belongs to the complement of A, A C. The collection of all boundary points of A is called the boundary of A and denoted A We may now rephrase our concepts of open and closed sets as follows: Let A be a subset of a metric space X. Then A is open if and only if none of the boundary points of A lie in A: A A =. Let A be a subset of a metric space X. Then A is closed if and only if all the boundary points of A lie in A: A A = A. Figure 8: Closed sets, open sets, boundary Three last results which will help you make sure whether some set is open or closed. Some intuition will be given for them via examples in the exercises. 22

Theorem: (Properties of Open sets) Let (X, d(.,.)) be a metric space. Then (i) and X are open in X. (ii) A set A is open if and only if its complement is closed. (ii) The union of an arbitrary (possibly infinite) collection of open sets is open. (iii) The intersection of a finite collection of open sets is open. Theorem: (Properties of Closed sets) Let (X, d(.,.)) be a metric space. Then (i) and X are closed in X. (ii) A set A is closed if and only if its complement is open. (iii) The union of a finite collection of closed sets is closed. (iv) The intersection of an arbitrary (possibly infinite) collection of closed sets is closed. Fact: (Interior and Closure of Convex Sets) Let A be a subset of a metric space X. If A is a convex set, then so are Ā and Int(A). 5.3 Continuity We are finally ready to formally define continuity, one of the most fundamental requirements for the application of our optimization toolbox. Fact: (Continuous Function) A function mapping from a metric space (X, d X (.,.)) to a metric space (Y, d Y (.,.)) is continuous at x 0 X if and only if, for every ε > 0, there is a δ > 0 such that if d X (x, x 0 ) < δ, then d Y (f(x), f(x 0 )) < ε. A function that is continuous at every point of its domain is said to be continuous. According to this definition, a function is continuous at some point x 0 if, for any element contained in a δ-open ball around x 0, we can be sure that the images of both this element and x 0 are (i) well defined, and (ii) contained in a ε-open ball around f(x 0 ), where ε can be chosen as small as we wish by just selecting a small enough δ. In fact, one can also generalize the useful characterization of continuity that we have seen in the preliminary chapter, i.e., the characterization in terms of converging sequences. This is done by first generalizing the notion of convergence to the context of metric spaces, which is achieved rather easily by noting that, independently of our space s dimension, the distance function maps into the real line! 23

Definition: (Convergence) Let X be a metric space. The infinite sequence of vectors {x n } n N is said to converge to a vector x if the sequence {d(x n, x)} n N of real numbers converges to 0. That is, ε > 0 N N n > N d(x n, x) < ε In this case, we write x n x. The fact that the limit of a converging sequence is unique in a metric space is easily shown using the nonnegativity and triangle inequality properties of a metric. Suppose x n x and x n y, then n N 0 d(x, y) d(x, x n ) + d(x n, y) and the convergence of the left and right hand side suffices to establish the result (squeeze theorem a.k.a. sandwich theorem 10 ). Proposition (Characterization of Continuity) A function mapping from a metric space (X, d X (.,.)) to a metric space (Y, d Y (.,.)) is continuous at x 0 X if and only if x n x implies f(x n ) f(x). 10 If you have a sandwich inequality a x b and a and b converge to the same limit, then x converges to that same limit too. Please note that I approve much more of the French name of that theorem: the cops theorem! 24

A Appendix 1: The Separating and Supporting Hyperplane Theorems With the concepts of hyperplane, halfspace, and convex set are associated two extremely important results for convex optimization. Both of these results are geometrically very intuitive, and their proof is way beyond the scope of this lecture. Therefore, I only illustrate them. Theorem: (Separating Hyperplane Theorem V.1) Let C be a convex set in a metric space X, with non-empty interior, and x be an element of X not in Int(C). Then, there is a hyperplane in X containing x but no interior point of C. Figure 9: Separating Hyperplane Theorem Definition (Supporting Hyperplane) Let C be a subset of a metric space X and x 0 be a point in its boundary C. If a 0 is an element of X that satisfies a x a x 0 for all x in C, then the hyperplane {x a x = a x 0 } is called a supporting hyperplane to C at x 0. Figure 10: Supporting Hyperplane Theorem 25

Theorem (Supporting Hyperplane Theorem) Let C be a convex subset of a metric space X. If Int(C) is non-empty and x 0 is a point in C, then there exists a supporting hyperplane at x 0. Remark: There is also a partial converse to the supporting hyperplane theorem. Namely, if a set is closed, has a non-empty interior, and has a supporting hyperplane at every point in its boundary, then it is convex. Figure 11: A nonempty closed set is convex if and only if it possesses a supporting hyperplane at every point of its boundary! The supporting hyperplane theorem allows us to state a new version of the separating hyperplane theorem, which will actually prove most useful. Theorem: (Separating Hyperplane Theorem V.2) Let C and D be two convex sets in a metric space X. Further, assume C D =. Then, there exists a 0 in R n and b in R such that for all x in C a x b and for all x in D a x b. In other words, the affine function a x b is nonpositive on C and nonnegative on D. The hyperplane {x X a x = b} is called a separating hyperplane for the sets C and D. Figure 12: Separating Hyperplane Theorem 26

Remark: Please note that the converse of this theorem is not true unless some further requirements are added! That is, the existence of a separating hyperplane between two convex sets C and D does not imply that C and D do not intersect. (Consider for instance the degenerate case C = D = {0}.) 27

References [1] Boyd, S., and Vandenberghe, L. Convex optimization. Cambridge university press, 2009. [2] Gross, H. Res.18.006 calculus revisited: Single variable calculus (mit opencourseware: Massachusetts institute of technology), 2010. [3] Gross, H. Res.18-008 calculus revisited: Complex variables, differential equations, and linear algebra. (mit opencourseware: Massachusetts institute of technology), 2011. [4] Luenberger, D. G. Optimization by vector space methods. John Wiley & Sons, 1969. 28