FALL 2018 MATH 4211/6211 Optimization Homework 1 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution manual). Copying others solutions or programs is strictly prohibited and will result in grade of 0 to all involved students. Please type your answers in Latex and submit a single PDF file on icollege before due time. Please do not include your name anywhere in the submitted PDF file. Instead, name your PDF file as hw1 123456789.pdf (replace 123456789 by your own Panther ID number). It is recommended to use notations that are consistent to lectures. By default all vectors are treated as column vectors. 1
Problem 1. (1 point) Let x = (x 1,..., x n ) T R n, then the p-norm (p 1) of x is defined by x p = ( n x i p ) 1/p. The standard Euclidean norm is x 2 (often denoted by x without subscript 2). Prove the following statements. x = x T x for all x R n ; x x 1 for all x R n. For MATH 6211, also prove x 1 n x for all x R n ; Proof. By the definition of the standard Euclidian norm x = x 2 = n x i 2. On the other hand, the inner product of x and itself is x T x = (x 1 ) 2 + (x 2 ) 2 +... + (x n ) 2 = n x i 2. Hence we have x = x T x. Proof. Since x 1 = n x i, we have ( 2 x 2 1 = x i ) = x i 2 + x i x j i j x i 2 = x. Therefore x 1 x. Moreover, there is 2 x i x j x i 2 + x j 2 for any x i, x j R, and we have x i x j = 2 x i x j ( x i 2 + x j 2 ) = (n 1) i<j i<j i j x i 2, where the last equality is because each x i 2 appears n 1 times due to the sum. Therefore, we have x 2 1 = x i 2 + x i x j n x i 2 = n x 2, i j from which it follows that x 1 n x. 2
Problem 2. (1 point) Let x = (x 1,..., x n ) T R n, and f : R n R. Recall the following definitions: The gradient of f at x is defined by f(x) = ( f x 1,..., f x n ) T R n ; The Hessian of f at x is 2 f(x) = x 2 1 x 1 x 2. x 1 x n x 2 x 1 x 2 2. x n x 2 x n x 1 x n x 2.... x 2 n R n n The Taylor expansion of f(y) at a given point x up to the second order term is f(y) = f(x) + f(x) T (y x) + 1 2 (y x)t 2 f(x)(y x) + o( y x 2 ) Find the gradient and the Hessian of the function f defined below at point x = (0, 1) T R 2. f(x) = (x 1 x 2 ) 4 + x 2 1 x 2 2 2x 1 + 2x 2 + 1 In addition, find the Taylor expansion of f at x = (0, 1) T up to the second order term. We first compute the partial derivatives: x 2 ) 3 2x 2 + 2. Therefore the gradient is f(x) = f x 1 (x) = 4(x 1 x 2 ) 3 + 2x 1 2, ( f (x), f ) T [ 4(x1 x (x) = 2 ) 3 ] + 2x 1 2 x 1 x 2 4(x 1 x 2 ) 3. 2x 2 + 2 Plugging in x = (x 1, x 2 ) T = (0, 1) T, we obtain f((0, 1) T ) = ( 6, 4) T. f x 2 (x) = 4(x 1 We compute the second-order partial derivatives / x i x j to obtain the Hessian matrix [ 2 12(x1 x f(x) = 2 ) 2 + 2 12(x 1 x 2 ) 2 ] 12(x 1 x 2 ) 2 12(x 1 x 2 ) 2 2 Plugging in x = (x 1, x 2 ) T = (0, 1) T, we obtain the Hessian of f at x = (0, 1) T as [ ] 2 f((0, 1) T 14 12 ) =. 12 10 Note that f at x = (0, 1) T is f((0, 1) T ) = 3. The Taylor expansion of f(y) at x is then f(y) = f(x) + f(x) T (y x) + 1 2 (y x)t 2 f(x)(y x) + o( y x 2 ) ( ) y1 = 3 + ( 6, 4) + 1 [ ] ( ) 14 12 y 2 1 2 (y y1 1, y 2 1) + o( y 12 10 y 2 1 1 2 + y 2 1 2 ) = 7y 2 1 12y 1 y 2 + 5y 2 2 + 6y 1 6y 2 + 4 + o( y 1 2 + y 2 1 2 ). 3
Problem 3. (1 point) Let A R m n and b R m be given. Define function f : R n R by f(x) = Ax b 2. Find the expressions of the quantities below using A, x, and b. f(x); 2 f(x). Denote the matrix A = [a ij ] R m n (i.e., an m-by-n matrix with a ij as the (i, j)th entry) and b = (b 1,..., b m ) T R m. Also denote y := Ax b R m where y i = n j=1 a ijx j b i for i = 1,..., m. Note that f(x) = Ax b 2 = y 2 = m ( n j=1 a ijx j b i ) 2. Hence we have f/ x j = m 2a ij( n j=1 a ijx j b i ), which is the inner product of 2(a 1j,..., a mj ) T (2 multiplies the jth column of A) and y. By stacking the partial derivatives f/ x j for j = 1,..., n, we obtain the gradient as f(x) = 2A T y = 2A T (Ax b). We compute the second order partial derivatives to obtain the (k, j)th entry of the Hessian as 2 f x k x j = m 2a ija ik for k, j = 1,..., n. Therefore the Hessian matrix is 2 f(x) = [ 2 f x k x j ] = 2[ m a ija ik ] = 2A T A. 4
Problem 4. (1 point) Show that for any matrix A R m n and vector b R m, the set {x R n : Ax = b} is convex. Proof. Denote C = {x R n : Ax = b}. For any x, y C, there is Ax = Ay = b. For any θ [0, 1], we hence have A(θx+(1 θ)y) = θax+(1 θ)ay = θb+(1 θ)b = b, which means θx+(1 θ)y C. By the definition of convex sets, we know C is convex. 5
Problem 5. (1 point) Show that the set {x R n : x r} is convex, where r > 0 is a given real number. Proof. Denote C = {x R n : x r}. For any x, y C, there is x, y r. For any θ [0, 1], we hence have θx + (1 θ)y θx + (1 θ)y = θ x + (1 θ) y θr + (1 θ)r = r where we used the triangle inequality of norms to obtain the first inequality. The result above implies θx + (1 θ)y C, and hence C is convex. 6
Problem 6. (1 point) Let C R n be a convex set, and f : C R be a convex function. Prove that the following statements hold for any k 2, x 1,..., x k C,,..., 0, and + θ 2 + + = 1: x 1 + θ 2 x 2 + + x k C; f( x 1 + θ 2 x 2 + + x k ) f(x 1 ) + θ 2 f(x 2 ) + + f(x k ). Hint: use induction on k. Proof. If k = 2, we know the statement holds since C is a convex set. Assume the statement holds for k (induction hypothesis). Then we consider z := x 1 + θ 2 x 2 + + x k + +1 x k+1 where k+1 θ i = 1 and θ i 0 for i = 1,..., k + 1. If +1 = 0, then z = x 1 + θ 2 x 2 + + x k C due to the induction hypothesis. If +1 (0, 1], then we have Note that θ i z = x 1 + θ 2 x 2 + + x k + +1 x k+1 ( = (1 +1 ) x 1 + + 1 +1 0 for all i = 1,..., k and 1 +1 x k ) + +1 x k+1. + + = + + = 1 +1 = 1, 1 +1 1 +1 1 +1 1 +1 we know that x 1 + + x k C due to the induction hypothesis. Then it follows θ that z C since it is a convex combination of 1 x 1 + + x k and x k+1. Therefore the statement holds for k + 1. By induction we know the statement holds for all k 2. Proof. If k = 2, we know the statement holds since f is a convex function. Assume the statement holds for k (induction hypothesis). Then we again consider z := x 1 + θ 2 x 2 + + x k + +1 x k+1 where k+1 θ i = 1 and θ i 0 for i = 1,..., k + 1. If +1 = 0, then the statement holds due to the induction hypothesis. If +1 (0, 1], then we have f(z) = f( x 1 + θ 2 x 2 + + x k + +1 x k+1 ) ( ( ) ) = f (1 +1 ) x 1 + + x k + +1 x k+1 1 +1 1 +1 ( ) (1 +1 )f x 1 + + x k + +1 f(x k+1 ) 1 +1 1 +1 (1 +1 ) k+1 = θ i f(x i ), k θ i 1 +1 f(x i ) + +1 f(x k+1 ) where the first inequality above is due to the convexity of f, and the second inequality above is due to the induction hypothesis. This means the statement holds for k + 1 as well. Therefore by induction we know the statement holds for all k 2. 7