Lecture: Quadratic optimization

Lecture: Quadratic optimization 1. Positive definite och semidefinite matrices 2. LDL T factorization 3. Quadratic optimization without constraints 4. Quadratic optimization with constraints 5. Least-squares problems Lecture 1 Quadratic optimization

Quadratic optimization without constraints minimize 1 2 xt Hx+c T x+c 0 where H = H T R n n, c R n, c 0 R. Common in applications, e.g., linear regression (fitting of models) minimization of physically motivated objective functions such as minimization of energy, variance etc. quadratic approximation of nonlinear optimization problems. Lecture 2 Quadratic optimization

The quadratic term Let f(x) = 1 2 xt Hx+c T x+c 0 where H = H T R n n, c R n, c 0 R. We can assume that the matrix H is symmetric. If H is not symmetric it holds that x T Hx = 1 2 xt Hx+ 1 2 (xt Hx) T = 1 2 xt (H+H T )x, i.e., x T Hx = x T Hx where H = 1 2 (H+HT ) is a symmetric matrix. Lecture 3 Quadratic optimization

Positive definite and positive semi-definite matrices Definition 1 (Positive semidefinite). A symmetric matrix H = H T R n n is called positive semidefinite if x T Hx 0 for all x R n Definition 2 (Positive definite). A symmetric matrix H = H T R n n is called positive definite if x T Hx > 0 for all x R n {0} Lecture 4 Quadratic optimization

How do you determine if a symmetric matrix H = H T R n n is positive (semi-) definite: LDL T factorization (suitable for calculations by hand). Chapter 27.6 in the book. Use the spectral factorization of the matrix H = QΛQ T, where Q T Q = I and Λ = diag(λ 1,...,λ n ) where λ k are the eigenvalues of the matrix. From this it follows that H is positive semidefinite if and only if λ k 0, k = 1,...,n H is positive definite if and only if λ k > 0, k = 1,...,n Lecture 5 Quadratic optimization

LDL T factorization Theorem 1. A symmetric matrix H = H T R n n is positive semidefinite if and only if there is a factorization H = LDL T where 1 0 0... 0 d 1 0 0... 0 l 21 1 0... 0 0 d 2 0... 0 L = l 31 l 32 1... 0 D = 0 0 d 3... 0........ l n1 l n2 l n3... 1 0 0 0... d n and d k 0, k = 1,...,n. H is positive definite if and only if d k > 0, k = 1,...,n, in the LDL T factorization. Proof: See chapter 27.9 in the book. Lecture 6 Quadratic optimization

The factorization LDL T is determined by one of the following methods Elementary row and column operations, Completion of squares, Lecture 7 Quadratic optimization

LDL T factorization - Example Determine the LDL T factorization of 2 4 6 8 4 9 17 22 H = 6 17 44 61 8 22 61 118 The idea is now to perform symmetric row- and column-operations from the left and right in order to form the diagonal matrix D. The row-operations can be performed with lower triangular invertible matrices, and the product of lower triangular matrices is lower triangular and also the inverse. Lecture 8 Quadratic optimization

LDL T factorization - Example Perform row-operations to get zeros in the first column: 1 0 0 0 2 4 6 8 2 4 6 8 2 1 0 0 4 9 17 22 0 1 5 6 l 1 H = = 3 0 1 0 6 17 44 61 0 5 26 37 4 0 0 1 8 22 61 118 0 6 37 86 and the corresponding column-operations: 2 4 6 8 1 2 3 4 l 1 Hl T 0 1 5 6 0 1 0 0 1 = 0 5 25 30 0 0 1 0 0 6 30 35 0 0 0 1 = 2 0 0 0 0 1 5 6 0 5 26 37 0 6 37 86 Lecture 9 Quadratic optimization

LDL T factorization - Example Perform row-operations to get zeros in the second column: 1 0 0 0 2 0 0 0 2 0 0 0 l 2 l 1 Hl T 0 1 0 0 0 1 5 6 0 1 5 6 1 = = 0 5 1 0 0 5 26 37 0 0 1 7 0 6 0 1 0 6 37 86 0 0 7 50 and the corresponding column-operations: 2 0 0 0 1 0 0 0 l 2 l 1 Hl T 1l T 0 1 5 6 0 1 5 6 2 = 0 0 1 7 0 0 1 0 0 0 7 50 0 0 0 1 = 2 0 0 0 0 1 0 0 0 0 1 7 0 0 7 50 Lecture 10 Quadratic optimization

LDL T factorization - Example Perform row-operations to get zeros in the third column: 1 0 0 0 2 0 0 0 2 0 0 0 l 3 l 2 l 1 Hl T 1l T 0 1 0 0 0 1 0 0 0 1 0 0 2 = = 0 0 1 0 0 0 1 7 0 0 1 7 0 0 7 1 0 0 7 50 0 0 0 1 and the corresponding column-operations: 2 0 0 0 1 0 0 0 l 3 l 2 l 1 Hl T 1l T 2l T 0 1 0 0 0 1 0 0 3 = 0 0 1 7 0 0 1 7 0 0 0 1 0 0 0 1 = 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 = D Lecture 11 Quadratic optimization

LDL T factorization - Example We finally got the diagonal matrix 2 0 0 0 0 1 0 0 D = 0 0 1 0 0 0 0 1 Since all elements on the diagonal of D are positive the matrix H is positive definite. The values on the diagonal are not the eigenvalues of H, but they have the same sign as the eigenvalues, and that is all we need to know. Lecture 12 Quadratic optimization

LDL T factorization - Example How do we determine the L matrix? l 3 l 2 l 1 Hl T 1l T 2l T 3 = D, i.e., H = LDL T if L = (l 3 l 2 l 1 ) 1 = l 1 1 l 1 2 l 1 3 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 1 0 0 0 1 0 0 L = 3 0 1 0 0 5 1 0 0 0 1 0 4 0 0 1 0 6 0 1 0 0 7 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 1 0 0 0 1 0 0 2 1 0 0 = = 3 0 1 0 0 5 1 0 0 0 1 0 3 5 1 0 4 0 0 1 0 6 0 1 0 0 7 1 4 6 7 1 Lecture 13 Quadratic optimization

Unbounded problems Let f(x) = 1 2 xt Hx+c T x+c 0. Assume that H is not positive semidefinite. Then there is a d R n such that d T Hd < 0. It follows that f(td) = 1 2 t2 d T Hd+tc T d+c 0 when t Conclusion: A necessary condition for f(x) to have a minimum is that H is positive semidefinite. Lecture 14 Quadratic optimization

Assume that H is positive semidefinite. Then we can use the factorization H = LDL T which gives minimize 1 2 xt Hx+c T x+c 0 = minimize 1 2 xt LDL T x+c T (L T ) 1 L T x+c 0 = minimize = c 0 + n k=1 1 2 d ky 2 k + c k y k +c 0 n minimize 1 2 d kyk 2 + c k y k k=1 where we have introduced the new coordinates y = L T x and c = L 1 c. The optimization problem separates to n independent scalar quadratic optimization problems, which are easy to solve. Lecture 15 Quadratic optimization

Scalar Quadratic optimization without constraints The scalar quadratic optimization problem has a finite solution, if and only if, h = 0 and c = 0, or h > 0. minimize x R 1 2 hx2 +cx+c 0 In the first case there is no unique optimum. In the second case 1 2 hx2 +cx+c 0 = 1 2 h(x+c/h)2 +c 0 c 2 /(2h), and the optimum is achieved for x = c/h. Lecture 16 Quadratic optimization

Solving the minimization problems separately shows that [ ] T ŷ = ŷ 1... ŷ n is a minimizer if (i) d k 0, k = 1,...,n (ii) d k ŷ k = c k (i) H is positive semidefinite (ii) DL Tˆx = L 1 c Constraint (ii) can be replaced with LDL Tˆx = c i.e., Hˆx = c. Furthermore, f(x) = 1 2 xt Hx+c T x+c 0 is unbounded (from below) if (i) some d k < 0, or (ii) one of the equations d k ŷ k = c k do not have a solution i.e., d k = 0 and c k 0 (i) H is not positive semi-definite (ii) the equation Hˆx = c do not have a solution Lecture 17 Quadratic optimization

The above arguments can be formalized as the following theorem Theorem 2. Let f(x) = 1 2 xt Hx+c T x+c 0. Then ˆx is a minimizer if (i) H is positive semidefinite (ii) Hˆx = c If H is positive definite then ˆx = H 1 c. If H is not positive semidefinite or Hx = c do not have a solution, then f is not limited from below. Comment 1. The condition that Hx = c has a solution is equivalent to c R(H). Therefore, f is a minimizer if and only if H is positive semidefinite and c R(H). Lecture 18 Quadratic optimization

Quadratic optimization with linear constraints 1 minimize 2 xt Hx+c T x+c 0 s.t. Ax = b (1) where H = H T R n n, A R m n, c R n, and b R m, and c 0 R. We assume that b R(A) and n > m, i.e., N(A) {0}. Assume that N(A) = span{z ] 1,...,z k } and define the nullspace matrix Z = [z 1... z k. If A x = b it holds that an arbitrary solution to the linear constraint has the form x = x+zv, for some v R k. This follows since A( x+zv) = A x+azv = b+0 = b Lecture 19 Quadratic optimization

With x = x+zv inserted in f(x) = 1 2 xt Hx+c T x+c 0 we get f(x) = 1 2 vt Z T HZv+ ( Z T (H x+c) ) T v+f( x). The minimization problem (1) is thus equivalent with minimize 1 2 vt Z T HZv+ ( Z T (H x+c) ) T v+f( x). We can now apply Theorem 2 to obtain the following result: Theorem 3. ˆx is an optimal solution to (1) if (i) Z T HZ is positive semidefinite. (ii) ˆx = x+zˆv where Z T HZˆv = Z T (H x+c) Lecture 20 Quadratic optimization

Comment 2. The second condition in the theorem is equivalent with the existence of a û R m such that H AT A 0 ˆx = û c b Proof: The second constraint in the theorem can be written Z T (H( x+zˆv)+c) = 0 Since N(A) = R(Z) this is equivalent with H( x+zˆv }{{} )+c N(A) = R(A T ) ˆx Hˆx+c = A T û, for some û R m Lecture 21 Quadratic optimization

Linear constrained QP: Example Let f(x) = 1 2 xt Hx+c T x+c 0, where H = 2 0, c = 0 1 1, c 0 = 0. 1 Consider minimization of f over the feasible region F = {x Ax = b} where A = [ ] 2 1, b = 2. I.e. the minimization problem min x2 1 + 1 2 x2 2 +x 1 +x 2 s.t. 2x 1 +x 2 = 2 Lecture 22 Quadratic optimization

Linear constrained QP: Example - Nullspace method The nullspace of A is spanned by Z and the reduced Hessian is given by Z = 1, H = Z T HZ = 2 2 Since the reduced Hessian is positive definite the problem is strictly convex. The unique solution is then given by taking an arbitrary x F, e.g. x = (1 0) T and solving Then ˆx = x+zˆv = 1 + 0 Hˆv = c = Z T (H x+c) = 3 ˆv = 3 2 1 ( 3 2 2 ) = 5 2 3 Lecture 23 Quadratic optimization

Linear constrained QP: Example - Geometric illustration 4 3 2 Z F 1 0 x 1 3 2 Z 2 3 ˆx 4 4 3 2 1 0 1 2 3 4 Lecture 24 Quadratic optimization

Linear constrained QP: Example - Lagrange method Assume that we know the problem is convex, then H AT x = c 2 0 2, 0 1 1 A 0 u b 2 1 0 x 1 x 2 u = 1 1 2 Solving 2x 1 2u = 1 x 2 u = 1 x 1 = 1 2 u x 2 = 1+u, then 2x 1 +x 2 = 2 implies 2( 1 u)+( 1+u) = u = 2, so u = 2 2 and then x 1 = 5/2 and x 2 = 3. As for the nullspace method. Lecture 25 Quadratic optimization

Linear constrained QP: Example - Sensitivity analysis From the Lagrange method we know that u = 2, it tells us how the objective function changes due to variations in the right hand side b of the linear constraint, in first approximation. Assume that b := b+δ, then from the previous calculations u = (2+δ) and then x 1 = 5/2+δ and x 2 = 3 δ. Then f(x) = 1 2 5 +δ 2 3 δ 2 0 5 +δ 2 + 1 0 1 3 δ 1 T = 9 4 2δ 1 2 δ2 5 +δ 2 3 δ T Lecture 26 Quadratic optimization

Example 2: Analysis of a resistance circuit I u 1 x 12 x 13 v 13 r 13 u 3 x 34 x 35 v 35 v 12 r 12 r 23 r 34 r 35 v 34 u 5 x 23 v 23 r 45 r 24 u 2 x 24 v 24 u 4 x 45 v 45 Let a current I go through the circuit from node 1 to node 5. What is the solution to the circuit equation? Lecture 27 Quadratic optimization

The current x ij goes from node i to node j if x ij > 0. Kirchoff s currentlaw gives This can be written Ax = b where x = x 12 +x 13 = i x 12 x 23 x 24 = 0 x 13 +x 23 x 34 x 35 = 0 x 24 +x 34 x 45 = 0 x 35 +x 45 = i [ x 12 x 13 x 23 x 24 x 34 x 35 x 45 ] T, 1 1 0 0 0 0 0 I 1 0 1 1 0 0 0 0 A = 0 1 1 0 1 1 0, b = 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 I Lecture 28 Quadratic optimization

The Node potentials are denoted u j, j = 1,...,5 The voltage drop over the resistor r ij is denoted v ij and is given by the equations This can also be written v ij = u i u j v ij = r ij x ij (Ohms law) v = A T u v = Dx where [ ] T ] T u = u 1 u 2 u 3 u 4 u 5 v = [v 12 v 13 v 23 v 24 v 34 v 35 v 45 D = diag(r 12,r 13,r 23,r 24,r 34,r 35,r 45 ) Lecture 29 Quadratic optimization

The circuit equations: Ax = b v = A T u v = Dx D AT A 0 x = 0 b Since D is positive definite (r ij > 0) this is, see comment 2, equivalent to the optimality conditions for the following optimization problem 1 minimize 2 xt Dx s.t. Ax = b Lecture 30 Quadratic optimization

Comment In the lecture on network optimization we saw that the matrix N(A) 0 (then the matrix was called Ã) and that the nullspace corresponded to cycles in the circuit. The laws of electricity (Ohms law + Kirchoffs law) determines the current that minimizes the loss of power. This follows since the objective function can be written 1 2 xt Dx = 1 2 r ij x 2 ij ij Lecture 31 Quadratic optimization

Comment 3. It is common to connect one of the nodes to earth. Let us, e.g., connect node 5 to earth, then u 5 = 0. I u 1 x 12 x 13 v 13 r 13 u 3 x 34 x 35 v 35 v 12 r 12 x 23 r 23 v 23 r 34 r 35 v 34 r 45 u 5 = 0 u 2 x 24 r 24 v 24 u 4 x 45 v 45 Lecture 32 Quadratic optimization

With the potential in node 5 fixed at u 5 = 0 the last column in the voltage equation v = A T u is eliminated. The Voltage equation and the current balance can be replaced with v = Ā T ū and Āx = b where 1 1 0 0 0 0 0 I u 1 1 0 1 1 0 0 0 0 u 2 Ā =, b =, ū = 0 1 1 0 1 1 0 0 u 3 0 0 0 1 1 0 1 0 The advantage with this is that the rows of Ā are linearly independent. This facilitates the solution of the optimization problem. u 4 Lecture 33 Quadratic optimization

Least-Squares problems Application: Linear regression (model fitting) e(t) s(t) y(t) System Problem: Fit a linear regressionsmodel to measured data. n Regressionmodel : y(t) = α j ψ j (t) ψ k (t), k = 1,...,n are the regressors (known functions) α k = k = 1,...,n are the model parameters (to be determined) e(t) measurement noise (not known) s(t) observations. j=1 Lecture 34 Quadratic optimization

Idéa for fitting the model: Minimize the quadratic sum of the prediction errors minimize 1 m n ( α j ψ j (t i ) s(t i )) 2 2 =minimize 1 2 i=1 j=1 m (ψ(t i ) T x s(t i )) 2 i=1 =minimize 1 2 (Ax b)t (Ax b) (2) ψ(t) = ψ 1 (t)., x = ψ n (t) α 1. α n, A = ψ(t 1 ) T., b = ψ(t n ) T s(t 1 ). s(t n ) Lecture 35 Quadratic optimization

The solution of the Least-Squares problem (LSQ) The Least-Squares problem (2) can equivalently be written minimize 1 2 xt Hx+c T x+c 0 where H = A T A, c = A T b, c 0 = 1 2 bt b. We note that H = A T A is positive semidefinite since x T A T Ax = Ax 2 0. c R(H) since c = A T b R(A T ) = R(A T A) = R(H). The conditions in Theorem 2 and Comment 1 are satisfied and it follows that the LSQ-estimate is given by Hˆx = c A T Aˆx = A T b (The Normal equation) Lecture 36 Quadratic optimization

Linear algebra interpretation N(A) N(A T ) R n A R m x n x b Aˆx ˆx = A T u R(A T ) Aˆx R(A) The Normal equation can be written A T (b Aˆx) = 0 b Aˆx N(A T ) This orthogonality property has a geometric interpretation which is illustrated in the figure above. The Fundamental Theorem of Linear algebra explains the LSQ solution geometrically. Lecture 37 Quadratic optimization

Uniqueness of the LSQ solution Case 1 If the columns in A are linearly independent,i.e., N(A) = {0}, it holds that H = A T A is positive definite and hence invertible. The Normal equations then has the unique solution ˆx = (A T A) 1 Ab Case 2 If A has linearly dependent columns, i.e., N(A) {0} then it is natural to chose the solution of smallest length. From the figure on the previous slide it is clear that we should let ˆx = A T û, where AA T û = A x, and where x is an arbitrary solution to the LSQ problem. Lecture 38 Quadratic optimization

Our choice of ˆx in case 2 can equivalently be interpreted as the solution to the quadratic optimization problem 1 minimize 2 xt x s.t. Ax = A x According to Comment 2 the solution to this optimization problem is given by the solution to I AT A 0 i.e. ˆx = A T û, where AA T û = A x ˆx = û 0 A x Lecture 39 Quadratic optimization

Läsanvisningar Quadratic constraints without constraints: Chapter 9 and 27 in the book. Quadratic optimization with constraints: Chapter 10 in the book. Least-Squares method: Chapter 11 in the book. Lecture 40 Quadratic optimization