Lecture: Quadratic optimization 1. Positive definite och semidefinite matrices 2. LDL T factorization 3. Quadratic optimization without constraints 4. Quadratic optimization with constraints 5. Least-squares problems Lecture 1 Quadratic optimization
Quadratic optimization without constraints minimize 1 2 xt Hx+c T x+c 0 where H = H T R n n, c R n, c 0 R. Common in applications, e.g., linear regression (fitting of models) minimization of physically motivated objective functions such as minimization of energy, variance etc. quadratic approximation of nonlinear optimization problems. Lecture 2 Quadratic optimization
The quadratic term Let f(x) = 1 2 xt Hx+c T x+c 0 where H = H T R n n, c R n, c 0 R. We can assume that the matrix H is symmetric. If H is not symmetric it holds that x T Hx = 1 2 xt Hx+ 1 2 (xt Hx) T = 1 2 xt (H+H T )x, i.e., x T Hx = x T Hx where H = 1 2 (H+HT ) is a symmetric matrix. Lecture 3 Quadratic optimization
Positive definite and positive semi-definite matrices Definition 1 (Positive semidefinite). A symmetric matrix H = H T R n n is called positive semidefinite if x T Hx 0 for all x R n Definition 2 (Positive definite). A symmetric matrix H = H T R n n is called positive definite if x T Hx > 0 for all x R n {0} Lecture 4 Quadratic optimization
How do you determine if a symmetric matrix H = H T R n n is positive (semi-) definite: LDL T factorization (suitable for calculations by hand). Chapter 27.6 in the book. Use the spectral factorization of the matrix H = QΛQ T, where Q T Q = I and Λ = diag(λ 1,...,λ n ) where λ k are the eigenvalues of the matrix. From this it follows that H is positive semidefinite if and only if λ k 0, k = 1,...,n H is positive definite if and only if λ k > 0, k = 1,...,n Lecture 5 Quadratic optimization
LDL T factorization Theorem 1. A symmetric matrix H = H T R n n is positive semidefinite if and only if there is a factorization H = LDL T where 1 0 0... 0 d 1 0 0... 0 l 21 1 0... 0 0 d 2 0... 0 L = l 31 l 32 1... 0 D = 0 0 d 3... 0........ l n1 l n2 l n3... 1 0 0 0... d n and d k 0, k = 1,...,n. H is positive definite if and only if d k > 0, k = 1,...,n, in the LDL T factorization. Proof: See chapter 27.9 in the book. Lecture 6 Quadratic optimization
The factorization LDL T is determined by one of the following methods Elementary row and column operations, Completion of squares, Lecture 7 Quadratic optimization
LDL T factorization - Example Determine the LDL T factorization of 2 4 6 8 4 9 17 22 H = 6 17 44 61 8 22 61 118 The idea is now to perform symmetric row- and column-operations from the left and right in order to form the diagonal matrix D. The row-operations can be performed with lower triangular invertible matrices, and the product of lower triangular matrices is lower triangular and also the inverse. Lecture 8 Quadratic optimization
LDL T factorization - Example Perform row-operations to get zeros in the first column: 1 0 0 0 2 4 6 8 2 4 6 8 2 1 0 0 4 9 17 22 0 1 5 6 l 1 H = = 3 0 1 0 6 17 44 61 0 5 26 37 4 0 0 1 8 22 61 118 0 6 37 86 and the corresponding column-operations: 2 4 6 8 1 2 3 4 l 1 Hl T 0 1 5 6 0 1 0 0 1 = 0 5 25 30 0 0 1 0 0 6 30 35 0 0 0 1 = 2 0 0 0 0 1 5 6 0 5 26 37 0 6 37 86 Lecture 9 Quadratic optimization
LDL T factorization - Example Perform row-operations to get zeros in the second column: 1 0 0 0 2 0 0 0 2 0 0 0 l 2 l 1 Hl T 0 1 0 0 0 1 5 6 0 1 5 6 1 = = 0 5 1 0 0 5 26 37 0 0 1 7 0 6 0 1 0 6 37 86 0 0 7 50 and the corresponding column-operations: 2 0 0 0 1 0 0 0 l 2 l 1 Hl T 1l T 0 1 5 6 0 1 5 6 2 = 0 0 1 7 0 0 1 0 0 0 7 50 0 0 0 1 = 2 0 0 0 0 1 0 0 0 0 1 7 0 0 7 50 Lecture 10 Quadratic optimization
LDL T factorization - Example Perform row-operations to get zeros in the third column: 1 0 0 0 2 0 0 0 2 0 0 0 l 3 l 2 l 1 Hl T 1l T 0 1 0 0 0 1 0 0 0 1 0 0 2 = = 0 0 1 0 0 0 1 7 0 0 1 7 0 0 7 1 0 0 7 50 0 0 0 1 and the corresponding column-operations: 2 0 0 0 1 0 0 0 l 3 l 2 l 1 Hl T 1l T 2l T 0 1 0 0 0 1 0 0 3 = 0 0 1 7 0 0 1 7 0 0 0 1 0 0 0 1 = 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 = D Lecture 11 Quadratic optimization
LDL T factorization - Example We finally got the diagonal matrix 2 0 0 0 0 1 0 0 D = 0 0 1 0 0 0 0 1 Since all elements on the diagonal of D are positive the matrix H is positive definite. The values on the diagonal are not the eigenvalues of H, but they have the same sign as the eigenvalues, and that is all we need to know. Lecture 12 Quadratic optimization
LDL T factorization - Example How do we determine the L matrix? l 3 l 2 l 1 Hl T 1l T 2l T 3 = D, i.e., H = LDL T if L = (l 3 l 2 l 1 ) 1 = l 1 1 l 1 2 l 1 3 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 1 0 0 0 1 0 0 L = 3 0 1 0 0 5 1 0 0 0 1 0 4 0 0 1 0 6 0 1 0 0 7 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 1 0 0 0 1 0 0 2 1 0 0 = = 3 0 1 0 0 5 1 0 0 0 1 0 3 5 1 0 4 0 0 1 0 6 0 1 0 0 7 1 4 6 7 1 Lecture 13 Quadratic optimization
Unbounded problems Let f(x) = 1 2 xt Hx+c T x+c 0. Assume that H is not positive semidefinite. Then there is a d R n such that d T Hd < 0. It follows that f(td) = 1 2 t2 d T Hd+tc T d+c 0 when t Conclusion: A necessary condition for f(x) to have a minimum is that H is positive semidefinite. Lecture 14 Quadratic optimization
Assume that H is positive semidefinite. Then we can use the factorization H = LDL T which gives minimize 1 2 xt Hx+c T x+c 0 = minimize 1 2 xt LDL T x+c T (L T ) 1 L T x+c 0 = minimize = c 0 + n k=1 1 2 d ky 2 k + c k y k +c 0 n minimize 1 2 d kyk 2 + c k y k k=1 where we have introduced the new coordinates y = L T x and c = L 1 c. The optimization problem separates to n independent scalar quadratic optimization problems, which are easy to solve. Lecture 15 Quadratic optimization
Scalar Quadratic optimization without constraints The scalar quadratic optimization problem has a finite solution, if and only if, h = 0 and c = 0, or h > 0. minimize x R 1 2 hx2 +cx+c 0 In the first case there is no unique optimum. In the second case 1 2 hx2 +cx+c 0 = 1 2 h(x+c/h)2 +c 0 c 2 /(2h), and the optimum is achieved for x = c/h. Lecture 16 Quadratic optimization
Solving the minimization problems separately shows that [ ] T ŷ = ŷ 1... ŷ n is a minimizer if (i) d k 0, k = 1,...,n (ii) d k ŷ k = c k (i) H is positive semidefinite (ii) DL Tˆx = L 1 c Constraint (ii) can be replaced with LDL Tˆx = c i.e., Hˆx = c. Furthermore, f(x) = 1 2 xt Hx+c T x+c 0 is unbounded (from below) if (i) some d k < 0, or (ii) one of the equations d k ŷ k = c k do not have a solution i.e., d k = 0 and c k 0 (i) H is not positive semi-definite (ii) the equation Hˆx = c do not have a solution Lecture 17 Quadratic optimization
The above arguments can be formalized as the following theorem Theorem 2. Let f(x) = 1 2 xt Hx+c T x+c 0. Then ˆx is a minimizer if (i) H is positive semidefinite (ii) Hˆx = c If H is positive definite then ˆx = H 1 c. If H is not positive semidefinite or Hx = c do not have a solution, then f is not limited from below. Comment 1. The condition that Hx = c has a solution is equivalent to c R(H). Therefore, f is a minimizer if and only if H is positive semidefinite and c R(H). Lecture 18 Quadratic optimization
Quadratic optimization with linear constraints 1 minimize 2 xt Hx+c T x+c 0 s.t. Ax = b (1) where H = H T R n n, A R m n, c R n, and b R m, and c 0 R. We assume that b R(A) and n > m, i.e., N(A) {0}. Assume that N(A) = span{z ] 1,...,z k } and define the nullspace matrix Z = [z 1... z k. If A x = b it holds that an arbitrary solution to the linear constraint has the form x = x+zv, for some v R k. This follows since A( x+zv) = A x+azv = b+0 = b Lecture 19 Quadratic optimization
With x = x+zv inserted in f(x) = 1 2 xt Hx+c T x+c 0 we get f(x) = 1 2 vt Z T HZv+ ( Z T (H x+c) ) T v+f( x). The minimization problem (1) is thus equivalent with minimize 1 2 vt Z T HZv+ ( Z T (H x+c) ) T v+f( x). We can now apply Theorem 2 to obtain the following result: Theorem 3. ˆx is an optimal solution to (1) if (i) Z T HZ is positive semidefinite. (ii) ˆx = x+zˆv where Z T HZˆv = Z T (H x+c) Lecture 20 Quadratic optimization
Comment 2. The second condition in the theorem is equivalent with the existence of a û R m such that H AT A 0 ˆx = û c b Proof: The second constraint in the theorem can be written Z T (H( x+zˆv)+c) = 0 Since N(A) = R(Z) this is equivalent with H( x+zˆv }{{} )+c N(A) = R(A T ) ˆx Hˆx+c = A T û, for some û R m Lecture 21 Quadratic optimization
Linear constrained QP: Example Let f(x) = 1 2 xt Hx+c T x+c 0, where H = 2 0, c = 0 1 1, c 0 = 0. 1 Consider minimization of f over the feasible region F = {x Ax = b} where A = [ ] 2 1, b = 2. I.e. the minimization problem min x2 1 + 1 2 x2 2 +x 1 +x 2 s.t. 2x 1 +x 2 = 2 Lecture 22 Quadratic optimization
Linear constrained QP: Example - Nullspace method The nullspace of A is spanned by Z and the reduced Hessian is given by Z = 1, H = Z T HZ = 2 2 Since the reduced Hessian is positive definite the problem is strictly convex. The unique solution is then given by taking an arbitrary x F, e.g. x = (1 0) T and solving Then ˆx = x+zˆv = 1 + 0 Hˆv = c = Z T (H x+c) = 3 ˆv = 3 2 1 ( 3 2 2 ) = 5 2 3 Lecture 23 Quadratic optimization
Linear constrained QP: Example - Geometric illustration 4 3 2 Z F 1 0 x 1 3 2 Z 2 3 ˆx 4 4 3 2 1 0 1 2 3 4 Lecture 24 Quadratic optimization
Linear constrained QP: Example - Lagrange method Assume that we know the problem is convex, then H AT x = c 2 0 2, 0 1 1 A 0 u b 2 1 0 x 1 x 2 u = 1 1 2 Solving 2x 1 2u = 1 x 2 u = 1 x 1 = 1 2 u x 2 = 1+u, then 2x 1 +x 2 = 2 implies 2( 1 u)+( 1+u) = u = 2, so u = 2 2 and then x 1 = 5/2 and x 2 = 3. As for the nullspace method. Lecture 25 Quadratic optimization
Linear constrained QP: Example - Sensitivity analysis From the Lagrange method we know that u = 2, it tells us how the objective function changes due to variations in the right hand side b of the linear constraint, in first approximation. Assume that b := b+δ, then from the previous calculations u = (2+δ) and then x 1 = 5/2+δ and x 2 = 3 δ. Then f(x) = 1 2 5 +δ 2 3 δ 2 0 5 +δ 2 + 1 0 1 3 δ 1 T = 9 4 2δ 1 2 δ2 5 +δ 2 3 δ T Lecture 26 Quadratic optimization
Example 2: Analysis of a resistance circuit I u 1 x 12 x 13 v 13 r 13 u 3 x 34 x 35 v 35 v 12 r 12 r 23 r 34 r 35 v 34 u 5 x 23 v 23 r 45 r 24 u 2 x 24 v 24 u 4 x 45 v 45 Let a current I go through the circuit from node 1 to node 5. What is the solution to the circuit equation? Lecture 27 Quadratic optimization
The current x ij goes from node i to node j if x ij > 0. Kirchoff s currentlaw gives This can be written Ax = b where x = x 12 +x 13 = i x 12 x 23 x 24 = 0 x 13 +x 23 x 34 x 35 = 0 x 24 +x 34 x 45 = 0 x 35 +x 45 = i [ x 12 x 13 x 23 x 24 x 34 x 35 x 45 ] T, 1 1 0 0 0 0 0 I 1 0 1 1 0 0 0 0 A = 0 1 1 0 1 1 0, b = 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 I Lecture 28 Quadratic optimization
The Node potentials are denoted u j, j = 1,...,5 The voltage drop over the resistor r ij is denoted v ij and is given by the equations This can also be written v ij = u i u j v ij = r ij x ij (Ohms law) v = A T u v = Dx where [ ] T ] T u = u 1 u 2 u 3 u 4 u 5 v = [v 12 v 13 v 23 v 24 v 34 v 35 v 45 D = diag(r 12,r 13,r 23,r 24,r 34,r 35,r 45 ) Lecture 29 Quadratic optimization
The circuit equations: Ax = b v = A T u v = Dx D AT A 0 x = 0 b Since D is positive definite (r ij > 0) this is, see comment 2, equivalent to the optimality conditions for the following optimization problem 1 minimize 2 xt Dx s.t. Ax = b Lecture 30 Quadratic optimization
Comment In the lecture on network optimization we saw that the matrix N(A) 0 (then the matrix was called Ã) and that the nullspace corresponded to cycles in the circuit. The laws of electricity (Ohms law + Kirchoffs law) determines the current that minimizes the loss of power. This follows since the objective function can be written 1 2 xt Dx = 1 2 r ij x 2 ij ij Lecture 31 Quadratic optimization
Comment 3. It is common to connect one of the nodes to earth. Let us, e.g., connect node 5 to earth, then u 5 = 0. I u 1 x 12 x 13 v 13 r 13 u 3 x 34 x 35 v 35 v 12 r 12 x 23 r 23 v 23 r 34 r 35 v 34 r 45 u 5 = 0 u 2 x 24 r 24 v 24 u 4 x 45 v 45 Lecture 32 Quadratic optimization
With the potential in node 5 fixed at u 5 = 0 the last column in the voltage equation v = A T u is eliminated. The Voltage equation and the current balance can be replaced with v = Ā T ū and Āx = b where 1 1 0 0 0 0 0 I u 1 1 0 1 1 0 0 0 0 u 2 Ā =, b =, ū = 0 1 1 0 1 1 0 0 u 3 0 0 0 1 1 0 1 0 The advantage with this is that the rows of Ā are linearly independent. This facilitates the solution of the optimization problem. u 4 Lecture 33 Quadratic optimization
Least-Squares problems Application: Linear regression (model fitting) e(t) s(t) y(t) System Problem: Fit a linear regressionsmodel to measured data. n Regressionmodel : y(t) = α j ψ j (t) ψ k (t), k = 1,...,n are the regressors (known functions) α k = k = 1,...,n are the model parameters (to be determined) e(t) measurement noise (not known) s(t) observations. j=1 Lecture 34 Quadratic optimization
Idéa for fitting the model: Minimize the quadratic sum of the prediction errors minimize 1 m n ( α j ψ j (t i ) s(t i )) 2 2 =minimize 1 2 i=1 j=1 m (ψ(t i ) T x s(t i )) 2 i=1 =minimize 1 2 (Ax b)t (Ax b) (2) ψ(t) = ψ 1 (t)., x = ψ n (t) α 1. α n, A = ψ(t 1 ) T., b = ψ(t n ) T s(t 1 ). s(t n ) Lecture 35 Quadratic optimization
The solution of the Least-Squares problem (LSQ) The Least-Squares problem (2) can equivalently be written minimize 1 2 xt Hx+c T x+c 0 where H = A T A, c = A T b, c 0 = 1 2 bt b. We note that H = A T A is positive semidefinite since x T A T Ax = Ax 2 0. c R(H) since c = A T b R(A T ) = R(A T A) = R(H). The conditions in Theorem 2 and Comment 1 are satisfied and it follows that the LSQ-estimate is given by Hˆx = c A T Aˆx = A T b (The Normal equation) Lecture 36 Quadratic optimization
Linear algebra interpretation N(A) N(A T ) R n A R m x n x b Aˆx ˆx = A T u R(A T ) Aˆx R(A) The Normal equation can be written A T (b Aˆx) = 0 b Aˆx N(A T ) This orthogonality property has a geometric interpretation which is illustrated in the figure above. The Fundamental Theorem of Linear algebra explains the LSQ solution geometrically. Lecture 37 Quadratic optimization
Uniqueness of the LSQ solution Case 1 If the columns in A are linearly independent,i.e., N(A) = {0}, it holds that H = A T A is positive definite and hence invertible. The Normal equations then has the unique solution ˆx = (A T A) 1 Ab Case 2 If A has linearly dependent columns, i.e., N(A) {0} then it is natural to chose the solution of smallest length. From the figure on the previous slide it is clear that we should let ˆx = A T û, where AA T û = A x, and where x is an arbitrary solution to the LSQ problem. Lecture 38 Quadratic optimization
Our choice of ˆx in case 2 can equivalently be interpreted as the solution to the quadratic optimization problem 1 minimize 2 xt x s.t. Ax = A x According to Comment 2 the solution to this optimization problem is given by the solution to I AT A 0 i.e. ˆx = A T û, where AA T û = A x ˆx = û 0 A x Lecture 39 Quadratic optimization
Läsanvisningar Quadratic constraints without constraints: Chapter 9 and 27 in the book. Quadratic optimization with constraints: Chapter 10 in the book. Least-Squares method: Chapter 11 in the book. Lecture 40 Quadratic optimization