Continuous methods for numerical linear algebra problems

Similar documents
Continuous Models for Eigenvalue Problems

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

NORMS ON SPACE OF MATRICES

The convergence of stationary iterations with indefinite splitting

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Introduction to gradient descent

Nonlinear Control. Nonlinear Control Lecture # 3 Stability of Equilibrium Points

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Constrained optimization: direct methods (cont.)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

minimize x subject to (x 2)(x 4) u,

Polynomial level-set methods for nonlinear dynamical systems analysis

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

10-725/36-725: Convex Optimization Prerequisite Topics

SOS TENSOR DECOMPOSITION: THEORY AND APPLICATIONS

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

QUADRATIC MAJORIZATION 1. INTRODUCTION

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

A New Trust Region Algorithm Using Radial Basis Function Models

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

Lecture 7: Positive Semidefinite Matrices

c 2004 Society for Industrial and Applied Mathematics

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

COMP 558 lecture 18 Nov. 15, 2010

Numerical Methods - Numerical Linear Algebra

Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems

1. Find the solution of the following uncontrolled linear system. 2 α 1 1

Nonlinear Programming Algorithms Handout

Lecture: Examples of LP, SOCP and SDP

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

EN Nonlinear Control and Planning in Robotics Lecture 3: Stability February 4, 2015

Lecture Notes: Eigenvalues and Eigenvectors. 1 Definitions. 2 Finding All Eigenvalues

Symmetric matrices and dot products

Symmetric Matrices and Eigendecomposition

7.1 Linear Systems Stability Consider the Continuous-Time (CT) Linear Time-Invariant (LTI) system

CDS Solutions to the Midterm Exam

Semidefinite Programming Basics and Applications

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

MCE/EEC 647/747: Robot Dynamics and Control. Lecture 8: Basic Lyapunov Stability Theory

Preliminary Examination, Numerical Analysis, August 2016

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Math 273a: Optimization Subgradients of convex functions

Construction of Lyapunov functions by validated computation

Lyapunov Stability Theory

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

M.A. Botchev. September 5, 2014

1. Matrix multiplication and Pauli Matrices: Pauli matrices are the 2 2 matrices. 1 0 i 0. 0 i

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Introduction to Matrix Algebra

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

High-Gain Observers in Nonlinear Feedback Control. Lecture # 2 Separation Principle

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Positive Definite Matrix

Lyapunov stability ORDINARY DIFFERENTIAL EQUATIONS

Conjugate Gradient (CG) Method

Numerical Optimization

Real Symmetric Matrices and Semidefinite Programming

Numerical Algorithms as Dynamical Systems

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Gradient Methods Using Momentum and Memory

Fundamentals of Unconstrained Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Stability of Nonlinear Systems An Introduction

Second Order Optimization Algorithms I

8 Numerical methods for unconstrained problems

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

(b) (a) 1 Introduction. 2 Related Works. 3 Problem Definition. 4 The Analytical Formula. Yuandong Tian Facebook AI Research

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

MATH 4211/6211 Optimization Constrained Optimization

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

Nonlinear Programming Models

Lecture 2: Linear Algebra Review

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Linear Regression and Its Applications

Linear Algebra Review

Solving Regularized Total Least Squares Problems

Nonlinear Control. Nonlinear Control Lecture # 2 Stability of Equilibrium Points

Perron-Frobenius theorem for nonnegative multilinear forms and extensions

Handout 2: Invariant Sets and Stability

Convex Optimization and Modeling

Experiments with iterative computation of search directions within interior methods for constrained optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Lecture 4 Eigenvalue problems

Mathematical Systems Theory: Advanced Course Exercise Session 5. 1 Accessibility of a nonlinear system

Transcription:

Continuous methods for numerical linear algebra problems Li-Zhi Liao (http://www.math.hkbu.edu.hk/ liliao) Department of Mathematics Hong Kong Baptist University The First International Summer School on Numerical Linear Algebra, Guangzhou and Hong Kong, July 17 - August 5, 2006

Roadmap of my talk I. What is the continuous method? II. Applications in numerical linear algebra (NLA). III. Case study in symmetric eigenvalue problems. 1

I.- a) What is the continuous method Target Problem: min f(x). (1) x Ω Rn Conventional methods: iterative, {x k }, x k x, where x is a local minimizer of (1). Original idea of the continuous method: continuous path (trajectory) from x 0 to x. form a 2

I.- a) What is the continuous method (cont.) A) Mathematician s route Started in 1950 s K. J. Arrow, L. Hurwicz and H. Uzawa, Studies in Linear and Nonlinear Programming, Stanford University Press, Stanford, CA, 1958. Bear many names, ODE method, dynamic method, etc. Mathematical model: dx(t) dt = descent and feasible direction of f(x). +: easy to construct the ODE; : difficult to prove the convergence of x(t); : difficult to solve constrained problems. 3

I.- a) What is the continuous method (cont.) B) Engineer s route (Hopfield neural network) Mathematical model: { Energy (or merit) function E(x) dx(t) dt = p(x(t)), and de(x(t)) dt < 0. +: introduce an energy function; +: hardware implementation; : energy function E(x) must be a Lyapunov function; : E(x) and p(x) must be simple functions. Ref.: L.-Z. Liao, H.-D. Qi, L. Qi, Neurodynamical optimization J. Global Optim., 28, pp. 175-195, 2004. 4

I.- a) What is the continuous method (cont.) Continuous method Idea: take all the + s and overcome all the s. Geometric meaning: If we view the conventional method as that we put a person somewhere in a mountain with both eyes covered and ask him to find the bottom of the mountain, then we can view the continuous method as that the person finds a large metal ball and squeeze himself into the ball and let the ball falls freely. 5

I.- b) Framework of the continuous method A mathematical framework of the continuous method for (1) i) Define an energy function E(x); ii) Construct an ODE in the form of dx(t) dt = p(x(t)) such that de(x(t)) dt p(x) = 0. < 0 and de(x(t)) dt = 0 iii) Any local minimizer of (1) is an equilibrium point of the ODE. 6

I.- c) Why is the continuous method attractive Theoretical aspect: i) strong convergence results; ii) weak conditions or assumptions; iii) suitable for many kinds of problems. Computational aspect: i) simple and neat ODE systems; ii) relaxed ODE solvers; iii) large-scale problems; iv) parallelizable. 7

II. Applications in NLA a) Symmetric eigenvalue problems Let A R n n and A T = A. From the Real Schur Decomposition, we have A = UΛU T, where Λ = diag(λ 1, λ 2,,λ n ) with λ 1 λ 2 λ n, and U = (u 1, u 2,,u n ) is orthogonal. Extreme eigenvalue problem: find λ 1 and u 1. Interior eigenvalue problem: find a λ i and u i such that λ i [a, b], where a and b are predefined constants. 8

II. - b) Least squares problems Let A be a given m n matrix of rank r and let b a given vector. Linear least squares: Find x so that b Ax 2 = min. (Trivial) Least squares with linear constraints: Find x so that b Ax 2 = min subject to C T x = 0. (Trivial) 9

II. - b) Least squares problems (Cont.) Least squares with linear and quadratic constraints: Find x so that b Ax 2 = min subject to C T x = 0 and x 2 2 α 2. (Trivial) Total least squares: Find x, a matrix Ê, and a residual ˆr so that ( Ê 2 F + ˆr 2 ) 2 = min subject to (A + Ê)x = b + ˆr. 10

II. - b) Least squares problems (Cont.) Regularized total least squares: Need to solve the following problem min b Ax 2 2 1 + x T V x subject to x T V x = α 2, where V is a given symmetric positive definite matrix. 11

II. - c) Nonnegative matrix factorization Let A R m n be nonnegative, i.e. A ij 0, i, j. For any given k min(m, n), find nonnegative matrices W R m k and H R k n such that A WH 2 F = min. The extension of the above problem is the nonnegative tensor factorization. 12

III. Case study in symmetric eigenvalue problems a) Conversion to optimization problems Extreme eigenvalue problem: min (λ,x) s.t. λ Ax = λx, x T x = 1. min x R n x T Ax s.t. x T x = 1. min x R n x T Ax cx T x s.t. x T x 1, (2) where c λ n + 1. 13

III. - a) Conversion... (Cont.) Problem (2) has: concave objective function; and convex set constraint. 14

III. - a) Conversion... (Cont.) Properties of problem (2): x is a local minimizer of (2) x is a global minimizer of (2) x satisfies Ax = λ 1 x, x T x = 1. See G. Golub and L.-Z. Liao, Continuous methods for extreme and interior eigenvalue problems, LAA, 415, pp. 31-51, 2006. 15

III. - a) Conversion... (Cont.) Interior eigenvalue problem: min 1 s.t. Ax = λx, (λ,x) xt x = 1, a λ b. min x T (A ai n )(A bi n )x x R n s.t. x T x = 1. min x T (A ai n )(A bi n )x cx T x (3) x R n s.t. x T x 1, where c max 1 i n (λ i a)(λ i b) + 1. Problem (3) has: concave objective function; and convex set constraint. 16

III. - a) Conversion... (Cont.) Properties of problem (3): 1) x is a local minimizer of (3) x is a global minimizer of (3) 2) Let x be a global minimizer of (3) and η = (x ) T (A ai n )(A bi n )x, then 2a) If η > 0, then there exists no eigenvalue of A in [a,b]. 2b) If η 0, then there exists at least one eigenvalue of A in [a, b]. 2c) If η = 0, then one of the eigenvalues of A must be a or b. 17

III. - a) Conversion... (Cont.) If we combine problems (2) and (3), we have min x R n x T Hx cx T x (4) s.t. x T x 1, where H = A for (2) and H = (A ai n )(A bi n ) for (3) and c λ max (H) + 1. Note: the objective function is a concave function, so the solution is always on the boundary. 18

III. - b) Continuous models for eigenvalue problems Dynamic Model 1: Merit function: Dynamical system: f(x) = x T Hx cx T x. (5) dx(t) dt = { x P Ω [x f(x)] } : e(x), (6) where Ω = {x R n x T x 1} and P Ω ( ) is a projection operation defined by P Ω (y) = arg min x Ω x y 2, y R n. 19

III. - b) Continuous models...(cont.) Properties of Model 1: 1) For any x 0 R n, there exists a unique solution x(t) of the dynamical system (6) with x(t = t 0 ) = x 0 in [t 0,+ ). 2) If e(x 0 ) = 0 = x(t) x 0, t t 0. If e(x 0 ) 0 = lim e(x(t)) = 0. t + 3) e(x) = 0 with x 0 x is an eigenvector of H with x 2 = 1. 4) If x 0 2 > 1, x(t) 2 is monotonically decreasing to 1. If x 0 2 < 1, x(t) 2 is monotonically increasing to 1. If x 0 2 = 1, x(t) 2 1. 20

x 0 trajectory of x( t) x 0 x 0 1 Figure 1: Dynamical trajectory of (6) 21

III. - b) Continuous models...(cont.) 5) If x 0 0 = x such that lim t + x(t) = x and x 2 = 1. 5.1) For H = A, we have lim t + xt (t)ax(t) = λ k, where k = min{i x T 0 u i 0, i = 1,,n}. Note: (λ 1, x ) is what we want! 5.2) For H = (A ai n )(A bi n ), we have lim t + xt (t)(a ai n )(A bi n )x(t) = θ k, where k = min{i x T 0 v i 0, i = 1,,n}, H = V ΘV T, Θ = diag(θ 1, θ 2,,θ n ), and V T V = I n with V = (v 1, v 2,,v n ). Note: This θ k may not be an eigenvalue of A. 22

III. - b) Continuous models...(cont.) Steps to obtain an eigenvalue of A in [a, b]: 1) If θ k = 0, one of a or b is an eigenvalue of A. This can be verified by checking the values of Ax ax 2 or Ax bx 2. 2) If θ k < 0, solve (λ k a)(λ k b) = θ k. Two λ k values can be obtained. Pick the one such that Ax λ k x 2 is very small. An eigenvalue of A in [a, b] is found. 3) If θ k > 0, a new starting point has to be selected to start over. If after several tries, all θ k s are positive, we may conclude that there is no eigenvalue of A in [a, b]. 23

III. - b) Continuous models...(cont.) In numerical computation, we take c = H 1 + 1. But the numerical results seem to be sensitive to the value of c. The larger, the worse. Reason: Let θ 1 be the smallest eigenvalue of H, θ s (> θ 1 ) be the second smallest eigenvalue of H. Then we have x(t) x 2 ɛ e 2(θ 1 θ s) 1+2(c θ 1) x 0 x 2. Can we improve this? 24

III. - b) Continuous models...(cont.) Dynamic Model 2: Remember e(x) = x P Ω [x f(x)]. Now, we define e γ = x P Ω [x γ f(x)], 0 γ 1. Merit function: (no change) Dynamical system: f(x) = x T Hx cx T x. dx(t) dt = α(x) f(x) e γ (x), (7) where α(x) = { η xt e γ (x) x T f(x) x 0, γη x = 0. It can be shown that α(x) is locally Lipschitz continuous. 25

III. - b) Continuous models...(cont.) All the theoretical results of Model 1 are also held for Model 2. In addition, Model 2 enjoys df(x) dt α(x) f(x) 2 2 1 γ e γ(x) 2 2. 26

III. - c) Numerical results for eigenvalue problems Example 1: We construct Example 1 in the following steps: 1. Select Λ = diag( 1e 4, 1e 4,0,0, 1,,1) R n n. 2. Let B = rand(n, n) and [Q, R] = qr(b). 3. Define A = Q T ΛQ. Example 2: Example 2 is similar to Example 1 except Λ = diag( 1, 1,0,0,1,,1) R n n. The two starting points used are x 0 = (1,,1) T and x 0. 27

III. - c) Numerical... (Cont.) Stopping criterion: dx(t) dt 10 6. Dynamical system solver: Matlab ODE45 with RelTol= 10 6 and AbsTol=10 9. 28

III. - c) Numerical... (Cont.) Extreme eigenvalue problem: Model 1 Test 1: sensitivity to the initial point We fix n = 5,000 and c = A 1 + 1 (=5.32 for Example 1 and =8.04 for Example 2). Table 1 Model 1 Example 1 Example 2 CPU(s) λ + 10 4 CPU(s) λ + 1 x 0 469.1 3.0e 5 475.6 6.5e 6 P Ω (x 0 ) 298.6 3.0e 5 306.3 5.8e 6 x 0 469.0 3.0e 5 474.9 6.5e 6 P Ω ( x 0 ) 301.3 3.0e 5 305.6 5.8e 6 : λ = (x ) T Ax is the computed eigenvalue. 29

III. - c) Numerical... (Cont.) Test 2: sensitivity to c Table 2 Model 1 Example 1 Example 2 c CPU(s) λ + 10 4 CPU(s) λ + 1 10 423.5 3.0e 5 361.8 3.8e 7 x 0 x 0 2 default 298.6 3.0e 5 306.3 5.8e 6 2 253.5 3.0e 5 258.3 4.0e 9 10 422.4 3.0e 5 362.0 3.8e 7 x 0 x 0 2 default 301.3 3.0e 5 305.6 5.8e 6 2 252.6 3.0e 5 258.7 4.0e 9 : λ = (x ) T Ax is the computed eigenvalue. 30

III. - c) Numerical... (Cont.) Test 3: computational cost (we fix c = 2) Table 3 Model 1 Example 1 Example 2 n CPU(s) λ + 10 4 CPU(s) λ + 1 x 1, 000 0 x 0 13253 7.4e 6 12.4 1.8e 9 2 x 0 x 0 13290 7.4e 6 12.3 1.8e 9 2 x 2, 500 0 x 0 10694 1.7e 5 68.4 1.7e 9 2 x 0 x 0 10729 1.7e 5 69.3 1.7e 9 2 x 5, 000 0 x 0 253.5 3.0e 5 258.3 4.0e 9 2 x 0 x 0 252.6 3.0e 5 258.7 4.0e 9 2 x 7, 500 0 x 0 857.5 7.1e 5 951.5 6.7e 9 2 x 0 x 0 861.8 7.1e 5 953.8 6.7e 9 2 : λ = (x ) T Ax is the computed eigenvalue. The CPU time grows at a rate of n 2+ɛ where ɛ > 0 31

III. - c) Numerical... (Cont.) Interior eigenvalue problem: Model 1 Test 4: no eigenvalue in the defined interval We select [a, b] = [ 3 10 4, 2 10 4 ], and fix n = 5,000. Table 4 Model 1 P Ω (x 0 ) P Ω ( x 0 ) c : def. c = 2 c : def. c = 2 Example 1 CPU(s) 280.6 105.2 279.5 107.7 λ = (x ) T Ax -7e-5-7e-5-7e-5-7e-5 Ax λx 2e 5 4e 6 2e 5 4e 6 θ k 1e 6 5e 8 1e 6 5e 8 Example 2 CPU(s) 538.3 111.0 534.4 106.0 λ = (x ) T Ax 1e 5 2e 8 1e 5 2e 8 Ax λx 6e 5 2e 6 6e 5 2e 6 θ k 1e 5 8e 8 1e 5 8e 8 32

III. - c) Numerical... (Cont.) Test 5: one eigenvalue in the defined interval We select [a, b] = [0.9, 1.1], and fix n = 5, 000. Table 5 Model 1 P Ω (x 0 ) P Ω ( x 0 ) c : def. c = 2 c : def. c = 2 Example 1 CPU(s) 85.5 45.6 105.6 30.6 λ = (x ) T Ax 1-1e-6 1+4e-9 1-1e-6 1+4e-9 Ax λx 2.9e-5 2.4e-6 2.9e-5 2.4e-6 θ k 0.01 0.01 0.01 0.01 Example 2 CPU(s) 147.0 65.7 142.4 66.3 λ = (x ) T Ax 1-2e-6 1+6e-8 1-2e-6 1+6e-8 Ax λx 6.4e-5 2.3e-6 6.4e-5 2.3e-6 θ k 0.01 0.01 0.01 0.01 33

III. - c) Numerical... (Cont.) Extreme eigenvalue problem: Model 2 We fixed the initial value at P Ω (x 0 ) and select c i = A 1 + 10 i, i = 0,1,2. Table 6 CPUs for Model 1 (Model 2) of Example 1 c n 1,000 3,000 5,000 c 0 11.69 (8.172) 102.0 (63.73) 281.1 (175.5) c 1 18.08 (7.828) 155.1 (61.34) 458.4 (173.1) c 2 61.33 (24.89) 518.6 (98.13) 1515 (666.1) The CPU time grows at a rate of n 2 for Model 2. 34

III. - c) Numerical... (Cont.) Interior eigenvalue problem: Model 2 We select [a,b] (1) = [ 3 10 4, 2 10 4 ], [a, b] (2) = [0.9,1.1], and [a, b] (3) = [0,2]. In addition, we fix n = 5,000, the initial value at P Ω (x 0 ) and select c i = A 1 + 10 i, i = 0,1,2. Table 7 CPUs for Model 1 (Model 2) of Example 1 c [a, b] [a, b] (1) [a,b] (2) [a,b] (3) c 0 505.6 (230.6) 91.03 (34.08) 139.9 (30.52) c 1 603.2 (293.8) 104.5 (32.30) 157.9 (28.86) c 2 1343 (577.6) 207.6 (26.34) 313.2 (22.92) 35

Why is Model 2 much better than Model 1? Answer: Don t know yet, in process 36

Concluding remarks The continuous method is powerful, attractive, and still under development. 37