A top nine list: Most popular induced matrix norms

Similar documents
DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Notes on basis changes and matrix diagonalization

Chapter 3 Transformations

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Linear Algebra Massoud Malek

Semicontinuity of rank and nullity and some consequences

Numerical Analysis Lecture Notes

NORMS ON SPACE OF MATRICES

Chap 3. Linear Algebra

1 Last time: least-squares problems

Computing the Norm A,1 is NP-Hard

The SVD-Fundamental Theorem of Linear Algebra

Math for ML: review. ML and knowledge of other fields

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Lecture notes: Applied linear algebra Part 1. Version 2

Symmetric Matrices and Eigendecomposition

Elementary linear algebra

Subsets of Euclidean domains possessing a unique division algorithm

Optimization Theory. A Concise Introduction. Jiongmin Yong

CS 246 Review of Linear Algebra 01/17/19

COMP 558 lecture 18 Nov. 15, 2010

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Math 291-2: Final Exam Solutions Northwestern University, Winter 2016

Linear Algebra. Session 12

14.2 QR Factorization with Column Pivoting

Diagonalization by a unitary similarity transformation

Fast Approximate Matrix Multiplication by Solving Linear Systems

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Conceptual Questions for Review

Linear Algebra and Eigenproblems

Stat 159/259: Linear Algebra Notes

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

j=1 x j p, if 1 p <, x i ξ : x i < ξ} 0 as p.

Recall the convention that, for us, all vectors are column vectors.

Assignment 1: From the Definition of Convexity to Helley Theorem

Lecture 1: Review of linear algebra

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

This property turns out to be a general property of eigenvectors of a symmetric A that correspond to distinct eigenvalues as we shall see later.

11.0 Introduction. An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if. A x = λx (11.0.1)

8. Diagonalization.

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Math Linear Algebra II. 1. Inner Products and Norms

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

A TOUR OF LINEAR ALGEBRA FOR JDEP 384H

LINEAR ALGEBRA REVIEW

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Linear Algebra Review. Vectors

MAT 610: Numerical Linear Algebra. James V. Lambers

BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION

UNIT 6: The singular value decomposition.

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

2. Matrix Algebra and Random Vectors

1. General Vector Spaces

MATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS

This appendix provides a very basic introduction to linear algebra concepts.

There are two things that are particularly nice about the first basis

Some Notes on Linear Algebra

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Characterization of half-radial matrices

Mathematical Methods wk 2: Linear Operators

1. Foundations of Numerics from Advanced Mathematics. Linear Algebra

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

ELE/MCE 503 Linear Algebra Facts Fall 2018

Linear Algebra Methods for Data Mining

Linear Algebra I. Ronald van Luijk, 2015

5 Compact linear operators

ACI-matrices all of whose completions have the same rank

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

Differentiation. f(x + h) f(x) Lh = L.

Lecture 7: Positive Semidefinite Matrices

Chapter 3. Linear and Nonlinear Systems

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Algebra II. Paulius Drungilas and Jonas Jankauskas

A fast randomized algorithm for overdetermined linear least-squares regression

A First Course in Linear Algebra

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

B553 Lecture 5: Matrix Algebra Review

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Knowledge Discovery and Data Mining 1 (VO) ( )

Basics of Calculus and Algebra

We describe the generalization of Hazan s algorithm for symmetric programming

Review of Linear Algebra

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Econ Slides from Lecture 7

Lecture 8 : Eigenvalues and Eigenvectors

GQE ALGEBRA PROBLEMS

MAT 2037 LINEAR ALGEBRA I web:

Foundations of Matrix Analysis

Math Solutions to homework 5

Math113: Linear Algebra. Beifang Chen

M.A. Botchev. September 5, 2014

Lecture 2: Linear Algebra Review

3 Matrix Algebra. 3.1 Operations on matrices

CHAPTER 10 Shape Preserving Properties of B-splines

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Transcription:

A top nine list: Most popular induced matrix norms Andrew D. Lewis 2010/03/20 Abstract Explicit formulae are given for the nine possible induced matrix norms corresponding to the 1-, 2-, and -norms for Euclidean space. The complexity of computing these norms is investigated. Keywords. Induced norm. AMS Subject Classifications (2010). 15A60 1. Introduction Arguably the most commonly used norms for real Euclidean space R n are the norms 1, 2, and defined by x 1 = n ( n x j, x 2 = x j 2) 1/2, x = max{ x 1,..., x n }, respectively, for x = (x 1,..., x n ) R n. Let L(R n ; R m ) be the set of linear maps from R n to R m, which we identify with the set of m n matrices in the usual way. If A L(R n ; R m ) and if p, q {1, 2, } then the norm of A induced by the p-norm on R n and the q-norm on R m is A p,q = sup{ A(x) q x p = 1}. This is well-known to define a norm on L(R n ; R m ). There are other equivalent characterisations of the induced norm, but the one given above is the only one we will need. We refer to [Horn and Johnson 2013] for a general discussion of induced matrix norms. For certain combinations of (p, q), explicit expressions for p,q are known. For example, in [Horn and Johnson 2013] expressions are given in the cases (1, 1) (in 5.6.4), (2, 2) ( 5.6.6), and (, ) ( 5.6.5). In [Rohn 2000] the case (, 1) is studied, and its computation is shown to be NP-hard. The case (2, 1) is given by Drakakis and Pearlmutter [2009], although the details of the degenerate case given there are a little sketchy. Drakakis and Pearlmutter also list all of the other combinations except (2, ), for which no expression seems to be available, and which we give here, apparently for the first time. The formula given by Drakakis and Pearlmutter for (, 2) is presented without reference or proof, and is incorrect, probably a typographical error. Here we present the correct formulae for all nine of the induced norms. Although most of these formulae are known in the literature, we give proofs in all nine cases so that, for the Professor, Department of Mathematics and Statistics, Queen s University, Kingston, ON K7L 3N6, Canada Email: andrew.lewis@queensu.ca, URL: http://www.mast.queensu.ca/~andrew/ 1

2 A. D. Lewis first time, all proofs for all cases are given in one place. We also analyse the computational complexity of computing these various norms. Here is the notation we use. By {e 1,..., e n } we denote the standard basis for R n. For a matrix A L(R n ; R m ), r(a, a) R n denotes the ath row and c(a, j) R m denotes the jth column. The components of A are denoted by A aj, a {1,..., m}, j {1,..., n}. The transpose of A is denoted by A T. The Euclidean inner product is denoted by,. For a differentiable map f : R n R m, Df(x) L(R n ; R m ) denotes the derivative of f at x. For a set X, 2 X denotes the power set of X. 2. Formulae for induced norms 1 Theorem: Let p, q {1, 2, } and let A L(R m ; R m ). The induced norm p,q satisfies the following formulae: (i) A 1,1 = max{ c(a, j) 1 j {1,..., n}}; (ii) A 1,2 = max{ c(a, j) 2 j {1,..., n}}; (iii) A 1, = max{ A aj a {1,..., m}, j {1,..., n}} ; = max{ c(a, j) j {1,..., n}} = max{ r(a, a) a {1,..., m}} (iv) A 2,1 = max{ A T (u) 2 u { 1, 1} m }; (v) A 2,2 = max{ λ λ is an eigenvalue for A T A}; (vi) A 2, = max{ r(a, a) 2 a {1,..., m}}; (vii) A,1 = max{ A(u) 1 u { 1, 1} n }; (viii) A,2 = max{ A(u) 2 u { 1, 1} n }; (ix) A, = max{ r(a, a) 1 a {1,..., m}}. Proof: (i) We compute A 1,1 = sup{ A(x) 1 x 1 = 1} { m } = sup r(a(x)), x x 1 = 1 { m sup n } A aj x j x 1 = 1 { n ( m } = sup x j A aj ) x 1 = 1 { m } max A aj j {1,..., n} = max{ c(a, j) 1 j {1,..., n}}. To establish the opposite inequality, suppose that k {1,..., n} is such that c(a, k) 1 = max{ c(a, j) 1 j {1,..., n}}.

A top nine list: Most popular induced matrix norms 3 Then, Thus since e k 1 = 1. (ii) We compute A(e k ) 1 = m ( n ) m A aj e k,j = A ak = c(a, k) 1. A 1,1 max{ c(a, j) 1 j {1,..., n}}, A 1,2 = sup{ A(x) 2 x 1 = 1} {( m = sup r(a, a), x 2) 1/2 } x 1 = 1 {( m ( n 2 ) 1/2 } sup A aj x j ) x 1 = 1 {( m sup (max{ A aj j {1,..., n}}) 2( n 2 ) 1/2 } x j ) x 1 = 1 ( m = (max{ A aj j {1,..., n}}) 2) 1/2 = using the fact that ( { m max A 2 aj j {1,..., n}}) 1/2 = max{ c(a, j) 2 j {1,..., n}}, sup{ x 2 x 1 = 1} = 1. To establish the other inequality, note that if we take k {1,..., n} such that c(a, k) 2 = max{ c(a, j) 2 j {1,..., n}}, then we have ( m ( n ) 2 ) 1/2 ( m 1/2 A(e k ) 2 = A aj e k,j = Aak) 2 = c(a, k) 2. Thus since e k 1 = 1. (iii) Here we compute A 1,2 max{ c(a, j) 2 j {1,..., n}}, A 1, = sup{ A(x) x 1 = 1} { { n } } = sup max A aj x j a {1,..., m} x 1 = 1 { { }( n } sup max A aj j {1,..., n}, a {1,..., m} x j ) x 1 = 1 = max{ A aj j {1,..., n}, a {1,..., m}}.

4 A. D. Lewis For the converse inequality, let k {1,..., n} be such that max{ A ak a {1,..., m}} = max{ A aj j {1,..., n}, a {1,..., m}}. Then { n } A(e k ) = max A aj e k,j a {1,..., m} = max{ A ak a {1,..., m}}. Thus A 1, max{ A aj j {1,..., n}, a {1,..., m}}, since e k 1 = 1. (iv) In this case we maximise the function x A(x) 1 subject to the constraint that x 2 2 = 1. We shall do this using the Lagrange Multiplier Theorem [e.g., Edwards 1973, II.5], defining f(x) = A(x) 1, g(x) = x 2 2 1. Let us first assume that none of the rows of A are zero. We must exercise some care because f is not differentiable on R n. However, f is differentiable at points off the set B A = {x R n there exists a {1,..., m} such that r(a, a), x = 0}. To facilitate computations, let us define u A : R n R m by asking that u A,a (x) = sign( r(a, a), x ). Note that B A = u 1 A (0) and that on Rn \B A the function u A is locally constant. Moreover, it is clear that f(x) = u A (x), A(x). Now let x 0 R n \ B A be a maximum of f subject to the constraint that g(x) = 0. One easily verifies that Dg has rank 1 at points that satisfy the constraint. Thus, by the Lagrange Multiplier Theorem, there exists λ R such that D(f λg)(x 0 ) = 0. We compute Df(x 0 ) v = u A (x 0 ), A(v), Thus D(f λg)(x 0 ) = 0 if and only if Dg(x) v = 2 x, v. A T (u A (x 0 )) = 2λx 0 = λ = 1 2 AT (u A (x 0 )) 2, since x 0 2 = 1. Thus λ = 0 if and only if A T (u A (x 0 )) = 0. Therefore, if λ = 0, then f(x 0 ) = 0. We can disregard this possibility since f cannot have a maximum of zero as we are assuming that A has no zero rows. As λ 0 we have f(x 0 ) = A T (u A (x 0 )), x 0 = 1 2λ AT (u A (x 0 )) 2 2 = 2λ.

A top nine list: Most popular induced matrix norms 5 We conclude that, at solutions of the constrained maximisation problem, we must have f(x 0 ) = A T (u) 2, where u varies over the nonzero points in the image of u A, i.e., over points from { 1, 1} m. This would conclude the proof of this part of the theorem in the case that A has no zero rows, but for the fact that it is possible that f attains its maximum on B A. We now show that this does not happen. Let x 0 B A satisfy x 0 2 = 1 and denote A 0 = {a {1,..., m} u A,a (x 0 ) = 0}. Let A 1 = {1,..., m} \ A 0. Let a 0 A 0. For ɛ R define x ɛ = x 0 + ɛr(a, a 0 ). 1 + ɛ 2 r(a, a 0 ) 2 2 Note that x ɛ satisfies the constraint x ɛ 2 2 = 1. Now let ɛ 0 R >0 be sufficiently small that r(a, a), x ɛ 0 for all a A 1 and ɛ [ ɛ 0, ɛ 0 ]. Then we compute A(x ɛ ) 1 = m r(a, a), x 0 + ɛ r(a, a), r(a, a 0 ) + O(ɛ 2 ) = a A 0 ɛ r(a, a), r(a, a 0 ) + a A 1 r(a, a), x 0 + ɛ r(a, a), r(a, a 0 ) + O(ɛ 2 ). (1) Since we are assuming that none of the rows of A are zero, a A 0 ɛ r(a, a), r(a, a 0 ) > 0 (2) for ɛ [ ɛ 0, ɛ 0 ], as long as ɛ 0 is sufficiently small. Now take a A 1. If ɛ is sufficiently small we can write r(a, a), x 0 + ɛ r(a, a), r(a, a 0 ) = r(a, a), x 0 + ɛc a for some C a R. As a result, and using (1), we have A(x ɛ ) 1 = A(x 0 ) 1 + ( ɛ r(a, a), r(a, a 0 ) + ɛ C a + O(ɛ 2 ). a A 0 a A 1 It therefore follows, by choosing ɛ 0 to be sufficiently small, that we have A(x ɛ ) 1 > A(x 0 ) 1 either for all ɛ [ ɛ 0, 0) or for all ɛ (0, ɛ 0 ], taking (2) into account. Thus if x 0 B A then x 0 is not a local maximum for f subject to the constraint g 1 (0).

6 A. D. Lewis Finally, suppose that A has some rows that are zero. Let and let A 1 = {1,..., m} \ A 0. Â L(R n ; R k ) by A 0 = {a {1,..., m} r(a, a) = 0} Let A 1 = {a 1,..., a k } with a 1 < < a k, and define Â(x) = k r(a, a r ), x e r, r=1 and note that A(x) 1 = Â(x) 1 for every x R n. If y R m define ŷ R k by removing from y the elements corresponding to the zero rows of A: ŷ = (y a1,..., y ak ). Then we easily determine that A T (y) = ÂT (ŷ). Therefore, A 2,1 = sup{ A(x) 1 x 2 = 1} = sup{ Â(x) 1 x 2 = 1} = Â 2,1 = max{ ÂT (û) 2 û { 1, 1} k } = max{ A T (u) 2 u { 1, 1} m }, and this finally gives the result. (v) Note that, in this case, we wish to maximise the function x A(x) 2 2 subject to the constraint that x 2 2 = 1. In this case, the function we are maximising and the function defining the constraint are infinitely differentiable. Therefore, we can use the Lagrange Multiplier Theorem to determine the character of the maxima. Thus we define f(x) = A(x) 2 2, g(x) = x 2 2 1. As Dg has rank 1 at points satisfying the constraint, if a point x 0 R n solves the constrained maximisation problem, then there exists λ R such that Since f(x) = A T A(x), x, we compute D(f λg)(x 0 ) = 0. Df(x) v = 2 A T A(x), v. As above, Dg(x) v = 2 x, v. Thus D(f λg)(x 0 ) = 0 implies that A T A(x 0 ) = λx 0. Thus it must be the case that λ is an eigenvalue for A T A with eigenvector x 0. Since A T A is symmetric and positive-semidefinite, all eigenvalues are real and nonnegative. Thus there exist λ 1,..., λ n R 0 and vectors x 1,..., x n such that λ 1 λ n,

A top nine list: Most popular induced matrix norms 7 such that A T A(x j ) = λ j x j, j {1,..., n}, and such that a solution to the problem of maximising f with the constraint g 1 (0) is obtained by evaluating f at one of the points x 1,..., x n. Thus the problem can be solved by evaluating f at this finite collection of points, and determining at which of these f has its largest value. A computation gives f(x j ) = λ j, and this part of the result follows. (vi) First of all, we note that this part of the theorem certainly holds when A = 0. Thus we shall freely assume that A is nonzero when convenient. We maximise the function x A(x) subject to the constraint that x 2 2 = 1. We shall again use the Lagrange Multiplier Theorem, defining f(x) = A(x), g(x) = x 2 2 1. Note that A is not differentiable on R n, so we first restrict to a subset where f is differentiable. Let us define Then denote S A : R n 2 {1,...,m} x {a {1,..., m} r(a, a), x = A(x) }. B A = {x R n card(s A (x)) > 1}. We easily see that f is differentiable at points that are not in the set B A. Let us first suppose that x 0 R n \ B A is a maximum of f subject to the constraint that g(x) = 0. Then there exists a unique a 0 {1,..., m} such that f(x 0 ) = r(a, a 0 ), x 0. Since we are assuming that A is nonzero, it must be that r(a, a 0 ) is nonzero. Moreover, there exists a neighbourhood U of x 0 such that sign( r(a, a 0 ), x ) = sign( r(a, a 0 ), x 0 ) and f(x) = r(a, a 0 ), x for each x U. Abbreviating we have u A,a0 (x) = sign( r(a, a 0 ), x ), f(x) = u A,j (x 0 ) r(a, a 0 ), x for every x U. Note that, as in the proofs of parts (iv) and (v) above, Dg(x) has rank 1 for x 0. Therefore there exists λ R such that We compute D(f λg)(x 0 ) = 0. D(f λg)(x 0 ) v = u A,j (x 0 ) r(a, a 0 ), v 2λ x 0, v for every v R n. Thus we must have 2λx 0 = u A,a0 (x 0 )r(a, a 0 ). This implies that x 0 and r(a, a 0 ) are collinear and that λ = 1 2 r(a, a 0) 2

8 A. D. Lewis since x 0 2 = 1. Therefore, f(x 0 ) = u A,a0 (x 0 ) r(a, a 0 ), 1 2λ u A,a 0 (x 0 )r(a, a 0 ) = 2λ. Since λ = 1 2 r(a, a 0) 2 it follows that f(x 0 ) = r(a, a 0 ) 2. This completes the proof, but for the fact that maxima of f may occur at points in B A. Thus let x 0 B A be such that x 0 2 = 1. For a S A (x 0 ) let us write r(a, a) = ρ a x 0 + y a, where x 0, y a = 0. Therefore, r(a, a), x 0 = ρ a. We claim that if there exists a 0 S A (x 0 ) for which y a0 0, then x 0 cannot be a maximum of f subject to the constraint g 1 (0). Indeed, if y a0 0 then define x ɛ = x 0 + ɛy a0. 1 + ɛ 2 y a0 2 2 As in the proof of part (iv) above, one shows that x ɛ satisfies the constraint for every ɛ R. Also as in the proof of part (iv), we have Thus, for ɛ sufficiently small, x ɛ = x 0 + ɛy 0 + O(ɛ 2 ). r(a, a 0 ), x ɛ = r(a, a 0 ), x 0 + ɛc a0 + O(ɛ 2 ) where C a0 is nonzero. Therefore, there exists ɛ 0 R >0 such that r(a, a 0 ), x ɛ > r(a, a 0 ), x 0 either for all ɛ [ ɛ 0, 0) or for all ɛ (0, ɛ 0 ]. In either case, x 0 cannot be a maximum for f subject to the constraint g 1 (0). Finally, suppose that x 0 B A is a maximum for f subject to the constraint g 1 (0). Then, as we saw in the preceding paragraph, for each a S A (x 0 ), we must have r(a, a) = r(a, a), x 0 x 0. It follows that r(a, a) 2 2 = r(a, a), x 0 2. Moreover, by definition of S A (x 0 ) and since we are supposing that x 0 is a maximum for f subject to the constraint g 1 (0), we have Now, if a {1,..., m}, we claim that Indeed suppose that a {1,..., m} satisfies r(a, a) 2 = A 2,. (3) r(a, a) 2 A 2,. (4) r(a, a) 2 > A 2,.

A top nine list: Most popular induced matrix norms 9 Define x = r(a,a) r(a,a) 2 so that x satisfies the constraint g(x) = 0. Moreover, f(x) r(a, a), x = r(a, a) 2 > A 2,, contradicting the assumption that x 0 is a maximum for f. Thus, given that (3) holds for every a S A (x 0 ) and (4) holds for every a {1,..., m}, we have A 2, = max{ r(a, a) 2 a {1,..., m}}, as desired. For the last three parts of the theorem, the following result is useful. Lemma: Let be a norm on R n and let be the norm induced on L(R n ; R m ) by the norm on R n and the norm on R m. Then Proof: Note that the set A = max{ A(u) u { 1, 1} n }. {x R n x 1} is a convex polytope. Therefore, this set is the convex hull of the vertices { 1, 1} n ; see [Webster 1994, Theorem 2.6.16]. Thus, if x = 1 we can write x = where λ u [0, 1] for each u { 1, 1} n and u { 1,1} n λ u u u { 1,1} n λ u = 1. Therefore, A(x) = λ u A(u) λ u A(u) u { 1,1} n u { 1,1} ( n ) max{ A(u) u { 1, 1} n } u { 1,1} n λ u = max{ A(u) u { 1, 1} n }. Therefore, sup{ A(x) x = 1} max{ A(u) u { 1, 1} n } sup{ A(x) x = 1}, the last inequality holding since if u { 1, 1} n then u = 1. The result follows since the previous inequalities must be equalities. (vii) This follows immediately from the preceding lemma. (viii) This too follows immediately from the preceding lemma.

10 A. D. Lewis (ix) Note that for u { 1, 1} n we have r(a, a), u = Therefore, using the previous lemma, n A aj u j n A aj = r(a, a) 1. A, = max{ A(u) u { 1, 1} n } = max{max{ r(a, a), u a {1,..., m}} u { 1, 1} n } max{ r(a, a) 1 a {1,..., m}}. To establish the other inequality, for a {1,..., m} define u a { 1, 1} n by { 1, A aj 0, u a,j = 1, A aj < 0 and note that a direct computation gives the ath component of A(u a ) as r(a, a) 1. Therefore, max{ r(a, a) 1 a {1,..., m}} = max{ A(u a ) a a {1,..., m}} max{ A(u) u { 1, 1} n } = A,, giving this part of the theorem. 3. Complexity of induced norm computations Let us consider a comparison of the nine induced matrix norms in terms of the computational effort required. One would like to know how many operations are required to compute any of the norms. We shall do this making the following assumptions on our computational model. Floating point operations are carried out to an accuracy of ɛ = 2 N for some fixed N Z >0. By M(N) we denote the number of operations required to multiply integers j 1 and j 2 satisfying 0 j 1, j 2 2 N. We assume that addition and multiplication of floating point numbers can be performed with a relative error of O(2 N ) using O(M(N)) operations. With this assumption, we can deduce the computational complexity of the basic operations we will need. 1. Computing a square root takes O(M(N)) operations; see [Brent 1976]. 2. Computing the absolute value of a number is 1 operation (a bit flip). 3. Comparing two numbers takes O(N) operations. 4. Finding the maximum number in a list of k numbers takes O(kN) operations; see [Blum, Floyd, Pratt, Rivest, and Tarjan 1973].

A top nine list: Most popular induced matrix norms 11 5. If A L(R n ; R m ) and B L(R m ; R p ) then the matrix multiplication BA takes O(mnpM(n)) operations. Faster matrix multiplication algorithms are possible than the direct one whose complexity we describe here, [e.g., Coppersmith and Winograd 1990], but we are mainly interested in the fact that matrix multiplication has polynomial complexity in the size of the matrices. 6. Computation of the QR-decomposition of an k k matrix A has computational complexity O(k 3 M(N)). Note that the QR-decomposition can be used to determine the Gram Schmidt orthogonalisation of a finite number of vectors. We refer to [Golub and Van Loan 1996, 5.2] for details. 7. Let us describe deterministic bounds for the operations needed to compute the eigenvalues and eigenvectors of a k k matrix A, following Pan and Chen [1999]. Let us fix some norm on L(R k ; R k ). Given ɛ = 2 N as above, let β R >0 be such that 2 β A ɛ. Then Pan and Chen show that the eigenvalues and eigenvectors of A can be computed for A using an algorithm of complexity O(k 3 M(N)) + O((k log 2 k)(log β + log 2 k)m(n)). (5) There are stochastic, iterative, or gradient flow algorithms that will generically perform computations with fewer operations than predicted by this bound. However, the complexity of such algorithms is difficult to understand, or they require unbounded numbers of operations in the worst case. In any event, here we only care that the complexity of the eigenproblem is polynomial. 8. The previous two computational complexity results can be combined to show that finding the square root of a symmetric positive-definite matrix has computational complexity given by (5). This is no doubt known, but let us see how this works since it is simple. First compute the eigenvalues and eigenvectors of A using an algorithm with complexity given by (5). The eigenvectors can be made into an orthonormal basis of eigenvectors using the Gram Schmidt procedure. This decomposition can be performed using an algorithm of complexity O(n 3 M(N)). Assembling the orthogonal eigenvectors into the columns of a matrix gives an orthogonal matrix U L(R n ; R n ) and a diagonal matrix D L(R n ; R n ) with positive diagonals such that A = UDU T. Then the matrix D 1/2 with diagonal entries equal to the square roots of the diagonal of D can be constructed with complexity O(nM(n)). Finally, A 1/2 = UD 1/2 U is computed using matrix multiplication with complexity of O(n 3 M(n)). Using these known computational complexity results, it is relatively straightforward to assess the complexity of the computations of the various norms in Theorem 1. In Table 1 we display this data, recording only the dependency of the computations on the number of Table 1: Complexity of computing the norms p,q p q 1 2 1 O(mn) O(mn) O(mn) 2 O(mn2 m ) O(n 3 ) O(mn) O(mn2 n ) O(mn2 n ) O(mn)

12 A. D. Lewis rows m and columns n of the matrix. Note that the cases of (p, q) {(2, 1), (, 1), (, 2)} are exceptional in that the required operations grow exponentially with the size of A. One must exercise some care in drawing conclusions here. For example, as we show in the proof of Theorem 1, A, = max{ A(u) u { 1, 1} n }, (6) and this computation has complexity O(mn2 n ). However, it turns out that the norm can be determined with a formula that is actually less complex. Indeed, our proof of the formula for, which is not the usual proof starts with the formula (6) and produces a result with complexity O(mn) as stated in Table 1. One is then led to ask, are there similar simplifications of the norms corresponding to the cases (p, q) {(2, 1), (, 1), (, 2)}? Rohn [2000] shows that the computation of,1 is NP-hard. We shall show here, using his ideas, that the computation of the norms 2,1 and,2 are likewise difficult, perhaps impossible, to reduce to algorithms with polynomial complexity. 2 Theorem: If there exists an algorithm to compute A 2,1 or A,2 whose computational complexity is polynomial in the number of rows and the number of columns of A, then P=NP. Proof: First note that A 2,1 = A T,2, so it suffices to prove the theorem only for (p, q) = (, 2). Following Rohn [2000] we introduce the notion of an MC-matrix ( MC stands for max-cut since these matrices are related to the max-cut problem in graph theory) as a symmetric matrix A L(R n ; R n ) with the property that the diagonal elements are equal to n and the off-diagonal elements are either 0 or 1. [Rohn 1994] shows that M C-matrices are positive-definite. Paljak and Rohn [1993] also prove the following. The following decision problem is NP-complete: Given an n n MC-matrix A and M Z >0, is A(u), u M for some u { 1, 1} n? We will use this fact crucially in our proof. Let us call a symmetric matrix A L(R n ; R n ) a MC-matrix if A A is an MCmatrix. Note that the map A A A from the set of MC-matrices to the set of MCmatrices is surjective since M C-matrices have symmetric positive-definite square roots by virtue of their being themselves symmetric and positive-definite. Now suppose that there exists an algorithm for determining the (, 2)-norm of a matrix, the computational complexity of which is of polynomial order in the number of rows and columns of the matrix. Let A be an n n MC-matrix and let M Z >0. As we pointed out prior to stating the theorem, one can determine the MC-matrix A 1/2 using an algorithm with computational complexity that is polynomial in n. Then, by assumption, we can compute A 1/2 2,2 = max{ A 1/2 (u) 2 2 u { 1, 1} n } = max{ A(u), u u { 1, 1} n } in polynomial time. In particular, we can determine whether A(u), u M in polynomial time. As we stated above, this latter decision problem is NP-complete, and so we must have P=NP.

A top nine list: Most popular induced matrix norms 13 References Blum, M., Floyd, R. W., Pratt, V., Rivest, R. L., and Tarjan, R. E. [1973] Time bounds for selection, Journal of Computer and System Sciences, 7(4), pages 448 461, issn: 0022-0000, doi: 10.1016/S0022-0000(73)80033-9. Brent, R. P. [1976] Fast multiple-precision evaluation of elementary functions, Journal of the Association for Computing Machinery, 23(2), pages 242 251, issn: 0004-5411, doi: 10.1145/321941.321944. Coppersmith, D. and Winograd, S. [1990] Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, 9(3), pages 251 280, issn: 0747-7171, doi: 10.1016/ S0747-7171(08)80013-2. Drakakis, K. and Pearlmutter, B. A. [2009] On the calculation of the l 2 l 1 induced matrix norm, International Journal of Algebra, 3(5), pages 231 240, issn: 1312-8868, url: http://www.m- hikari.com/ija/ija- password- 2009/ija- password5-8- 2009/drakakisIJA5-8-2009.pdf. Edwards, C. H. [1973] Advanced Calculus of Several Variables, Harcourt Brace & Company: New York, NY, Reprint: [Edwards 1995]. [1995] Advanced Calculus of Several Variables, Dover Publications, Inc.: New York, NY, isbn: 978-0-486-68336-2, Original: [Edwards 1973]. Golub, G. H. and Van Loan, C. F. [1996] Matrix Computations, 3rd edition, Johns Hopkins Studies in the Mathematical Sciences, The Johns Hopkins University Press: Baltimore, MD, isbn: 978-0-8018-5414-9. Horn, R. A. and Johnson, C. R. [2013] Matrix Analysis, 2nd edition, Cambridge University Press: New York/Port Chester/Melbourne/Sydney, isbn: 978-0-521-83940-2. Paljak, S. and Rohn, J. [1993] Checking robust nonsingularity is NP-hard, Mathematics of Control, Signals, and Systems, 6, pages 1 9, issn: 0932-4194, doi: 10.1007/BF01213466. Pan, V. Y. and Chen, Z. Q. [1999] The complexity of the matrix eigenproblem, in Conference Record of 31st Annual ACM Symposium on Theory of Computing, ACM Symposium on Theory of Computing, (Atlanta, GA, May 1999), Association for Computing Machinery, pages 507 516. Rohn, J. [1994] Checking positive-definiteness or stability of symmetric interval matrices is NP-hard, Commentationes Mathematicae Universitatis Carolinae, 35(4), pages 795 797, issn: 0010-2628, url: http://hdl.handle.net/10338.dmlcz/118721. [2000] Computing the norm A,1 is NP-hard, Linear and Multilinear Algebra, 47(3), issn: 0308-1087, doi: 10.1080/03081080008818644. Webster, R. J. [1994] Convexity, Oxford Science Publications, Oxford University Press: Oxford, isbn: 978-0-19-853147-0.