L. Vandenberghe EE133A (Spring 2017) 3. Matrices. notation and terminology. matrix operations. linear and affine functions.

L Vandenberghe EE133A (Spring 2017) 3 Matrices notation and terminology matrix operations linear and affine functions complexity 3-1

Matrix a rectangular array of numbers, for example A = 0 1 23 01 13 4 01 0 41 1 0 17 numbers in array are the elements (entries, coefficients, components) A ij is the i, j element of A; i is its row index, j the column index size (dimensions) of the matrix is specified as (#rows) (#columns) for example, A is a 3 4 matrix set of m n matrices with real elements is written R m n set of m n matrices with complex elements is written C m n Matrices 3-2

Other conventions many authors use parentheses as delimiters: A = 0 1 23 01 13 4 01 0 41 1 0 17 often a ij is used to denote the i, j element of A Matrices 3-3

Matrix shapes Scalar: we don t distinguish between a 1 1 matrix and a scalar Vector: we don t distinguish between an n 1 matrix and an n-vector Row and column vectors a 1 n matrix is called a row vector an n 1 matrix is called a column vector (or just vector) Tall, wide, square matrices: an m n matrix is tall if m > n wide if m < n square if m = n Matrices 3-4

Block matrix a block matrix is a rectangular array of matrices elements in the array are the blocks or submatrices of the block matrix Example A = [ B C D E ] is a 2 2 block matrix; if the blocks are B = [ 2 1 ], C = [ 0 2 3 5 4 7 ], D = [ 1 ], E = [ 1 6 0 ] then A = 2 0 2 3 1 5 4 7 1 1 6 0 Note: dimensions of the blocks must be compatible! Matrices 3-5

Rows and columns a matrix can be viewed as a block matrix with row/column vector blocks m n matrix A as 1 n block matrix A = [ a 1 a 2 a n ] each a j is an m-vector (the jth column of A) m n matrix A as m 1 block matrix A = b 1 b 2 b m each b i is a 1 n row vector (the ith row of A) Matrices 3-6

Special matrices Zero matrix matrix with A ij = 0 for all i, j notation: 0 (usually) or 0 m n (if dimension is not clear from context) Identity matrix square matrix with A ij = 1 if i = j and A ij = 0 if i j notation: I (usually) or I n (if dimension is not clear from context) columns of I n are unit vectors e 1, e 2,, e n ; for example, I 3 = 1 0 0 0 1 0 0 0 1 = [ e 1 e 2 e 3 ] Matrices 3-7

Symmetric and Hermitian matrices Symmetric matrix: square with A ij = A ji 3 1 5 4 3 2 2 5 0, 4 + 3j 3 2j 0 3 2j j 2j 0 2j 3 Hermitian matrix: square with A ij = Āji (complex conjugate of A ij ) 4 3 2j 1 + j 3 + 2j 1 2j 1 j 2j 3 note: diagonal elements are real (since A ii = Āii) Matrices 3-8

Structured matrices matrices with special patterns or structure arise in many applications diagonal matrix: square with A ij = 0 for i j 1 0 0 0 2 0 0 0 5, 1 0 0 0 0 0 0 0 5 lower triangular matrix: square with A ij = 0 for i < j 3 1 0 4 0 0 1 5 2, 0 1 0 4 0 0 1 0 2 upper triangular matrix: square with A ij = 0 for i > j Matrices 3-9

Sparse matrices a matrix is sparse if most (almost all) of its elements are zero sparse matrix storage formats and algorithms exploit sparsity efficiency depends on number of nonzeros and their positions positions of nonzeros are visualized in a spy plot Example 2,987,012 rows and columns 26,621,983 nonzeros (Freescale/FullChip matrix from Univ of Florida Sparse Matrix Collection) Matrices 3-10

Outline notation and terminology matrix operations linear and affine functions complexity

Scalar-matrix multiplication and addition Scalar-matrix multiplication: scalar-matrix product of m n matrix A with scalar β βa = β A 11 β A 12 β A 1n β A 21 β A 22 β A 2n β A m1 β A m2 β A mn A and β can be real or complex Addition: sum of two m n matrices A and B (real or complex) A + B = A 11 + B 11 A 12 + B 12 A 1n + B 1n A 21 + B 21 A 22 + B 22 A 2n + B 2n A m1 + B m1 A m2 + B m2 A mn + B mn Matrices 3-11

Transpose the transpose of an m n matrix A is the n m matrix A T = A 11 A 21 A m1 A 12 A 22 A m2 A 1n A 2n A mn (A T ) T = A a symmetric matrix satisfies A = A T A may be complex, but transpose of complex matrices is rarely needed transpose of matrix-scalar product and matrix sum (βa) T = βa T, (A + B) T = A T + B T Matrices 3-12

Conjugate transpose the conjugate transpose of an m n matrix A is the n m matrix A H = Ā 11 Ā 21 Ā m1 Ā 12 Ā 22 Ā m2 Ā 1n Ā 2n Ā mn (Āij is complex conjugate of A ij ) A H = A T if A is a real matrix a Hermitian matrix satisfies A = A H conjugate transpose of matrix-scalar product and matrix sum (βa) H = βa H, (A + B) H = A H + B H Matrices 3-13

Matrix-matrix product product of m n matrix A and n p matrix B (A, B are real or complex) C = AB is the m p matrix with i, j element C ij = A i1 B 1j + A i2 B 2j + + A in B nj dimensions must be compatible: #columns in A = #rows in B Matrices 3-14

Exercise: paths in directed graph directed graph with n = 5 vertices 2 3 5 A = matrix representation 0 1 0 0 1 1 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 4 A ij = 1 indicates an edge j i Question: give a graph interpretation of A 2 = AA, A 3 = AAA, A 2 = 1 0 1 1 0 0 1 0 1 2 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0, A3 = 1 1 0 1 2 2 0 1 2 0 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1, Matrices 3-15

Properties of matrix-matrix product associative: (AB)C = A(BC) so we write ABC associative with scalar-matrix multiplication: (γa)b = γ(ab) = γab distributes with sum: A(B + C) = AB + AC, (A + B)C = AC + BC transpose and conjugate transpose of product: (AB) T = B T A T, (AB) H = B H A H not commutative: AB BA in general; for example, [ ] [ ] [ ] [ 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 ] there are exceptions, eg, AI = IA for square A Matrices 3-16

Notation for vector inner product inner product of a, b R n (see page 1-17): b T a = b 1 a 1 + b 2 a 2 + + b n a n = b 1 b 2 b n T a 1 a 2 a n product of the transpose of the column vector b and the column vector a inner product of a, b C n (see page 1-24): b H a = b 1 a 1 + b 2 a 2 + + b n a n = b 1 b 2 b n H a 1 a 2 a n product of conjugate transpose of the column vector b and the column vector a Matrices 3-17

Matrix-matrix product and block matrices block-matrices can be multiplied as regular matrices Example: product of two 2 2 block matrices [ ] [ ] [ A B W Y AW + BX AY + BZ = C D X Z CW + DX CY + DZ ] if the dimensions of the blocks are compatible Matrices 3-18

Outline notation and terminology matrix operations linear and affine functions complexity

Matrix-vector product product of m n matrix A with n-vector (or n 1 matrix) x Ax = A 11 x 1 + A 12 x 2 + + A 1n x n A 21 x 1 + A 22 x 2 + + A 2n x n A m1 x 1 + A m2 x 2 + + A mn x n dimensions must be compatible: number of columns of A equals the size of x Ax is a linear combination of the columns of A: Ax = [ ] a 1 a 2 a n x 1 x 2 x n each a i is an m-vector (ith column of A) = x 1a 1 + x 2 a 2 + + x n a n Matrices 3-19

Linear function a function f : R n R m is linear if superposition holds: for all n-vectors x, y and all scalars α, β f(αx + βy) = αf(x) + βf(y) Extension: if f is linear, superposition holds for any linear combination: f(α 1 u 1 + α 2 u 2 + + α p u p ) = α 1 f(u 1 ) + α 2 f(u 2 ) + + α p f(u p ) for all scalars, α 1,, α p and all n-vectors u 1,, u p Matrices 3-20

Matrix-vector product function for fixed A R m n, define a function f : R n R m as f(x) = Ax any function of this type is linear: A(αx + βy) = α(ax) + β(ay) every linear function can be written as a matrix-vector product function: f(x) = f(x 1 e 1 + x 2 e 2 + + x n e n ) = x 1 f(e 1 ) + x 2 f(e 2 ) + + x n f(e n ) = [ f(e 1 ) f(e n ) ] x 1 x n hence, f(x) = Ax with A = [ f(e 1 ) f(e 2 ) f(e n ) ] Matrices 3-21

Input-output (operator) interpretation think of a function f : R n R m in terms of its effect on x x A y = f(x) = Ax signal processing/control interpretation: n inputs x i, m outputs y i f is linear if we can represent its action on x as a product f(x) = Ax Matrices 3-22

Examples (f : R 3 R 3 ) f reverses the order of the components of x a linear function: f(x) = Ax with A = 0 0 1 0 1 0 1 0 0 f sorts the components of x in decreasing order: not linear f scales x 1 by a given number d 1, x 2 by d 2, x 3 by d 3 a linear function: f(x) = Ax with A = d 1 0 0 0 d 2 0 0 0 d 3 f replaces each x i by its absolute value x i : not linear Matrices 3-23

Operator interpretation of matrix-matrix product explains why in general AB BA x B A y = ABx x A B y = BAx Example A = [ 1 0 0 1 ], B = [ 0 1 1 0 ] f(x) = ABx reverses order of elements; then changes sign of first element f(x) = BAx changes sign of 1st element; then reverses order Matrices 3-24

Reverser and circular shift Reverser matrix A = 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0, Ax = x n x n 1 x 2 x 1 Circular shift matrix A = 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0, Ax = x n x 1 x 2 x n 1 Matrices 3-25

Permutation Permutation matrix a square 0-1 matrix with one element 1 per row and one element 1 per column equivalently, an identity matrix with columns reordered equivalently, an identity matrix with rows reordered Ax is a permutation of the elements of x Example A = 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0, Ax = x 2 x 4 x 1 x 3 Matrices 3-26

Rotation in a plane A = [ cos θ sin θ sin θ cos θ ] Ax is x rotated counterclockwise over an angle θ Ax θ x Matrices 3-27

Projection on line and reflection x a y z projection on line through a (see page 2-12): y = at x 1 a 2a = Ax with A = a 2aaT reflection with respect to line through a z = x + 2(y x) = Bx, with B = 2 a 2aaT I Matrices 3-28

Node-arc incidence matrix directed graph (network) with m vertices, n arcs (directed edges) incidence matrix is m n matrix A with A ij = 1 if arc j enters node i 1 if arc j leaves node i 0 otherwise 3 2 3 1 4 5 1 2 4 A = 1 1 0 1 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 1 Matrices 3-29

Kirchhoff s current law n-vector x = (x 1, x 2,, x n ) with x j the current through arc j (Ax) i = arc j enters node i x j arc j leaves node i x j = total current arriving at node i x 3 2 3 x 1 x 4 x 5 Ax = x 1 x 2 + x 4 x 1 x 3 x 3 x 4 x 5 x 2 + x 5 1 x 2 4 Matrices 3-30

Kirchhoff s voltage law m-vector y = (y 1, y 2,, y m ) with y i the potential at node i (A T y) j = y k y l if edge j goes from node l to k = negative of voltage across arc j y 2 3 y 3 2 3 y 1 1 4 5 1 2 4 y 4 A T y = y 2 y 1 y 4 y 1 y 3 y 2 y 1 y 3 y 4 y 3 Matrices 3-31

Convolution the convolution of an n-vector a and an m-vector b is the (n + m 1)-vector c notation: c = a b Example: n = 4, m = 3 c k = all i and j with i + j = k + 1 a i b j c 1 = a 1 b 1 c 2 = a 1 b 2 + a 2 b 1 c 3 = a 1 b 3 + a 2 b 2 + a 3 b 1 c 4 = a 2 b 3 + a 3 b 2 + a 4 b 1 c 5 = a 3 b 3 + a 4 b 2 c 6 = a 4 b 3 Matrices 3-32

Properties Interpretation: if a and b are the coefficients of polynomials p(x) = a 1 + a 2 x + + a n x n 1, q(x) = b 1 + b 2 x + + b m x m 1 then c = a b gives the coefficients of the product polynomial p(x)q(x) = c 1 + c 2 x + c 3 x 2 + + c n+m 1 x n+m 2 Properties symmetric: a b = b a associative: (a b) c = a (b c) if a b = 0 then a = 0 or b = 0 these properties follow directly from the polynomial product interpretation Matrices 3-33

Example: moving average of a time series n-vector x represents a time series the 3-period moving average of the time series is the time series y k = 1 3 (x k + x k 1 + x k 2 ), k = 1, 2,, n + 2 (with x k interpreted as zero for k < 1 and k > n) this can be expressed as a convolution y = a x with a = (1/3, 1/3, 1/3) 3 3 xk 2 1 (a x)k 2 1 0 0 20 40 60 80 100 k 0 0 20 40 60 80 100 k Matrices 3-34

Convolution and Toeplitz matrices c = a b is a linear function of b if we fix a c = a b is a linear function of a if we fix b Example: convolution c = a b of a 4-vector a and a 3-vector b c 1 c 2 c 3 c 4 c 5 c 6 = a 1 0 0 a 2 a 1 0 a 3 a 2 a 1 a 4 a 3 a 2 0 a 4 a 3 0 0 a 4 b 1 b 2 b 3 = b 1 0 0 0 b 2 b 1 0 0 b 3 b 2 b 1 0 0 b 3 b 2 b 1 0 0 b 3 b 2 0 0 0 b 3 a 1 a 2 a 3 a 4 the matrices in these matrix-vector products are called Toeplitz matrices Matrices 3-35

Vandermonde matrix polynomial of degree n 1 or less with coefficients x 1, x 2,, x n : p(t) = x 1 + x 2 t + x 3 t 2 + + x n t n 1 values of p(t) at m points t 1,, t m : p(t 1 ) p(t 2 ) p(t m ) = = Ax 1 t 1 t1 n 1 1 t 2 t2 n 1 1 t m tm n 1 x 1 x 2 x n the matrix A is called a Vandermonde matrix f(x) = Ax maps coefficients of polynomial to function values Matrices 3-36

Discrete Fourier transform the DFT maps a complex n-vector (x 1, x 2,, x n ) to the complex n-vector y 1 y 2 y 3 y n = 1 1 1 1 1 ω 1 ω 2 ω (n 1) 1 ω 2 ω 4 ω 2(n 1) 1 ω (n 1) ω 2(n 1) ω (n 1)(n 1) x 1 x 2 x 3 x n = W x where ω = e 2πj/n (and j = 1) DFT matrix W C n n has k, l element W kl = ω (k 1)(l 1) a Vandermonde matrix with m = n and t 1 = 1, t 2 = ω 1, t 3 = ω 2,, t n = ω (n 1) Matrices 3-37

Affine function a function f : R n R m is affine if it satisfies f(αx + βy) = αf(x) + βf(y) for all n-vectors x, y and all scalars α, β with α + β = 1 Extension: if f is affine, then f(α 1 u 1 + α 2 u 2 + + α m u m ) = α 1 f(u 1 ) + α 2 f(u 2 ) + + α m f(u m ) for all n-vectors u 1,, u m and all scalars α 1,, α m with α 1 + α 2 + + α m = 1 Matrices 3-38

Affine functions and matrix-vector product for fixed A R m n, b R m, define a function f : R n R m by f(x) = Ax + b ie, a matrix-vector product plus a constant any function of this type is affine: if α + β = 1 then A(αx + βy) + b = α(ax + b) + β(ax + b) every affine function can be written as f(x) = Ax + b with: and b = f(0) A = [ f(e 1 ) f(0) f(e 2 ) f(0) f(e n ) f(0) ] Matrices 3-39

Affine approximation first-order Taylor approximation of differentiable f : R n R m around z: f i (x) = f i (z) + f i x 1 (z)(x 1 z 1 ) + + f i x n (z)(x n z n ), i = 1,, m in matrix-vector notation: f(x) = f(z) + Df(z)(x z) where Df(z) = f 1 x 1 (z) f 2 x 1 (z) f m x 1 (z) f 1 x 2 (z) f 2 x 2 (z) f m x 2 (z) f 1 x n (z) f 2 x n (z) f m x n (z) = f 1 (z) T f 2 (z) T f m (z) T Df(z) is called the derivative matrix or Jacobian matrix of f at z f is a local affine approximation of f around z Matrices 3-40

Example f(x) = [ f1 (x) f 2 (x) ] = [ ] e 2x 1 +x 2 x 1 x 2 1 x 2 derivative matrix [ 2e 2x 1 +x 2 1 e 2x 1+x 2 ] Df(x) = 2x 1 1 first order approximation of f around z = 0: f(x) = [ f1 (x) f 2 (x) ] = [ 1 0 ] + [ 1 1 0 1 ] [ x1 x 2 ] Matrices 3-41

Outline notation and terminology matrix operations linear and affine functions complexity

Matrix-vector product matrix-vector multiplication of m n matrix A and n-vector x: y = Ax requires (2n 1)m flops m elements in y; each element requires an inner product of length n approximately 2mn for large n Special cases: flop count is lower for structured matrices A diagonal: n flops A lower triangular: n 2 flops A sparse: #flops 2mn Matrices 3-42

Matrix-matrix product product of m n matrix A and n p matrix B: C = AB requires mp(2n 1) flops mp elements in C; each element requires an inner product of length n approximately 2mnp for large n Matrices 3-43

Exercises 1 evaluate y = ABx two ways (A and B are n n, x is a vector) y = (AB)x (first make product C = AB, then multiply C with x) y = A(Bx) (first make product y = Bx, then multiply A with y) both methods give the same answer, but which method is faster? 2 evaluate y = (I + uv T )x where u, v, x are n-vectors A = I + uv T followed by y = Ax in MATLAB: y = (eye(n) + u*v ) * x w = (v T x)u followed by y = x + w in MATLAB: y = x + (v *x) * u Matrices 3-44