Linear Algebra. Maths Preliminaries. Wei CSE, UNSW. September 9, /29

September 9, 2018 1/29

Introduction This review focuses on, in the context of COMP6714. Key take-away points Matrices as Linear mappings/functions 2/29

Note You ve probability learned from matrix/system of linear equations, etc. We will review key concepts in LA from the perspective of linear transformations (think of it as functions for now). This perspective provides semantics and intuition into most of the ML models and operations. Here we emphasize more on intuitions; We deliberately skip many concepts and present some contents in an informal way. It is a great exercise for you to view related maths and ML models/operations in this perspective throughout this course! 3/29

A Common Trick in Maths I Question Calculate 2 10, 2 1, 2 ln 5 and 2 4 3i? Properties: f a (n) = f a (n 1) a, for n 1; f a (1) = a. f (u) f (v) = f (u + v). f (x) = y ln(y) = x ln(a) f (x) = exp{x ln a}. e ix = cos(x) + i sin(x). The trick: Same in Linear algebra 4/29

Objects and Their Representations Goal We need to study the objects On one side: A good representation helps (a lot)! On the other side: Properties of the objects should be independent of the representation! 5/29

Basic Concepts I Algebra a set of objects two operations and their identity objects (aka. identify element): addition (+); its identity is 0. scalar multiplication ( ); its identity is 1. constraints: Closed for both operations Some nice properties of these operations: Communicative: a + b = b + a. Associative: (a + b) + c = a + (b + c). Distributive: λ(a + b) = λa + λb. 6/29

Basic Concepts II Think: What about substraction and division? Tips Always use analogy from algebra on integers (Z) and algebra on Polynomials (P). Why these constraints are natural and useful? 7/29

Basic Concepts III Representation matters? Consider even geometric vectors: c = a + b What if we represent vectors by a column of their coordinates? What if by their polar coordinates? Notes Informally, the objects we are concerned with in this course are (column) vectors. The set of all n-dimensional real vectors is called R n. 8/29

(Column) Vector Vector A n-dimensional vector, v, is a n 1 matrix. We can emphasize its shape by calling it a column vector. A row vector is a transposed column vector: v. Operations Addition: v 1 + v 2 = (Scalar) Multiplication: λv 1 = 9/29

Linearity I Linear Combination: Generalization of Univariate Linear Functions Let λ i R, given a set of k vectors v i (i [k]), a linear combination of them is λ 1 v 1 + λ 2 v 2 +... + λ k v k = i [k] λ iv i Later, this is just Vλ, where V = v 1 v 2... v k λ 1 λ 2 λ =... Span: All linear combination of a set of vectors is the span of them. Basis: The minimal set of vectors whose span is exactly the whole R n. λ k 10/29

Linearity II Benefit: every vector has a unique decomposition into basis. Think: Why uniqueness is desirable? Examples [ [ 1 0 Span of and is R 0] 1] 2. They are also the basis. [ [ [ 1 0 2 Span of,, is R 0] 1] 3] 2. But one of them is redundant. Think: Who? [ 4 Decompose 6] 11/29

Linearity III Exercises What are the (natural) basis of all (univariate) Polynomials of degrees up to d? Decompose 3x 2 + 4x 8 into the linear combination of 2, 2x 3, x 2 + 1. 3x 2 + 4x 7 = 3(x 2 + 1) + 2(2x 3) + ( 2)(2). The same polynomial is mapped to two different vectors under two different bases. Think: Any analogy? 12/29

Matrix I Linear Transformation is a nice linear function that maps a vector in R n to another vector in R m. [ ] y 1 x1 f y 2 x 2 y 3 The general form: [ x1 x 2 ] f y 1 y 2 y 3 = y 1 = M 11 x 1 + M 12 x 2 y 2 = M 21 x 1 + M 22 x 2 y 3 = M 31 x 1 + M 32 x 2 13/29

Matrix II Nonexample [ x1 x 2 ] f y 1 y 2 y 3 = y 1 = αx 2 1 + βx 2 y 2 = γx 2 1 + θx 1 + τx 2 y 3 = cos(x 1 ) + e x 2 14/29

Matrix III Why Only Linear Transformation? Simple and nice properties: (f 1 + f 2 )(x) = f 1 (x) + f 2 (x) (λf )(x) = λ f (x) What about f (g(x))? Useful 15/29

Matrix I Definition A m n matrix corresponds to a linear transformation from R n to R m f ( x ) = y = M x = y, where matrix-vector multiplication is defined as: y i = k M ik x k M outdim indim Transformation or Mapping emphasizes more on the mapping between two sets, rather than the detailed specification of the mapping; the latter is more or less the elementary understanding of a function. These are all specific instances of morphism in category theory. Semantic Interpretation 16/29

Matrix II Linear combination of columns of M: x 1 y 1 x 2 M 1 M 2... M n x = M 1 M 2... M n. =. y x m n y = x 1 M 1 +... + x n M n 17/29

Matrix III Example: 1 2 [ ] 1 2 21 4 9 1 = 1 4 + 10 9 = 86 25 1 10 25 1 35 [ ] [ ] [ ] [ ] [ ] 1 2 1 1 2 21 = 1 + 10 = 4 9 10 4 9 86 Think: What does M do for the last example? Rotation and scaling When x is also a matrix, 1 2 [ ] 21 42 4 9 1 2 = 86 172 10 20 25 1 35 70 18/29

System of Linear Equations I y 1 = M 11 x 1 + M 12 x 2 y 2 = M 21 x 1 + M 22 x 2 y 3 = M 31 x 1 + M 32 x 2 = y 1 y 2 y 3 M 11 M 12 = M 21 M 22 M 31 M 32 y = Mx [ x1 x 2 ] Interpretation: find a vector in R 2 such that its image (under M) is exactly the given vector y R 3. How to solve it? 19/29

System of Linear Equations II x 1 y 1 x 2 y 2 x 3 y 3 Domain M Range The above transformation is injective, but not surjective. 20/29

A Matrix Also Specifies a (Generalized) Coordinate System I Yet another interpretation y = Mx = Iy = Mx. The vector y wrt standard coordinate system, I, is the same as x wrt the coordinate system defined by column vectors of M. Think: why columns of M? 21/29

A Matrix Also Specifies a (Generalized) Coordinate System II Example for polynomials I: for 1 1 0 0 for x 0 1 0 = M: for x 2 0 0 1 ] Let x = [ 1 2 3 = Mx = I [ 7 13 6 for 3 3 1 4 for x 1 0 1 5 for 2x 2 +5x 4 ] 0 0 2 22/29

Exercise I What if y is given in the above example? What does the following mean? 3 1 4 1 0 0 3 1 4 0 1 5 0 1 0 = 0 1 5 0 0 2 0 0 1 0 0 2 Think about representing polynomials using the basis: (x 1) 2, x 2 1, x 2 + 1. 23/29

Inner Product THE binary operator some kind of similarity Type signature: vector vector scalar: x, y. In R n, usually called dot product: x y def = x y = i x iy i. For certain functions, f, g = b f (t)g(t) dt. leads to the a Hilbert Space Properties / definitions for R: conjugate symmetry: x, y = y, x linearity in the first argument: ax + y, z = a x, z + y, z positive definitiveness: x, x 0; x, x x = 0; Generalizes many geometric concepts to vector spaces: angle (orthogonal), projection, norm sin nt, sin mt = 0 within [ π, π] (m n) they are orthogonal to each other. C = A B: C ij = A i, B j Special case: A A. 24/29

Eigenvalues/vectors and Eigen Decomposition Eigen means characteristic of (German) A (right) eigenvector of a square matrix A is u such that Au = λu. Not all matrices have eigenvalues. Here, we only consider good matrices. Not all eigenvalues need to be distinct. Traditionally, we normalize u (such that u u = 1). We can use all eigenvectors of A to construct a matrix U (as columns). Then AU = UΛ, or equivalently, A = UΛU 1. This is the Eigen Decomposition. We can interpret U as a transformation between two coordinate systems. Note that vectors in U are not necessarily orthogonal. Λ as the scaling on each of the directions in the new coordinate system. 25/29

Applications Compute A n Apply Eigen Decomposition. A = UΛU 1. Then A n = ( UΛU 1) ( UΛU 1)... = UΛ n U 1 26/29

Exercises I Rewrite n i=1 a ib i in vector/matrix operations. Write a matrix operation such that given x = [ x1 2x 2 3x 3 ]. [ x1 x 2 x 3 ], it returns 27/29

Exercises II Suppose we want to apply the linear mapping W to more than one x vectors. Draw a schematic diagram to show how this can be done using matrix operations (rather than a loop). In machine learning, we usually store training data as a data matrix; if it is n m, then it has n training samples (as rows) and each sample is characterized by its m feature values. How can I apply the above linear mapping to all the training samples. 28/29

References and Further Reading I Gaussian Quadrature: https://www.youtube.com/watch?v=k-yudqrxijo Review and Reference. http://cs229.stanford.edu/section/cs229-linalg.pdf Scipy LA tutorial. https://docs.scipy.org/doc/scipy/ reference/tutorial/linalg.html We Recommend a Singular Value Decomposition. http://www.ams.org/samplings/feature-column/fcarc-svd 29/29