Large Scale Data Analysis Using Deep Learning

Similar documents
Linear Algebra for Machine Learning. Sargur N. Srihari

Deep Learning Book Notes Chapter 2: Linear Algebra

Linear Algebra (Review) Volker Tresp 2018

Review of Linear Algebra

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra Review. Vectors

Linear Algebra - Part II

Image Registration Lecture 2: Vectors and Matrices

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

Lecture II: Linear Algebra Revisited

UNIT 6: The singular value decomposition.

Maths for Signals and Systems Linear Algebra in Engineering

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Foundations of Computer Vision

Properties of Matrices and Operations on Matrices

1 Linearity and Linear Systems

The study of linear algebra involves several types of mathematical objects:

Knowledge Discovery and Data Mining 1 (VO) ( )

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

CS 246 Review of Linear Algebra 01/17/19

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Introduction to Data Mining

Mathematical foundations - linear algebra

Singular Value Decomposition

Econ Slides from Lecture 7

Review of Linear Algebra

Linear Algebra Formulas. Ben Lee

CS 143 Linear Algebra Review

Review of Some Concepts from Linear Algebra: Part 2

Chapter 3 Transformations

1 Inner Product and Orthogonality

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

7. Symmetric Matrices and Quadratic Forms

2. Review of Linear Algebra

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Background Mathematics (2/2) 1. David Barber

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Linear Algebra: Characteristic Value Problem

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Functional Analysis Review

14 Singular Value Decomposition

Appendix A: Matrices

Mathematical foundations - linear algebra

Basic Calculus Review

Data Preprocessing. Jilles Vreeken IRDM 15/ Oct 2015

Matrix Algebra: Summary

B553 Lecture 5: Matrix Algebra Review

Linear Algebra. Lecture slides for Chapter 2 of Deep Learning Ian Goodfellow

Linear Algebra and Eigenproblems

I. Multiple Choice Questions (Answer any eight)

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Stat 159/259: Linear Algebra Notes

Quantum Computing Lecture 2. Review of Linear Algebra

3 (Maths) Linear Algebra

1 Last time: least-squares problems

Review of some mathematical tools

Basic Concepts in Linear Algebra

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Singular Value Decomposition

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Eigenvalues and diagonalization

Review problems for MA 54, Fall 2004.

Singular Value Decompsition

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Review of Basic Concepts in Linear Algebra

Introduction to Numerical Linear Algebra II

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)

Singular Value Decomposition (SVD)

Lecture 2: Linear Algebra

Linear Algebra Massoud Malek

LinGloss. A glossary of linear algebra

Linear Algebra Review

Lecture 6 Positive Definite Matrices

The Singular Value Decomposition

COMP 558 lecture 18 Nov. 15, 2010

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

b 1 b 2.. b = b m A = [a 1,a 2,...,a n ] where a 1,j a 2,j a j = a m,j Let A R m n and x 1 x 2 x = x n

EE731 Lecture Notes: Matrix Computations for Signal Processing

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

DS-GA 1002 Lecture notes 10 November 23, Linear models

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Designing Information Devices and Systems II

Lecture notes: Applied linear algebra Part 1. Version 2

Problem # Max points possible Actual score Total 120

Lecture 02 Linear Algebra Basics

Quick Tour of Linear Algebra and Graph Theory

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Cheat Sheet for MATH461

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

Jeffrey D. Ullman Stanford University

2. Every linear system with the same number of equations as unknowns has a unique solution.

Chapter 1. Matrix Algebra

Chap 3. Linear Algebra

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Transcription:

Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1

In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset most relevant to deep learning For details on linear algebra, refer to Linear Algebra and Its Applications by Gilbert Strang U Kang 2

Scalar A scalar is a single number Integers, real number, rational numbers, etc. We denote it with italic font a, n, x U Kang 3

Vectors A vector is a 1-D array of numbers Can be real, binary, integer, etc. Notation for type and size U Kang 4

Matrices A matrix is a 2-D array of numbers Example notation for type and shape U Kang 5

Tensors A tensor is an array of numbers, that may have Zero dimensions, and be a scalar One dimension, and be a vector Two dimensions, and be a matrix Three dimensions or more Matrix over time Knowledge base (subject, verb, object) U Kang 6

Matrix Transpose (A T ) i,j = A j,i (AB) T = B T A T U Kang 7

Matrix Product C = AB U Kang 8

Matrix Product Matrix product as sum of outer product Product with a diagonal matrix U Kang 9

Identity Matrix Example identity matrix: I 3 U Kang 10

Systems of Equations expands to U Kang 11

Solving Systems of Equations A linear system of equations can have: No solution 3x + 2y = 6, 3x + 2y = 12 Many solutions 3x + 2y = 6, 6x + 4y = 12 Exactly one solution: this means multiplication by the matrix is an invertible function AAAA = bb xx = AA 1 bb U Kang 12

Matrix Inversion Matrix inverse: an inverse A -1 of an nxn matrix A satisfies AAAA 1 = AA 1 AA = II nn Solving a system using an inverse U Kang 13

Invertibility Matrix A cannot be inverted if More rows than columns More columns than rows Redundant rows/columns ( linearly dependent, low rank ) full rank The number 0 is an eigenvalue of A A non-invertible matrix is called a singular matrix An invertible matrix is called non-singular U Kang 14

Linear Dependence and Span Linear combination of vectors {vv 1,, vv nn } : ii cc ii vv (ii) The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: AAAA = ii xx ii AA :,ii The span of columns of A is called column space or range of A A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors linearly dependent U Kang 15

Rank of a Matrix Rank of a matrix A: Number of linearly independent columns (or rows) of A For example: Matrix A = s rank? Why? U Kang 16

Norms Functions that measure how large a vector is Similar to a distance between zero and the point represented by the vector Formally, a norm is any function f that satisfies the following: U Kang 17

Norms L p norm Most popular norm: L2 norm, p = 2 Euclidean distance For matrix/tensor, Frobenius norm (L F ) does the same thing L1 norm, p = 1: Called Manhattan distance Max norm, infinite p: Dot product in terms of norm: U Kang 18

Special Matrices and Vectors Diagonal matrix 2 by 2 : aa bb We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements Inverse of diagonal matrix is computed easily U Kang 19

Special Matrices and Vectors Unit vector Symmetric matrix Orthogonal matrix U Kang 20

Eigenvector and Eigenvalue Eigenvector and eigenvalue of A Eigenvalue/eigenvector pairs are defined only for square matrix AA RR nn nn # of eigenvalues: n There may be duplicates Eigenvalues and eigenvectors can contain real or imaginary numbers U Kang 21

Intuition A as vector transformation x 2 1 A x 2 1 1 3 1 0 = x x U Kang 22

Intuition By defn., eigenvectors remain parallel to them selves ( fixed points ) λ 1 3.62 * v 1 0.52 0.85 = A 2 1 1 3 v 1 0.52 0.85 U Kang 23

Eigendecomposition Eigendecomposition of a matrix Let A be a square (n x n) matrix with n linearly independent eigenvectors (=diagonizable) Then A can be factorized as where V is an (nxn) matrix whose i th column is the i th eigenvector of A U Kang 24

Eigendecomposition Every real symmetric matrix has a real, orthogonal eigendecomposition A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors) QQ TT QQ = QQ QQ TT = II QQ TT = QQ 1 U Kang 25

Eigendecomposition Interpreting matrix-vector multiplication Ax using eigendecomposition U Kang 26

Eigendecomposition Understanding optimization of ff xx = xx TT AAAA, s.t. xx 2 = 1, using eigenvalues and eigenvectors U Kang 27

Eigendecomposition Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive Positive semidefinite matrix Negative definite matrix Negative semidefinite matrix For positive semidefinite matrix A, xx, xx TT AAAA 0. For positive definite matrix A, xx 0, xx TT AAAA > 0 U Kang 28

Singular Value Decomposition (SVD) Similar to eigendecomposition More general; matrix need not be square AA = UUΛVV TT U Kang 29

SVD - Definition A [n x m] = U [n x r] Λ [ r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) Left singular vectors Λ: r x r diagonal matrix (strength of each concept ) (r : rank of the matrix) V: m x r matrix (m terms, r concepts) Right singular vectors U Kang 30

SVD - Definition A [n x m] = U [n x r] Λ [ r x r] (V [m x r] ) T A U Λ = x x r x r V T r x m n x m n x r U Kang 31

SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U Λ V T, where U, Λ, V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) U T U = I; V T V = I (I: identity matrix) Λ: singular are positive, and sorted in decreasing order U Kang 32

SVD - Properties AA UUΛVV TT = σσ ii (uu ii vv TT ii ) ii n n m A m ΛΛ V T U Best rank-k approximation in L F U Kang 33

SVD and eigendecomposition The left-singular vectors of A are the eigenvectors of AA T The right-singular vectors of A are the eigenvectors of A T A The non-zero singular values of A are the square root of the eigenvalues of A T A U Kang 34

Moore-Penrose Pseudoinverse Assume we want to solve A x = y for x What if A is not invertible? What if A is not square? Still, we can find the best x by using pseudoinverse For an (n x m) matrix A, the pseudoinverse A + is an (m x n) matrix U Kang 35

Moore-Penrose Pseudoinverse If the equation has: Exactly one solution: this is the same as the inverse No solution: this gives us the solution with the smallest error Ax y 2 Over-specified case Many solutions: this gives us the solution with the smallest norm of x Under-specified case U Kang 36

Pseudoinverse: Over-specified case No solution: this gives us the solution with the smallest error Ax y 2 [3 2] T [x] = [1 2] T (i.e., 3x = 1, 2x = 2) 2 1 desirable point y 1 2 3 4 reachable points (3x, 2x) ([3 2] T ) + [1 2] = = 7/13 This method is called the least square U Kang 37

Pseudoinverse: Under-specified case Many solutions: this gives us the solution with the smallest norm of x [1 2] [w z] T = 4 (i.e., 1 w + 2 z = 4) 2 z 1 x 0 shortest-length solution all possible solutions 1 2 3 4 w [1 2] + 4 = [1/5 2/5] T 4 = [4/5 8/5] U Kang 38

Computing the Pseudoinverse The SVD allows the computation of the pseudoinverse: U Kang 39

Trace TTTT AA = ii AA ii,ii TTTT AAAAAA = TTTT CCCCCC = TTTT(BBBBBB) TTTT AA = TTTT(AA TT ) AA FF = TTTT(AAAA TT ) For scalar a, TTTT aa = aa The trace of a matrix is also computed by the sum of all eigenvalues U Kang 40

What you need to know Linear algebra is crucial for understanding notations and mechanisms of many ML/deep learning methods Important concepts Matrix product, identity, invertibility, norm, eigendecomposition, singular value decomposition, pseudo inverse, and trace U Kang 41

Questions? U Kang 42