Linear Algebra. Lecture slides for Chapter 2 of Deep Learning Ian Goodfellow

Similar documents
The study of linear algebra involves several types of mathematical objects:

Large Scale Data Analysis Using Deep Learning

Linear Algebra for Machine Learning. Sargur N. Srihari

Matrices. Chapter Definitions and Notations

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Review of Linear Algebra

Deep Learning Book Notes Chapter 2: Linear Algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Elementary maths for GMT

Matrices BUSINESS MATHEMATICS

Knowledge Discovery and Data Mining 1 (VO) ( )

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Econ Slides from Lecture 7

Linear Algebra Review. Vectors

IFT 6760A - Lecture 1 Linear Algebra Refresher

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Image Registration Lecture 2: Vectors and Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

1 Matrices and matrix algebra

Phys 201. Matrices and Determinants

CS 246 Review of Linear Algebra 01/17/19

3 a 21 a a 2N. 3 a 21 a a 2M

4 Linear Algebra Review

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Basic Concepts in Linear Algebra

Matrix Algebra: Summary

A primer on matrices

Review of Basic Concepts in Linear Algebra

Review of linear algebra

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Matrices Gaussian elimination Determinants. Graphics 2009/2010, period 1. Lecture 4: matrices

Finite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system

Matrices and systems of linear equations

Math 123, Week 2: Matrix Operations, Inverses

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

n n matrices The system of m linear equations in n variables x 1, x 2,..., x n can be written as a matrix equation by Ax = b, or in full

Chapter 2 - Basics Structures MATH 213. Chapter 2: Basic Structures. Dr. Eric Bancroft. Fall Dr. Eric Bancroft MATH 213 Fall / 60

B553 Lecture 5: Matrix Algebra Review

Fundamentals of Engineering Analysis (650163)

b 1 b 2.. b = b m A = [a 1,a 2,...,a n ] where a 1,j a 2,j a j = a m,j Let A R m n and x 1 x 2 x = x n

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

ICS 6N Computational Linear Algebra Matrix Algebra

Linear Algebra V = T = ( 4 3 ).

Linear Algebra Massoud Malek

4 Linear Algebra Review

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Quantum Computing Lecture 2. Review of Linear Algebra

CSL361 Problem set 4: Basic linear algebra

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Chapter 2. Linear Algebra. rather simple and learning them will eventually allow us to explain the strange results of

Chapter 2 - Basics Structures

MAC Module 1 Systems of Linear Equations and Matrices I

ELEMENTARY LINEAR ALGEBRA

Matrices. 1 a a2 1 b b 2 1 c c π

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Review of Linear Algebra

Notation, Matrices, and Matrix Mathematics

Matrix Algebra: Definitions and Basic Operations

Matrices and Vectors

Linear Algebra (Review) Volker Tresp 2018

Definition 2.3. We define addition and multiplication of matrices as follows.

Chapter 1 Vector Spaces

A Review of Matrix Analysis

Numerical Analysis Lecture Notes

Matrices: 2.1 Operations with Matrices

Appendix A: Matrices

Mathematics for Graphics and Vision

Relation of Pure Minimum Cost Flow Model to Linear Programming

Lecture 7. Econ August 18

What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010

EE731 Lecture Notes: Matrix Computations for Signal Processing

22m:033 Notes: 7.1 Diagonalization of Symmetric Matrices

Mathematics 13: Lecture 10

[ Here 21 is the dot product of (3, 1, 2, 5) with (2, 3, 1, 2), and 31 is the dot product of

Linear Algebra and Matrix Inversion

Matrix Arithmetic. j=1

Mathematical foundations - linear algebra

Linear Algebra March 16, 2019

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Linear Equations in Linear Algebra

Presentation by: H. Sarper. Chapter 2 - Learning Objectives

Linear Algebra. Preliminary Lecture Notes

Vector, Matrix, and Tensor Derivatives

There are two things that are particularly nice about the first basis

Linear Equations in Linear Algebra

Linear Algebra and Eigenproblems

Matrix Algebra: Vectors

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

ENGI 9420 Lecture Notes 2 - Matrix Algebra Page Matrix operations can render the solution of a linear system much more efficient.

1 Last time: least-squares problems

Matrix Basic Concepts

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

Vectors and matrices: matrices (Version 2) This is a very brief summary of my lecture notes.

Linear Algebra Primer

A matrix over a field F is a rectangular array of elements from F. The symbol

Definitions for Quizzes

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Definition of Equality of Matrices. Example 1: Equality of Matrices. Consider the four matrices

Transcription:

Linear lgebra Lecture slides for Chapter 2 of Deep Learning Ian Goodfellow 206-06-24

bout this chapter Not a comprehensive survey of all of linear algebra Focused on the subset most relevant to deep learning Larger subset: e.g., Linear lgebra by Georgi Shilov

Scalars scalar is a single number Integers, real numbers, rational numbers, etc. We denote it with italic font: a, n, x

Vectors vector is a -D array of numbers: x = 2 6 4 x x 2. x n 3 7 5. (2.) Can be real, binary, integer, etc. Example notation for type and size: in t R n

ndices and write the set as a subscript x6, we define the set S = {, 3, 6} and Matrices x the complement of a set. For example ents of x except for x, and x S is the of x except x,ofxnumbers: matrix is a for 2-D array 3 and x6. 2, = 4 2, 3, 3,2, 2,2 5 ) =,2 3,2 2, 2,2 3, 3,2 The transpose of the matrix can be thought of as a mirror image across the al. th column of. When we need to explicitly identify the elements of a x, we write them as an array enclosed in square brackets: Row,,2. (2.2) 2, 2,2 ray of numbers, so each element is identifi. We usually give matrices upper-case va Column Example notation for type and shape: h as. If a real-valued matrix has a m n we say that 2 R. We usually id g its name in italic but not bold font, an times we may need to index matrix-valued expressions that are not just gle letter. In this case, we use subscripts after the expression, but do onvert anything to lower case. For example, f ()i,j gives element (i, j) e matrix computed by applying the function f to. ors: In some cases we will need an array with more than two axes. In eneral case, an array of numbers arranged on a regular grid with a ble number of axes is known as a tensor. We denote a tensor named this typeface:. We identify the element of at coordinates (i, j, k) riting.

Tensors tensor is an array of numbers, that may have zero dimensions, and be a scalar one dimension, and be a vector two dimensions, and be a matrix or more dimensions.

ations have many useful properties that make mathematical rtant operation on matrices is the transpose. The transpose of a CHPTER 2. LINER LGEBR more convenient. For example, matrix multiplication is mirror image of the matrix across a diagonal line, called the main Matrix Transpose ning down and to the right, starting from its upper left corner. See graphical depiction of = thisb operation. We denote the transpose of a (B + C) + C., and it is defined such that ( )i,j = j,i. (2.3) (BC) = (B)C. (2.6) (2.7) an be thought of as matrices that contain only one column. The not iscommutative condition B = B 2 (the 3 aisvector therefore a matrix with only one row. Sometimes we,,2 does not 4 5) = = alar multiplication. 33However, the dot between two product : x transpose y = yof x. (2.8)the Figure 2.: The the matrix can be thought of as a mirror image across 2, 2,2 3, 3,2, 2, 3,,2 2,2 3,2 main diagonal. matrix product has a simple form: the i-th column of. When we need to explicitly identify the elements of a as matrix, we write them (B) = B an. array enclosed in square brackets: (2.9),,2. that the value (2.2) onstrate Eq. 2.8, by exploiting the fact of 2, 2,2

Matrix (Dot) Product C = B. (2.4) s defined by C i,j = X k X i,k B k,j. (2.5) m = m n p n Must match p

Inverse Matrices PTER 2. LINER LGEBR Identity Matrix 2 inversion 3 werful tool called matrix that allows us to 0 0 for many values of. 4 0 0 5 0 0 ersion, we first need to define the concept of an identity x is a matrix that does not change any vector when we Figure 2.2: Example identity matrix: This is I3. at matrix. We denote the identity matrix that preserves n n n. Formally, I n 2xR+, xand + + x = b (2 2, n 2,2 2 8x 2 R, In x = x. 2,n n 2... m, x + m,2 x2 + + m,n xn = bm. (2 (2.20) (2 ity matrix is simple: all of the entries along the main Matrix-vector product notation provides a more compact representation the other entries are zero. See Fig. 2.2 for an example. ations of this form.

Systems of Equations x = b (2.) trix, 2 R is a known expands vector, and 2 R is a,: x = b (2.2) 2,: x = b 2 (2.3)... (2.4) m,: x = b m (2.5)

Solving Systems of Equations linear system of equations can have: No solution Many solutions Exactly one solution: this means multiplication by the matrix is an invertible function

f the identity matrix is simple: all of the entries along the main identity matrix is simple: all of the entries along the main while all of the other entries are zero. See Fig. 2.2 for an example. all of the other entries are zero. See Fig. 2.2 for an example. Matrix Inversion inverse of is denoted as, and it is defined as the matrix se of is denoted as, and it is defined as the matrix = In. Matrix inverse: (2.2) = In. solve Eq. 2. by the following steps: (2.2) Solving a system using an inverse: Eq. 2. by the following steps: (2.22) x = b x x = b= b x = In x = bb Numerically In x = 36 b unstable, analysis (2.22) (2.23) (2.23) (2.24) but useful for abstract (2.24) 36

Invertibility Matrix can t be inverted if More rows than columns More columns than rows Redundant rows/columns ( linearly dependent, low rank )

Norms X! Functions that measure how large a vector is Similar to a distance between zero and the point represented by the vector f(x) =0) x = 0 f(x + y) apple f(x)+f(y) (the triangle inequality) 8 2 R,f( x) = f(x)

zero and elements that are small but nonzero. In these cases, we turn to a function that grows at the same rate in all locations, but retains mathematical simplicity: the L norm. The L norm may be simplified to X x = xi. (2.3) o measure the size of a vector. In machine learning, we ectors using a function called a norm. Formally, the L i Norms Lp norm The normis commonly used in machine learning when the difference between PTERzero 2. LINER LGEBR and nonzero elements is very important. Every time an element of x moves p away from 0 by, the L norm increases by. x = X x p! We sometimes measure the size of the by counting its number of nonzero p i vector ause it increases very slowly near the origin. In several 0machine learning elements. Some authors refer to this function as the L norm, but this is incorrect lications, it is important to discriminate between elements that are exactly i terminology. The number of non-zero entries in a vector is not a norm, because and elements that are small but nonzero. In these cases, we turn to a function scaling the vector by does not change the number of nonzero entries. The L t grows at the same rate in all locations, but retains mathematical simplicity: norm is often used as anorm: substitute for the number Most L2 norm, p=2 of nonzero entries. L norm. The Lpopular norm may be simplified to norm, One other norm that commonly arises in machine learning is the L p X also known as the max x norm. simplifies to the absolute value of the (2.3) = This xnorm i. L norm, p=: i element with the largest magnitude in the vector, the L norm, are functions mapping vectors to non-n ive level, the norm of a vector x measures the distan L norm is commonly used in machine when the difference between Max norm, infinite p: x learning (2.32) = max xi. nt x. More rigorously, a norm is any function f that o and nonzero elements is very important. Everyi time an element of x moves y from 0 by, the L norm increases by. ties: Sometimes we may also wish to measure the size of a matrix. In the context We sometimes measure the the size the vector by to counting nonzero obscure of deep learning, mostofcommon way do thisitsis number with theof otherwise 0 norm, but this is incorrect ments.frobenius Some authors refer to this function as the L norm sx minology. The number of non-zero entries in a vector is not a norm, because

n =. (2.35) learning algorithm in terms of arbitrary matrices, esmachine often arise when the entries are generated by some function of at not depend on the order of the arguments. example, sive (and less descriptive) by some endoes arise when the entries arealgorithm generated by restricting somefor function of distance measurements, with giving the distance point es not depend on the order ofi,jthe arguments. Forfrom example, = because distance functions are symmetric. i,j j,i ance the distance from point i,j giving ices measurements, need be square.with It is possible to construct a rectangular is avector with unit norm:functions are symmetric. = because distance j,i quare diagonal matrices do not have inverses but it is still Unit vector: x a2 =. (2.36) ector unit For norm: them with cheaply. non-square diagonal matrix D, the Special Matrices and Vectors scaling each element of x, and either concatenating some x 2 =. (2.36) is taller than it is wide, or discarding some of the last d a vector y are orthogonal to each other if x y = 0. If both D norm, is wider than it that is tall. ero this means they are at a 90 degree angle to each Symmetric Matrix: vector ymatrix are orthogonal to each other x y nonzero = 0. If norm. both most n vectors may with is any thatbeismutually equal toorthogonal its owniftranspose: that they are at a 90 degree angle to each eorm, not this only means orthogonal but also have unit norm, we call them =. (2.35) n vectors may be mutually orthogonal with nonzero norm. LGEBR only but also have unit weorthonormal call them is a square matrix whose rows are norm, mutually nmatrix arise orthogonal when the entries are generated by some function of Orthogonal matrix: ns are depend mutuallyon orthonormal: s not the order of the arguments. For example, i,j rows nce giving distanceorthonormal from (2.37) point ix ismeasurements, a squarematrix whose arethe mutually =with = I. functions are symmetric. = j,i because distance e mutually orthonormal: = (2.38) 4, ctor with unit norm:

Trace Tr() = X i i,i. (2.48) Tr(BC) =Tr(CB)=Tr(BC) (2.5) Y Y

Learning linear algebra Do a lot of practice problems Start out with lots of summation signs and indexing into individual entries Eventually you will be able to mostly use matrix and vector product notation quickly and easily