Mathematics Review. Chapter Preliminaries: Numbers and Sets

Similar documents
ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

MATH 320, WEEK 7: Matrices, Matrix Operations

Linear Algebra March 16, 2019

Linear Algebra and Eigenproblems

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Linear Algebra Massoud Malek

October 25, 2013 INNER PRODUCT SPACES

NOTES ON LINEAR ALGEBRA CLASS HANDOUT

x 1 x 2. x 1, x 2,..., x n R. x n

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Vector Spaces. Chapter 1

Math 291-2: Lecture Notes Northwestern University, Winter 2016

3.7 Constrained Optimization and Lagrange Multipliers

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

The following definition is fundamental.

Math 123, Week 2: Matrix Operations, Inverses

Some Notes on Linear Algebra

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Linear Algebra, Summer 2011, pt. 2

Math Linear Algebra II. 1. Inner Products and Norms

Linear Algebra. Preliminary Lecture Notes

Math 302 Outcome Statements Winter 2013

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Inner product spaces. Layers of structure:

Systems of Linear Equations

1.1.1 Algebraic Operations

4.3 - Linear Combinations and Independence of Vectors

Linear Algebra. Preliminary Lecture Notes

Math 291-2: Final Exam Solutions Northwestern University, Winter 2016

Vector, Matrix, and Tensor Derivatives

Weekly Activities Ma 110

Review 1 Math 321: Linear Algebra Spring 2010

Tangent spaces, normals and extrema

a (b + c) = a b + a c

This appendix provides a very basic introduction to linear algebra concepts.

CS 246 Review of Linear Algebra 01/17/19

Notes on Linear Algebra and Matrix Theory

Elementary linear algebra

Implicit Functions, Curves and Surfaces

12. Perturbed Matrices

1 Matrices and Systems of Linear Equations

Vector Geometry. Chapter 5

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Linear Algebra for Machine Learning. Sargur N. Srihari

v = v 1 2 +v 2 2. Two successive applications of this idea give the length of the vector v R 3 :

REVIEW OF DIFFERENTIAL CALCULUS

4.1 Distance and Length

(, ) : R n R n R. 1. It is bilinear, meaning it s linear in each argument: that is

Linear Algebra. Min Yan

Linear Algebra and Robot Modeling

b 1 b 2.. b = b m A = [a 1,a 2,...,a n ] where a 1,j a 2,j a j = a m,j Let A R m n and x 1 x 2 x = x n

Consequences of Orthogonality

Matrix Algebra: Vectors

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

Lagrange Multipliers

Math 52: Course Summary

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Lecture 7. Econ August 18

Sometimes the domains X and Z will be the same, so this might be written:

Math 320-3: Lecture Notes Northwestern University, Spring 2015

2 so Q[ 2] is closed under both additive and multiplicative inverses. a 2 2b 2 + b

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Lecture 2: Vector Spaces, Metric Spaces

Mathematics for Graphics and Vision

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

LINEAR ALGEBRA: THEORY. Version: August 12,

Introduction to Linear Algebra

Math (P)Review Part II:

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

Linear Models Review

Chapter 1 Vector Spaces

Math 290-2: Linear Algebra & Multivariable Calculus Northwestern University, Lecture Notes

1 Matrices and Systems of Linear Equations. a 1n a 2n

1 Linear transformations; the basics

CMU CS 462/662 (INTRO TO COMPUTER GRAPHICS) HOMEWORK 0.0 MATH REVIEW/PREVIEW LINEAR ALGEBRA

Chapter 2. Linear Algebra. rather simple and learning them will eventually allow us to explain the strange results of

Eigenspaces and Diagonalizable Transformations

Basic Linear Algebra in MATLAB

Elementary maths for GMT

Math Linear Algebra Final Exam Review Sheet

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION

MAT 2037 LINEAR ALGEBRA I web:

WA State Common Core Standards - Mathematics

Eigenvalues and Eigenvectors

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

B553 Lecture 5: Matrix Algebra Review

Linear Algebra I. Ronald van Luijk, 2015

Notes on Linear Algebra I. # 1

Singular Value Decomposition

9. Integral Ring Extensions

0.2 Vector spaces. J.A.Beachy 1

(v, w) = arccos( < v, w >

Transcription:

Chapter Mathematics Review In this chapter we will review relevant notions from linear algebra and multivariable calculus that will figure into our discussion of computational techniques It is intended as a review of background material with a bias toward ideas and interpretations commonly encountered in practice; the chapter safely can be skipped or used as reference by students with stronger background in mathematics 1 Preliminaries: Numbers and Sets Rather than considering algebraic (and at times philosophical) discussions like What is a number?, we will rely on intuition and mathematical common sense to define a few sets: The natural numbers N = {1,, 3, } The integers Z = {,, 1,, 1,, } The rational numbers Q = { a /b : a, b Z} 1 The real numbers R encompassing Q as well as irrational numbers like π and The complex numbers C = {a + bi : a, b R}, where we think of i as satisfying i = 1 It is worth acknowledging that our definition of R is far from rigorous The construction of the real numbers can be an important topic for practitioners of cryptography techniques that make use of alternative number systems, but these intricacies are irrelevant for the discussion at hand As with any other sets, N, Z, Q, R, and C can be manipulated using generic operations to generate new sets of numbers In particular, recall that we can define the Euclidean product of two sets A and B as A B = {(a, b) : a A and b B} We can take powers of sets by writing A n = } A A {{ A } n times 1 This is the first of many times that we will use the notation {A : B}; the braces should denote a set and the colon can be read as such that For instance, the definition of Q can be read as the set of fractions a/b such that a and b are integers As a second example, we could write N = {n Z : n > } 1

This construction yields what will become our favorite set of numbers in chapters to come: R n = {(a 1, a,, a n ) : a i R for all i} Vector Spaces Introductory linear algebra courses easily could be titled Introduction to Finite-Dimensional Vector Spaces Although the definition of a vector space might appear abstract, we will find many concrete applications that all satisfy the formal aspects and thus can benefit from the machinery we will develop 1 Defining Vector Spaces We begin by defining a vector space and providing a number of examples: Definition 1 (Vector space) A vector space is a set V that is closed under scalar multiplication and addition For our purposes, a scalar is a number in R, and the addition and multiplication operations satisfy the usual axioms (commutativity, associativity, and so on) It is usually straightforward to spot vector spaces in the wild, including the following examples: Example 1 (R n as a vector space) The most common example of a vector space is R n Here, addition and scalar multiplication happen component-by-component: (1, ) + ( 3, 4) = (1 3, + 4) = (, 6) 1 ( 1, 1) = (1 1, 1 1) = ( 1, 1) Example (Polynomials) A second important example of a vector space is the ring of polynomials with real number inputs, denoted R[x] A polynomial p R[x] is a function p : R R taking the form p(x) = a k x k k Addition and scalar multiplication are carried out in the usual way, eg if p(x) = x + x 1 and q(x) = x 3, then 3p(x) + 5q(x) = 5x 3 + 3x + 6x 3, which is another polynomial As an aside, for future examples note that functions like p(x) = (x 1)(x + 1) + x (x 3 5) are still polynomials even though they are not explicitly written in the form above Elements v V of a vector space V are called vectors, and a weighted sum of the form i a i v i, where a i R and v i V, is known as a linear combination of the v i s In our second example, the vectors are functions, although we do not normally use this language to discuss R[x] One way to link these two viewpoints would be to identify the polynomial k a k x k with the sequence (a, a 1, a, ); remember that polynomials have finite numbers of terms, so the sequence eventually will end in a string of zeros The notation f : A B means f is a function that takes as input an element of set A and outputs an element of set B For instance, f : R Z takes as input a real number in R and outputs an integer Z, as might be the case for f (x) = x, the round down function

Span, Linear Independence, and Bases Suppose we start with vectors v 1,, v k V for vector space V By Definition 1, we have two ways to start with these vectors and construct new elements of V: addition and scalar multiplication The idea of span is that it describes all of the vectors you can reach via these two operations: Definition (Span) The span of a set S V of vectors is the set span S {a 1 v 1 + + a k v k : k, v i V for all i, and a i R for all i} Notice that span S is a subspace of V, that is, a subset of V that is in itself a vector space We can provide a few examples: Example 3 (Mixology) The typical well at a cocktail bar contains at least four ingredients at the bartender s disposal: vodka, tequila, orange juice, and grenadine Assuming we have this simple well, we can represent drinks as points in R 4, with one slot for each ingredient For instance, a typical tequila sunrise can be represented using the point (, 15, 6, 75), representing amounts of vodka, tequila, orange juice, and grenadine (in ounces), resp The set of drinks that can be made with the typical well is contained in span {(1,,, ), (, 1,, ), (,, 1, ), (,,, 1)}, that is, all combinations of the four basic ingredients A bartender looking to save time, however, might notice that many drinks have the same orange juice to grenadine ratio and mix the bottles The new simplified well may be easier for pouring but can make fundamentally fewer drinks: span {(1,,, ), (, 1,, ), (,, 6, 75)} Example 4 (Polynomials) Define the p k (x) x k Then, it is easy to see that R[x] = span {p k : k } Make sure you understand notation well enough to see why this is the case Adding another item to a set of vectors does not always increase the size of its span For instance, in R it is clearly the case that span {(1, ), (, 1)} = span {(1, ), (, 1), (1, 1)} In this case, we say that the set {(1, ), (, 1), (1, 1)} is linearly dependent: Definition 3 (Linear dependence) We provide three equivalent definitions A set S V of vectors is linearly dependent if: 1 One of the elements of S can be written as a linear combination of the other elements, or S contains zero There exists a non-empty linear combination of elements v k S yielding m k=1 c k v k = where c k = for all k 3 There exists v S such that span S = span S\{ v} That is, we can remove a vector from S without affecting its span 3

If S is not linearly dependent, then we say it is linearly independent Providing proof or informal evidence that each definition is equivalent to its counterparts (in an if and only if fashion) is a worthwhile exercise for students less comfortable with notation and abstract mathematics The concept of linear dependence leads to an idea of redundancy in a set of vectors In this sense, it is natural to ask how large a set we can choose before adding another vector cannot possibly increase the span In particular, suppose we have a linearly independent set S V, and now we choose an additional vector v V Adding v to S leads to one of two possible outcomes: 1 The span of S { v} is larger than the span of S Adding v to S has no effect on the span The dimension of V is nothing more than the maximal number of times we can get outcome 1, add v to S, and repeat Definition 4 (Dimension and basis) The dimension of V is the maximal size S of a linearlyindependent set S V such that span S = V Any set S satisfying this property is called a basis for V Example 5 (R n ) The standard basis for R n is the set of vectors of the form e k (,,, 1,,, ) }{{}}{{} k 1 slots n k slots That is, e k has all zeros except for a single one in the k-th slot It is clear that these vectors are linearly independent and form a basis; for example in R 3 any vector (a, b, c) can be written as a e 1 + b e + c e 3 Thus, the dimension of R n is n, as we would expect Example 6 (Polynomials) It is clear that the set {1, x, x, x 3, } is a linearly independent set of polynomials spanning R[x] Notice that this set is infinitely large, and thus the dimension of R[x] is 3 Our Focus: R n Of particular importance for our purposes is the vector space R n, the so-called n-dimensional Euclidean space This is nothing more than the set of coordinate axes encountered in high school math classes: R 1 R is the number line R is the two-dimensional plane with coordinates (x, y) R 3 represents three-dimensional space with coordinates (x, y, z) Nearly all methods in this course will deal with transformations and functions on R n For convenience, we usually write vectors in R n in column form, as follows a 1 a (a 1,, a n ) a n 4

This notation will include vectors as special cases of matrices discussed below Unlike some vector spaces, R n has not only a vector space structure, but also one additional construction that makes all the difference: the dot product Definition 5 (Dot product) The dot product of two vectors a = (a 1,, a n ) and b = (b1,, b n ) in R n is given by a b = n k=1 a k b k Example 7 (R ) The dot product of (1, ) and (, 6) is 1 + 6 = + 1 = 1 The dot product is an example of a metric, and its existence gives a notion of geometry to R n For instance, we can use the Pythagorean theorem to define the norm or length of a vector a as the square root a a 1 + + a n = a a Then, the distance between two points a, b R n is simply b a Dot products yield not only notions of lengths and distances but also of angles Recall the following identity from trigonometry for a, b R 3 : a b = a b cos θ where θ is the angle between a and b For n 4, however, the notion of angle is much harder to visualize for R n We might define the angle θ between a and b to be the value θ given by θ arccos a b a b We must do our homework before making such a definition! In particular, recall that cosine outputs values in the interval [ 1, 1], so we must check that the input to arc cosine (also notated cos 1 ) is in this interval; thankfully, the well-known Cauchy-Schwarz inequality a b a b guarantees exactly this property When a = c b for some c R, we have θ = arccos 1 =, as we would expect: the angle between parallel vectors is zero What does it mean for vectors to be perpendicular? Let s substitute θ = 9 Then, we have = cos 9 = a b a b Multiplying both sides by a b motivates the definition: Definition 6 (Orthogonality) Two vectors are perpendicular, or orthogonal, when a b = This definition is somewhat surprising from a geometric standpoint In particular, we have managed to define what it means to be perpendicular without any explicit use of angles This construction will make it easier to solve certain problems for which the nonlinearity of sine and cosine might have created headache in simpler settings 5

Aside 1 There are many theoretical questions to ponder here, some of which we will address in future chapters when they are more motivated: Do all vector spaces admit dot products or similar structures? Do all finite-dimensional vector spaces admit dot products? What might be a reasonable dot product between elements of R[x]? Intrigued students can consult texts on real and functional analysis 3 Linearity A function between vector spaces that preserves structure is known as a linear function: Definition 7 (Linearity) Suppose V and V are vector spaces Then, L : V V is linear if it satisfies the following two criteria for all v, v 1, v V and c R: L preserves sums: L[ v 1 + v ] = L[ v 1 ] + L[ v ] L preserves scalar products: L[c v] = cl[ v] It is easy to generate linear maps between vector spaces, as we can see in the following examples: Example 8 (Linearity in R n ) The following map f : R R 3 is linear: We can check linearity as follows: Sum preservation: f (x, y) = (3x, x + y, y) f (x 1 + x, y 1 + y ) = (3(x 1 + x ), (x 1 + x ) + (y 1 + y ), (y 1 + y )) = (3x 1, x 1 + y 1, y 1 ) + (3x, x + y, y ) = f (x 1, y 1 ) + f (x, y ) Scalar product preservation: f (cx, cy) = (3cx, cx + cy, cy) = c(3x, x + y, y) = c f (x, y) Contrastingly, g(x, y) xy is not linear For instance, g(1, 1) = 1 but g(, ) = 8 = g(1, 1), so this form does not preserve scalar products Example 9 (Integration) The following functional L from R[x] to R is linear: L[p(x)] 1 6 p(x) dx

This somewhat more abstract example maps polynomials p(x) to real numbers L[p(x)] For example, we can write L[3x + x 1] = 1 (3x + x 1) dx = 1 Linearity comes from the following well-known facts from calculus: 1 1 1 c f (x) dx = c [ f (x) + g(x)] dx = 1 f (x) dx 1 f (x) dx + g(x) dx We can write a particularly nice form for linear maps on R n Recall that the vector a = (a 1,, a n ) is equal to the sum k a k e k, where e k is the k-th standard basis vector Then, if L is linear we know: [ ] L[ a] = L a k e k for the standard basis e k k = L [a k e k ] by sum preservation k = a k L [ e k ] by scalar product preservation k This derivation shows the following important fact: L is completely determined by its action on the standard basis vectors e k That is, for any vector a R n, we can use the sum above to determine L[ a] by linearly combining L[ e 1 ],, L[ e n ] Example 1 (Expanding a linear map) Recall the map in Example 8 given by f (x, y) = (3x, x + y, y) We have f ( e 1 ) = f (1, ) = (3,, ) and f ( e ) = f (, 1) = (, 1, 1) Thus, the formula above shows: f (x, y) = x f ( e 1 ) + y f ( e ) = x 3 + y 1 1 31 Matrices The expansion of linear maps above suggests one of many contexts in which it is useful to store multiple vectors in the same structure More generally, say we have n vectors v 1,, v n R m We can write each as a column vector: v 1 = v 11 v 1 v m1, v = v 1 v v m 7,, v n = v 1n v n v mn

Carrying these around separately can be cumbersome notationally, so to simplify matters we simply combine them into a single m n matrix: v 11 v 1 v 1n v 1 v v n v 1 v v n = v m1 v m v mn We will call the space of such matrices R m n Example 11 (Identity matrix) We can store the standard basis for R n in the n n identity matrix I n n given by: 1 1 I n n e 1 e e n = 1 1 Since we constructed matrices as convenient ways to store sets of vectors, we can use multiplication to express how they can be combined linearly In particular, a matrix in R m n can be multiplied by a column vector in R n as follows: v 1 v v n c 1 c c n c 1 v 1 + c v + + c n v n Expanding this sum yields the following explicit formula for matrix-vector products: v 11 v 1 v 1n c 1 c 1 v 11 + c v 1 + + c n v 1n v 1 v v n c c 1 v 1 + c v + + c n v n v m1 v m v mn c n = c 1 v m1 + c v m + + c n v mn Example 1 (Identity matrix multiplication) It is clearly true that for any x R n, we can write x = I n n x, where I n n is the identity matrix from Example 11 Example 13 (Linear map) We return once again to the expression from Example 8 to show one more alternative form: 3 ( ) f (x, y) = 1 x y 1 We similarly define a product between a matrix in M R m n and another matrix in R n p by concatenating individual matrix-vector products: M c 1 c c n M c 1 M c M c n 8

Example 14 (Mixology) Continuing Example 3, suppose we make a tequila sunrise and second concoction with equal parts of the two liquors in our simplified well To find out how much of the basic ingredients are contained in each order, we could combine the recipes for each column-wise and use matrix multiplication: Well 1 Well Well 3 Drink 1 Drink Vodka 1 Drink 1 Drink ( ) 75 Vodka 75 Tequila 1 15 75 = 15 75 Tequila OJ 6 6 1 OJ 1 Grenadine 75 75 15 Grenadine In general, we will use capital letters to represent matrices, like A R m n We will use the notation A ij R to denote the element of A at row i and column j 3 Scalars, Vectors, and Matrices It comes as no surprise that we can write a scalar as a 1 1 vector c R 1 1 Similar, as we already suggested in 3, if we write vectors in R n in column form, they can be considered n 1 matrices v R n 1 Notice that matrix-vector products can be interpreted easily in this context; for example, if A R m n, x R n, and b R m, then we can write expressions like }{{} A }{{} x m n n 1 = b }{{} m 1 We will introduce one additional operator on matrices that is useful in this context: Definition 8 (Transpose) The transpose of a matrix A R m n is a matrix A R n m with elements (A ) ij = A ji Example 15 (Transposition) The transpose of the matrix A = 1 3 4 5 6 is given by A = ( 1 3 5 4 6 ) Geometrically, we can think of transposition as flipping a matrix on its diagonal This unified treatment of scalars, vectors, and matrices combined with operations like transposition and multiplication can lead to slick derivations of well-known identities For instance, 9

we can compute the dot products of vectors a, b R n by making the following series of steps: a b = n k=1 a k b k = ( ) a 1 a a n = a b Many important identities from linear algebra can be derived by chaining together these operations with a few rules: (A ) = A (A + B) = A + B (AB) = B A b 1 b b n Example 16 (Residual norm) Suppose we have a matrix A and two vectors x and b If we wish to know how well A x approximates b, we might define a residual r b A x; this residual is zero exactly when A x = b Otherwise, we might use the norm r as a proxy for the relationship between A x and b We can use the identities above to simplify: r = b A x = ( b A x) ( b A x) as explained in 3 = ( b A x) ( b A x) by our expression for the dot product above = ( b x A )( b A x) by properties of transposition = b b b A x x A b + x A A x after multiplication All four terms on the right hand side are scalars, or equivalently 1 1 matrices Scalars thought of as matrices trivially enjoy one additional nice property c = c, since there is nothing to transpose! Thus, we can write This allows us to simplify our expression even more: x A b = ( x A b) = b A x r = b b b A x + x A A x = A x b A x + b We could have derived this expression using dot product identities, but intermediate steps above will prove useful in our later discussion 1

33 Model Problem: A x = b In introductory algebra class, students spend considerable time solving linear systems such as the following for triplets (x, y, z): 3x + y + 5z = 4x + 9y 3z = 7 x 3y 3z = 1 Our constructions in 31 allow us to encode such systems in a cleaner fashion: 3 5 4 9 3 3 3 x y z = More generally, we can write linear systems of equations in the form A x = b by following the same pattern above; here, the vector x is unknown while A and b are known Such a system of equations is not always guaranteed to have a solution For instance, if A contains only zeros, then clearly no x will satisfy A x = b whenever b = We will defer a general consideration of when a solution exists to our discussion of linear solvers in future chapters A key interpretation of the system A x = b is that it addresses task: Write b as a linear combination of the columns of A Why? Recall from 31 that the product A x is encoding a linear combination of the columns of A with weights contained in elements of x So, the equation A x = b asks that the linear combination A x equal the given vector b Given this interpretation, we define the column space of A to be the space of right hand sides b for which the system has a solution: Definition 9 (Column space) The column space of a matrix A R m n is the span of the columns of A We can write as col A {A x : x R n } One important case is somewhat easier to consider Suppose A is square, so we can write A R n n Furthermore, suppose that the system A x = b has a solution for all choices of b The only condition on b is that it is a member of R n, so by our interpretation above of A x = b we can conclude that the columns of A span R n In this case, since the linear system is always solvable suppose we plug in the standard basis e 1,, e n to yield vectors x 1,, x n satisfying A x k = e k for each k Then, we can stack these expressions to show: A x 1 x x n = A x 1 A x A x n 7 1 = e 1 e e n = I n n, where I n n is the identity matrix from Example 11 We will call the matrix with columns x k the inverse A 1, which satisfies AA 1 = A 1 A = I n n 11

1 15 1 f (x) 5 5 1 5 5 4 4 x 1 1 x 1 5 5 1 x Figure 1: The closer we zoom into f (x) = x 3 + x 8x + 4, the more it looks like a line It is also easy to check that (A 1 ) 1 = A When such an inverse exists, it is easy to solve the system A x = b In particular, we find: x = I n n x = (A 1 A) x = A 1 (A x) = A 1 b 4 Non-Linearity: Differential Calculus While the beauty and applicability of linear algebra makes it a key target of study, nonlinearities abound in nature and we often must design computational systems that can deal with this fact of life After all, at the most basic level the square in the famous relationship E = mc makes it less than amenable to linear analysis 41 Differentiation While many functions are globally nonlinear, locally they exhibit linear behavior This idea of local linearity is one of the main motivators behind differential calculus For instance, Figure 1 shows that if you zoom in close enough to a smooth function eventually it looks like a line The derivative f (x) of a function f (x) : R R is nothing more than the slope of the approximating line, computed by finding the slope of lines through closer and closer points to x: f (x) = lim y x f (y) f (x) y x We can express local linearity by writing f (x + x) = f (x) + x f (x) + O( x ) If the function f takes multiple inputs, then it can be written f ( x) : R n R for x R n ; in other words, to each point x = (x 1,, x n ) in n-dimensional space f assigns a single number f (x 1,, x n ) Our idea of local linearity breaks down somewhat here, because lines are onedimensional objects However, fixing all but one variable reduces back to the case of singlevariable calculus For instance, we could write g(t) = f (t, x,, x n ), where we simply fix constants x,, x n Then, g(t) is a differentiable function of a single variable Of course, we could have put t in any of the input slots for f, so in general we make the following definition of the partial derivative of f : 1

Definition 1 (Partial derivative) The k-th partial derivative of f, notated f x k, is given by differentiating f in its k-th input variable: f x k (x 1,, x n ) d dt f (x 1,, x k 1, t, x k+1,, x n ) t=xk The notation t=xk should be read as evaluated at t = x k Example 17 (Relativity) The relationship E = mc can be thought of as a function from m and c to E Thus, we could write E(m, c) = mc, yielding the derivatives Using single-variable calculus, we can write: E m = c E c = mc f ( x + x) = f (x 1 + x 1, x + x,, x n + x n ) = f (x 1, x + x,, x n + x n ) + f x 1 + O( x1 ) by single-variable calculus x [ 1 ] f = f (x 1,, x n ) + x x k + O( xk ) by repeating this n times k n k=1 = f ( x) + f ( x) x + O( x ) where we define the gradient of f as f ( f x 1, f x,, ) f R n x n From this relationship, it is easy to see that f can be differentiated in any direction v; we can evaluate this derivative D v f as follows: Example 18 (R ) Take f (x, y) = x y 3 Then, D v f ( x) d dt f ( x + t v) t= = f ( x) v f x = xy3 f y = 3x y Thus, we can write f (x, y) = (xy 3, 3x y ) The derivative of f at (1, ) in the direction ( 1, 4) is given by ( 1, 4) f (1, ) = ( 1, 4) (16, 1) = 3 Example 19 (Linear functions) It is obvious but worth noting that the gradient of f ( x) a x + c = (a 1 x 1 + c 1,, a n x n + c n ) is a 13

Example (Quadratic forms) Take any matrix A R n n, and define f ( x) x A x Expanding this function element-by-element shows f ( x) = ij A ij x i x j ; expanding out f and checking this relationship explicitly is worthwhile Take some k {1,, n} Then, we can can separate out all terms containing x k : f ( x) = A kk x k + x k With this factorization, it is easy to see f x k = A kk x k + = n i=1 ( ) A ik x i + A kj x j + A ij x i x j i =k j =k i,j =k (A ik + A ki )x i ( ) A ik x i + A kj x j i =k j =k This sum is nothing more than the definition of matrix-vector multiplication! Thus, we can write f ( x) = A x + A x We have generalized from f : R R to f : R n R To reach full generality, we would like to consider f : R n R m In other words, f takes in n numbers and outputs m numbers Thankfully, this extension is straightforward, because we can think of f as a collection of singlevalued functions f 1,, f m : R n R smashed together into a single vector That is, we write: f ( x) = f 1 ( x) f ( x) f m ( x) Each f k can be differentiated as before, so in the end we get a matrix of partial derivatives called the Jacobian of f : Definition 11 (Jacobian) The Jacobian of f : R n R m is the matrix D f R m n with entries (D f ) ij f i x j Example 1 (Simple function) Suppose f (x, y) = (3x, xy, x + y) Then, 3 D f (x, y) = y xy 1 1 Make sure you can derive this computation by hand 14

Example (Matrix multiplication) Unsurprisingly, the Jacobian of f ( x) = A x for matrix A is given by D f ( x) = A Here we encounter a common point of confusion Suppose a function has vector input and scalar output, that is, f : R n R We defined the gradient of f as a column vector, so to align this definition with that of the Jacobian we must write D f = f 4 Optimization Recall from single variable calculus minima and maxima of f : R R must occur at points x satisfying f (x) = Of course, this condition is necessary rather than sufficient: there may exist points x with f (x) = that are not maxima or minima That said, finding such critical points of f can be a step of a function minimization algorithm, so long as the next step ensures that the resulting x actually a minimum/maximum If f : R n R is minimized or maximized at x, we have to ensure that there does not exist a single direction x from x in which f decreases or increases, resp By the discussion in 41, this means we must find points for which f = Example 3 (Simple function) Suppose f (x, y) = x + xy + 4y Then, f x x + 8y Thus, critical points of f satisfy: x + y = x + 8y = = x + y and f y = Clearly this system is solved at (x, y) = (, ) Indeed, this is the minimum of f, as can be seen more clearly by writing f (x, y) = (x + y) + 3y Example 4 (Quadratic functions) Suppose f ( x) = x A x + b x + c Then, from the examples in the previous section we can write f ( x) = (A + A) x + b Thus, critical points x of f satisfy (A + A) x + b = Unlike single-variable calculus, when we do calculus on R n we can add constraints to our optimization The most general form of such a problem looks like: minimize f ( x) such that g( x) = Example 5 (Rectangle areas) Suppose a rectangle has width w and height h A classic geometry problem is to maximize area with a fixed perimeter 1: maximize wh such that w + h 1 = When we add this constraint, we can no longer expect that critical points satisfy f ( x) =, since these points might not satisfy g( x) = 15

For now, suppose g : R n R Consider the set of points S { x : g( x) = } Obviously, any two x, y S satisfy the relationship g( y) g( x) = = Suppose y = x + x for small x Then, g( y) g( x) = g( x) x + O( x ) In other words, if we start at x satisfying g( x) =, then if we displace in the x direction g( x) x to continue to satisfy this relationship Now, recall that the derivative of f in the direction v at x is given by f v If x is a minimum of the constrained optimization problem above, then any small displacement x to x + v should cause an increase from f ( x) to f ( x + v) Since we only care about displacements v preserving the g( x + v) = c constraint, from our argument above we want f v = for all v satisfying g( x) v = In other words, f and g must be parallel, a condition we can write as f = λ g for some λ R Define Λ( x, λ) = f ( x) λg( x) Then, critical points of Λ without constraints satisfy: = Λ λ = g( x) = x Λ = f ( x) λ g( x) In other words, critical points of Λ satisfy g( x) = and f ( x) = λ g( x), exactly the optimality conditions we derived! Extending to multivariate constraints yields the following: Theorem 1 (Method of Lagrange multipliers) Critical points of the constrained optimization problem above are unconstrained critical points of the Lagrange multiplier function with respect to both x and λ Λ( x, λ) f ( x) λ g( x), Example 6 (Maximizing area) Continuing Example 5, we define the Lagrange multiplier function Λ(w, h, λ) = wh λ(w + h 1) Differentiating, we find: So, critical points of the system satisfy = Λ w = h λ = Λ h = w λ = Λ = 1 w h λ 1 1 w h λ = Solving the system shows w = h = 1 /4 and λ = 1 /8 In other words, for a fixed amount of perimeter, the rectangle with maximal area is a square 1 16

Example 7 (Eigenproblems) Suppose that A is a symmetric positive definite matrix, meaning A = A (symmetry) and x A x > for all x R n \{ } (positive definite) Often we wish to minimize x A x subject to x = 1 for a given matrix A Rn n ; notice that without the constraint the minimum trivially takes place at x = We define the Lagrange multiplier function Differentiating with respect to x, we find In other words, x is an eigenvector of the matrix A: Λ( x, λ) = x A x λ( x 1) = x A x λ( x x 1) = x Λ = A x λ x A x = λ x 5 Problems Problem 1 Take C 1 (R) to be the set of functions f : R R that admit a first derivative f (x) Why is C 1 (R) a vector space? Prove that C 1 (R) has dimension Problem Suppose the rows of A R m n are given by the transposes of r 1,, r m R n and the columns of A R m n are given by c 1,, c n R m That is, r 1 r A = r m = c 1 c c n Give expressions for the elements of A A and AA in terms of these vectors Problem 3 Give a linear system of equations satisfied by minima of the energy f ( x) = A x b with respect to x, for x R n, A R m n, and b R m This system is called the normal equations and will appear elsewhere in these notes; even so, it is worth working through and fully understanding the derivation Problem 4 Suppose A, B R n n Formulate a condition for vectors x R n to be critical points of A x subject to Bx = 1 Also, give an alternative form for the optimal values of A x Problem 5 Fix some vector a R n \{ } and define f ( x) = a x Give an expression for the maximum of f ( x) subject to x = 1 17