FALL 2011, LECTURE 1 (9/8/11) This is subject to revision. Current version: Thu, Sep 8, 1:00 PM

18700 FALL 2011, LECTURE 1 (9/8/11) TRAVIS SCHEDLER This is subject to revision Current version: Thu, Sep 8, 1:00 PM Abstract In this lecture, I will broadly sketch what linear algebra is and why we should study it I will also give some picture of the organization of the course 1 What is linear algebra? Linear algebra is the mathematical study of linear relationships Recall that a linear function f is one satisfying (101) (102) f(x + y) = f(x) + f(y) f(ax) = af(x), a R Here R denotes the real numbers For instance, f could be a function f : R R whose graph is a line: More generally, f could be a function of the plane f : R 2 R 2, for instance, the 90-degree counterclockwise rotation function, f(x, y) = ( y, x): Note that this function also satisfies (101) (102) In contrast, let us consider examples that are not linear: (103) (104) f(x) = x 2, f(x) = 2 + x 1

As the semester progresses, we will consider both more sophisticated examples of functions f, and deeper ways of analyzing and understanding them (So don t think that things will always be this simple!) 11 Matrix multiplication A general example of a linear function is a function R m R n given by multiplying by an n by m matrix A with entries A = (a ij ), for 1 i n and 1 j m (and a ij R): x 1 a 11 a 12 a 1m x 1 a 11 x 1 + a 12 x 2 + + a 1m x m x 2 (111) a 21 a 22 a 2m x 2 = a 21 x 1 + a 22 x 2 + + a 2m x m x m a n1 a n2 a nm x m a n1 x 1 + a n2 x 2 + + a nm x m As we will explain later, this is actually the most general type of linear equation Systems of linear equations can be neatly expressed in this way: b 2 (112) A x = b =, which is equivalent to the system b 1 b n a 11 x 1 + a 12 x 2 + + a 1m x m = b 1 a 21 x 1 + a 22 x 2 + + a 2m x m = b 2 a n1 x 1 + a n2 x 2 + + x nm x m = b n Such systems are ubiquitous in math and applied math, as well as engineering, computer science, and many practical fields So it is important to know how to solve them, or approximate their solutions, or understand them better! 2 Why study linear algebra? Linear algebra is one of the cornerstones of mathematics, and it is also extremely useful in real life and other disciplines 21 Calculus, derivatives, and first-order approximations In terms of calculus, linear algebra is the study of the first-order approximation of a function, encoded in its derivative In other words, given a function f : R n R, and a point x 0 R n, we want to approximate f near x 0 by a linear function So, we say (211) f(x) f(x 0 ) + f (x 0 ) (x x 0 ), where f is the derivative of f In the case n = 1, ie, one-variable calculus, this is just the tangent line to f: 2

When n = 2, we can view the graph of f : R 2 R as a surface in R 3, and then the above is the tangent plane to the graph at x 0 In higher dimensions, one obtains tangent hyperplanes This may seem too simple, but in fact already the first-order approximations can be quite complicated to study! Here are two among many reasons: (1) The number of variables can be very large: eg, in economics, one can have thousands of variables; (2) One may be studying differential equations based on the function f For example, we seek functions φ satisfying (212) φ (x) = f(φ(x)) This is already complicated enough when f is linear (and moreso when we extend to multiple variables) When f is nonlinear, many times we can find a good approximate solution using the linear approximation above Hence, in order to understand complicated nonlinear functions, it is both helpful and necessary to first understand linear ones 22 Real-life applications I only have time to very briefly sketch some among many tremendous applications of linear algebra 221 Google PageRank Before Google, it took a long time to wade through internet search results to find what you were looking for Google came around and got rich off the idea that the search engine should rank websites based on relevance In fact many times you find what you want in the very first hit! The main technical innovation behind the above is called PageRank Here is how it works: (1) Google scours the web and draws a graph of websites and links: a b c d (2) Google scores the websites based on how highly linked they are 3

At first glance, this means simply that b is the highest ranked website, having three links pointing to it This is followed by a and d with two links pointing to each, and then c with only one However, this ignores several important subtleties: (a) It makes sense that if a website is ranked higher, then links from it should be more valuable It should matter whether my website is linked to from The New York Times (nytimescom) or from conservapediaorg (b) It should matter how many links a website has If an article on wikipediaorg links to a thousand websites, no matter how prestigious Wikipedia is, each link should count less than a link from a site with only ten links total To fix this, we seek a rank function, (221) ρ : Websites R 0, so that, if my website x is linked to by websites y 1,, y m each with a 1,, a m links total, then my ranking is: (222) ρ(x) = 1 a 1 ρ(y 1 ) + 1 a 2 ρ(y 2 ) + + 1 a m ρ(y m ) In the case of the above example, we seek ρ : {a, b, c, d} R + satisfying: 1 ρ(a) 0 1 3 0 ρ(a) (223) ρ(b) ρ(c) = 1 1 1 2 0 3 2 1 0 0 0 ρ(b) ρ(c) 2 ρ(d) ρ(d) 1 1 2 0 It turns out that there is a unique solution up to scaling (note that we can multiply all of the values of ρ by the same overall positive number without affecting the validity or ordering of importance) This solution is: (224) 10 9 3 6 So in fact, a is the most relevant, followed by b, then d, and then c Our estimate was not that far from the truth (it had a and b reversed); but here we also get more information: precise numerical scores It is not so obvious that, in general, one can solve such a system However, we will learn in this course techniques to prove not only that it can be solved, but how to solve it quickly More generally, we will learn, for every matrix A, techniques to solve for all column vectors v and all numbers λ R such that 3 0 (225) A v = λ v This is called the eigenvalue problem It has great theoretical importance to all of mathematics, in addition to great practical importance One of the main goals of this course will be to prove: Theorem 226 For every n by n matrix A, there exists a complex number λ and a nonzero column vector v satisfying (225) Moreover, there can be at most n possible values of λ In the situation at hand, we can prove Theorem 227 If the entries of each column of A sums to one, then there exists v such that Av = v If the entries of A are nonnegative, then we can take the entries of v also to be nonnegative We will also demonstrate how to effectively compute this v 4

222 Signal processing and Fourier transforms In communications, one sends and receives an oscillating signal (if electronic, then voltage is oscillating; if sound, air pressure is oscillating; if radio, then it is electromagnetic potential that is oscillating) A basic problem is to determine the frequency of oscillation of a signal The applications of this are huge They cannot be overstated! In music, frequency is pitch; devices like Auto-Tune work using the Fourier transform to find the approximate pitch of singing and make it precisely the right pitch More generally, cell phone transmissions, DVDs, MP3s, JPEGs, etc, all use this and/or other linear algebra They all work by decomposing raw data into special bases pure frequencies in the case of Fourier transform To achieve better compression, one throws out most of the data except for exactly what your eyes or ears can detect: the relevant basis elements In the case of the Fourier transform, one wishes to rewrite a function f : R R of time (to potential) as a function of frequency For instance, if f is a continuous periodic function with period 2π, then there exist unique real numbers a k, b k for k Z such that (228) f(x) = k Z a k cos(kx) + b k sin(kx) Then, given f, the question is, how do we compute a k and b k? The answer is (a version of) the Fourier transform, (229) a k = 2π 0 f(x) cos(kx) dx, b k = 2π 0 f(x) sin(kx) dx In real life, one deals not with functions f : R R but rather with functions on finite sets of samples Say for example we sampled only at times jπ n for j = 0, 1,, 2n 1 (we will restrict to an even number of samples for simplicity) Then, it turns out we only need a k and b k for k = 0, 1,, n 1, and we can write n 1 (2210) a k = j=0 f( jπ n 1 n ) cos(jkπ n ), b k = j=0 f( jπ n ) sin(jkπ ), 0 k n 1 n Problem 2211 Computing a k and b k by the above formulas takes 2n 2 operations That is a long time for large n Similarly, computing f from a k, b k also takes 2n 2 operations It turns out that linear algebra affords a solution the fast Fourier transform and its inverse that each take only about 2n log n operations! If we have time, we will explain how this works 223 Multiplying large numbers Give two large numbers, say with n digits, it takes n 2 individual digit multiplications to compute their product Indeed, if we write (2212) x = x 0 + 10x 1 + 10 2 x 2 + + 10 n 1 x n 1, y = y 0 + 10y 1 + 10 2 y 2 + + 10 n 1 y n 1, where the x j, y j are the digits of x and y, then the product is given by (2213) xy = (xy) 0 + 10(xy) 1 + + 10 2n 2 (xy) 2n 2, (xy) m = m x j y m j, where here we set x j = y j = 0 for j n We multiply each digit from x with each digit from y one time, resulting in n 2 individual multiplications It turns out that the Fourier transform gives a faster way to do this Let us use 2n-digit numbers for simplicity Consider the functions f, g, h : {0, 1,, 2n 1} R given by f(j) = x j, g(j) = y j, h(j) = (xy) j Let a k (f), b k (f), a k (g), b k (g), a k (h), and b k (h) be their Fourier coefficients as before Then, one can prove the formula: (2214) a k (h) = a k (f)a k (g) b k (f)b k (g), b k (h) = a k (f)b k (g) + b k (f)a k (g) 5 j=0

This means that each a k (h) and b k (h) can be computed in only 2 operations: there is no large summation! So the entire Fourier transform, ie, a k (h), b k (h) for all k, can be computed in a total of only 2n steps This is a big savings: h itself (ie, xy) required n 2 operations to compute! Thus, using the fast Fourier transform of the previous section, we can multiply x and y in only about n log n operations (up to a constant multiple): (1) First, take the Fourier transforms a(f), b(f), a(g), and b(g), of x and y, each taking about n log n steps; (2) Then, compute a(h) and b(h) from these using (2214), taking only about 4n steps; (3) Finally, convert a(h) and b(h) back to h and hence xy, using the inverse fast Fourier transform, again taking about n log n steps We managed to multiply the numbers x and y much faster than expected! This can be very important in numerical applications 224 Constrained Minimization/maximization Finally, another very important application of linear algebra is to minimization or maximization For example, a manufacturer could be interested in maximizing profits based on how many of each type of product it manufactures A cell phone company could be interested in positioning its cell phone towers so as to minimize the delays and dropped calls of customers, or to maximize the minimal data transfer rate These problems become difficult because there are typically many constraints to satisfy Solving them requires linear algebra We give just one example: suppose we have just three variables, x, y, and z We want to minimize the function (2215) x 2 + (y 2) 2 + (z + 4) 2, subject to the constraint (2216) 2x + 3y + z = 16 That is, we want to minimize the distance from the point (0, 2, 4), subject to lying on the plane (2216) Note that (0, 2, 4) is not on that plane, so we cannot just obtain zero for (2215) The solution linear algebra provides is as follows: The perpendicular direction to the plane (2216) is parallel to the line (2t, 3t, 1t) t R So, wherever the minimum distance from the plane (2216) to the point (0, 2, 4) is, it lies along the line through (0, 2, 4) perpendicular to the plane, ie, the line (2t, 2 + 3t, t 4) See the figure: This line intersects the plane when (2217) 2(2t) + 3(2 + 3t) + (t 4) = 16, ie, (2218) 14t + 2 = 16, 6

so t = 1 Thus, the minimum value of the function (2215) is obtained at (2, 5, 3), and this distance is 14 The great thing is that the approach above generalizes to arbitrary dimensions: one can minimize distances from k-dimensional planes to l-dimensional planes in n-dimensional space This has great utility Of course, in real life, typically the functions and constraints are not linear, ie we are not just restricted to planes and minimizing distances between them However, one solves the general problem by reduction to the linear case, using derivatives In one variable, this is the minimization you may be used to: take the derivative of a function and set it equal to zero: this produces its local minima and maxima In higher dimensions, as mentioned before, taking the derivative yields matrices and a complicated linear problem, which one must solve as above The resulting technique is called Lagrange multipliers, which many of you may have heard of They are ubiquitous in economics 23 Ubiquity in pure mathematics Finally, but perhaps most important to us, is the ubiquity of linear algebra in pure mathematics itself We already mentioned that it is the foundation for calculus, and hence for the study of (differentiable) functions Here we briefly list a few other essential uses: (1) Differential equations: To solve an equation D(f) = 0, for example, where D is a differential operator, one can sometimes reduce to the case where D(f) = a 0 f + a 1 f + + f (n), and the a j are real numbers In the case n = 1, one has f = a 0 f which has the solution Ce a0x For more general n, linear algebra gives a solution: Rewrite the problem as (231) f f f (n 1) = A f f f (n 1) 0 0 0 a 0 1 0 0 a 1, A := 0 1 0 a 2 0 0 1 a n 1 Now, one can reduce to the eigenvalue problem of (225): We seek column vectors v of real (or complex) numbers, together with real (or complex) numbers λ, such that Av = λv Given such a pair v, λ, one has the following solution of the original equation: (232) e λt v Indeed, one has (233) (e λt v) = λe λt v = e λt (λv) = e λt (Av) = A(e λt v) (2) Functional analysis: the study of spaces of functions, eg, continuous, smooth, etc These can be viewed as infinite-dimensional linear spaces, since one can add functions and multiply them by real numbers Thus, this study generalizes the linear algebra we study (mostly for finite-dimensional situations) Moreover, the linear operators one considers are often closely related to differential equations, for which one applies the linear algebra outlined above (3) Representation theory: the generalized study of symmetries of spaces For example, one can classify all spaces which have n-fold rotational symmetry, or one can replace n by an arbitrary (finite or Lie) group (In the news recently this was finally classified for the most complicated Lie group, called E8! This was completed in part by mathematicians at MIT, such as David Vogan) To do this, one looks at ways of mapping the group into matrices, and uses the full power of linear algebra (4) Algebraic geometry: the algebraic study of geometric spaces This is also applicable to number theory, where the geometric spaces involve integers and prime numbers, and 7

their generalizations, rather than eg, real surfaces and points of them To study this geometry, one considers everything as certain algebraic structures over the real or complex numbers (or integers) and applies the power of linear algebra to deduce facts about them One also uses symmetries of these spaces, involving the representation theory above (and hence more linear algebra) 3 Main goals of the course There are two main goals of this course The most important is to help you all grow as thinkers and mathematicians The second goal is to teach you important foundational mathematical material that you will need in future courses and endeavors 31 Development of mathematical thinking and skills For many of you, this course may be your first math major course, by which I mean 18axy for a 1 This means you will be exposed to rigorous mathematical definitions, statements, and proofs Our main goal is to help you become comfortable with this language and mode of thinking, and to learn how to read and write proper mathematics As such, (1) The homework will be carefully checked for both style and correctness Write careful, correct, and complete answers to all questions! It is best to work first on scratch paper and to write your final answers on a clean sheet of paper once you understand all the details (2) The course material will emphasize theory over calculation, ie, abstract mathematical concepts, logical connections, and general results We will have some calculations, for your own practical benefit and concrete understanding, as well as to make the exams more reasonable But the theory will be primary (this contrasts with 1806!) (3) Although we plan the material to be particularly relevant to future mathematical courses (and to a lesser extent, useful and applicable more generally), our primary concern will be that you learn how to learn So, it is not essential that all of the material be of immediate use, and you should treat it partly as an exercise for your own development 32 Material and main theorems The main mathematical theory and results we are aiming for are the following: (1) The definition of linear spaces and linear operators; (2) The concepts of subspaces, dimension, linear independence, span, and bases; (3) Row reduction and Gaussian elimination for matrices; (4) The eigenvalue problem and its solution (cf (225)); (5) Inner product spaces (ie, linear spaces equipped with distance functions), orthogonal (ie, perpendicular) projections and operators, and minimization/maximization (cf 224); (6) The spectral theorem for operators on inner product spaces: roughly, this characterizes operators that have an orthogonal collection of eigenvectors; (7) The polar decomposition for operators on inner product spaces: this decomposes them into a product of isometries (operators preserving the distance) and positive operators (operators with an orthogonal collection of eigenvectors with positive eigenvalues); also, the singular value decomposition, which is the decomposition of the aforementioned positive operator; (8) Generalized eigenspaces, the characteristic polynomial, and Jordan canonical forms of matrices: this is a generalization of the eigenvalue problem which applies to matrices that don t have enough eigenvectors; (9) Determinant, trace, and volume: important examples of the preceding invariants, that you may have heard of! 8

4 Homework for next time Please read sections 11 13 of the book, and brush up on your complex numbers 9