Common-Knowledge / Cheat Sheet

Similar documents
Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Lecture 1 and 2: Random Spanning Trees

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

Linear Algebra Formulas. Ben Lee

MA 265 FINAL EXAM Fall 2012

MAT Linear Algebra Collection of sample exams

Lecture 18. Ramanujan Graphs continued

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Lecture 1: Review of linear algebra

Algebra II. Paulius Drungilas and Jonas Jankauskas

Review problems for MA 54, Fall 2004.

Ma/CS 6b Class 20: Spectral Graph Theory

Linear Algebra Primer

1. General Vector Spaces

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Lecture 23: Trace and determinants! (1) (Final lecture)

Quantum Computing Lecture 2. Review of Linear Algebra

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

Linear algebra for computational statistics

Review of Linear Algebra

STAT200C: Review of Linear Algebra

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Linear Algebra Review

Exercise Sheet 1.

TBP MATH33A Review Sheet. November 24, 2018

Review of Some Concepts from Linear Algebra: Part 2

Linear Algebra Lecture Notes-II

1 Linear Algebra Problems

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Linear algebra and applications to graphs Part 1

Chapter 3. Matrices. 3.1 Matrices

Basic Elements of Linear Algebra

Introduction and Math Preliminaries

Recall the convention that, for us, all vectors are column vectors.

MATH 235. Final ANSWERS May 5, 2015

Archive of past papers, solutions and homeworks for. MATH 224, Linear Algebra 2, Spring 2013, Laurence Barker

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Math Linear Algebra Final Exam Review Sheet

Conceptual Questions for Review

Norms for Vectors and Matrices

OHSx XM511 Linear Algebra: Solutions to Online True/False Exercises

CS 143 Linear Algebra Review

(b) If a multiple of one row of A is added to another row to produce B then det(b) =det(a).

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Lecture 7: Positive Semidefinite Matrices

Ma/CS 6b Class 20: Spectral Graph Theory

Chapter 4 Euclid Space

LINEAR ALGEBRA QUESTION BANK

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Lecture 8: Linear Algebra Background

Linear Algebra. Session 12

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

Matrix Operations: Determinant

Online Exercises for Linear Algebra XM511

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Mathematical Methods wk 2: Linear Operators

Foundations of Matrix Analysis

MA 1B ANALYTIC - HOMEWORK SET 7 SOLUTIONS

Matrix Algebra, part 2

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

1 9/5 Matrices, vectors, and their applications

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

There are six more problems on the next two pages

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

MATH 369 Linear Algebra

Computational math: Assignment 1

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Chapter 0 Miscellaneous Preliminaries

Linear Algebra Practice Problems

In Class Peer Review Assignment 2

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

MH1200 Final 2014/2015

MATH 431: FIRST MIDTERM. Thursday, October 3, 2013.

1 Last time: least-squares problems

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS

Introduction Eigen Values and Eigen Vectors An Application Matrix Calculus Optimal Portfolio. Portfolios. Christopher Ting.

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

Math 408 Advanced Linear Algebra

1 Principal component analysis and dimensional reduction

MATRIX ANALYSIS HOMEWORK

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

1. In this problem, if the statement is always true, circle T; otherwise, circle F.

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

Computationally, diagonal matrices are the easiest to work with. With this idea in mind, we introduce similarity:

Symmetric Matrices and Eigendecomposition

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Math 240 Calculus III

Algorithms to Compute Bases and the Rank of a Matrix

235 Final exam review questions

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Transcription:

CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s] s. Linearity of Expectation: For any two Random variables X, Y, E [X + Y ] = E [X] + E [Y ]. Variance: The variance of a random variable X is defined as Var(X) = E [ (X E [X]) 2]. The following identity always holds, Var(X) = E [ X 2] (E [X]) 2. The standard deviation of X, σ(x) = Var(X). k-wise Independence For an integer k 2, a set of random variables X 1,..., X n is set to be k-wise independent if for any set S {1,..., n} of size k, [ ] E X i = X i. i S i S Sum of Variance: Let X 1,..., X n be pairwise independent random variables, then Var(X 1 + + X n ) = Var(X 1 ) + + Var(X n ). Markov s Inequality Let X be a nonnegative random variable, then for any k 0, P [X k] E [X] k. Chebyshev s Inequality For any random variable X and any ɛ > 0, So, equivalently, P [ X E [X] > ɛ] Var(X) ɛ 2. P [ X E [X] > kσ(x)] 1 k 2. 1

2 Lecture 7: Common-Knowledge / Cheat Sheet Hoeffding s Inequality Let X 1,..., X n be independent random variables where for all i, X i [a i, b i ]. Then, for any ɛ > 0, [ ] n n ( 2ɛ 2 ) P X i E X i > ɛ 2 exp n (a i b i ) 2 Multiplicative Chernoff Bound Let X 1,..., X n be independent Bernoulli random variables, i.e., for all i, X i {0, 1}, and let X = X 1 + + X n and µ = E [X]. Then, for any ɛ > 0, ( e ɛ ) µ P [X > (1 + ɛ)µ] (1 + ɛ) 1+ɛ e ɛ2 µ 2+ɛ, and P [X < (1 ɛ)µ] e ɛ2 µ/2 McDiarmid s Inequality Let X 1,..., X n X be independent random variables. Let f : X n R. If for all 1 i n and for all x 1,..., x n and x i, then, f(x 1,..., x n ) f(x 1,..., x i 1, x i, x i+1,..., x n ) c i, ) P [ f(x 1,..., X n ) Ef(X 1,..., X n ) > ɛ] 2 exp ( 2ɛ2. i c2 i Concentration of Gaussians Let X 1,..., X n be independent standard normal random variables i.e., for all i, X i N (0, 1). Then, for any ɛ > 0, [ ] n ( ) P Xi 2 ɛ 2 n > ɛ 2 exp 8 Gaussian Density Function N (µ, σ 2 ) is as follows: The density function of a 1-dimensional normal random variable X 1 2πσ 2 e (x µ)2 /2σ 2. More generally, we say X 1,..., X n form a multivariate normal random variable when they have following density function: det(2πσ) 1/2 e (x µ) Σ 1 (x µ)/2 where Σ is the covariance matrix of X 1,..., X n. In particular, for all i, j, Σ i,j = Cov(X i, X j ) = E [X i E [X i ]] E [X j E [X j ]] = E [X i X j ] E [X i ] E [X j ]. As a special case, if X 1,..., X n are standard normals chosen independently then Σ is just the identity matrix. 2 Spectral Algorithms Determinant Let A R n n, the determinant of A can be written as follows: det(a) = n A i,σ(i) sgn(σ). σ

Lecture 7: Common-Knowledge / Cheat Sheet 3 where the sum is over all permutations σ of the numbers 1,..., n, and sgn(σ) {+1, 1}. For a permutation σ, sgn(σ) is the parity of the number of swaps one needs to transform σ into the identity permutations. For example, for n = 4, sgn(1, 2, 3, 4) = +1 because we need no swaps, sgn(2, 1, 3, 4) = 1 because we can transform it to the identity just by swapping 1, 2 and sgn(3, 1, 2, 4) = +1. Properties of Determinant For a matrix A R n n, det(a) 0 if and only if the columns of A are linearly independent. Recall that for a set of vectors v 1,..., v n R n, we say they are linearly independent if for any set of coefficients c 1,..., c n c 1 v 1 + c 2 v 2 + + c n v n = 0 only when c 1 = c 2 = = c n = 0. In other words, v 1,..., v n are linearly independent if no v i can be written as a linear combination of the rest of the vectors. For any matrix A R n n, with eigenvalues λ 1,..., λ n, det(a) = So, det(a) = 0 iff A has at least one zero eigenvalue. So, it follows from the previous fact that A has a zero eigenvalue iff columns of A are linearly independent. For any two square matrices A, B R n n, n λ i det(ab) = det(a) det(b). Characteristic Polynomial For a matrix A R n n we write det(xi A) for an indeterminant (variable) x is called the characteristic polynomial of A. The roots of this polynomial are the eigenvalues of A. In particular, det(xi A) = (x λ 1 )(x λ 2 )... (x λ n ), where λ 1,..., λ n are the eigenvalues of A. It follows from the above identity that for x = 0, det( A) = λ i or equivalently, det(a) = n λ i. Rank The rank of a matrix A R n n is the number of nonzero eigenvalues of A. More generally, the rank of a matrix A R m n is the number of nonzero singular values of A. Or in other words, it is the number of nonzero eigenvalues of AA. PSD matrices We discuss several equivalent defnitions of PSD matrices. A symmetric matrix A R n n is positive semidefnite (PSD) iff All eigenvalues of A are nonnegative A can be written as BB for some matrix B R n m. x Ax 0 for all vectors x R n. det(a S,S ) 0 for all S {1,..., n} where A S,S denotes the square submatrix of A with rows and columns indexed by S.

4 Lecture 7: Common-Knowledge / Cheat Sheet The following fact about PSD matrices is immediate. If A 0 is an n n matrix, then for any matrix C R k n, CAC T 0. This is because for any vector x R k, where y = C T x. x T CAC T x = (C T x) T A(C T x) = y T Ay 0, For two symmetric A, B R n we write A B if and only if B A 0. In other words, A B if and only if for any vector x R n, x T Ax x T Bx. Let λ 1,..., λ n be the eigenvalues of A, and λ 1,..., λ n be the eigenvalues of B. If A B, then for all i, λ i λ i. Nonsymmetric Matrices where Any matrix A R m n (for m n) can be written as m A = σ i u i v i u 1,..., u m R m form an orthonormal set of vectors. These are called left singular vectors of A and they have the property, u i A = σ i v i. These vectors are the eigenvectors of the matrix AA. v 1,..., v m R n form an orthonormal set of vectors. Note that these vectors do not necessarily span the space. These vectors are eigenvectors of the matrix A A. σ 1,..., σ m are called the singular values of A. They are always real an nonnegative. In fact they are eigenvalues of the PSD matrix AA. Rotation Matrix A matrix R n n is a rotation matrix iff Rx 2 = x 2 for all vectors x R n. In other words, R as an operator preserves the norm of all vectors. Next, we discuss equivalent definitions of R being a rotation matrix. R is a rotation matrix iff RR = I. All singular values of R are 1. Columns of R form an orthonormal set of vectors in R n. Projection Matrix A symmetric matrix P R n n is a projection matrix iff It can be written as P = k v iv i for some 1 k n. All eigenvalues of P are 0 or 1. P P = P. It follows from the spectral theorem that there is a unique projection matrix of rank n and that is the identity matrix. In general a projection matrix projects any given vector x to the linear subspace corresponding to span of the vectors v 1,..., v k.

Lecture 7: Common-Knowledge / Cheat Sheet 5 Trace For a square matrix A R n n we write Tr(A) = to denote the sum of entries on the diagonal of A. Next, we discuss several properties of the trace. n A i,i Trace of A is equal to the sum of all eigenvalues of A. Trace is a linear operator, for any two square matrices A, B R n n Tr(A + B) = Tr(A) + Tr(B) Tr(tA) = t Tr(A), t R. It follows by the previous fact that for a random matrix X, E [Tr(X)] = Tr(E [X]). For any pair of matrices A R n k and B R k n such that AB is a square matrix we have Tr(AB) = Tr(BA). So, in particular, for any vector v R n, Tr(vv ) = Tr(v v) = v 2. For any matrix A R m n m n A 2 F = A 2 i,j = Tr(AA ). j=1 Matrix Chernoff Bound Let X be a random n n PSD matrix. Suppose that X αe [X]with probability 1 for some α 0. Let X 1,..., X k be independent copies of X. Then, for any 0 < ɛ < 1, [ P (1 ɛ)e [X] 1 ] k (X 1 + + X k ) (1 + ɛ)e [X] 1 2ne ɛ2k/4α. 3 Optimization Convex Functions have A function f : R n R is convex on a set S R n if for any two points x, y S, we ( ) x + y f 1 (f(x) + f(y)). 2 2 We say f is concave if for any such x, y S, we have f (f(αx + (1 α)y) αf(x) + (1 α)f(y), for any 0 α 1. There is an equivalent definition of convexity: For a function f : R n R, the Hessian of f, 2 f is a n n matrix defined as follows: ( 2 f) i,j = xi xj f

6 Lecture 7: Common-Knowledge / Cheat Sheet for all 1 i, j n. We can show that f is convex over S if and only if for all a S, 2 f 0. x=a For example, consider the function f(x) = x T Ax for x R n and A R n n. Then, 2 f = A. So, f is convex (over R n ) if and only if A 0. For another example, let f : R R be f(x) = x k for some integer k 2. Then, f (x) = k(k 1)x k 2. If k is an even integer, f (x) 0 over all x R, so f is convex over all real numbers. On the other hand, if k is an odd integer then f (x) 0 if and only if x 0. So, in this f is convex only over non-negative reals. Similarly, f is concave over S, if 2 f 0 for all a S. For example, x log x is concave over all x=a positive reals. Convex set We say a set S R n is convex if for any pair of points x, y S, the line segment connecting x to y is in S. For example, let f : R n R be a convex function over a set S R n. Let t R, and define T = {x R n : f(x) t}. Then, T is convex. This is because if x, y T, then for any 0 α 1, f(αx + (1 α)y) αf(x) + (1 α)f(y) αt + (1 α)t = t where the first inequality follows by convexity of f. So, αx + (1 α) T and T is convex. Norms are Convex functions following three properties, A norm is defined as a function that maps R n to R and satisfies the i) x 0 for all x R n, ii) αx = α x for all α 0 and x R n, iii) Triangle inequality: x + y x + y for all x, y R n. It is easy to see that any norm function is a convex function: This is because for any x, y R n, and 0 α 1, αx + (1 α)y αx + (1 α)y = α x + (1 α) y. 4 Useful Inequalities For real numbers, a 1,..., a n and nonnegative reals b 1,..., b n, min i a i a 1 + + a n a i max b i b 1 + + b n i b i Cauchy-Schwartz inequality: For real numbers a 1,..., a n, b 1,..., b n, n a i b i a 2 i i i b 2 i

Lecture 7: Common-Knowledge / Cheat Sheet 7 There is an equivalent vector-version of the above inequality. For any two vectors u, v R n, n u i v i = u, v u v The equality in the above holds only when u, v are parallel. AM-GM inequality: For any n nonnegative real numbers a 1,...,..., a n, a 1 + + a n n (a 1 a 2... a n ) 1/n. Relation between norms: For any vector a R n, a 2 a 1 n a 2 The right inequality is just Cauchy-Schwartz inequality. For any real numbers a 1,..., a n, ( a 1 + + a n ) 2 n(a 2 1 + + a 2 n). This is indeed a special case of Cauchy-Schwartz inequality. For any real number x, 1 x e x. In this course we use 1 x e x to simplify calculations.