PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Size: px
Start display at page:

Download "PCA with random noise. Van Ha Vu. Department of Mathematics Yale University"

Transcription

1 PCA with random noise Van Ha Vu Department of Mathematics Yale University

2 An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical analysis) is to compute the first few singular vectors of a large matrix. Among others, this problem lies at the heart of PCA (Principal Component Analysis), which has a very wide range of applications. Problem. For a matrix A of size n n with singular values σ 1 σ n 0, let v 1,..., v n be the corresponding (unit) singular vectors. Compute v 1,..., v k, for some k n.

3 Typically n is large and k is relatively small. As a matter of fact, in many applications k is a constant independent of n. For example, to obtain a visualization of a large set of data, one often sets k = 2 or 3. The assumption that A is a square matrix is for convenience and our analysis can be carried out with nominal modification for rectangular matrices. Asymptotic notation: Θ, Ω, O under the assumption that n. For a vector v, v denotes its L 2 norm. For a matrix A, A = σ 1 (A) denotes its spectral norm.

4 A model. The matrix A, which represents data, is often perturbed by noise. Thus, one works with A + E, where E represents the noise. A natural and important problem is to estimate the influence of noise on the vectors v 1,..., v k. We denote by v 1,..., v k the first k singular vectors of A + E. Question. When is v 1 a good approximation of v 1 or how much the noise change v 1? For singular values (Weyl s bound) σ 1 (A + E) σ 1 (A) σ 1 (E). If E 0, σ 1 (A + E) σ 1 (A). In other words, σ 1 is continuous.

5 On the other hand, the singular vectors are not continuous. Let A be the matrix ( ) 1 + ɛ ɛ Apparently, the singular values of A are 1 + ɛ and 1 ɛ, with corresponding singular vectors (1, 0) and (0, 1). Let E be ( ) ɛ ɛ, ɛ ɛ where ɛ is a small positive number. The perturbed matrix A + E has the form ( ) 1 ɛ. ɛ 1 Obviously, the singular values A + E are also 1 + ɛ and 1 ɛ. However, the corresponding singular vectors now are ( 1 1 2, 2 ) and ( 1 2, 1 2 ), no matter how small ɛ is.

6 A traditional way to measure the distance between two vectors v and v is to look at sin (v, v ), where (v, v ) is the angle between the vectors, taken in [0, π/2] Let us fix a small parameter ɛ > 0, which represents a desired accuracy. We want find a sufficient condition for the matrix A which guarantees that sin (v 1, v 1 ) ɛ. The key parameter to look at is the gap (or separation) δ := σ 1 σ 2, between the first and second singular values of A. Theorem (Wedin sin theorem) There is a positive constant C such that sin (v 1, v 1) C E δ.

7 Corollary For any given ɛ > 0, there is C = C(ɛ) > 0 such that if δ C E, then sin (v 1, v 1) ɛ. In the case when A and A + E are Hermitian, this statement is a special case of the Davis-Kahan sin θ theorem. Wedin extended Davis-Kahan theorem to non-hermitian matrices.

8 Random perturbation Noise (or perturbation) represents errors that come from various sources which are frequently of entirely different nature, such as errors occurring in measurements, errors occurring in recording and transmitting data, errors occurring by rounding etc. It is usually too complicated to model noise deterministically, so in practice, one often assumes that it is random. In particular, a popular model is that the entries of E are independent random variables with mean 0 and variance 1 (the value 1 is, of course, just matter of normalization).

9 For simplicity, we restrict ourselves to a representative case when all entries of E are iid Bernoulli random variables, taking values ±1 with probability half. We prefer the Bernoulli model over the Gaussian one for two reasons: In many real-life applications, noise must have discrete nature (after all, data are finite). So it seems reasonable to use random variables with discrete support to model noise, and Bernoulli is the simplest such variable. The analysis for the Bernoulli model easily extends to many other models of random matrices (including the Gaussian one). On the other hand, the analysis for gaussian matrices often relies on special properties of the Gaussian measure which are not available in other cases.

10 It is well known that a random matrix of size n has norm E 2 n, with high probability. Corollary For any given ɛ > 0, there is C = C(ɛ) > 0 such that if δ C n, then with probability 1 o(1) sin (v 1, v 1) ɛ.

11 Empirical CDF 1 Empirical CDF F(x) 0.5 F(x) x x matrix of rank 2, with gaps being 1 and 8, respectively; the efficient gap is much less than predicted by Wedin s bound.

12 Empirical CDF 1 Empirical CDF F(x) 0.5 F(x) x x matrix of rank 2, with gaps being 1 and 10, respectively.

13 Empirical CDF F(x) x 1 Empirical CDF F(x) x

14 Low dimensional data and improved bounds In a large variety of problems, the data is of small dimension, namely, r := rank A n. In this setting, we discovered that the results can be significantly improved. This improvement will reflect the real dimension r, rather than the size n of the matrix. Corollary For any positive constant ɛ there is a positive constant C = C(ɛ) such that the following holds. Assume that A has rank r n.99 and n r log n σ 1 and δ C r log n. Then with probability 1 o(1) sin (v 1, v 1) ɛ. (1)

15 Theorem (Probabilistic sin-theorem) For any positive constants α 1, α 2 there is a positive constant C such that the following holds. Assume that A has rank r n 1 α 1 and σ 1 := σ 1 (A) n α 2. Let E be a random Bernoulli matrix. Then with probabilty 1 o(1) ( sin 2 (v 1, v 1) r log n n ) C max,. (2) δ δσ 1

16 Let us now consider the general case when we try to approximate the first k singular vectors. Set ɛ k := sin (v k, v k ) and s k := (ɛ ɛ2 k )1/2. We can bound ɛ k recursively as follows. Theorem For any positive constants α 1, α 2, k there is a positive constant C such that the following holds. Assume that A has rank r n 1 α 1 and σ 1 := σ 1 (A) n α 2. Let E be a random Bernoulli matrix. Then with probabilty 1 o(1) ( ɛ 2 k C max r log n n n,,, σ2 1 s2 k 1, (σ 1 + n)(σk + n)s ) k 1. δ k σ k δ k σ k σ k δ k σ k δ k (3)

17 Take A such that r = n o(1), σ 1 = 2n α, σ 2 = n α, δ 2 = n β, where α > 1/2 > ( β > 1 α are positive constants. Then δ 1 = n α and ɛ 2 1 max n α+o(1), n ), 1 2α+o(1) almost surely. Assume that we want to bound sin (v 2, v 2 ). The gap δ 2 = n β = o(n 1/2 ), so Wedin theorem does not apply. On the other hand, our theorem implies that almost surely ( ɛ 2 2 max n β+o(1), n 1/2 α+o(1), n α β+1). Thus, we have almost surely sin (v 2, v 2) = n Ω(1) = o(1).

18 Proof strategy. Bound the difference σ 1 σ 1 from both above and below. Show that if v 1 is far from v 1, then σ 1 is far from σ 1. The second step relies on the formula σ 1 := sup v (A + E)v. v =1 It suffices to consider v in an ɛ-net of the unit sphere. Critical step: It suffices to restrict to a subset of dimension roughly rank A!!.

19 Fix a system v 1,..., v n of unit singular vectors of A. It is well-known that v 1,..., v n form an orthonormal basis. (If A has rank r, the choice of v r+1,..., v n will turn out to be irrelevant.) For a vector v, if we decompose it as then v := α 1 v α n v n, Av 2 = v A Av = n αi 2 σi 2. (4) i=1 Courant-Fisher minimax principle for singular values: σ k (M) = max min dim H=k v H, v =1 where σ k (M) is the kth largest singular value of M. Mv, (5)

20 Let ɛ be a positive number. A set X is an ɛ-net of a set Y if for any y Y, there is x X such that x y ɛ. Lemma [ɛ-approximation lemma] Let H be a subspace and S := {v v = 1, v H}. Let 0 < ɛ 1 be a number and M a linear map. Let N S be an ɛ-net of S. Then there is a vector w N such that Mw (1 ɛ) max v S Mv. Let v be the vector where the maximum is attained and let w be a vector in the net closest to v (tights are broken arbitrarily). Then by the triangle inequality Mw Mv M(v w). As v w ɛ, M(v w) ɛ max v S Mv.

21 Lemma [Net size] A unit sphere in d dimension admits an ɛ-net of size at most (3ɛ 1 ) d. Let S be the sphere in question, centered at O, and N S be a finite subset of S such that the distance between any two points is at least ɛ. If N is maximal with respect to this property then N is an ɛ-net. On the other hand, the balls of radius ɛ/2 centered at the points in N are disjoint subsets of the the ball of radius (1 + ɛ/2), centered at O. Since 1 + ɛ/2 ɛ/2 3ɛ 1 the claim follows by a volume argument.

22 Lemma [Spectral norm; Alon-Krivelevich-V.] There is a constant C 0 > 0 such that the following holds. Let E be a random Bernoulli matrix of size n. Then P( E 3 n) exp( C 0 n). Next, we present a lemma which roughly asserts that for any two vectors given u and v, u and Ev are, with high probability, almost orthogonal. Lemma [Orthogonality lemma] Let E be a random Bernoulli matrix of size n. For any fixed unit vectors u, v and positive number t P( u T Ev t) 2 exp( t 2 /16).

23 Lemma [Main lemma] For any constant 0 < β 1 there is a constant C such that the following holds. Assume that A is such that σ 1 n β 1 and let V := {v 1,..., v d } for some d = o(n/ log n).. Then the following holds almost surely. For any unit vector v V (A + E)v 2 n (v v i ) 2 σi 2 + C(n + σ 1 d log n). i=1 It is important that the statement holds for all unit v simultaneously.

24 It suffices to prove for v belonging to an ɛ-net N of the unit sphere S in V, with ɛ := 1 n+σ 1. With such small ɛ, the error coming from the term (1 ɛ) is swallowed into the error term O(n + σ 1 d log n). Thanks to the upper on the net size, it suffices to show that if C is large enough, then for any v N P( (A+E)v 2 n (v v i ) 2 +C(n+σ 1 d log n)) exp( 2C1 d log n) i=1 for any fixed v N. Fix v N. (A + E)v 2 = Av 2 + Ev 2 + 2(Av) (Ev) n = (v v i ) 2 σi 2 + Ev 2 + 2(Av) (Ev). i=1 Use the spectral norm lemma and the orthogonality lemma.

25 Let and u i (1 i n) be the singular vectors of the matrix A. First, we give a lower bound for σ 1 := A + E. By the minimax principle, we have σ 1 = A + E u T 1 (A + E)v 1 = σ 1 + u T 1 Ev 1. By orthogonality lemma with probability 1 o(1), u T 1 Ev 1 log log n. (The choice of log log n is not important. One can replace it by any function that tends slowly to infinity with n.) Thus, we have, with probability 1 o(1), that A + E σ 1 log log n. (6) Our main observation is that, with high probability, any v that is far from v 1 would yield (A + E)v < σ 1 log log n. Therefore, the first singular vector v 1 of A + E must be close to v 1.

26 Consider a unit vector v and write it as v = c 1 v 1 + c 2 v c r v r + c 0 u (7) where u is a unit vector orthogonal to H := {v 1,..., v r } and c c2 r + c 2 0 = 1. Recall that r is the rank of A, so Au = 0. Setting w := c 1 v c r v r and using Cauchy-Schwartz, we have (A + E)v 2 = (A + E)w + c 0 Eu 2 (A + E)w 2 + 2c 0 (A + E)w Eu + c 2 0 Eu 2 (1 + c2 0 4 ) (A + E)w 2 + (4 + c 2 0 ) Eu 2.

27 By Spectral norm Lemma, we have, with probability 1 o(1), that Eu 3 n for every unit vector u. Furthermore, by Main Lemma, we have, with probability 1 o(1), (A + E)w 2 r (w v i ) 2 + O(σ 1 r log n + n) i=1 for every vector w H of length at most 1. Since r (w v i ) 2 σi 2 = i=1 r ci 2 σi 2 (1 c0 2 )σ1 2 (1 c0 2 c1 2 )(σ1 2 σ2), 2 i=1 we can conclude that with probability 1 o(1) the following holds. Any unit vector v written in the form above form satisfies c 2 0 /4 (A + E)v 2 (1 c 2 0 )σ 2 1 (1 c 2 0 c 2 1 )(σ 2 1 σ 2 2) + O(σ 1 r log n + n).

28 Set v to be the first singular vector of A + E. By the lower bound on (A + E)v c 2 0 /4 (A + E)v 2 (1 c2 0 4 )(σ 1 log log n) 2. Combining it with the previous inequality we get (1 c 2 1 )σ 1 δ c2 0 4 σ2 1 c 2 0 σ C(σ 1 r log n + n). From here we can get a upper bound on 1 c1 2 after some manipulation.

29 Further directions of research. Improve bounds. Other models of random matrices. Limiting distributions. Data in low dimension.

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Steven J. Miller June 19, 2004 Abstract Matrices can be thought of as rectangular (often square) arrays of numbers, or as

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Small Ball Probability, Arithmetic Structure and Random Matrices

Small Ball Probability, Arithmetic Structure and Random Matrices Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA) Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

5 Compact linear operators

5 Compact linear operators 5 Compact linear operators One of the most important results of Linear Algebra is that for every selfadjoint linear map A on a finite-dimensional space, there exists a basis consisting of eigenvectors.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Norms of Random Matrices & Low-Rank via Sampling

Norms of Random Matrices & Low-Rank via Sampling CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

A geometric proof of the spectral theorem for real symmetric matrices

A geometric proof of the spectral theorem for real symmetric matrices 0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

Random Fermionic Systems

Random Fermionic Systems Random Fermionic Systems Fabio Cunden Anna Maltsev Francesco Mezzadri University of Bristol December 9, 2016 Maltsev (University of Bristol) Random Fermionic Systems December 9, 2016 1 / 27 Background

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Extreme eigenvalues of Erdős-Rényi random graphs

Extreme eigenvalues of Erdős-Rényi random graphs Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why? KTH ROYAL INSTITUTE OF TECHNOLOGY Norms for vectors and matrices Why? Lecture 5 Ch. 5, Norms for vectors and matrices Emil Björnson/Magnus Jansson/Mats Bengtsson April 27, 2016 Problem: Measure size of

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Peter J. Dukes. 22 August, 2012

Peter J. Dukes. 22 August, 2012 22 August, 22 Graph decomposition Let G and H be graphs on m n vertices. A decompostion of G into copies of H is a collection {H i } of subgraphs of G such that each H i = H, and every edge of G belongs

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

CSE 206A: Lattice Algorithms and Applications Spring Minkowski s theorem. Instructor: Daniele Micciancio

CSE 206A: Lattice Algorithms and Applications Spring Minkowski s theorem. Instructor: Daniele Micciancio CSE 206A: Lattice Algorithms and Applications Spring 2014 Minkowski s theorem Instructor: Daniele Micciancio UCSD CSE There are many important quantities associated to a lattice. Some of them, like the

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

The 123 Theorem and its extensions

The 123 Theorem and its extensions The 123 Theorem and its extensions Noga Alon and Raphael Yuster Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv, Israel Abstract It is shown

More information

Recall that any inner product space V has an associated norm defined by

Recall that any inner product space V has an associated norm defined by Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Notes taken by Costis Georgiou revised by Hamed Hatami

Notes taken by Costis Georgiou revised by Hamed Hatami CSC414 - Metric Embeddings Lecture 6: Reductions that preserve volumes and distance to affine spaces & Lower bound techniques for distortion when embedding into l Notes taken by Costis Georgiou revised

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v ) Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

On the concentration of eigenvalues of random symmetric matrices

On the concentration of eigenvalues of random symmetric matrices On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Lecture 8: The Goemans-Williamson MAXCUT algorithm

Lecture 8: The Goemans-Williamson MAXCUT algorithm IU Summer School Lecture 8: The Goemans-Williamson MAXCUT algorithm Lecturer: Igor Gorodezky The Goemans-Williamson algorithm is an approximation algorithm for MAX-CUT based on semidefinite programming.

More information

MATH 31BH Homework 1 Solutions

MATH 31BH Homework 1 Solutions MATH 3BH Homework Solutions January 0, 04 Problem.5. (a) (x, y)-plane in R 3 is closed and not open. To see that this plane is not open, notice that any ball around the origin (0, 0, 0) will contain points

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

A Simple Algorithm for Clustering Mixtures of Discrete Distributions

A Simple Algorithm for Clustering Mixtures of Discrete Distributions 1 A Simple Algorithm for Clustering Mixtures of Discrete Distributions Pradipta Mitra In this paper, we propose a simple, rotationally invariant algorithm for clustering mixture of distributions, including

More information

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Jan van den Heuvel and Snežana Pejić Department of Mathematics London School of Economics Houghton Street,

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Invertibility of symmetric random matrices

Invertibility of symmetric random matrices Invertibility of symmetric random matrices Roman Vershynin University of Michigan romanv@umich.edu February 1, 2011; last revised March 16, 2012 Abstract We study n n symmetric random matrices H, possibly

More information

Grothendieck s Inequality

Grothendieck s Inequality Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Inner product spaces. Layers of structure:

Inner product spaces. Layers of structure: Inner product spaces Layers of structure: vector space normed linear space inner product space The abstract definition of an inner product, which we will see very shortly, is simple (and by itself is pretty

More information

1: Introduction to Lattices

1: Introduction to Lattices CSE 206A: Lattice Algorithms and Applications Winter 2012 Instructor: Daniele Micciancio 1: Introduction to Lattices UCSD CSE Lattices are regular arrangements of points in Euclidean space. The simplest

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012 Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

Lecture 3. Random Fourier measurements

Lecture 3. Random Fourier measurements Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

Analysis in weighted spaces : preliminary version

Analysis in weighted spaces : preliminary version Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.

More information

Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today

Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Class notes: Approximation

Class notes: Approximation Class notes: Approximation Introduction Vector spaces, linear independence, subspace The goal of Numerical Analysis is to compute approximations We want to approximate eg numbers in R or C vectors in R

More information

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information