PCA with random noise. Van Ha Vu. Department of Mathematics Yale University
|
|
- Willis Fletcher
- 5 years ago
- Views:
Transcription
1 PCA with random noise Van Ha Vu Department of Mathematics Yale University
2 An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical analysis) is to compute the first few singular vectors of a large matrix. Among others, this problem lies at the heart of PCA (Principal Component Analysis), which has a very wide range of applications. Problem. For a matrix A of size n n with singular values σ 1 σ n 0, let v 1,..., v n be the corresponding (unit) singular vectors. Compute v 1,..., v k, for some k n.
3 Typically n is large and k is relatively small. As a matter of fact, in many applications k is a constant independent of n. For example, to obtain a visualization of a large set of data, one often sets k = 2 or 3. The assumption that A is a square matrix is for convenience and our analysis can be carried out with nominal modification for rectangular matrices. Asymptotic notation: Θ, Ω, O under the assumption that n. For a vector v, v denotes its L 2 norm. For a matrix A, A = σ 1 (A) denotes its spectral norm.
4 A model. The matrix A, which represents data, is often perturbed by noise. Thus, one works with A + E, where E represents the noise. A natural and important problem is to estimate the influence of noise on the vectors v 1,..., v k. We denote by v 1,..., v k the first k singular vectors of A + E. Question. When is v 1 a good approximation of v 1 or how much the noise change v 1? For singular values (Weyl s bound) σ 1 (A + E) σ 1 (A) σ 1 (E). If E 0, σ 1 (A + E) σ 1 (A). In other words, σ 1 is continuous.
5 On the other hand, the singular vectors are not continuous. Let A be the matrix ( ) 1 + ɛ ɛ Apparently, the singular values of A are 1 + ɛ and 1 ɛ, with corresponding singular vectors (1, 0) and (0, 1). Let E be ( ) ɛ ɛ, ɛ ɛ where ɛ is a small positive number. The perturbed matrix A + E has the form ( ) 1 ɛ. ɛ 1 Obviously, the singular values A + E are also 1 + ɛ and 1 ɛ. However, the corresponding singular vectors now are ( 1 1 2, 2 ) and ( 1 2, 1 2 ), no matter how small ɛ is.
6 A traditional way to measure the distance between two vectors v and v is to look at sin (v, v ), where (v, v ) is the angle between the vectors, taken in [0, π/2] Let us fix a small parameter ɛ > 0, which represents a desired accuracy. We want find a sufficient condition for the matrix A which guarantees that sin (v 1, v 1 ) ɛ. The key parameter to look at is the gap (or separation) δ := σ 1 σ 2, between the first and second singular values of A. Theorem (Wedin sin theorem) There is a positive constant C such that sin (v 1, v 1) C E δ.
7 Corollary For any given ɛ > 0, there is C = C(ɛ) > 0 such that if δ C E, then sin (v 1, v 1) ɛ. In the case when A and A + E are Hermitian, this statement is a special case of the Davis-Kahan sin θ theorem. Wedin extended Davis-Kahan theorem to non-hermitian matrices.
8 Random perturbation Noise (or perturbation) represents errors that come from various sources which are frequently of entirely different nature, such as errors occurring in measurements, errors occurring in recording and transmitting data, errors occurring by rounding etc. It is usually too complicated to model noise deterministically, so in practice, one often assumes that it is random. In particular, a popular model is that the entries of E are independent random variables with mean 0 and variance 1 (the value 1 is, of course, just matter of normalization).
9 For simplicity, we restrict ourselves to a representative case when all entries of E are iid Bernoulli random variables, taking values ±1 with probability half. We prefer the Bernoulli model over the Gaussian one for two reasons: In many real-life applications, noise must have discrete nature (after all, data are finite). So it seems reasonable to use random variables with discrete support to model noise, and Bernoulli is the simplest such variable. The analysis for the Bernoulli model easily extends to many other models of random matrices (including the Gaussian one). On the other hand, the analysis for gaussian matrices often relies on special properties of the Gaussian measure which are not available in other cases.
10 It is well known that a random matrix of size n has norm E 2 n, with high probability. Corollary For any given ɛ > 0, there is C = C(ɛ) > 0 such that if δ C n, then with probability 1 o(1) sin (v 1, v 1) ɛ.
11 Empirical CDF 1 Empirical CDF F(x) 0.5 F(x) x x matrix of rank 2, with gaps being 1 and 8, respectively; the efficient gap is much less than predicted by Wedin s bound.
12 Empirical CDF 1 Empirical CDF F(x) 0.5 F(x) x x matrix of rank 2, with gaps being 1 and 10, respectively.
13 Empirical CDF F(x) x 1 Empirical CDF F(x) x
14 Low dimensional data and improved bounds In a large variety of problems, the data is of small dimension, namely, r := rank A n. In this setting, we discovered that the results can be significantly improved. This improvement will reflect the real dimension r, rather than the size n of the matrix. Corollary For any positive constant ɛ there is a positive constant C = C(ɛ) such that the following holds. Assume that A has rank r n.99 and n r log n σ 1 and δ C r log n. Then with probability 1 o(1) sin (v 1, v 1) ɛ. (1)
15 Theorem (Probabilistic sin-theorem) For any positive constants α 1, α 2 there is a positive constant C such that the following holds. Assume that A has rank r n 1 α 1 and σ 1 := σ 1 (A) n α 2. Let E be a random Bernoulli matrix. Then with probabilty 1 o(1) ( sin 2 (v 1, v 1) r log n n ) C max,. (2) δ δσ 1
16 Let us now consider the general case when we try to approximate the first k singular vectors. Set ɛ k := sin (v k, v k ) and s k := (ɛ ɛ2 k )1/2. We can bound ɛ k recursively as follows. Theorem For any positive constants α 1, α 2, k there is a positive constant C such that the following holds. Assume that A has rank r n 1 α 1 and σ 1 := σ 1 (A) n α 2. Let E be a random Bernoulli matrix. Then with probabilty 1 o(1) ( ɛ 2 k C max r log n n n,,, σ2 1 s2 k 1, (σ 1 + n)(σk + n)s ) k 1. δ k σ k δ k σ k σ k δ k σ k δ k (3)
17 Take A such that r = n o(1), σ 1 = 2n α, σ 2 = n α, δ 2 = n β, where α > 1/2 > ( β > 1 α are positive constants. Then δ 1 = n α and ɛ 2 1 max n α+o(1), n ), 1 2α+o(1) almost surely. Assume that we want to bound sin (v 2, v 2 ). The gap δ 2 = n β = o(n 1/2 ), so Wedin theorem does not apply. On the other hand, our theorem implies that almost surely ( ɛ 2 2 max n β+o(1), n 1/2 α+o(1), n α β+1). Thus, we have almost surely sin (v 2, v 2) = n Ω(1) = o(1).
18 Proof strategy. Bound the difference σ 1 σ 1 from both above and below. Show that if v 1 is far from v 1, then σ 1 is far from σ 1. The second step relies on the formula σ 1 := sup v (A + E)v. v =1 It suffices to consider v in an ɛ-net of the unit sphere. Critical step: It suffices to restrict to a subset of dimension roughly rank A!!.
19 Fix a system v 1,..., v n of unit singular vectors of A. It is well-known that v 1,..., v n form an orthonormal basis. (If A has rank r, the choice of v r+1,..., v n will turn out to be irrelevant.) For a vector v, if we decompose it as then v := α 1 v α n v n, Av 2 = v A Av = n αi 2 σi 2. (4) i=1 Courant-Fisher minimax principle for singular values: σ k (M) = max min dim H=k v H, v =1 where σ k (M) is the kth largest singular value of M. Mv, (5)
20 Let ɛ be a positive number. A set X is an ɛ-net of a set Y if for any y Y, there is x X such that x y ɛ. Lemma [ɛ-approximation lemma] Let H be a subspace and S := {v v = 1, v H}. Let 0 < ɛ 1 be a number and M a linear map. Let N S be an ɛ-net of S. Then there is a vector w N such that Mw (1 ɛ) max v S Mv. Let v be the vector where the maximum is attained and let w be a vector in the net closest to v (tights are broken arbitrarily). Then by the triangle inequality Mw Mv M(v w). As v w ɛ, M(v w) ɛ max v S Mv.
21 Lemma [Net size] A unit sphere in d dimension admits an ɛ-net of size at most (3ɛ 1 ) d. Let S be the sphere in question, centered at O, and N S be a finite subset of S such that the distance between any two points is at least ɛ. If N is maximal with respect to this property then N is an ɛ-net. On the other hand, the balls of radius ɛ/2 centered at the points in N are disjoint subsets of the the ball of radius (1 + ɛ/2), centered at O. Since 1 + ɛ/2 ɛ/2 3ɛ 1 the claim follows by a volume argument.
22 Lemma [Spectral norm; Alon-Krivelevich-V.] There is a constant C 0 > 0 such that the following holds. Let E be a random Bernoulli matrix of size n. Then P( E 3 n) exp( C 0 n). Next, we present a lemma which roughly asserts that for any two vectors given u and v, u and Ev are, with high probability, almost orthogonal. Lemma [Orthogonality lemma] Let E be a random Bernoulli matrix of size n. For any fixed unit vectors u, v and positive number t P( u T Ev t) 2 exp( t 2 /16).
23 Lemma [Main lemma] For any constant 0 < β 1 there is a constant C such that the following holds. Assume that A is such that σ 1 n β 1 and let V := {v 1,..., v d } for some d = o(n/ log n).. Then the following holds almost surely. For any unit vector v V (A + E)v 2 n (v v i ) 2 σi 2 + C(n + σ 1 d log n). i=1 It is important that the statement holds for all unit v simultaneously.
24 It suffices to prove for v belonging to an ɛ-net N of the unit sphere S in V, with ɛ := 1 n+σ 1. With such small ɛ, the error coming from the term (1 ɛ) is swallowed into the error term O(n + σ 1 d log n). Thanks to the upper on the net size, it suffices to show that if C is large enough, then for any v N P( (A+E)v 2 n (v v i ) 2 +C(n+σ 1 d log n)) exp( 2C1 d log n) i=1 for any fixed v N. Fix v N. (A + E)v 2 = Av 2 + Ev 2 + 2(Av) (Ev) n = (v v i ) 2 σi 2 + Ev 2 + 2(Av) (Ev). i=1 Use the spectral norm lemma and the orthogonality lemma.
25 Let and u i (1 i n) be the singular vectors of the matrix A. First, we give a lower bound for σ 1 := A + E. By the minimax principle, we have σ 1 = A + E u T 1 (A + E)v 1 = σ 1 + u T 1 Ev 1. By orthogonality lemma with probability 1 o(1), u T 1 Ev 1 log log n. (The choice of log log n is not important. One can replace it by any function that tends slowly to infinity with n.) Thus, we have, with probability 1 o(1), that A + E σ 1 log log n. (6) Our main observation is that, with high probability, any v that is far from v 1 would yield (A + E)v < σ 1 log log n. Therefore, the first singular vector v 1 of A + E must be close to v 1.
26 Consider a unit vector v and write it as v = c 1 v 1 + c 2 v c r v r + c 0 u (7) where u is a unit vector orthogonal to H := {v 1,..., v r } and c c2 r + c 2 0 = 1. Recall that r is the rank of A, so Au = 0. Setting w := c 1 v c r v r and using Cauchy-Schwartz, we have (A + E)v 2 = (A + E)w + c 0 Eu 2 (A + E)w 2 + 2c 0 (A + E)w Eu + c 2 0 Eu 2 (1 + c2 0 4 ) (A + E)w 2 + (4 + c 2 0 ) Eu 2.
27 By Spectral norm Lemma, we have, with probability 1 o(1), that Eu 3 n for every unit vector u. Furthermore, by Main Lemma, we have, with probability 1 o(1), (A + E)w 2 r (w v i ) 2 + O(σ 1 r log n + n) i=1 for every vector w H of length at most 1. Since r (w v i ) 2 σi 2 = i=1 r ci 2 σi 2 (1 c0 2 )σ1 2 (1 c0 2 c1 2 )(σ1 2 σ2), 2 i=1 we can conclude that with probability 1 o(1) the following holds. Any unit vector v written in the form above form satisfies c 2 0 /4 (A + E)v 2 (1 c 2 0 )σ 2 1 (1 c 2 0 c 2 1 )(σ 2 1 σ 2 2) + O(σ 1 r log n + n).
28 Set v to be the first singular vector of A + E. By the lower bound on (A + E)v c 2 0 /4 (A + E)v 2 (1 c2 0 4 )(σ 1 log log n) 2. Combining it with the previous inequality we get (1 c 2 1 )σ 1 δ c2 0 4 σ2 1 c 2 0 σ C(σ 1 r log n + n). From here we can get a upper bound on 1 c1 2 after some manipulation.
29 Further directions of research. Improve bounds. Other models of random matrices. Limiting distributions. Data in low dimension.
arxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationDissertation Defense
Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationThroughout these notes we assume V, W are finite dimensional inner product spaces over C.
Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal
More informationReview of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem
Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Steven J. Miller June 19, 2004 Abstract Matrices can be thought of as rectangular (often square) arrays of numbers, or as
More informationBALANCING GAUSSIAN VECTORS. 1. Introduction
BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationSmall Ball Probability, Arithmetic Structure and Random Matrices
Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLeast singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)
Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix
More informationRandom matrices: Distribution of the least singular value (via Property Testing)
Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued
More informationSpectral Theorem for Self-adjoint Linear Operators
Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationA Randomized Algorithm for the Approximation of Matrices
A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More information5 Compact linear operators
5 Compact linear operators One of the most important results of Linear Algebra is that for every selfadjoint linear map A on a finite-dimensional space, there exists a basis consisting of eigenvectors.
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationNorms of Random Matrices & Low-Rank via Sampling
CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationA geometric proof of the spectral theorem for real symmetric matrices
0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationOctober 25, 2013 INNER PRODUCT SPACES
October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal
More informationHILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define
HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationSPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS
SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS TSOGTGEREL GANTUMUR Abstract. After establishing discrete spectra for a large class of elliptic operators, we present some fundamental spectral properties
More informationMath 408 Advanced Linear Algebra
Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x
More informationRandom Fermionic Systems
Random Fermionic Systems Fabio Cunden Anna Maltsev Francesco Mezzadri University of Bristol December 9, 2016 Maltsev (University of Bristol) Random Fermionic Systems December 9, 2016 1 / 27 Background
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationExtreme eigenvalues of Erdős-Rényi random graphs
Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More informationLecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?
KTH ROYAL INSTITUTE OF TECHNOLOGY Norms for vectors and matrices Why? Lecture 5 Ch. 5, Norms for vectors and matrices Emil Björnson/Magnus Jansson/Mats Bengtsson April 27, 2016 Problem: Measure size of
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More informationU.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018
U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationPeter J. Dukes. 22 August, 2012
22 August, 22 Graph decomposition Let G and H be graphs on m n vertices. A decompostion of G into copies of H is a collection {H i } of subgraphs of G such that each H i = H, and every edge of G belongs
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationCSE 206A: Lattice Algorithms and Applications Spring Minkowski s theorem. Instructor: Daniele Micciancio
CSE 206A: Lattice Algorithms and Applications Spring 2014 Minkowski s theorem Instructor: Daniele Micciancio UCSD CSE There are many important quantities associated to a lattice. Some of them, like the
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationSPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS
SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationThe 123 Theorem and its extensions
The 123 Theorem and its extensions Noga Alon and Raphael Yuster Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv, Israel Abstract It is shown
More informationRecall that any inner product space V has an associated norm defined by
Hilbert Spaces Recall that any inner product space V has an associated norm defined by v = v v. Thus an inner product space can be viewed as a special kind of normed vector space. In particular every inner
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationNotes taken by Costis Georgiou revised by Hamed Hatami
CSC414 - Metric Embeddings Lecture 6: Reductions that preserve volumes and distance to affine spaces & Lower bound techniques for distortion when embedding into l Notes taken by Costis Georgiou revised
More informationNormed & Inner Product Vector Spaces
Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed
More informationIr O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )
Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationOn the concentration of eigenvalues of random symmetric matrices
On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationLecture 8: The Goemans-Williamson MAXCUT algorithm
IU Summer School Lecture 8: The Goemans-Williamson MAXCUT algorithm Lecturer: Igor Gorodezky The Goemans-Williamson algorithm is an approximation algorithm for MAX-CUT based on semidefinite programming.
More informationMATH 31BH Homework 1 Solutions
MATH 3BH Homework Solutions January 0, 04 Problem.5. (a) (x, y)-plane in R 3 is closed and not open. To see that this plane is not open, notice that any ball around the origin (0, 0, 0) will contain points
More informationRandom Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg
Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles
More informationMath 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination
Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column
More informationA Simple Algorithm for Clustering Mixtures of Discrete Distributions
1 A Simple Algorithm for Clustering Mixtures of Discrete Distributions Pradipta Mitra In this paper, we propose a simple, rotationally invariant algorithm for clustering mixture of distributions, including
More informationUsing Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems
Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Jan van den Heuvel and Snežana Pejić Department of Mathematics London School of Economics Houghton Street,
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationCharacterization of half-radial matrices
Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationInvertibility of symmetric random matrices
Invertibility of symmetric random matrices Roman Vershynin University of Michigan romanv@umich.edu February 1, 2011; last revised March 16, 2012 Abstract We study n n symmetric random matrices H, possibly
More informationGrothendieck s Inequality
Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationInner product spaces. Layers of structure:
Inner product spaces Layers of structure: vector space normed linear space inner product space The abstract definition of an inner product, which we will see very shortly, is simple (and by itself is pretty
More information1: Introduction to Lattices
CSE 206A: Lattice Algorithms and Applications Winter 2012 Instructor: Daniele Micciancio 1: Introduction to Lattices UCSD CSE Lattices are regular arrangements of points in Euclidean space. The simplest
More informationMathematics Department Stanford University Math 61CM/DM Inner products
Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector
More informationTools from Lebesgue integration
Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given
More informationThe circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)
The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues
More information1 Math 241A-B Homework Problem List for F2015 and W2016
1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let
More informationA strongly polynomial algorithm for linear systems having a binary solution
A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th
More informationApplications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012
Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications
More information5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.
5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationChapter 7. Extremal Problems. 7.1 Extrema and Local Extrema
Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced
More informationAnalysis in weighted spaces : preliminary version
Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.
More informationToday: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today
AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationClass notes: Approximation
Class notes: Approximation Introduction Vector spaces, linear independence, subspace The goal of Numerical Analysis is to compute approximations We want to approximate eg numbers in R or C vectors in R
More informationNATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II
NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;
More informationFinding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October
Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find
More information