Supremum of simple stochastic processes
|
|
- Nathan Powers
- 5 years ago
- Views:
Transcription
1 Subspace embeddings Daniel Hsu COMS Supremum of simple stochastic processes 2
2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there exists a linear map ε 2 f : R d R k such that (1 ε) x y 2 2 f (x) f (y) 2 2 (1+ε) x y 2 2 for all x, y S. Main probabilistic lemma random linear map M : R d R k such that, for any u S d 1, ( Mu ) ( ) 2 P 2 1 > ε 2 exp Ω(kε 2 ). JL lemma is consequence of main probabilistic lemma as applied to collection T S d 1 of T = ( n 2) unit vectors (+ union bound): ( P ) max Mu > ε u T ( ) T 2 exp Ω(kε 2 ). 3 Related question For T S d 1, expected maximum deviation E max Mu 2 2 1? u T General questions For arbitrary collection of zero-mean random variables {X t : t T }: E max t T X t? E max t T X t? 4
3 Finite collections Let {X t : t T } be a finite collection of v-subgaussian and mean-zero random variables. Then E max X t 2v ln T. t T Doesn t assume independence of {X t : t T }. (Independent case is the worst.) Get bound on E max t T X t as corollary. Apply result to collection {X t : t T } { X t : t T }. 5 Proof Starting point is identity from two invertible operations (λ > 0): E max X t = 1 ( ) t T λ ln exp E max λx t t T Apply Jensen s inequality: 1 ) (max λ ln E exp λx t t T = 1 ) (max λ ln E exp(λx t) t T Bound max with sum, and use linearity of expectation: 1 λ ln t T E exp(λx t ) Exploit v-subgaussian property: 1 λ ln t T ( ) exp vλ 2 /2 = ln T λ + vλ 2 Choose appropriate λ to conclude. 6
4 Alternative proof Integrate tail bound: for any non-negative random variable Y, E(Y ) = 0 P(Y y) dy. For Y := max t T X t, gives same result up to constants. 7 Infinite collections For infinite collection of zero-mean random variables {X t : t T }: E sup X t? t T In general, can go. To bound, must exploit correlations among the X t. { Mu 2 E.g., in 2 1 } : u T for T S d 1, the random variables for u and u + δ, for small δ, are highly correlated. 8
5 Convex hulls of linear functionals Let T R d be a finite set of vectors, and let X be a random vector in R d such that w, X is v-subgaussian for every w T. Then Proof: E max w, X 2v ln T. w conv(t ) Write w conv(t ) as w = w T p ww for some p w 0 that sum to one. Observe that w, x = w T p w w, x max w, x. w T So max over w conv(t ) is at most max over w T. Conclude by applying previous result for finite collections. 9 Euclidean norm Let X be a random vector such that u, X is v-subgaussian for every u S d 1. Then ( ) E X 2 = E max u S d 1 u, X 2 2v ln 5 d = O vd. Key step of proof: For any ε > 0, there is a finite subset N S d 1 of cardinality N (1 + 2/ε) d such that, for every u S d 1, there exists u 0 N with u u 0 2 ε. Such a set N is called an ε-net for S d 1. We need a 1/2-net, of cardinality at most 5 d. 10
6 Proof Write u S d 1 as u = u 0 + δq, where u 0 N, q S d 1, δ [ 0, 1/2 ], so u, X = u 0, X + δ q, X. Observe that max X u Sd 1 u, max u 0, X + u 0 N max δ [0,1/2] q S max δ q, X d 1 max u 0, X + 1 u 0 N 2 max X. d 1 q, q S So max over S d 1 is at most twice max over N. Conclude by applying previous result for finite collections. 11 ε-nets for unit sphere There is an ε-net for S d 1 of cardinality at most (1 + 2/ε) d. Proof: Repeatedly select points from S d 1 so that each selected point has distance more than ε from all previously selected points. Equivalent: repeatedly select points from S d 1 as long as balls of radius ε/2, centered at selected points, are disjoint. (Process must eventually stop.) When process stops, every u S d 1 is at distance at most ε from selected points. I.e., selected points form an ε-net for S d 1. If select N points, then the N balls of radius ε/2 are disjoint, and they are contained in a ball of radius 1 + ε/2. So N vol((ε/2)b d ) vol((1 + ε/2)b d ). This implies N (1 + 2/ε) d. 12
7 Remarks All previous results also hold with random variables are (v, c)-subexponential (possibly with c > 0), with a slightly different bound: e.g., E max t T X t { } max 2v ln T, 2c ln T. Also easy to get probability tail bounds (rather than expectation bounds). 13 Subspace embeddings 14
8 Subspace JL lemma Consider k d random matrix M whose entries are iid N(0, 1/k). For a W R d be a subspace of dimension r, ( E max Mu 2 r 2 1 O k + r ). k u S d 1 W ( ) Bound is at most ε when k O r. ε 2 Implies existence of mapping M : R d R k that approximately preserves all distances between points in W. 15 Proof of subspace JL lemma Let columns of Q be ONB for W. Then max Mu 2 u 2 1 = max Q ( M M I ) Qu u S r 1 u S d 1 W Lemma. For any u, v S r 1, = max u,v S r 1 u Q ( M M I ) Qv. X u,v := u Q ( M M I ) Qv is (O(1/k), O(1/k))-subexponential. 16
9 Proof of subspace JL lemma (continued) For u, v S r 1, X u,v := u Q ( M M I ) Qv. Let N be 1/4-net for S r 1. Write u, v S r 1 as u = u 0 + εp, v = v 0 + δq, where u 0, v 0 N, p, q S r 1 and ε, δ [ 0, 1/4 ], so Therefore X u,v = X u0,v 0 + εx p,v + δx u0,q. max X u,v S r 1 u,v max X u 0,v max u 0,v 0 N 2 X p,q S r 1 p,q, which implies max X u,v S r 1 u,v 2 max X u 0,v 0. u 0,v 0 N Conclude by applying previous result for finite collections. 17 Application to least squares 18
10 Big data least squares Input: matrix A R n d, vector b R n (n d). Goal: find x R d so as to (approx.) minimize Ax b 2 2. Computation time: O(nd 2 ). Can we speed this up? 19 Simple approach Pick m n. Let M be random m n matrix (e.g., entries iid N(0, 1/m), Fast JL Transform). Let à := MA and b := Mb. Obtain solution ˆx to least squares problem on (Ã, b). 20
11 Simple (somewhat loose) analysis Let W be subspace spanned by columns of A and b. Dimension is at most d + 1. If m O(d/ε 2 ), then M is subspace embedding for W : (1 ε) x 2 2 Mx 2 2 (1 + ε) x 2 2 for all x W. Let x := arg min x R d Ax b 2 2. Aˆx b ε M(Aˆx b) ε M(Ax b) ε 1 ε Ax b 2 2. ( Running time (using FJLT): O (m + n)d log n + md 2). 21 Another perspective: random sampling Pick random sample of m n of rows of (A, b); obtain solution ˆx for least squares problem on the sample. Hope ˆx is also good for the original problem. In statistics, this is the random design setting for regression. Random sample of covariates à Rm d and responses b R m from full population (A, b). Least squares solution ˆx on ( Ã, b) is MLE for linear regression coefficients under linear model with Gaussian noise. Can also regard ˆx as empirical risk minimizer among all linear predictors under squared loss. 22
12 Simple random design analysis Let x := arg min x R d Ax b 2 2. With high probability over choice of random sample, ( ( ) ) κ Aˆx b O Ax b 2 2 m (up to lower-order terms), where κ := n max i [n] (A A) 1/2 A e i 2 2 and e i is i-th coordinate basis vector. Write thin SVD of A as A = USV, where U R n d. Then (A A) 1/2 A = (V S 2 V ) 1/2 V SU = V U. So κ = n max i [n] U e i 2 2. U e i 2 2 is statistical leverage score for i-th row of A: measures how much influence i-th row has on least squares solution. 23 Statistical leverage i-th statistical leverage score: l i := U e i 2 2, where U Rn d is matrix of left singular vectors of A. Two extreme cases: ] U = [ Id d 0 (n d) d U = 1 n [ H n e 1 H n e 2 H n e d ] where H n is n n Hadamard matrix. First case: first d rows are the only rows that matter. Second case: all n rows equally important. n max i [n] l i = n. n max i [n] l i = d, 24
13 Ensuring small statistical leverage To ensure situation is more like second case, apply random rotation (e.g., randomized Hadamard transform) to A and b. Randomly mixes up rows of (A, b) so no single row is (much) more important than another. Get n maxi [n] l i = O(d + log n) with high probability. To get 1 + ε approximation ratio, i.e., Aˆx b 2 2 (1 + ε) Ax b 2 2, suffices to have ( ) d + log n m O. ε 25 Application to compressed sensing 26
14 Under-determined least squares Input: matrix A R n d, vector b R n (n d). Goal: find sparsest x R d so as to minimize Ax b 2 2. NP-hard in general. Suppose b = A x for some x R d with nnz( x) k. I.e., x is k-sparse. Is x the (unique) sparsest solution? If so, how to find it? 27 Null space property Lemma. Null space of A does not contain any non-zero 2k-sparse vectors every k-sparse vector x R d is the unique solution to Ax = A x. Proof. ( ) Take any k-sparse vectors x and y with Ax = Ay. Want to show x = y. Then x y is 2k-sparse, and A(x y) = 0. By assumption, null space of A does not contain any non-zero 2k-sparse vectors. So x y = 0, i.e., x = y. ( ) Take any 2k-sparse vector z in the null space of A. Want to show z = 0. Write it as z = x y for some k-sparse vectors x and y with disjoint supports. Then A(x y) = 0, and hence x = y by assumption. But x and y have disjoint support, so it must be that x = y = 0, so z = 0. 28
15 Null space property from subspace embeddings If A is n d random matrix with iid N(0, 1) entries, then under what conditions is there no non-zero 2k-sparse vector in its null space? Want: for any 2k-sparse vector z, Az 0, i.e., Az 2 2 > 0. Consider a particular choice I [d] of I = 2k coordinates, and the corresponding subspace W I spanned by {e i : i I}. Every 2k-sparse z is in WI for some I. Sufficient for A to be 1/2-subspace embedding for W I for all I: 1 2 z 2 2 Az z 2 2 for all 2k-sparse z. Null space property from subspace embeddings (continued) 29 Say A fails for I if it is not a 1/2-subspace embedding for W I. Subspace JL lemma: P(A fails for I) 2 O(k) exp ( Ω(n) ). Union bound over all choices of I with I = 2k: P(A fails for some I) ( ) d 2 O(k) exp ( Ω(n) ). 2k To ensure this is, say, at most 1/2, just need n O k + log ( ) d = O ( k + k log(d/k) ). 2k 30
16 Restricted isometry property (l, δ)-restricted isometry property (RIP): (1 δ) z 2 2 Az 2 2 (1 + δ) z 2 2 for all l-sparse z. Many algorithms can recover unique sparsest solution under RIP (with l = O(k) and δ = Ω(1)). E.g., Basis pursuit, Lasso, orthogonal matching pursuit. 31
Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationZ Algorithmic Superpower Randomization October 15th, Lecture 12
15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem
More informationConvex optimization COMS 4771
Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationLecture 22: More On Compressed Sensing
Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More information1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?
. Let m and n be two natural numbers such that m > n. Which of the following is/are true? (i) A linear system of m equations in n variables is always consistent. (ii) A linear system of n equations in
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More information17 Random Projections and Orthogonal Matching Pursuit
17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationLeast singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)
Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix
More informationCompressive Sensing with Random Matrices
Compressive Sensing with Random Matrices Lucas Connell University of Georgia 9 November 017 Lucas Connell (University of Georgia) Compressive Sensing with Random Matrices 9 November 017 1 / 18 Overview
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationProblem Set 6: Solutions Math 201A: Fall a n x n,
Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series
More informationCS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5
CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationThe Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective
Forty-Eighth Annual Allerton Conference Allerton House UIUC Illinois USA September 9 - October 1 010 The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective Gongguo Tang
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationTHE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX
THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX MARK RUDELSON AND ROMAN VERSHYNIN Abstract. We prove an optimal estimate on the smallest singular value of a random subgaussian matrix, valid
More informationUniversity of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May
University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector
More informationReview of Some Concepts from Linear Algebra: Part 2
Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationSparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing
Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing Bobak Nazer and Robert D. Nowak University of Wisconsin, Madison Allerton 10/01/10 Motivation: Virus-Host Interaction
More information1 Lesson 1: Brunn Minkowski Inequality
1 Lesson 1: Brunn Minkowski Inequality A set A R n is called convex if (1 λ)x + λy A for any x, y A and any λ [0, 1]. The Minkowski sum of two sets A, B R n is defined by A + B := {a + b : a A, b B}. One
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationDimensionality Reduction Notes 3
Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T
More informationOrthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016
Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationarxiv: v1 [math.pr] 22 May 2008
THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationIntroduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011
Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationInvertibility of symmetric random matrices
Invertibility of symmetric random matrices Roman Vershynin University of Michigan romanv@umich.edu February 1, 2011; last revised March 16, 2012 Abstract We study n n symmetric random matrices H, possibly
More informationSparse analysis Lecture II: Hardness results for sparse approximation problems
Sparse analysis Lecture II: Hardness results for sparse approximation problems Anna C. Gilbert Department of Mathematics University of Michigan Sparse Problems Exact. Given a vector x R d and a complete
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationEmpirical Processes and random projections
Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationSINGULAR VALUES OF GAUSSIAN MATRICES AND PERMANENT ESTIMATORS
SINGULAR VALUES OF GAUSSIAN MATRICES AND PERMANENT ESTIMATORS MARK RUDELSON AND OFER ZEITOUNI Abstract. We present estimates on the small singular values of a class of matrices with independent Gaussian
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationOPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY
published in IMA Journal of Numerical Analysis (IMAJNA), Vol. 23, 1-9, 23. OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY SIEGFRIED M. RUMP Abstract. In this note we give lower
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationQUASI-LINEAR COMPRESSED SENSING
QUASI-LINEAR COMPRESSED SENSING MARTIN EHLER, MASSIMO FORNASIER, AND JULIANE SIGL Abstract. Inspired by significant real-life applications, in particular, sparse phase retrieval and sparse pulsation frequency
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationEmbeddings of finite metric spaces in Euclidean space: a probabilistic view
Embeddings of finite metric spaces in Euclidean space: a probabilistic view Yuval Peres May 11, 2006 Talk based on work joint with: Assaf Naor, Oded Schramm and Scott Sheffield Definition: An invertible
More informationSolution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions
Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationAn algebraic perspective on integer sparse recovery
An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationLecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:
More informationThe uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008
The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008 Emmanuel Candés (Caltech), Terence Tao (UCLA) 1 Uncertainty principles A basic principle
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationWe showed that adding a vector to a basis produces a linearly dependent set of vectors; more is true.
Dimension We showed that adding a vector to a basis produces a linearly dependent set of vectors; more is true. Lemma If a vector space V has a basis B containing n vectors, then any set containing more
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationarxiv: v2 [math.pr] 15 Dec 2010
HOW CLOSE IS THE SAMPLE COVARIANCE MATRIX TO THE ACTUAL COVARIANCE MATRIX? arxiv:1004.3484v2 [math.pr] 15 Dec 2010 ROMAN VERSHYNIN Abstract. GivenaprobabilitydistributioninR n withgeneral(non-white) covariance,
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationFast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationLinear Algebra- Final Exam Review
Linear Algebra- Final Exam Review. Let A be invertible. Show that, if v, v, v 3 are linearly independent vectors, so are Av, Av, Av 3. NOTE: It should be clear from your answer that you know the definition.
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More information25 Minimum bandwidth: Approximation via volume respecting embeddings
25 Minimum bandwidth: Approximation via volume respecting embeddings We continue the study of Volume respecting embeddings. In the last lecture, we motivated the use of volume respecting embeddings by
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More information