Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Size: px
Start display at page:

Download "Improved Bounds on the Dot Product under Random Projection and Random Sign Projection"

Transcription

1 Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK axk KDD 2015, Sydney, August 2015.

2 Outline Introduction & motivation A Johnson-Lindenstrauss lemma (JLL) for the dot product without union bound Corollaries & connections with previous results Numerical validation Application to bounding generalisation error of compressive linear classifiers Conclusions and future work

3 Introduction Dot product a key building block in data mining classification, regression, retrieval, correlation-clustering, etc. Random projection (RP) a universal dimensionality reduction method independent of the data, computationally cheap, has low-distortion guarantees The Johnson-Lindenstrauss lemma (JLL) for Euclidean distances is optimal, but for dot product the guarantees have been looser; some suggested that obtuse angles may be not preserved.

4 Background: JLL for Euclidean distance Theorem[Johnson-Lindenstrauss lemma] Let x, y R d. Let R M k d, k < d, be a random projection matrix with entries drawn i.i.d. from a 0-mean subgaussian distribution with parameter σ 2, and let Rx, Ry R k be the images of x, y under R. Then, ɛ (0, 1): ( ) Pr{ Rx Ry 2 < (1 ɛ) x y 2 kσ 2 } < exp kɛ2 (1) 8 ( ) Pr{ Rx Ry 2 > (1 + ɛ) x y 2 kσ 2 } < exp kɛ2 (2) 8 An elementary constructive proof is in [Dasgupta & Gupta, 2002]. These bounds are known to be optimal [Larsen & Nelson, 2014].

5 The quick & loose JLL for dot product (Rx) T Ry = 1 ( 4 R(x + y) 2 R(x y) 2) Now, applying the JLL on both terms separately and applying the union bound yields: ( ) Pr{(Rx) T Ry < x T ykσ 2 ɛkσ 2 x y } < 2 exp kɛ2 8 ( ) Pr{(Rx) T Ry > x T ykσ 2 + ɛkσ 2 x y } < 2 exp kɛ2 8 Or, (Rx) T Ry = 2 1 ( R(x y) 2 Rx 2 Ry 2)...then we get factors of 3 in front of exp.

6 Can we improve the JLL for dot products? The problems: Technical issue: Union bound. More fundamental issue: Ratio of std of projected dot product and original dot product ( coefficient of variation ) is unbounded [Li et al. 2006]. Other issue: Some previous proofs were only applicable to acute angles [Shi et al, 2012]; obtuse angles investigated empirically is inevitably based on limited numerical tests.

7 Results: Improved bounds for dot product Theorem[Dot Product under Random Projection] Let x, y R d. Let R M k d, k < d, be a random projection matrix having i.i.d. 0-mean subgaussian entries with parameter σ 2, and let Rx, Ry R k be the images of x, y under R. Then, ɛ (0, 1): ( ) Pr{(Rx) T Ry < x T ykσ 2 ɛkσ 2 x y } < exp kɛ2 (3) 8 ( ) Pr{(Rx) T Ry > x T ykσ 2 + ɛkσ 2 x y } < exp kɛ2 (4) 8 The proof uses elementary techniques. A standard Chernoff bound argument, but exploit the convexity of the exponential function. The union bound is eliminated. (Details in the paper.)

8 Corollaries (1): Clarifying the role of angle Corollary[Relative distortion bounds] Denote by θ the angle between the vectors x, y R d. Then we have the following: 1. Relative distortion bound: Assume x T y 0. Then, Pr { xt R T Ry x T y } ( kσ 2 > ɛ < 2 exp k ) cos 2 (θ) 8(kσ 2 ) 2ɛ2 (5) 2. Multiplicative form of relative distortion bound: ( Pr{x T R T Ry < x T y(1 ɛ)kσ 2 } < exp k ) 8 ɛ2 cos 2 (θ) ( Pr{x T R T Ry > x T y(1 + ɛ)kσ 2 } < exp k ) 8 ɛ2 cos 2 (θ) (6) (7)

9 Observations from Corollary Guarantees are the same for both obtuse and acute angles! Symmetric around orthogonal angles. Relation to coefficient of variation [Li et al.]: Var(x T R T Ry) x T y 2 Computing this (case of Gaussian R), Var(x T R T Ry) x T y = k 1 k (unbounded) (8) ( ) cos 2 (θ) we see that unbounded coefficient of variation occurs only when x and y are perpendicular. Again, symmetric around orthogonal angles. (9)

10 Corollaries (2) Corollary[Margin type bounds and random sign projection] Denote by θ the angle between the vectors x, y R d. Then, 1. Margin bound: Assume x T y 0. Then, for all ρ s.t. ρ < x T ykσ 2 and ρ > (cos(θ) 1) x y kσ 2, ( Pr{x T R T Ry < ρ} < exp k ( ) ) ρ 2 cos(θ) 8 x y kσ 2 (10) for all ρ s.t. ρ > x T ykσ 2 and ρ < (cos(θ) + 1) x y kσ 2, ( Pr{x T R T Ry > ρ} < exp k ( ) ) ρ 2 8 x y kσ 2 cos(θ) (11)

11 2. Dot product under random sign projection: Assume x T y 0. Then, { x T R T } ( Ry Pr x T < 0 < exp k ) y 8 cos2 (θ) (12) These forms of the bound, with ρ > 0, are useful for instance to bound the margin loss of compressive classifiers. Details to follow shortly. The random sign projection bound was used before to bound the error of compressive classifiers under 0-1 loss [Durrant & Kabán, ICML 13] in the case of Gaussian RP; here subgaussian RP is allowed.

12 Numerical validation We will compute empirical estimates of the following probabilities, from 2000 independently drawn instances of the RP. The target dimension varies from 1 to the original dimension d = 300. Rejection probability for dot product preservation = Probability that the relative distortion of the dot product after RP falls outside the allowed error tolerance ɛ: 1 P r { (1 ɛ) < (Rx)T Ry x T y } < (1 + ɛ) (13) The sign flipping probability: { (Rx) T Ry P r x T y } < 0 (14)

13 Replicating the results in [Shi et al, ICML 12]. Left: Two acute angles; Right: Two obtuse angles. Preservation of these obtuse angles looks indeed worse......but not because they are obtuse (see next slide!).

14 Now take the angles symmetrical around π/2 and observe the opposite behaviour. this is why the previous result in [Shi et al, ICML 12] has been misleading. Left: Two acute angles; Right: Two obtuse angles.

15 Numerical validation full picture Left: Empirical estimates of rejection probability for dot product preservation; Right: Our analytic upper bound. The error tolerance was set to ɛ = 0.3. Darker means higher probability.

16 The same with ɛ = 0.1. Bound matches the true behaviour: All of these probabilities are symmetric around the angles of π/2 and 3π/2 (i.e. orthogonal vectors before RP). Thus, the preservation of the dot product is symmetrically identical for both acute and obtuse angles.

17 Empirical estimates of sign flipping probability vs. our analytic upper-bound. Darker means higher probability.

18 An application in machine learning: Margin bound on compressive linear classification Consider the hypothesis class of linear classifiers defined by a unit length parameter vector: H = {x h(x) = w T x : w R d, w 2 = 1} (15) The parameters w are estimated from a training set of size N: T N = {(x n, y n )} N n=1, where (x n, y n ) i.i.d D over X { 1, 1}, X R d. We will work with the margin loss: 0 if ρ u l ρ (u) = 1 u/ρ if u [0, ρ] 1 if u 0 (16)

19 We are interested in the case when d is large and N not proportionately so. Use a RP matrix R M k d, k < d, with entries R ij drawn i.i.d. from a subgaussian distribution with parameter 1/k. Analogous definitions in the reduced k-dimensional space. The hypothesis class: H R = {x h R (Rx) = w T R Rx : w R R k, w R 2 = 1} (17) where the parameters w R R k are estimated from T N R = {(Rx n, y n )} N n=1 by minimising the empirical margin error: 1 ĥ R = arg min h R H R N N l ρ (h R (Rx n ), y n ) (18) n=1 The quantity of our interest is the generalisation error of ĥr as a random function of both T N, and R: ] E (x,y) D [ĥr (Rx) y (19)

20 Theorem Let R by a k d, k < d matrix having i.i.d. 0-mean subgaussian entries with parameter 1/k, and a compressed training set TR N = {(Rx n, y n )} N n=1, where (x n, y n ) are drawn i.i.d. from some distribution D. For any δ (0, 1), the following holds with probability at least 1 3δ for the empirical minimiser of the margin loss in the RP space, ĥr, uniformly for any margin parameter ρ (0, 1): ] E (x,y) D [ĥr (Rx) y + 4 ρ 1 N { 1 N min 1 (h(x n )y n < ρ) + S k + h H N n=1 ( 8 log(1/δ) 1 + XX T ) Tr + k N N log log 2 (2/ρ) 3 log(1/δ) S k } log(4/δ) + 3 2N where θ n is the angle between the parameter vector of h and the vector x n y n. The function 1( ) takes value 1 if its argument is true and 0 otherwise. X is an N d matrix that holds the input points, and S k = 1 N N n=1 1 (h(x n)y n ρ) exp cos(θ n ) ρ k 8 1+ x n 8 log(1/δ) k δ.

21 Illustration of the bound Illustration of the predictive behaviour of the bound (δ = 0.1 and ρ = 0.05) on the Advert classification data set from the UCI (d = 1554 features and N = 3279 points). The empirical error was estimated on holdout sets using SVM with default settings and 30 random splits (in proportion 2/3 training & 1/3 testing) of the data. We standardised the data first, and scaled it to max x n = 1. n {1,...,N}

22 Conclusions & Future work We proved new bounds on the dot product under random projection that take the same form as the optimal bounds on the Euclidean distance in the Johnson-Lindenstrauss lemma. The dot product is ubiquitous in data mining, and the use of RP on this operation is now better justified. We cleared the controversy about preservation of obtuse angles and clarified the precise role of angles in the relative distortion of the dot product under random projection. We further discussed connections with the notion of margin in generalisation theory, and connections with sign random projections generalise earlier results. Our proof technique applies to any subgaussian RP matrices with i.i.d entries. In future work is would be of interest to see if it could be adapted to Fast JL transforms whose entries are not i.i.d.

23 Selected References [Achlioptas] D. Achlioptas. Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. Journal of Computer and System Sciences, 66(4): , [Balcan & Blum] M.F. Balcan, A. Blum, S. Vempala. Kernels as features: On kernels, margins, and low-dimensional mappings, Machine Learning 65 (1), 79-94, [Bingham & Mannila] E. Bingham and H. Mannila. Random projection in dimensionality reduction: Applications to image and text data. In Knowledge Discovery and Data Mining (KDD), pp , ACM Press, [Buldygin & Kozachenko] V.V. Buldygin, Y.V. Kozachenko. Metric characterization of random variables and random processes. American Mathematical Society, [Dasgupta & Gupta] S. Dasgupta, and A. Gupta. An elementary proof of the Johnson Lindenstrauss Lemma. Random Structures & Algorithms, 22:60 65, [Durrant & Kabán] R.J. Durrant, A. Kabán. Sharp generalization error bounds for randomlyprojected classifiers. ICML 13, Journal of Machine Learning Research-Proceedings Track 28(3): , [Larsen & Nelson] K.G. Larsen, J. Nelson. The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction, arxiv preprint arxiv: , [Li et al.] P. Li, T. Hastie, K. Church. Improving random projections using marginal information. In Proc. Conference on Learning Theory (COLT) 4005, , [Shi et al.] Q. Shi, C. Shen, R. Hill, A. Hengel. Is margin preserved after random projection? Proceedings of the 29th International Conference on Machine Learning (ICML), pp , 2012.

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Is margin preserved after random projection?

Is margin preserved after random projection? Qinfeng Shi javen.shi@adelaide.edu.au Chunhua Shen chunhua.shen@adelaide.edu.au Rhys Hill rhys.hill@adelaide.edu.au Anton van den Hengel anton.vandenhengel@adelaide.edu.au Australian Centre for Visual

More information

Very Sparse Random Projections

Very Sparse Random Projections Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

A Randomized Algorithm for Large Scale Support Vector Learning

A Randomized Algorithm for Large Scale Support Vector Learning A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation Indian Institute of Science Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers Robert J. Durrant r.j.durrant@cs.bham.ac.uk School of Computer Science, University of Birmingham, Edgbaston, UK, B5 TT Ata Kabán a.kaban@cs.bham.ac.uk

More information

arxiv: v1 [math.st] 28 Sep 2017

arxiv: v1 [math.st] 28 Sep 2017 arxiv: arxiv:0000.0000 Structure-aware error bounds for linear classification with the zero-one loss arxiv:709.09782v [math.st] 28 Sep 207 Ata Kabán and Robert J. Durrant School of Computer Science The

More information

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces Robert John Durrant, Ata Kabán School of Computer Science, University of Birmingham, Edgbaston, UK, B15

More information

16 Embeddings of the Euclidean metric

16 Embeddings of the Euclidean metric 16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Learning in High Dimensions with Projected Linear Discriminants

Learning in High Dimensions with Projected Linear Discriminants Learning in High Dimensions with Projected Linear Discriminants By Robert John Durrant A thesis submitted to the University of Birmingham for the degree of Doctor of Philosophy School of Computer Science

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

A Randomized Algorithm for Large Scale Support Vector Learning

A Randomized Algorithm for Large Scale Support Vector Learning A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation, Indian Institute of Science, Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya

More information

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections Journal of Machine Learning Research 6 A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections Ata Kabán School of Computer Science, The University of Birmingham, Edgbaston,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

Distance concentration and detection of meaningless distances

Distance concentration and detection of meaningless distances Distance concentration and detection of meaningless distances Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk Dagstuhl Seminar 12081

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

i=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x

i=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee

Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Yi Xu, Haiqin

More information

Rademacher Bounds for Non-i.i.d. Processes

Rademacher Bounds for Non-i.i.d. Processes Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based

More information

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Alexandr Andoni Piotr Indy April 11, 2009 Abstract We provide novel methods for efficient dimensionality reduction in ernel spaces.

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings

Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings Maria-Florina Balcan 1 and Avrim Blum 1 and Santosh Vempala 1 Computer Science Department, Carnegie Mellon University {ninamf,avrim}@cs.cmu.edu

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Random projection ensemble classification

Random projection ensemble classification Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008 Bregman Divergences Barnabás Póczos RLAI Tea Talk UofA, Edmonton Aug 5, 2008 Contents Bregman Divergences Bregman Matrix Divergences Relation to Exponential Family Applications Definition Properties Generalization

More information

CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction

CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction Tim Roughgarden & Gregory Valiant April 12, 2017 1 The Curse of Dimensionality in the Nearest Neighbor Problem Lectures #1 and

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Incentive Compatible Regression Learning

Incentive Compatible Regression Learning Incentive Compatible Regression Learning Ofer Dekel 1 Felix Fischer 2 Ariel D. Procaccia 1 1 School of Computer Science and Engineering The Hebrew University of Jerusalem 2 Institut für Informatik Ludwig-Maximilians-Universität

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions Robert J. Durrant (bobd@waikato.ac.nz Department of Statistics, University of Waikato, Hamilton

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

PROPERTIES OF SIMPLE ROOTS

PROPERTIES OF SIMPLE ROOTS PROPERTIES OF SIMPLE ROOTS ERIC BRODER Let V be a Euclidean space, that is a finite dimensional real linear space with a symmetric positive definite inner product,. Recall that for a root system Δ in V,

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

The Johnson-Lindenstrauss Lemma in Linear Programming

The Johnson-Lindenstrauss Lemma in Linear Programming The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti, Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal: solving very large LPs min{c x Ax

More information

Error Limiting Reductions Between Classification Tasks

Error Limiting Reductions Between Classification Tasks Error Limiting Reductions Between Classification Tasks Keywords: supervised learning, reductions, classification Abstract We introduce a reduction-based model for analyzing supervised learning tasks. We

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Using the Johnson-Lindenstrauss lemma in linear and integer programming Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information