Improved Bounds on the Dot Product under Random Projection and Random Sign Projection
|
|
- Rachel Underwood
- 5 years ago
- Views:
Transcription
1 Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK axk KDD 2015, Sydney, August 2015.
2 Outline Introduction & motivation A Johnson-Lindenstrauss lemma (JLL) for the dot product without union bound Corollaries & connections with previous results Numerical validation Application to bounding generalisation error of compressive linear classifiers Conclusions and future work
3 Introduction Dot product a key building block in data mining classification, regression, retrieval, correlation-clustering, etc. Random projection (RP) a universal dimensionality reduction method independent of the data, computationally cheap, has low-distortion guarantees The Johnson-Lindenstrauss lemma (JLL) for Euclidean distances is optimal, but for dot product the guarantees have been looser; some suggested that obtuse angles may be not preserved.
4 Background: JLL for Euclidean distance Theorem[Johnson-Lindenstrauss lemma] Let x, y R d. Let R M k d, k < d, be a random projection matrix with entries drawn i.i.d. from a 0-mean subgaussian distribution with parameter σ 2, and let Rx, Ry R k be the images of x, y under R. Then, ɛ (0, 1): ( ) Pr{ Rx Ry 2 < (1 ɛ) x y 2 kσ 2 } < exp kɛ2 (1) 8 ( ) Pr{ Rx Ry 2 > (1 + ɛ) x y 2 kσ 2 } < exp kɛ2 (2) 8 An elementary constructive proof is in [Dasgupta & Gupta, 2002]. These bounds are known to be optimal [Larsen & Nelson, 2014].
5 The quick & loose JLL for dot product (Rx) T Ry = 1 ( 4 R(x + y) 2 R(x y) 2) Now, applying the JLL on both terms separately and applying the union bound yields: ( ) Pr{(Rx) T Ry < x T ykσ 2 ɛkσ 2 x y } < 2 exp kɛ2 8 ( ) Pr{(Rx) T Ry > x T ykσ 2 + ɛkσ 2 x y } < 2 exp kɛ2 8 Or, (Rx) T Ry = 2 1 ( R(x y) 2 Rx 2 Ry 2)...then we get factors of 3 in front of exp.
6 Can we improve the JLL for dot products? The problems: Technical issue: Union bound. More fundamental issue: Ratio of std of projected dot product and original dot product ( coefficient of variation ) is unbounded [Li et al. 2006]. Other issue: Some previous proofs were only applicable to acute angles [Shi et al, 2012]; obtuse angles investigated empirically is inevitably based on limited numerical tests.
7 Results: Improved bounds for dot product Theorem[Dot Product under Random Projection] Let x, y R d. Let R M k d, k < d, be a random projection matrix having i.i.d. 0-mean subgaussian entries with parameter σ 2, and let Rx, Ry R k be the images of x, y under R. Then, ɛ (0, 1): ( ) Pr{(Rx) T Ry < x T ykσ 2 ɛkσ 2 x y } < exp kɛ2 (3) 8 ( ) Pr{(Rx) T Ry > x T ykσ 2 + ɛkσ 2 x y } < exp kɛ2 (4) 8 The proof uses elementary techniques. A standard Chernoff bound argument, but exploit the convexity of the exponential function. The union bound is eliminated. (Details in the paper.)
8 Corollaries (1): Clarifying the role of angle Corollary[Relative distortion bounds] Denote by θ the angle between the vectors x, y R d. Then we have the following: 1. Relative distortion bound: Assume x T y 0. Then, Pr { xt R T Ry x T y } ( kσ 2 > ɛ < 2 exp k ) cos 2 (θ) 8(kσ 2 ) 2ɛ2 (5) 2. Multiplicative form of relative distortion bound: ( Pr{x T R T Ry < x T y(1 ɛ)kσ 2 } < exp k ) 8 ɛ2 cos 2 (θ) ( Pr{x T R T Ry > x T y(1 + ɛ)kσ 2 } < exp k ) 8 ɛ2 cos 2 (θ) (6) (7)
9 Observations from Corollary Guarantees are the same for both obtuse and acute angles! Symmetric around orthogonal angles. Relation to coefficient of variation [Li et al.]: Var(x T R T Ry) x T y 2 Computing this (case of Gaussian R), Var(x T R T Ry) x T y = k 1 k (unbounded) (8) ( ) cos 2 (θ) we see that unbounded coefficient of variation occurs only when x and y are perpendicular. Again, symmetric around orthogonal angles. (9)
10 Corollaries (2) Corollary[Margin type bounds and random sign projection] Denote by θ the angle between the vectors x, y R d. Then, 1. Margin bound: Assume x T y 0. Then, for all ρ s.t. ρ < x T ykσ 2 and ρ > (cos(θ) 1) x y kσ 2, ( Pr{x T R T Ry < ρ} < exp k ( ) ) ρ 2 cos(θ) 8 x y kσ 2 (10) for all ρ s.t. ρ > x T ykσ 2 and ρ < (cos(θ) + 1) x y kσ 2, ( Pr{x T R T Ry > ρ} < exp k ( ) ) ρ 2 8 x y kσ 2 cos(θ) (11)
11 2. Dot product under random sign projection: Assume x T y 0. Then, { x T R T } ( Ry Pr x T < 0 < exp k ) y 8 cos2 (θ) (12) These forms of the bound, with ρ > 0, are useful for instance to bound the margin loss of compressive classifiers. Details to follow shortly. The random sign projection bound was used before to bound the error of compressive classifiers under 0-1 loss [Durrant & Kabán, ICML 13] in the case of Gaussian RP; here subgaussian RP is allowed.
12 Numerical validation We will compute empirical estimates of the following probabilities, from 2000 independently drawn instances of the RP. The target dimension varies from 1 to the original dimension d = 300. Rejection probability for dot product preservation = Probability that the relative distortion of the dot product after RP falls outside the allowed error tolerance ɛ: 1 P r { (1 ɛ) < (Rx)T Ry x T y } < (1 + ɛ) (13) The sign flipping probability: { (Rx) T Ry P r x T y } < 0 (14)
13 Replicating the results in [Shi et al, ICML 12]. Left: Two acute angles; Right: Two obtuse angles. Preservation of these obtuse angles looks indeed worse......but not because they are obtuse (see next slide!).
14 Now take the angles symmetrical around π/2 and observe the opposite behaviour. this is why the previous result in [Shi et al, ICML 12] has been misleading. Left: Two acute angles; Right: Two obtuse angles.
15 Numerical validation full picture Left: Empirical estimates of rejection probability for dot product preservation; Right: Our analytic upper bound. The error tolerance was set to ɛ = 0.3. Darker means higher probability.
16 The same with ɛ = 0.1. Bound matches the true behaviour: All of these probabilities are symmetric around the angles of π/2 and 3π/2 (i.e. orthogonal vectors before RP). Thus, the preservation of the dot product is symmetrically identical for both acute and obtuse angles.
17 Empirical estimates of sign flipping probability vs. our analytic upper-bound. Darker means higher probability.
18 An application in machine learning: Margin bound on compressive linear classification Consider the hypothesis class of linear classifiers defined by a unit length parameter vector: H = {x h(x) = w T x : w R d, w 2 = 1} (15) The parameters w are estimated from a training set of size N: T N = {(x n, y n )} N n=1, where (x n, y n ) i.i.d D over X { 1, 1}, X R d. We will work with the margin loss: 0 if ρ u l ρ (u) = 1 u/ρ if u [0, ρ] 1 if u 0 (16)
19 We are interested in the case when d is large and N not proportionately so. Use a RP matrix R M k d, k < d, with entries R ij drawn i.i.d. from a subgaussian distribution with parameter 1/k. Analogous definitions in the reduced k-dimensional space. The hypothesis class: H R = {x h R (Rx) = w T R Rx : w R R k, w R 2 = 1} (17) where the parameters w R R k are estimated from T N R = {(Rx n, y n )} N n=1 by minimising the empirical margin error: 1 ĥ R = arg min h R H R N N l ρ (h R (Rx n ), y n ) (18) n=1 The quantity of our interest is the generalisation error of ĥr as a random function of both T N, and R: ] E (x,y) D [ĥr (Rx) y (19)
20 Theorem Let R by a k d, k < d matrix having i.i.d. 0-mean subgaussian entries with parameter 1/k, and a compressed training set TR N = {(Rx n, y n )} N n=1, where (x n, y n ) are drawn i.i.d. from some distribution D. For any δ (0, 1), the following holds with probability at least 1 3δ for the empirical minimiser of the margin loss in the RP space, ĥr, uniformly for any margin parameter ρ (0, 1): ] E (x,y) D [ĥr (Rx) y + 4 ρ 1 N { 1 N min 1 (h(x n )y n < ρ) + S k + h H N n=1 ( 8 log(1/δ) 1 + XX T ) Tr + k N N log log 2 (2/ρ) 3 log(1/δ) S k } log(4/δ) + 3 2N where θ n is the angle between the parameter vector of h and the vector x n y n. The function 1( ) takes value 1 if its argument is true and 0 otherwise. X is an N d matrix that holds the input points, and S k = 1 N N n=1 1 (h(x n)y n ρ) exp cos(θ n ) ρ k 8 1+ x n 8 log(1/δ) k δ.
21 Illustration of the bound Illustration of the predictive behaviour of the bound (δ = 0.1 and ρ = 0.05) on the Advert classification data set from the UCI (d = 1554 features and N = 3279 points). The empirical error was estimated on holdout sets using SVM with default settings and 30 random splits (in proportion 2/3 training & 1/3 testing) of the data. We standardised the data first, and scaled it to max x n = 1. n {1,...,N}
22 Conclusions & Future work We proved new bounds on the dot product under random projection that take the same form as the optimal bounds on the Euclidean distance in the Johnson-Lindenstrauss lemma. The dot product is ubiquitous in data mining, and the use of RP on this operation is now better justified. We cleared the controversy about preservation of obtuse angles and clarified the precise role of angles in the relative distortion of the dot product under random projection. We further discussed connections with the notion of margin in generalisation theory, and connections with sign random projections generalise earlier results. Our proof technique applies to any subgaussian RP matrices with i.i.d entries. In future work is would be of interest to see if it could be adapted to Fast JL transforms whose entries are not i.i.d.
23 Selected References [Achlioptas] D. Achlioptas. Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. Journal of Computer and System Sciences, 66(4): , [Balcan & Blum] M.F. Balcan, A. Blum, S. Vempala. Kernels as features: On kernels, margins, and low-dimensional mappings, Machine Learning 65 (1), 79-94, [Bingham & Mannila] E. Bingham and H. Mannila. Random projection in dimensionality reduction: Applications to image and text data. In Knowledge Discovery and Data Mining (KDD), pp , ACM Press, [Buldygin & Kozachenko] V.V. Buldygin, Y.V. Kozachenko. Metric characterization of random variables and random processes. American Mathematical Society, [Dasgupta & Gupta] S. Dasgupta, and A. Gupta. An elementary proof of the Johnson Lindenstrauss Lemma. Random Structures & Algorithms, 22:60 65, [Durrant & Kabán] R.J. Durrant, A. Kabán. Sharp generalization error bounds for randomlyprojected classifiers. ICML 13, Journal of Machine Learning Research-Proceedings Track 28(3): , [Larsen & Nelson] K.G. Larsen, J. Nelson. The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction, arxiv preprint arxiv: , [Li et al.] P. Li, T. Hastie, K. Church. Improving random projections using marginal information. In Proc. Conference on Learning Theory (COLT) 4005, , [Shi et al.] Q. Shi, C. Shen, R. Hill, A. Hengel. Is margin preserved after random projection? Proceedings of the 29th International Conference on Machine Learning (ICML), pp , 2012.
Sharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationIs margin preserved after random projection?
Qinfeng Shi javen.shi@adelaide.edu.au Chunhua Shen chunhua.shen@adelaide.edu.au Rhys Hill rhys.hill@adelaide.edu.au Anton van den Hengel anton.vandenhengel@adelaide.edu.au Australian Centre for Visual
More informationVery Sparse Random Projections
Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationA Randomized Algorithm for Large Scale Support Vector Learning
A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation Indian Institute of Science Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers Robert J. Durrant r.j.durrant@cs.bham.ac.uk School of Computer Science, University of Birmingham, Edgbaston, UK, B5 TT Ata Kabán a.kaban@cs.bham.ac.uk
More informationarxiv: v1 [math.st] 28 Sep 2017
arxiv: arxiv:0000.0000 Structure-aware error bounds for linear classification with the zero-one loss arxiv:709.09782v [math.st] 28 Sep 207 Ata Kabán and Robert J. Durrant School of Computer Science The
More informationA tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces
A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces Robert John Durrant, Ata Kabán School of Computer Science, University of Birmingham, Edgbaston, UK, B15
More information16 Embeddings of the Euclidean metric
16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationLearning in High Dimensions with Projected Linear Discriminants
Learning in High Dimensions with Projected Linear Discriminants By Robert John Durrant A thesis submitted to the University of Birmingham for the degree of Doctor of Philosophy School of Computer Science
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More information1 Dimension Reduction in Euclidean Space
CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationA Randomized Algorithm for Large Scale Support Vector Learning
A Randomized Algorithm for Large Scale Support Vector Learning Krishnan S. Department of Computer Science and Automation, Indian Institute of Science, Bangalore-12 rishi@csa.iisc.ernet.in Chiranjib Bhattacharyya
More informationA New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections
Journal of Machine Learning Research 6 A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections Ata Kabán School of Computer Science, The University of Birmingham, Edgbaston,
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationLecture 6 Proof for JL Lemma and Linear Dimensionality Reduction
COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More information1 Differential Privacy and Statistical Query Learning
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose
More informationFaster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationDistance concentration and detection of meaningless distances
Distance concentration and detection of meaningless distances Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk Dagstuhl Seminar 12081
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationi=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationLearning with L q<1 vs L 1 -norm regularisation with exponentially many irrelevant features
Learning with L q
More informationActive Learning Class 22, 03 May Claire Monteleoni MIT CSAIL
Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationEfficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee Yi Xu, Haiqin
More informationRademacher Bounds for Non-i.i.d. Processes
Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based
More informationDimension Reduction in Kernel Spaces from Locality-Sensitive Hashing
Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Alexandr Andoni Piotr Indy April 11, 2009 Abstract We provide novel methods for efficient dimensionality reduction in ernel spaces.
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More informationNon-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007
Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationKernels as Features: On Kernels, Margins, and Low-dimensional Mappings
Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings Maria-Florina Balcan 1 and Avrim Blum 1 and Santosh Vempala 1 Computer Science Department, Carnegie Mellon University {ninamf,avrim}@cs.cmu.edu
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationRandom projection ensemble classification
Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationBregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008
Bregman Divergences Barnabás Póczos RLAI Tea Talk UofA, Edmonton Aug 5, 2008 Contents Bregman Divergences Bregman Matrix Divergences Relation to Exponential Family Applications Definition Properties Generalization
More informationCS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction
CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction Tim Roughgarden & Gregory Valiant April 12, 2017 1 The Curse of Dimensionality in the Nearest Neighbor Problem Lectures #1 and
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationFrom Batch to Transductive Online Learning
From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org
More informationIncentive Compatible Regression Learning
Incentive Compatible Regression Learning Ofer Dekel 1 Felix Fischer 2 Ariel D. Procaccia 1 1 School of Computer Science and Engineering The Hebrew University of Jerusalem 2 Institut für Informatik Ludwig-Maximilians-Universität
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationDiscriminative Learning can Succeed where Generative Learning Fails
Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,
More informationRandom Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions
Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions Robert J. Durrant (bobd@waikato.ac.nz Department of Statistics, University of Waikato, Hamilton
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationPROPERTIES OF SIMPLE ROOTS
PROPERTIES OF SIMPLE ROOTS ERIC BRODER Let V be a Euclidean space, that is a finite dimensional real linear space with a symmetric positive definite inner product,. Recall that for a root system Δ in V,
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationThe Johnson-Lindenstrauss Lemma in Linear Programming
The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti, Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal: solving very large LPs min{c x Ax
More informationError Limiting Reductions Between Classification Tasks
Error Limiting Reductions Between Classification Tasks Keywords: supervised learning, reductions, classification Abstract We introduce a reduction-based model for analyzing supervised learning tasks. We
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationUsing the Johnson-Lindenstrauss lemma in linear and integer programming
Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationSimilarity-Based Theoretical Foundation for Sparse Parzen Window Prediction
Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More information