Sharp Generalization Error Bounds for Randomly-projected Classifiers

Size: px
Start display at page:

Download "Sharp Generalization Error Bounds for Randomly-projected Classifiers"

Transcription

1 Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK axk International Conference on Machine Learning (ICML 2013), Atlanta, June 2013.

2 Outline Introduction; Preliminaries Main result - Generalisation bound for a generic linear classifier trained on randomly-projected data Main ingredient of the proof: Flipping probability Geometric interpretation Corollaries Summary and future work

3 Introduction (1) High dimensional data is everywhere Random projection (RP) is fast becoming a workhorse in high dimensional learning computationally cheap; amenable to theoretical analysis Little is known about its effect on the generalisation performance of a classifier, except for a few specific cases. Results so far only for specific families of classifiers, and under constraints of some form on the data.

4 Introduction (1) High dimensional data is everywhere Random projection (RP) is fast becoming a workhorse in high dimensional learning computationally cheap; amenable to theoretical analysis Little is known about its effect on the generalisation performance of a classifier, except for a few specific cases. Results so far only for specific families of classifiers, and under constraints of some form on the data. Seminal paper by (Arriaga & Vempala 1999) on RP-perceptron - relies on Johnson-Lindenstrauss lemma so the bound unnaturally loosens with increasing the sample size

5 Introduction (1) High dimensional data is everywhere Random projection (RP) is fast becoming a workhorse in high dimensional learning computationally cheap; amenable to theoretical analysis Little is known about its effect on the generalisation performance of a classifier, except for a few specific cases. Results so far only for specific families of classifiers, and under constraints of some form on the data. Seminal paper by (Arriaga & Vempala 1999) on RP-perceptron - relies on Johnson-Lindenstrauss lemma so the bound unnaturally loosens with increasing the sample size (Calderbank et al. 09) on RP-SVM - uses techniques from Compressed Sensing - bound only holds under assumption of sparse data

6 Introduction (1) High dimensional data is everywhere Random projection (RP) is fast becoming a workhorse in high dimensional learning computationally cheap; amenable to theoretical analysis Little is known about its effect on the generalisation performance of a classifier, except for a few specific cases. Results so far only for specific families of classifiers, and under constraints of some form on the data. Seminal paper by (Arriaga & Vempala 1999) on RP-perceptron - relies on Johnson-Lindenstrauss lemma so the bound unnaturally loosens with increasing the sample size (Calderbank et al. 09) on RP-SVM - uses techniques from Compressed Sensing - bound only holds under assumption of sparse data (Davenport et al. 10) - assumes spherical classes; (Durrant & Kabán 11) - RP-FLD, assumes subgaussian classes

7 Introduction (2) Our aim is to obtain generalisation bound for a generic linear classifier trained by ERM on randomly projected data. Make no restrictive assumptions other than the original data is drawn i.i.d from some unknown distribution D.

8 Introduction (2) Our aim is to obtain generalisation bound for a generic linear classifier trained by ERM on randomly projected data. Make no restrictive assumptions other than the original data is drawn i.i.d from some unknown distribution D. Nice idea in (Garg et al, 02) where they use RP as an analytic tool to bound the error of classifiers trained in the original data space.

9 Introduction (2) Our aim is to obtain generalisation bound for a generic linear classifier trained by ERM on randomly projected data. Make no restrictive assumptions other than the original data is drawn i.i.d from some unknown distribution D. Nice idea in (Garg et al, 02) where they use RP as an analytic tool to bound the error of classifiers trained in the original data space. The idea is to quantify the effect of RP by how it changes class label predictions for projected points relative to the predictions of the data space classifier. (Garg et al, 02) derived a rather loose bound on this flipping probability, yielding numerically trivial bound.

10 Introduction (2) Our aim is to obtain generalisation bound for a generic linear classifier trained by ERM on randomly projected data. Make no restrictive assumptions other than the original data is drawn i.i.d from some unknown distribution D. Nice idea in (Garg et al, 02) where they use RP as an analytic tool to bound the error of classifiers trained in the original data space. The idea is to quantify the effect of RP by how it changes class label predictions for projected points relative to the predictions of the data space classifier. (Garg et al, 02) derived a rather loose bound on this flipping probability, yielding numerically trivial bound. Our main result is to derive the exact expression for the flipping probability, and turn around the high-level approach of (Garg et al. 02) to obtain nontrivial bounds for RP-classifiers. The ingredients of our proof then also feed back to improve the bound of (Garg et al, 02).

11 Preliminaries 2-class classification Training set T N = {(x i, y i )} N i=1 ; (x i, y i ) i.i.d D over R d {0,1}. Let ĥ H be the ERM linear classifier. So ĥ Rd, and w.l.o.g. we take it to pass through the origin, and can take that all data lies on S d 1 R d and ĥ = 1. For an unlabelled query point x q the label returned by nĥt o ĥ(x q ) = 1 x q > 0 ĥ is then: where 1{ } is the indicator function.

12 Preliminaries 2-class classification Training set T N = {(x i, y i )} N i=1 ; (x i, y i ) i.i.d D over R d {0,1}. Let ĥ H be the ERM linear classifier. So ĥ Rd, and w.l.o.g. we take it to pass through the origin, and can take that all data lies on S d 1 R d and ĥ = 1. For an unlabelled query point x q the label returned by nĥt o ĥ(x q ) = 1 x q > 0 where 1{ } is the indicator function. ĥ is then: The generalisation error of ĥ is defined as E (x q,y q ) D [L(ĥ(x q), y q )], and we use the (0,1)-loss: L (0,1) (ĥ(x q), y q ) = j 0 if ĥ(x q ) = y q 1 otherwise.

13 Problem setting Consider the case when d is too large. Many dimensionality reduction methods; RP is computationally cheap and non-adaptive.

14 Problem setting Consider the case when d is too large. Many dimensionality reduction methods; RP is computationally cheap and non-adaptive. Random projection: Take a random matrix R M k d, k d, with entries r ij i.i.d N(0, σ 2 ). Pre-multiply the data points with it: T N R = {(Rx i, y i )} N i=1.

15 Problem setting Consider the case when d is too large. Many dimensionality reduction methods; RP is computationally cheap and non-adaptive. Random projection: Take a random matrix R M k d, k d, with entries r ij i.i.d N(0, σ 2 ). Pre-multiply the data points with it: T N R = {(Rx i, y i )} N i=1. Side note: r ij can be subgaussian in general but in this work we exploit the rotation-invariance of the Gaussian.

16 Problem setting Consider the case when d is too large. Many dimensionality reduction methods; RP is computationally cheap and non-adaptive. Random projection: Take a random matrix R M k d, k d, with entries r ij i.i.d N(0, σ 2 ). Pre-multiply the data points with it: T N R = {(Rx i, y i )} N i=1. Side note: r ij can be subgaussian in general but in this work we exploit the rotation-invariance of the Gaussian. We are interested in the generalisation error of the ERM linear classifier trained on T N R rather than T N.

17 Denote the trained classifier by ĥr R k (possibly not through the origin, but translation does not affect our proof technique) The label returned by ĥr is therefore: o ĥ R (Rx q ) = 1 nĥt R Rx q + b > 0 where b R. We want to estimate: h i L (0,1) (ĥr(rx q ), y q ) E (xq,y q ) D = Pr (xq,y q ) D nĥr (Rx q ) y q o with high probability w.r.t the random choice of T N and R.

18 Results

19 Theorem [Generalization error bound] Let T N = {(x i, y i ) x i R d, y i {0,1}} N i=1 be a training set, and let ĥ be the linear ERM classifier estimated from T N. Let R M k d, k < d be a RP matrix with entries r ij i.i.d N(0, σ 2 ). Denote by T N R = {(Rx i, y i )} N i=1 the RP of T N, and let ĥr be the linear classifier estimated from T N R.

20 Theorem [Generalization error bound] Let T N = {(x i, y i ) x i R d, y i {0,1}} N i=1 be a training set, and let ĥ be the linear ERM classifier estimated from T N. Let R M k d, k < d be a RP matrix with entries r ij i.i.d N(0, σ 2 ). Denote by T N R = {(Rx i, y i )} N i=1 the RP of T N, and let ĥr be the linear classifier estimated from T N R. Then, for all δ (0,1], with probability at least 1 2δ, Pr xq,y q {ĥr(rx q ) y q } Ê(T N, ĥ) + 1 N 8v v >< u + min t 1u 3 log t 1 NX f k (θ i ), 1 δ >: δ N δ i=1 NX f k (θ i ) i=1 1 N 9 s NX >= f k (θ i ) >; + 2 (k + 1) log 2eN k+1 + log 1 δ N where f k (θ i ) := Pr R {sign(ĥrt Rx i ) sign (ĥt x i )} is the flipping probability for x i with θ i the principal angle between ĥ and x i, and Ê(T N, ĥ) is the empirical risk of the data space classifier. i=1

21 Proof.(sketch) For a fixed instance of R, from classical VC theory we have δ (0,1) w.p. 1 δ over T N, s Pr xq,y q {ĥr(rx q ) y q } Ê(T R N, ĥr) (k + 1) log(2en/(k + 1)) + log(1/δ) + 2 N where Ê(T N R, ĥr) = 1 N P Ni=1 1{ĥR(Rx i ) y i } the empirical error. We see RP reduces the complexity term but will increase the empirical error.

22 Proof.(sketch) For a fixed instance of R, from classical VC theory we have δ (0,1) w.p. 1 δ over T N, s Pr xq,y q {ĥr(rx q ) y q } Ê(T R N, ĥr) (k + 1) log(2en/(k + 1)) + log(1/δ) + 2 N where Ê(T N R, ĥr) = 1 N P Ni=1 1{ĥR(Rx i ) y i } the empirical error. We see RP reduces the complexity term but will increase the empirical error. We bound the latter further: Ê(TR N, ĥr)... 1 NX N 1{sign((Rĥ)T Rx i ) sign(ĥt x i )} +Ê(T N, ĥ) i=1 {z } S

23 Proof.(sketch) For a fixed instance of R, from classical VC theory we have δ (0,1) w.p. 1 δ over T N, s Pr xq,y q {ĥr(rx q ) y q } Ê(T R N, ĥr) (k + 1) log(2en/(k + 1)) + log(1/δ) + 2 N where Ê(T N R, ĥr) = 1 N P Ni=1 1{ĥR(Rx i ) y i } the empirical error. We see RP reduces the complexity term but will increase the empirical error. We bound the latter further: Ê(TR N, ĥr)... 1 NX N 1{sign((Rĥ)T Rx i ) sign(ĥt x i )} +Ê(T N, ĥ) i=1 {z } S Now, bound S from E R [S] w.h.p, w.r.t. the random choice of R. Dependent sum, each term depends on the same R. Can use Markov inequality or a Chernoff bound for dependent variables (Siegel, 95) - numerically tighter for small δ - and take min of these. The terms of E R [S] is what we call the flipping probabilities we derive the exact form of this, which is the main technical ingredient.

24 Theorem [Flipping Probability] Let h, x R d be two vectors with angle θ [0, π/2] between them. Without loss of generality take h = x = 1. Let R M k d, k < d, be a random projection matrix with entries i.i.d r ij N(0, σ 2 ) and let Rh, Rx R k be the images of h, x under R with angular separation θ R. 1. Denote by f k (θ) the flipping probability f k (θ) := Pr{(Rh) T Rx < 0 h T x > 0}. Then: f k (θ) = Γ(k) (Γ(k/2)) 2 Z ψ where ψ = (1 cos(θ))/(1 + cos(θ)). 0 z (k 2)/2 dz (1) (1 + z) k

25 2. The expression above can be rewritten as the quotient of the surface area of a hyperspherical cap with an angle of 2θ by the surface area of the corresponding hypersphere, namely: f k (θ) = R θ0 sin k 1 (φ) dφ R π0 sin k 1 (φ) dφ (2) 3. The flipping probability is monotonic decreasing as a function of k: Fix θ [0, π/2], then f k (θ) f k+1 (θ). Proof is in the paper.

26 Flip Probability - Illustration 0.5 Theoretical Bound: k { } Pr[flip] Angle θ (radians)

27 Flip Probability - d-invariance 0.5 k=5, d= k=5, d=450 k=5, d= k=5, d= k=5, d= k=5, d= MC trials showing flip probability invariance w.r.t d. Here k = 5, d {500,450,..., 250}.

28 Geometric interpretation of Flipping Probability The flip probability f k (θ) recovers the known result for k = 1, namely θ/π, as given in (Goemans & Williamson, 1995, Lemma 3.2.). Geometrically, when k = 1 the flipping probability is the quotient of the length of the arc with angle 2θ by the circumference of the unit circle which is 2π. Projection line Vector h Flip region Normal to projection line Vector x

29 Geometric interpretation of Flipping Probability The flip probability f k (θ) recovers the known result for k = 1, namely θ/π, as given in (Goemans & Williamson, 1995, Lemma 3.2.). For generic k, our flip probability equals the ratio of the surface area in R k+1 of a hyperspherical cap with angle 2θ, 2π Qk 2 R π0 i=1 sin i (φ)dφ R θ 0 sin k 1 (φ)dφ R to the surface area of the unit hypersphere, π0 2π Qk 1 i=1 sin i (φ)dφ. Hence our result gives a natural generalisation of the result in (Goemans & Williamson). Also, since this ratio is known to be upperbounded by exp( 1 2 k cos2 (θ)) (Ball, 1997), we have that the cost of working with randomly-projected data decays exponentially with k.

30 Two straightforward Corollaries Corollary 1 [Upper Bound on Generalization Error for Data Separable with a Margin] f k (θ) exp( 1 2 k cos2 (θ)) (Ball, 1997,Lemma 2.2.) cos(θ) = m

31 Corollary 1 [Upper Bound on Generalisation Error for Data Separable with a Margin] If the conditions of Theorem hold and the data classes are also separable with a margin, m, in the data space then for all δ (0,1) with probability at least 1 2δ we have: Pr xq,y q {ĥr(rx q ) y q } Ê(T N, ĥ) + exp 1 «2 km2 +min + 2 (r3 log 1δ exp 14 km2 «s (k + 1) log 2eN k+1 + log 1 δ N, 1 δ δ exp 12 «) km2

32 Corollary 2 [Upper Bound on Generalisation Error in Data Space] Let T 2N = {(x i, y i )} 2N i=1 be a set of d-dimensional labelled training examples drawn i.i.d. from some data distribution D, and let ĥ be a linear classifier estimated from T 2N by ERM. Let k {1,2,..., d} be an integer and let R M k d be a random projection matrix, with entries r ij N(0, σ 2 ). Then for all δ (0,1], i.i.d with probability at least 1 4δ w.r.t. the random draws of T 2N and R the generalisation error of ĥ w.r.t the (0,1)-loss is bounded above by: Pr xq,y q {ĥt x q y q } Ê(T 2N, 8v v >< u + min t 1u 3 log t 1 >: δ N 2NX i=1 ĥ) + 2 min k f k (θ i ), 1 δ δ 1 N ( 1 2NX f k (θ i )... N i=1 9 2NX >= f k (θ i ) >; + i=1 s (k + 1) log 2eN k+1 + log 1 δ 2N 9 >= >;

33 Conclusions & Future work We derived the exact probability of label flipping as a result of Gaussian random projection & used this to give sharp upper bounds on the generalisation error of a randomly-projected classifier.

34 Conclusions & Future work We derived the exact probability of label flipping as a result of Gaussian random projection & used this to give sharp upper bounds on the generalisation error of a randomly-projected classifier. We require neither a large margin nor data sparsity for our bounds to hold, and we make no assumption on the data distribution.

35 Conclusions & Future work We derived the exact probability of label flipping as a result of Gaussian random projection & used this to give sharp upper bounds on the generalisation error of a randomly-projected classifier. We require neither a large margin nor data sparsity for our bounds to hold, and we make no assumption on the data distribution. We saw that a margin implies low flipping probabilities; identifying other structures that imply low flipping probabilities is subject to future work.

36 Conclusions & Future work We derived the exact probability of label flipping as a result of Gaussian random projection & used this to give sharp upper bounds on the generalisation error of a randomly-projected classifier. We require neither a large margin nor data sparsity for our bounds to hold, and we make no assumption on the data distribution. We saw that a margin implies low flipping probabilities; identifying other structures that imply low flipping probabilities is subject to future work. Our proof makes use of the orthogonal invariance of the standard Gaussian distribution, which cannot be applied for other random matrices with entries whose distribution is not orthogonally invariant: Extension to other random matrices is subject to future work.

37 Conclusions & Future work We derived the exact probability of label flipping as a result of Gaussian random projection & used this to give sharp upper bounds on the generalisation error of a randomly-projected classifier. We require neither a large margin nor data sparsity for our bounds to hold, and we make no assumption on the data distribution. We saw that a margin implies low flipping probabilities; identifying other structures that imply low flipping probabilities is subject to future work. Our proof makes use of the orthogonal invariance of the standard Gaussian distribution, which cannot be applied for other random matrices with entries whose distribution is not orthogonally invariant: Extension to other random matrices is subject to future work. Further tightening is possible if we replace the form of VC complexity term in our bounds with better guarantees given in (Bartlett & Mendelson 02).

38 Selected References Arriaga, R.I. and Vempala, S. An Algorithmic Theory of Learning: Robust Concepts and Random Projection. In 40th Annual Symposium on Foundations of Computer Science (FOCS 1999)., pp IEEE, Ball, K. An Elementary Introduction to Modern Convex Geometry. Flavors of Geometry, 31: 1 58, Bartlett, P.L. and Mendelson, S. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. J. Machine Learning Research, 3: , Calderbank, R., Jafarpour, S., and Schapire, R. Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain. Technical Report, Rice University, Durrant, R.J. and Kabán, A. Compressed Fisher Linear Discriminant Analysis: Classification of Randomly Projected Data. In Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2010), Garg, A. and Roth, D. Margin Distribution and Learning Algorithms. In Proc. 20th International Conference on Machine Learning (ICML 2003), pp , Garg, A., Har-Peled, S., and Roth, D. On Generalization Bounds, Projection Profile, and Margin Distribution. In Proc. 19th International Conference on Machine Learning (ICML 2002), pp , Goemans, M.X. and Williamson, D.P. Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems using Semidefinite Programming. Journal of the ACM, 42(6): 1145, 1995.

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers Robert J. Durrant r.j.durrant@cs.bham.ac.uk School of Computer Science, University of Birmingham, Edgbaston, UK, B5 TT Ata Kabán a.kaban@cs.bham.ac.uk

More information

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces

A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces A tight bound on the performance of Fisher s linear discriminant in randomly projected data spaces Robert John Durrant, Ata Kabán School of Computer Science, University of Birmingham, Edgbaston, UK, B15

More information

arxiv: v1 [math.st] 28 Sep 2017

arxiv: v1 [math.st] 28 Sep 2017 arxiv: arxiv:0000.0000 Structure-aware error bounds for linear classification with the zero-one loss arxiv:709.09782v [math.st] 28 Sep 207 Ata Kabán and Robert J. Durrant School of Computer Science The

More information

Learning in High Dimensions with Projected Linear Discriminants

Learning in High Dimensions with Projected Linear Discriminants Learning in High Dimensions with Projected Linear Discriminants By Robert John Durrant A thesis submitted to the University of Birmingham for the degree of Doctor of Philosophy School of Computer Science

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections Journal of Machine Learning Research 6 A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections Ata Kabán School of Computer Science, The University of Birmingham, Edgbaston,

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Rademacher Bounds for Non-i.i.d. Processes

Rademacher Bounds for Non-i.i.d. Processes Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based

More information

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang

More information

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions Robert J. Durrant (bobd@waikato.ac.nz Department of Statistics, University of Waikato, Hamilton

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Generalization error bounds for classifiers trained with interdependent data

Generalization error bounds for classifiers trained with interdependent data Generalization error bounds for classifiers trained with interdependent data icolas Usunier, Massih-Reza Amini, Patrick Gallinari Department of Computer Science, University of Paris VI 8, rue du Capitaine

More information

Lecture 8: The Goemans-Williamson MAXCUT algorithm

Lecture 8: The Goemans-Williamson MAXCUT algorithm IU Summer School Lecture 8: The Goemans-Williamson MAXCUT algorithm Lecturer: Igor Gorodezky The Goemans-Williamson algorithm is an approximation algorithm for MAX-CUT based on semidefinite programming.

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Is margin preserved after random projection?

Is margin preserved after random projection? Qinfeng Shi javen.shi@adelaide.edu.au Chunhua Shen chunhua.shen@adelaide.edu.au Rhys Hill rhys.hill@adelaide.edu.au Anton van den Hengel anton.vandenhengel@adelaide.edu.au Australian Centre for Visual

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Compressive Sensing with Random Matrices

Compressive Sensing with Random Matrices Compressive Sensing with Random Matrices Lucas Connell University of Georgia 9 November 017 Lucas Connell (University of Georgia) Compressive Sensing with Random Matrices 9 November 017 1 / 18 Overview

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing Alexandr Andoni Piotr Indy April 11, 2009 Abstract We provide novel methods for efficient dimensionality reduction in ernel spaces.

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26 Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 17 (Nov 3, 2011 ): Approximation via rounding SDP: Max-Cut

Lecture 17 (Nov 3, 2011 ): Approximation via rounding SDP: Max-Cut CMPUT 675: Approximation Algorithms Fall 011 Lecture 17 (Nov 3, 011 ): Approximation via rounding SDP: Max-Cut Lecturer: Mohammad R. Salavatipour Scribe: based on older notes 17.1 Approximation Algorithm

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:

More information

On the tradeoff between computational complexity and sample complexity in learning

On the tradeoff between computational complexity and sample complexity in learning On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham

More information

Computational Learning Theory - Hilary Term : Learning Real-valued Functions

Computational Learning Theory - Hilary Term : Learning Real-valued Functions Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling

More information

Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain

Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the easurement Domain Robert Calderbank Electrical Engineering and athematics Princeton University calderbk@princeton.edu

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Cross-Validation and Mean-Square Stability

Cross-Validation and Mean-Square Stability Cross-Validation and Mean-Square Stability Satyen Kale Ravi Kumar Sergei Vassilvitskii Yahoo! Research 701 First Ave Sunnyvale, CA 94089, SA. {skale, ravikumar, sergei}@yahoo-inc.com Abstract: k-fold cross

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings

Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings Maria-Florina Balcan 1 and Avrim Blum 1 and Santosh Vempala 1 Computer Science Department, Carnegie Mellon University {ninamf,avrim}@cs.cmu.edu

More information

A Pseudo-Boolean Set Covering Machine

A Pseudo-Boolean Set Covering Machine A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Rank, Trace-Norm & Max-Norm

Rank, Trace-Norm & Max-Norm Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University Matrix Learning users movies 2 1 4 5 5 4? 1 3 3 5 2 4? 5 3? 4 1 3 5 2 1? 4

More information

Learning Combinatorial Functions from Pairwise Comparisons

Learning Combinatorial Functions from Pairwise Comparisons Learning Combinatorial Functions from Pairwise Comparisons Maria-Florina Balcan Ellen Vitercik Colin White Abstract A large body of work in machine learning has focused on the problem of learning a close

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Statistical Learning Theory and the C-Loss cost function

Statistical Learning Theory and the C-Loss cost function Statistical Learning Theory and the C-Loss cost function Jose Principe, Ph.D. Distinguished Professor ECE, BME Computational NeuroEngineering Laboratory and principe@cnel.ufl.edu Statistical Learning Theory

More information

The definitions and notation are those introduced in the lectures slides. R Ex D [h

The definitions and notation are those introduced in the lectures slides. R Ex D [h Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Lecture 16: Compressed Sensing

Lecture 16: Compressed Sensing Lecture 16: Compressed Sensing Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 16 1 / 12 Review of Johnson-Lindenstrauss Unsupervised learning technique key insight:

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information