E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

Size: px
Start display at page:

Download "E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)"

Transcription

1 E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how to obtain high confidence bounds on the generalization error er 0- D [h S] of a binary classifier h S learned by an algorith fro a function class H, X of liited capacity, using the ideas of unifor convergence. We saw the use of the growth function Π H to easure the capacity of the class H, as well as the VC-diension VCdiH, which provides a one-nuber suary of the capacity of H. In this lecture we will consider the proble of regression, or learning of real-valued functions. Here we are given a training saple S = x, y,..., x, y X Y for soe Y R, assued now to be drawn fro D for soe distribution D on X Y, and the goal is to learn fro this saple a real-valued function f S : X R that has low generalization error er l D [f S] w.r.t. soe appropriate loss function l : Y R [0,, such as the squared loss l sq given by l sq y, ŷ = ŷ y. As before, we will be interested in obtaining high confidence bounds on the generalization error of the learned function, er l D [f S]. Again, we will consider learning algoriths that learn f S fro a function class F R X of liited capacity, and obtain generalization error bounds for such algoriths via a unifor convergence result that upper bounds the probability P S D sup er l D[f] er l S[f]. However in order to derive such a result, we will need a different notion of capacity for a class of real-valued functions F. In particular, we will use the covering nubers of F, which will play a role analogous to that played by the growth function in the case of binary-valued function classes. Covering Nubers We start by considering covering nubers of subsets of a general etric space. We will then specialize this to subsets of Euclidean space, and use this to define covering nubers for a real-valued function class.. Covering Nubers in a General Metric Space Let A, d be a etric space. Let W A and let > 0. A set C W is said to be a proper -cover of W w.r.t. d if for every w W, c C such that dw, c <. In other words, C W is an -cover of W w.r.t. d if the union of open d-balls of radius centered at points in C contains W : 3 B d, c W. c C If W has a finite -cover w.r.t. d, then we define the -covering nuber of W w.r.t. d to be the cardinality of the sallest -cover of W : N, W, d = in C C is an -cover of W w.r.t. d. Recall that a etric space A, d consists of a set A together with a etric d : A A [0, that satisfies the following for all x, y, z A: dx, y = 0 x = y; dx, y = dy, x; and 3 dx, z dx, y + dx, z. Soeties it is also convenient to consider iproper covers C A which need not be contained in W. 3 Recall that the open d-ball centered at x A is defined as B d, x = y A dx, y <.

2 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension If W does not have a finite -cover w.r.t. d, we take N, W, d =. Thus the covering nubers of W can be viewed as easuring the extent of W in A, d at granularity or scale.. Covering Nubers in Euclidean Space Consider now A = R n. following: We can define a nuber of different etrics on R n, including in particular the d x, x = n x i x n i 3 i= d x, x = n x i x i n i= d x, x = ax x i x i. 5 i Accordingly, for any W R n, we can define the corresponding covering nubers N, W, d p for p =,,. It is easy to see that d x, x d x, x by Jensen s inequality and that d x, x d x, x, fro which it follows that the corresponding covering nubers satisfy the relation N, W, d N, W, d N, W, d. 6.3 Unifor Covering Nubers for a Real-Valued Function Class Now let F be a class of real-valued functions on X, and let x = x,..., x X. Then F x R. For any > 0 and N, the unifor d p covering nubers of F for p =,, are defined as N p, F, = ax N, F x X x, d p 7 if N, F x, d p is finite for all x X, and N p, F, = otherwise. This should be copared with the definition of growth function for a class of binary-valued functions H, which also involved a axiu over x X : in that case, H x was finite, and the axiu was over the cardinality of H x ; here, F x ay in general be infinite, and the axiu is over the extent of F x in R at scale, as easured using the etric d p. In particular, for H, X, we have that for any, N, H x, d = H x, and therefore N, H, = Π H. Thus the unifor covering nubers can be viewed as generalizing the notion of growth function to classes of real-valued functions. Note that the ter unifor here refers to the axiu over all x X of the covering nubers of F x in R, and is unrelated to the use of the ter unifor in unifor convergence, which refers to the supreu over functions f F. Unless otherwise stated, in what follows we will refer to the unifor covering nubers of a function class F as siply the covering nubers of F. 3 Unifor Convergence in a Real-Valued Function class F We can assue that functions in F take values in soe set Ŷ R, so that F ŶX. We will require the loss function l to be bounded, i.e. we will assue B > 0 such that 0 ly, ŷ B y Y, ŷ Ŷ. We will find it useful to define for any function class F ŶX and loss l : Y Ŷ [0, B] the loss function class l F [0, B] X Y given by l F = l f : X Y [0, B] l f x, y = ly, fx for soe f F. 8 We will first prove a unifor convergence result for general losses l as above in ters of the d covering nubers of the loss function class l F, and will then show that for any losses l, including the squared loss when Y and Ŷ are bounded, the d covering nubers of l F can further be bounded in ters of the d covering nubers of F.

3 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension 3 Theore 3.. Let Y, Ŷ R. Let F ŶX, and let l : Y Ŷ [0, B]. Let D be any distribution on X Y. For any > 0, P S D sup er l D[f] er l S[f] N /8, lf, e /3B. 9 Proof. The proof uses siilar techniques as in the proof of unifor convergence for the l 0- loss in the binary case that we saw in Lecture 3, and has the sae broad steps. The key difference is in the reduction to a finite class step 3. Step : Syetrization. Following the sae steps as in Lecture 3, we can show that for 8B, P S D sup er l D[h] er l S[h] P S, S D D sup er l S[f] er l S[f]. 0 Step : Swapping perutations. Again using the sae arguent as in Lecture 3, we can show that P S, S D D sup er l S[f] er l S[f] ] [P σ Γ sup er l σs [f] erl σ S [f]. sup S, S X Y Step 3: Reduction to a finite class. Fix any S, S X Y, and for siplicity, define x +i, y +i = x i, ỹ i i []. Now consider l F S, S [0, B]. Let G F be such that l G S, S is an /8-cover of l F S, S w.r.t. d. Clearly, we can take G = N /8, l F S, S, d N /8, l F, ; since l F aps to a bounded interval, this is a finite nuber. Now consider any σ Γ. We clai that if f F such that er l σs [f] erl σ S [f], then g G such that er l σs [g] erl σ S [g]. 3 To see this, let f F be such that holds. Take any g G for which i= l f x i, y i l g x i, y i < 8. Such a g exists since l G S, S is an /8-cover of l F S, S w.r.t. d. We will show that g satisfies 3. In particular, we have er l σs [f] erl σ S [f] = = < er l σs [f] erl σs [g] er l er σs [f] erl σs [g] + l er l σs [g] erl σ S [g] + er l σs [g] erl σ S [g] + er l σs [g] erl σ S [g] + 5 er lσ S [f] erlσ S [g] + er l σs [g] erl σ S [g] 6 er σ S [f] erl σ S [g] + l σs [g] er l σ S [g] 7 l f x i, y i l g x i, y i + l f x σi, y σi l g x σi, y σi 8 i= i= i=+ l f x σi, y σi l g x σi, y σi 9 by. 0 Y can be thought of as the space of true labels, and Ŷ the space of predicted labels.

4 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension The clai follows. Thus we have P σ Γ sup er l σs [f] erl σ S [f] P σ Γ ax er l σs [g] g G erl σ S [g] er N /8, l F, ax P l σ Γ σs [g] er l g G σ S [g] by union bound. Step : Hoeffding s inequality. As in Lecture 3, Hoeffding s inequality can now be used to show that for any g G, er l P σ Γ σs [g] er l σ S [g] 3 = P r, r i l g x i, y i l g x i, y i i= e /3B. 5 Putting everything together yields the desired result for 8B ; for < 8B, the result holds trivially. The above result yields a high-confidence bound on the generalization error of a function learned fro F in ters of covering nubers of l F. For well-behaved loss functions l, these can be further bounded in ters of covering nubers of F: Lea 3.. Let Y, Ŷ R. Let F ŶX, and let l : Y Ŷ [0, B]. If l is Lipschitz in its second arguent with Lipschitz constant L > 0, i.e. ly, ŷ ly, ŷ L ŷ ŷ y Y, ŷ, ŷ Ŷ, then for any N, N, lf, N /L, F,. Proof. Let S = x, y,..., x, y X Y, and let f, g F. Then l f x i, y i l g x i, y i = i= L ly i, fx i ly i, gx i 6 i= fx i gx i. 7 i= Thus any d /L-cover for F x is a d -cover for l F S, which iplies the result. This yields the following corollary: Corollary 3.3. Let Y, Ŷ R, F ŶX, and l : Y Ŷ [0, B] such that l is Lipschitz in its second arguent with Lipschitz constant L > 0. Let D be any distribution on X Y. Then for any > 0: P S D sup er l D[f] er l S[f] N /8L, F, e /3B. 8 As an exaple, consider the squared loss l sq y, ŷ = ŷ y. It can be shown that if Y, Ŷ are bounded, then l sq is bounded and Lipschitz. In particular, for Y = Ŷ = [, ], we have 0 l sqy, ŷ y Y, ŷ Ŷ,

5 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension 5 and l sq is Lipschitz with Lipschitz constant L = : l sq y, ŷ l sq y, ŷ = = y ŷ y ŷ 9 ŷ ŷ yŷ ŷ 30 ŷ + ŷ ŷ ŷ + y ŷ ŷ 3 ŷ ŷ since y, ŷ, ŷ [, ]. 3 Thus when both labels and predictions are in [, ], one gets for the squared loss: Corollary 3.. Let Y = Ŷ = [, ], F ŶX, and l sq : Y Ŷ [0, ] be given by l sqy, ŷ = ŷ y. Let D be any distribution on X Y. Then for any > 0: P S D sup er sq D [f] ersq S [f] N /3, F, e /5. 33 Exercise. Show that for Y = Ŷ = [, ], the absolute loss given by l absy, ŷ = ŷ y y Y, ŷ Ŷ is bounded and is Lipschitz in its second arguent with Lipschitz constant L =. Pseudo-Diension and Fat-Shattering Diension Just as the growth function Π H needed to be sub-exponential in for the unifor convergence result in the binary classification case to be eaningful, the covering nubers N /8, l F, or N /8L, F, need to be sub-exponential in for the above result to be eaningful. In the binary case, we saw that if the VC-diension of H is finite, then the growth function of H grows polynoially in. Analogous results can be shown to hold for the covering nubers. We provide soe basic definitions and results here; further details can be found for exaple in []. Definition Pseudo-diension. Let F R X and let x = x,..., x X. We say x is pseudoshattered by F if r = r,..., r R such that b = b,..., b,, f b F such that signf b x i r i = b i i []. The pseudo-diension of F is the cardinality of the largest set of points in X that can be pseudo-shattered by F: PdiF = ax N x X such that x is pseudo-shattered by F. If F pseudo-shatters arbitrarily large sets of points in X, we say PdiF =. Fact. If F is a vector space of real-valued functions, then PdiF = dif. For exaple, for the class of all affine functions over R n given by F = f : R n R fx = w x + b for soe w R n, b R, we have PdiF = n +. Clearly, if F F, PdiF PdiF. Definition Fat-shattering diension. Let F R X and let x = x,..., x X. Let γ > 0. We say x is γ-shattered by F if r = r,..., r R such that b = b,..., b,, f b F such that b i f b x i r i γ i []. The γ-diension of F or the fat-shattering diension of F at scale γ is the cardinality of the largest set of points in X that can be γ-shattered by F: fat F γ = ax N x X such that x is γ-shattered by F. If F γ-shatters arbitrarily large sets of points in X, we say fat F γ =. Clearly, fat F γ PdiF γ > 0. The fat-shattering diension is often called a scale-sensitive diension since it depends on the scale γ. Both quantities, when finite, can be used to bound the covering nubers of a function class F whose functions take values in a bounded range:

6 6 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Theore.. Let F [a, b] X for soe a b. Let 0 < b a, and let fat F /8 = d <. Then for d, d log /d N, F, = O. Theore.. Let F [a, b] X N, for soe a b. Let PdiF = d <. Then for all 0 < b a and d N, F, = O. 5 Next Lecture In the next lecture, we will return to binary classification, but will focus on learning functions of the for hx = signfx for soe real-valued function f, and will see how the quantities considered in this lecture can be useful in such situations. References [] Martin Anthony and Peter L. Bartlett. Learning in Neural Networks: Theoretical Foundations. Cabridge University Press, 999.

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Prerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.

Prerequisites. We recall: Theorem 2 A subset of a countably innite set is countable. Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Question 1. Question 3. Question 4. Graduate Analysis I Exercise 4

Question 1. Question 3. Question 4. Graduate Analysis I Exercise 4 Graduate Analysis I Exercise 4 Question 1 If f is easurable and λ is any real nuber, f + λ and λf are easurable. Proof. Since {f > a λ} is easurable, {f + λ > a} = {f > a λ} is easurable, then f + λ is

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

arxiv: v1 [math.co] 19 Apr 2017

arxiv: v1 [math.co] 19 Apr 2017 PROOF OF CHAPOTON S CONJECTURE ON NEWTON POLYTOPES OF q-ehrhart POLYNOMIALS arxiv:1704.0561v1 [ath.co] 19 Apr 017 JANG SOO KIM AND U-KEUN SONG Abstract. Recently, Chapoton found a q-analog of Ehrhart polynoials,

More information

U V. r In Uniform Field the Potential Difference is V Ed

U V. r In Uniform Field the Potential Difference is V Ed SPHI/W nit 7.8 Electric Potential Page of 5 Notes Physics Tool box Electric Potential Energy the electric potential energy stored in a syste k of two charges and is E r k Coulobs Constant is N C 9 9. E

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY

More information

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes. Solutions 1 Exercise 1.1. See Exaples 1.2 and 1.11 in the course notes. Exercise 1.2. Observe that the Haing distance of two vectors is the iniu nuber of bit flips required to transfor one into the other.

More information

Max-Product Shepard Approximation Operators

Max-Product Shepard Approximation Operators Max-Product Shepard Approxiation Operators Barnabás Bede 1, Hajie Nobuhara 2, János Fodor 3, Kaoru Hirota 2 1 Departent of Mechanical and Syste Engineering, Bánki Donát Faculty of Mechanical Engineering,

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

4 = (0.02) 3 13, = 0.25 because = 25. Simi- Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany. New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Symmetrization of Lévy processes and applications 1

Symmetrization of Lévy processes and applications 1 Syetrization of Lévy processes and applications 1 Pedro J. Méndez Hernández Universidad de Costa Rica July 2010, resden 1 Joint work with R. Bañuelos, Purdue University Pedro J. Méndez (UCR Syetrization

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Computational and Statistical Learning theory Assignment 4

Computational and Statistical Learning theory Assignment 4 Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

arxiv: v2 [stat.ml] 23 Feb 2016

arxiv: v2 [stat.ml] 23 Feb 2016 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3 A. Find all ordered pairs a, b) of positive integers for which a + b = 3 08. Answer. The six ordered pairs are 009, 08), 08, 009), 009 337, 674) = 35043, 674), 009 346, 673) = 3584, 673), 674, 009 337)

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maximal Ideal Spaces

Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maximal Ideal Spaces Int. J. Nonlinear Anal. Appl. 5 204 No., 9-22 ISSN: 2008-6822 electronic http://www.ijnaa.senan.ac.ir Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maxial Ideal

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Lecture 21 Principle of Inclusion and Exclusion

Lecture 21 Principle of Inclusion and Exclusion Lecture 21 Principle of Inclusion and Exclusion Holden Lee and Yoni Miller 5/6/11 1 Introduction and first exaples We start off with an exaple Exaple 11: At Sunnydale High School there are 28 students

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

arxiv: v1 [math.nt] 14 Sep 2014

arxiv: v1 [math.nt] 14 Sep 2014 ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Energy-Efficient Threshold Circuits Computing Mod Functions

Energy-Efficient Threshold Circuits Computing Mod Functions Energy-Efficient Threshold Circuits Coputing Mod Functions Akira Suzuki Kei Uchizawa Xiao Zhou Graduate School of Inforation Sciences, Tohoku University Araaki-aza Aoba 6-6-05, Aoba-ku, Sendai, 980-8579,

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

The isomorphism problem of Hausdorff measures and Hölder restrictions of functions

The isomorphism problem of Hausdorff measures and Hölder restrictions of functions The isoorphis proble of Hausdorff easures and Hölder restrictions of functions Doctoral thesis András Máthé PhD School of Matheatics Pure Matheatics Progra School Leader: Prof. Miklós Laczkovich Progra

More information

Approximation by Piecewise Constants on Convex Partitions

Approximation by Piecewise Constants on Convex Partitions Aroxiation by Piecewise Constants on Convex Partitions Oleg Davydov Noveber 4, 2011 Abstract We show that the saturation order of iecewise constant aroxiation in L nor on convex artitions with N cells

More information

Lecture 4: Completion of a Metric Space

Lecture 4: Completion of a Metric Space 15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Infinitely Many Trees Have Non-Sperner Subtree Poset

Infinitely Many Trees Have Non-Sperner Subtree Poset Order (2007 24:133 138 DOI 10.1007/s11083-007-9064-2 Infinitely Many Trees Have Non-Sperner Subtree Poset Andrew Vince Hua Wang Received: 3 April 2007 / Accepted: 25 August 2007 / Published online: 2 October

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

Metric Entropy of Convex Hulls

Metric Entropy of Convex Hulls Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information