1 Generalization bounds based on Rademacher complexity

Size: px
Start display at page:

Download "1 Generalization bounds based on Rademacher complexity"

Transcription

1 COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges to its expected value, not just for a single function or single hypothesis but for an entire faily of functions or hypotheses. Our goal is to bound the generalization error in such a way that the training error converges to the generalization error. But we are proving it in a ore general for. And we are deriving bounds in ters of the Radeacher coplexity that we talked about last tie. Today, we are going to finish proving the general theore and then go back to look at how to actually bound the Radeacher coplexity and apply it to the setting which we care about. We will look at various techniques to do that and relate it to all other coplexity easures we talked about in the class. And then we will ove on to soething copletely different. That will be the end of the first part of this course. Generalization bounds based on Radeacher coplexity Theore. Let F be faily of functions f : Z 0,. S = z,..., z n is a rando saple with i.i.d. z i D. With probability at least δ, for all f F, { } ( ) Ef ÊSf ˆRS (F) ln /δ + + O, R (F) where ˆR S (F) = E σ σ if(z i ) is the epirical Radeacher coplexity on a saple S for a function class f and R (F) = E S ˆRS (F) is its expected value. Proof. We first look at the function Φ(S) = ( Step : Φ(S) E S Φ(S) + O ln /δ ( ) Ef ÊSf. ) with probability at least δ. Then, we introduce the double saple trick. Let S = z,..., z be the real saple and S = z,..., z be the ghost saple which has the sae distribution with the original saple. ) Step : E S Φ(S) E S,S (ÊS f ÊSf. We replace the expected value with the epirical average on another new iaginary saple and end up with this nice syetric expression on the difference between epirical averages. The other trick that we used before is this idea of taking the data and randoly peruting it. We go through with each pair, flip a coin. If the coin is head, we swap the, otherwise we leave the alone. Writing it in an algorithic way, for i =....,, with probability /, swap z i z i, else do nothing.

2 Denote T, T the resulting saples after this rando peruting, where T is the new real saple and T is the new ghost saple. As before, we can argue that T and T have exactly the sae distributions as the original saples S and S. They just have the additional randoness that coes fro swapping. But the additional randoness does not change the distributions at all. We can take the expression Ê S f ÊSf = f(z i) f(z i ) = ( f(z i ) f(z i ) ) and replace S and S with T and T, and get exactly the sae expression. Writing it out explicitly, we have { f(zi ) f(z i) if z i z i swapped Ê T f ÊT f = = f(z i) f(z i ) if no swap ( σ i f(z i ) f(z i ) ), { if swap where σ i = indicates whether they are swapped or not. We wrap this up + if no swap to the next step. ) Step 3: E S,S (ÊS f ÊSf = E S,S,σ σ i (f(z i ) f(z i)). The expected value is taken with respect to T and T, which is generated by the original saples S, S, and then the sequence of σ i s indicating swap or no-swap for every pair. Step 4: E S,S,σ σ i (f(z i ) f(z i)) R (F). We can take the average and break it apart. ( E S,S,σ σ i f(z i ) f(z i ) ) ( ) = E S,S,σ σ i f(z i) + ( σ i )f(z i ) E S,S,σ σ i f(z i) + ( σ i )f(z i ) = E S,S,σ σ i f(z i) + E S,S,σ ( σ i )f(z i ). The inequality holds because taking the rea of two expressions separately, we can only get a larger nuber. The second ter in the last line is also the Radeacher coplexity since the ( σ i ) s have exactly the sae distribution as σ i s. Therefore, E S,S,σ ( σ i f(z i ) f(z i ) ) R (F). What we have done is actually a set of straight-line inequalities all the way down. Putting these steps together, we end up with the first bound with respect to the expected Radeacher coplexity. To go fro expected Radecher coplexity to the epirical

3 Radeacher coplexity which is actually its value on the saple, we can use McDiarid s inequality, which we already used in the first step to show that the coplicated rando variable is close to its expected value. We just have to check that the conditions of McDiarid s are satisfied. Here, this is the unifor convergence result. The key part is that with high probability on a single saple, for every function within the faily, the epirical average will be close to its true expected value. That is what is so hard to get but it is what we need for learning because we want to be able to show that what happens on a saple is a good proxy for what is happening in general on the entire universe. We proved this very general result. But we are interested in a ore specific case of classification error. We want to be able to show that in the usual learning setting we have been studying, with probability at least δ, h H, err(h) êrr(h) + soething sall. We already talked about how to express the true error and the training error in ters of the indicator variables, err(h) = E (x,y) D {h(x) y}, êrr(h) = {h(x i ) y i }. We want to take the theore and apply it to these indicator variables. Starting with the hypothesis space, we can set up functions which behave like the indicator variables. First, let the space Z = X {+, } be the Cartesian product between the set of all exaples X in our doain and the set of possible labels {+, }. Then, for each h H, define f h (x, y) = {h(x) y} which aps the input (x, y) to 0 or, depending on whether h incorrectly classifies exaple (x, y) or not. If we use this notation, the training error can be rewritten siply as the epirical average of f h on a saple S, êrr(h) = ÊSf h, and likewise the expected value can be rewritten as the expected value of f h, err(h) = Ef h. Then, the faily of functions we are working with is F H = {f h : h H}, which is the set of all these functions f h where h ranges over the hypothesis space H. Basically we are just aking soe definitions so that what we are calling generation error and training error atch what we were calling epirical average and true expectation. In particular, we take the theore and plug in the definitions. The result we get is that with high probability, h H (equivalently f h F H ), { } ( ) ˆRS (F H ) ln /δ Ef h err(h) ÊSf h êrr(h) + R (F H ) The point is that the expected value is the true error of h and the epirical average is the training error. We are left with the additional ter of Radeacher coplexity. We want to be claiing convergence of epirical average to the true expectation. However, it is not even clear how it depends on. We have to show in fact it goes to zero when gets large. Right now we would prefer to siplify the Radeacher coplexity for this particular function class. + O. 3

4 Start by writing out the epirical Radeacher coplexity, R S (F H ) = E σ σ i f h (z i ) = E σ H H σ i f h (x i, y i ). We can take f h and plug in what it is, which is an indicator variable. But by using the sae trick as last tie, we can instead use the algebraic for of the indicator function f h (x i, y i ) = {h(x i ) y i } = y ih(x i ), which gives R S (F H ) = E σ H y i h(x i ) σ i. By pulling out the first ter, which does not depend h, we have R S (F H ) = E σ σ i + ( y i σ i )h(x i ), h H where the expectation of the first ter is just zero. We are left with the expectation over the second ter. It is alost like the Radeacher coplexity except for ( y i σ i ) s. Since y i s which are s and + s are fixed and is fixed, we are taking s and + s and ultiplying it by a rando variable which is + and with equal probability. Thus, the whole thing ( y i σ i ) is itself a Radeacher variable that has exactly the sae distribution as σ i. Therefore, R S (F H ) = E σ h H σ i h(x i ) = R S (H). Plugging this into the theore, we have { } ( ) ˆRS (H) ln /δ err(h) êrr(h) + + O. R (H) What we are left with is the Radeacher coplexity over the hypothesis space H. That is nice because rather than working with this ore coplicated set of functions, we can just be working with the hypotheses theselves. Also, notice that the result really only depends on the x i s since the y i s have disappeared. The S is only the unlabeled part x i s, not the labels y i s. The upshot is, we just have to be able to copute Radeacher coplexity of the hypothesis space itself. If we can do that, we get exactly the kind of bounds we want. Radeacher coplexity bounds for binary classification Let s finally prove soe bounds on Randeacher coplexity. We get our pick either to work with the epirical version or the expected Radeacher coplexity. It is usually nicer to work with the epirical version because it is defined on a particular saple, i.e. a particular set of points. We are going to take advantage of that in a inute. It eans we only have to be considering what hypothesis in this class is doing on a particular fixed saple. Starting with the easiest case where the hypothesis space is finite, we have the following theore. 4

5 Theore. If H <, then R S (H) ln H where S = x,..., x consists of unlabeled points. It is saying that the Radeacher coplexity is upper bounded by soething which is like log of hypothesis space, our usual coplexity easure for finite hypothesis spaces. Finally we have a result that says, at least in this case, the Radeacher coplexity is going to zero at the rate /. Taking this bound and plugging it back into the inequality, we get a bound on the generalization error in ters of training error plus soe additional ter, which is a bound of the sae kind that we had before which we proved using Hoeffding s inequality and union bound. Here we are getting a little bit different constants, but we have seen that it falls out of this ore general result. The Mohri et al. book gives a proof of this, which they call Massart s lea. We are not going to prove this now, but we are going to reach a point later in the course, where it will follow as a corollary fro other stuff. We already have a way of dealing with finite hypothesis spaces. The whole point of doing all this work was to be able to deal with infinite hypothesis spaces. Suppose we have a hypothesis space H, possibly infinite. Reeber that we are working on the fixed saple S consisting of the exaples x i s. All that we care about is how the hypotheses are behaving on that fixed saple. So if we have another set of hypotheses which happen to behave exactly the sae way on that saple, then we get exactly the sae value. Or aybe said differently, all we care about are the behaviors of the hypotheses in this class on this fixed saple. In particular, we can iagine foring a uch saller hypothesis space, defining H H consisting of one representative hypothesis fro H for each behavior on S. And if we now take this definition and replace H by H, we will get exactly the sae value, because we haven t changed the hypotheses in ters of how they behave on the saple. If we take the over H and replace it with H, it will work out the sae and we get exactly the sae thing. The result is the epirical Radeacher coplexity on the saple S of this other class H, R S (H) = E σ h H σ i h(x i ) = E σ h H σ i h(x i ) = R S (H ). Whereas H is infinite, H is finite. The size of H is exactly equal to the nuber of behaviors of H on S. Therefore, H = Π H (S) Π H (), where Π H () is the growth function. Here we have the Radeacher coplexity of a finite class. And we can apply the theore to H which is finite and get ln ΠH (S) ln ΠH () R S (H). This is basically the sae kind of bound we have proved before but with the square root, which is kind of inevitable when we are not getting consistency. That s two of the coplexity easures we talked about, the log cardinality of the space and the log growth function. And the last one was good old VC diension, which we can also get easily. By Sauer s lea, the growth function is upper bounded by ( e ) d Π H () d 5

6 for d, where d is the VC diension of H. Plugging this in, we have d ln e d R S (H). We get this bound going to zero in ters of VC diension. This is in fact the VC diension on the saple, not VC diension in general. All the things can be easured on the saple S. 3 Introduction to boosting We reached the end of the first part of this course. We started out with a lot of idealized and restrictive conditions on learning by assuing the class of functions that are labeling the exaples and there is no noise and so on. And bit by bit, we eliinated those astions one by one. So at this point, the data can be anything although we are still assuing it is rando and i.i.d. We started out by giving this very tedious and specialized ad hoc arguents to show that an algorith works, eaning outputs a highly accurate hypothesis. Now we can just get rid of all that and have these very general and powerful tools that say if you have an algorith and want to show it works, we just need to show that it finds a hypothesis that is either consistent with the data or has low training error. Additionally if the hypothesis has low coplexity, then we have various ways of saying what that eans in precise quantitative ters with enough data. If we have that, then we just plug into these general theores and have an autoatic iediate proof that the hypothesis will be highly accurate. And also these theores qualitatively tell us soething about learning. They can also tell us soething about the cases when learning does not work or where learning is ipossible. Now that we have developed these tools, we can start looking at ore specific algoriths, not just for finding rectangles and intervals, but the kind of algoriths that actually get used in practice. We are going to talk about algoriths. But the first point we are going to talk about actually begins with a theoretical question, which eventually leads to an algorith that turns out to be practical. Go back to PAC odel. We have a concept class and assue the exaples are labeled according to the concept. We require that it be possible for the learning algorith to coe up with a hypothesis with arbitrarily high accuracy, to be able to drive the error down as close to zero as we want, provided we give the algorith enough data. But what if we have an algorith that cannot do that? Say a cruby algorith can only get the error down to 40%, that is, reliably gets accuracy 60%. Can we soehow use that algorith to coe up with a hypothesis that gives 99% accuracy? We got this thing that is doing soething really ediocre, not a whole lot better than just guessing randoly. Soehow we want to use that algorith to coe up with a hypothesis that gives alost perfect accuracy. We can kind of take this to an extree, rather than 60%, the algorith only gives accuracy 5%. We know we can always trivially get 50% accuracy just by flipping a coin. At an extree, we can say what happens if we can get accuracy a little bit better than rando guessing? Can we soehow boost the accuracy up to 99%? We can ask the question in the PAC odel. A class C is learnable if there is an algorith A so that for every target concept c in that class and for any distribution, for all ɛ > 0, δ > 0, A is given soe polynoial nuber in /ɛ, /δ of exaples labeled by c, and outputs a hypothesis h such that Pr (err D (h) ɛ) δ. 6

7 To ake the distinction, this is called strongly learnable. One thing to notice is that we are allowing the hypothesis h to coe fro any class. The other odel is alost the sae, except that we get rid of the ɛ part. We say a class C is weakly learnable if there exists γ > 0 and an algorith A such that for all δ > 0, with nuber of exaples polynoial in /δ, Pr (err D (h) ) γ δ. We can suarize the coparison in the following table. strong learnable weak learnable algorith A algorith A γ > 0 concept c C concept c C distribution D distribution D ɛ > 0 δ > 0 δ > 0 given = poly(/ɛ, /δ) given = poly(/δ) output h H output h H Pr(err D (h) ɛ) δ Pr(err D (h) γ) δ The question now is if we have a weak learning algorith that does a little bit better than rando, can we convert it to a strong learning algorith? In ore abstract ters, are these two notions of learnability equivalent? The answer turns out that we can, that the odels are equivalent. It turns out that if we are given a weak learning algorith that satisfies this criterion, we can convert it into a strong learning algorith. We will talk about it in detail in the next couple of lectures. 7

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

arxiv: v2 [stat.ml] 23 Feb 2016

arxiv: v2 [stat.ml] 23 Feb 2016 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437 cs-ftl 010/9/8 0:40 page 431 #437 16 Independence 16.1 efinitions Suppose that we flip two fair coins siultaneously on opposite sides of a roo. Intuitively, the way one coin lands does not affect the way

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Lean Walsh Transform

Lean Walsh Transform Lean Walsh Transfor Edo Liberty 5th March 007 inforal intro We show an orthogonal atrix A of size d log 4 3 d (α = log 4 3) which is applicable in tie O(d). By applying a rando sign change atrix S to the

More information

Note-A-Rific: Mechanical

Note-A-Rific: Mechanical Note-A-Rific: Mechanical Kinetic You ve probably heard of inetic energy in previous courses using the following definition and forula Any object that is oving has inetic energy. E ½ v 2 E inetic energy

More information

1 A Lower Bound on Sample Complexity

1 A Lower Bound on Sample Complexity COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

PHY 171. Lecture 14. (February 16, 2012)

PHY 171. Lecture 14. (February 16, 2012) PHY 171 Lecture 14 (February 16, 212) In the last lecture, we looked at a quantitative connection between acroscopic and icroscopic quantities by deriving an expression for pressure based on the assuptions

More information

Characterizing the Sample Complexity of Private Learners

Characterizing the Sample Complexity of Private Learners Characterizing the Saple Coplexity of ivate Learners Aos Beiel ept of Coputer Science Ben-Gurion University beiel@csbguacil Kobbi Nissi ept of Coputer Science Ben-Gurion University & Harvard University

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

1 Definition of Rademacher Complexity

1 Definition of Rademacher Complexity COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

NB1140: Physics 1A - Classical mechanics and Thermodynamics Problem set 2 - Forces and energy Week 2: November 2016

NB1140: Physics 1A - Classical mechanics and Thermodynamics Problem set 2 - Forces and energy Week 2: November 2016 NB1140: Physics 1A - Classical echanics and Therodynaics Proble set 2 - Forces and energy Week 2: 21-25 Noveber 2016 Proble 1. Why force is transitted uniforly through a assless string, a assless spring,

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

One Dimensional Collisions

One Dimensional Collisions One Diensional Collisions These notes will discuss a few different cases of collisions in one diension, arying the relatie ass of the objects and considering particular cases of who s oing. Along the way,

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Dimensions and Units

Dimensions and Units Civil Engineering Hydraulics Mechanics of Fluids and Modeling Diensions and Units You already know how iportant using the correct diensions can be in the analysis of a proble in fluid echanics If you don

More information

The Wilson Model of Cortical Neurons Richard B. Wells

The Wilson Model of Cortical Neurons Richard B. Wells The Wilson Model of Cortical Neurons Richard B. Wells I. Refineents on the odgkin-uxley Model The years since odgkin s and uxley s pioneering work have produced a nuber of derivative odgkin-uxley-like

More information

N-Point. DFTs of Two Length-N Real Sequences

N-Point. DFTs of Two Length-N Real Sequences Coputation of the DFT of In ost practical applications, sequences of interest are real In such cases, the syetry properties of the DFT given in Table 5. can be exploited to ake the DFT coputations ore

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 17th February 2010 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion

More information

Math 262A Lecture Notes - Nechiporuk s Theorem

Math 262A Lecture Notes - Nechiporuk s Theorem Math 6A Lecture Notes - Nechiporuk s Theore Lecturer: Sa Buss Scribe: Stefan Schneider October, 013 Nechiporuk [1] gives a ethod to derive lower bounds on forula size over the full binary basis B The lower

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Classification: The PAC Learning Framework

Classification: The PAC Learning Framework Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

Principal Components Analysis

Principal Components Analysis Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations

More information

Kinematics and dynamics, a computational approach

Kinematics and dynamics, a computational approach Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 9th February 011 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion of

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

PhysicsAndMathsTutor.com

PhysicsAndMathsTutor.com . A raindrop falls vertically under gravity through a cloud. In a odel of the otion the raindrop is assued to be spherical at all ties and the cloud is assued to consist of stationary water particles.

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

OSCILLATIONS AND WAVES

OSCILLATIONS AND WAVES OSCILLATIONS AND WAVES OSCILLATION IS AN EXAMPLE OF PERIODIC MOTION No stories this tie, we are going to get straight to the topic. We say that an event is Periodic in nature when it repeats itself in

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Chapter 10: Sinusoidal Steady-State Analysis

Chapter 10: Sinusoidal Steady-State Analysis Chapter 0: Sinusoidal Steady-State Analysis Sinusoidal Sources If a circuit is driven by a sinusoidal source, after 5 tie constants, the circuit reaches a steady-state (reeber the RC lab with t = τ). Consequently,

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information