6.883: Online Methods in Machine Learning Alexander Rakhlin

Size: px
Start display at page:

Download "6.883: Online Methods in Machine Learning Alexander Rakhlin"

Transcription

1 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform two evaluatios of φ. I may situatios, a easier method is available oe that does ot require drawig the radom variables. I fact, most of the methods ecoutered i the olie learig literature ca be see as doig precisely this gettig rid of the radom variables for future rouds. Oe of the most famous scearios i olie learig is that of predictio with expert advice. There are several roughly equivalet formulatios, but oce you ve see oe, you ll be able to modify the proof for the other. Cosider the situatio where o each roud t =,...,, we observe advice of N experts. Suppose the advice comes i the form of a vector x t [, ] N, ad we thik of x t (i) as, say, the buy/sell advice by expert i. We treat x t as side iformatio for makig our ow decisio. After seeig the advice, we decide o a mixed strategy ŷ t (N) (a distributio over the N experts) ad make a predictio ŷ t, x t [, ] by mixig the opiios accordig to ŷ t. The outcome y t {±} is the revealed. Oce we have the mea ŷ t, x t for the mixed strategy, we may either draw the actual biary-valued predictio from this distributio, or we may simply thik of y t ŷ t, x t = ŷ t, y t x t as the expected idicator loss of our strategy (see the collaborative filterig example). What is differet here from previous lectures is that our decisio variable ŷ t is ot a real umber, but a distributio. The goal of the learer is to icur small average loss y t ŷ t, x t. () As it turs out, a simple algorithm allows the learer to keep this average loss ot much worse tha the loss of the best expert, without kowig who the best is util the ed. I particular, we prove that Lemma. There is a algorithm (i fact, several distict methods) that guaratees y t ŷ t, x t mi y t x t (j) + c j [N] for ay sequece (x, y ),..., (x, y ). As a example, this boud (with c = 8) is attaied by the expoetial weights algorithm ŷ t (j) η t e s= ys xs(j) (2) N (3) j= e η t s= ys xs(j)

2 with a step size η = 2. We will preset two very similar proofs. After provig the Lemma, we will re-do the proof i the slightly simpler trasductive settig through the les of Cover s statemet. It s istructive to look at both proofs ad see the few small differeces. Both proofs will utilize the followig iequalities. First is the soft-max boud. Choose a parameter η > 0 ad let A,..., A N be real umbers. We the have max A j = j [N] η max ηa j = j η log exp max ηa j = j η log max j exp {ηa j } N η log exp {ηa j }. There is oly oe iequality betwee the maximum over j ad the soft-max fuctio. Suppose all A j are equal. The the right-had-side is larger tha the left-had-side by a additive η factor (verify this!). As η icreases, the gap betwee the two sides vaishes. Same ca be argued for the case whe the values are ot equal. I fact, the last upper boud is a equality if we are allowed to choose η. The secod iequality we use is 2 (ex + e x ) e x2 /2 which you ca prove via Taylor expasios. The iequality implies j= Ee λɛ e λ2 /2 (4) for the Rademacher radom variable ɛ ad a cotat λ R. The same boud holds for ay zero-mea radom variable Z with values i [, ]: Ee λz e λ2 /2, (5) ad it is immediate that the boud becomes e b2 λ 2 /2 for [ b, b]-valued Z.. Proof of Lemma Thaks to the idetity a b = ab that holds for a {±} ad b [, ], we may rewrite the differece betwee the loss of the algorithm ad the bechmark i (2) as y t ŷ t, x t mi j [N] y t x t (j). (6) Let us omit the fractio ad brig it back at the very ed. Whe comparig to the proof i the ext sectio, just isert this fractio throughout. Cosider the last step t =. I the first sum, all the terms except the last oe are fixed, ad so we eed to solve mi ŷ (N) max { y ŷ, x + Rel(x, y )} (7) y {±} 2

3 with Rel(x, y ) mi y t x t (j). (8) j [N] Here (N) is the probability simplex o N experts. We could choose Rel to be equal to the right-had side i (8). However, for computatioal purposes, we slightly modify this fuctio. Istead of max we shall work with soft-max. That is, take Rel(x, y ) = N η log exp {η y t x t (j)} for some η, to be determied. The algorithm (3), i the form (6), ca be writte as while the loss o roud t is (trivially) j= ŷ t (j) e η t s= ysxs(j) (9) y t ŷ t, x t = E j ŷt [y t x t (j)] = η log exp { ηe j ŷ t [y t x t (j)]}. The key observatio ow is that N j= exp {η y t x t (j)} = N j= because E j ŷ [A(j)] = N j= ŷ(j)a(j) with exp {ηy x (j)} exp {η = E j ŷ [exp {ηy x (j)}] ŷ (j) = N j= y t x t (j)} (0) exp {η y t x t (j)} () exp {η y t x t (j)} N j= exp {η y t x t (j)}. (2) Puttig everythig together, (7) is upper bouded by the particular choice of the ifimum strategy ŷ t as (7) max y {±} { ŷ, y x + Rel(x, y )} (3) = max y {±} { η log exp { ηe j ŷ [y x (j)]} + η log E j ŷ [exp {ηy x (j)}]} (4) + N η log j= exp {η We ow focus o the two terms i (4): y t x t (j)} (5) η log exp { ηe j ŷ [y x (j)]} + η log E j ŷ [exp {ηy x (j)}] (6) = η log E j ŷ [exp {η(y x (j) E j ŷ [y x (j)])}] (7) (2η) 2 η 2 = 2η (8) 3

4 by (5). Note that the rage of the zero-mea radom variable is [ 2, 2], ad so a additioal factor of 2 2 appears from the applicatio of (5). Observe that this last step of peelig off the zero-mea term makes the expressio idepedet of y ad x! I particular, it does ot matter whether the sequece of x s is geerated i.i.d. or i a arbitrary maer. Ulike the Cover s approach of solvig the max over the two alteratives, we preseted a particular ŷ t that allows (through a upper boud) to make the choice y irrelevat. While the two approaches give slightly differet algorithms, the upper bouds they ejoy are the same. Now, we simply defie ad Of course, Rel(x, y ) = N η log j= exp {η y t x t (j)} + 2η (9) Rel(x t, y t ) = N t η log exp {η y s x s (j)} + 2( t)η. (20) j= s= Rel( ) = η + 2η 2 2. (2) by choosig η = the lemma is Now, sice we iitially divided throughout by /, the boud of.2 (slightly easier) trasductive settig through the les of Cover s algorithm Cosider the simplified settig where expert advices x,..., x {±} N are fixed ad kow a priori, ad let F = {x x j j [N]} be the set of N fuctios that simply output a coordiate of x. As discussed earlier, F iduces a subset F {±} of fiite cardiality N, ad F = {y y = (x (j),..., x (j)), j [N]} φ(y) = d H (y, F ) + C = mi {f t y t } + C (22) for some appropriate C which we will defie later. I this sectio, we will directly solve for the real-valued predictio q t [, ], which ca be viewed as the mea of the mixed strategy for predictig ŷ t {±}. This is slightly differet from what was described earlier, where the predictio is calculated by mixig the advice as ŷ t, x t. We will solve the latter directly i the ext sectio. Recall that the choice of relaxatio Rel defies the algorithm. I this sectio we give the derivatio usig the very basic techique that goes back to Lecture. The algorithm that arises from Cover s lemma is ot expoetial weights, but it gives the same guaratee o performace as the expoetial weights method. 4

5 Let us take Rel(y ) as ay upper boud o the bechmark term mi We will use the soft-max upper boud (verify that it holds): {f t y t } = max f, y. (23) Rel(y ) η log exp{η f, y } (24) Check that this fuctio does ot chage by more tha / whe flippig oe bit. Now, as before, mi max {E [ q y {ŷ y }] + Rel(y )} = E ɛ Rel(y, ɛ ) + 2 By Jese s iequality (E log log E), (25) E ɛ Rel(y, ɛ ) η log E ɛ exp{η f, ỹ } (26) with ỹ = (y, ɛ ). The oly radomess i the above expressio is the ɛ o the last coordiate of ỹ. Let us abuse the otatio ad write f, ỹ = f, y +ɛ f. Our aim is to get rid of ɛ. If we succeed, we do ot have to draw the radom coi flips for radom playout at the itermediate steps. By (4), E ɛ exp {η f, ỹ } exp {η f, y } exp{η 2 /2} (27) ad, therefore, (26) is upper bouded by 2 η log exp {η f, y } + η 4 2 (28) I view of (25), we ca ow defie, Rel(y ) = 2η log exp {η f, y } + η 4 2. (29) That s it! There is o ɛ i the relaxatio at time. We peeled it off. Oe ca check that at the itermediate step t, Rel(y t ) = 2η log ( t)η exp {η f t, y t } + t 4 2 (30) with f t, y t beig defied as t s= f s y s. We also see that Rel( ) = 2η log exp {0} + η 4 = 2η + η 4 = 2 (3) 2 by choosig η =. This is a o-algorithmic derivatio, ad the algorithm is give i Lecture. We leave it as a homework exercise to write it explicitly. (hit: it does ot become expoetial weights). We also ote that the differece i the costat c comes from scalig of the idicator loss vs the absolute value loss. 5

6 .3 Discussio The two proofs are essetially the same. Both start by relaxig the max to a soft-max, ad takig this as Rel. The, the secod approach explicitly solves for the optimal real-valued predictio (the mea of the mixed strategy), while the first approach guesses a (potetially suboptimal) strategy of expoetial weights. Oce the strategy for the ifimum is plugged i, oe obtais a expressio with a zero-mea radom variable. This zero-mea variable is elimiated usig a probabilistic iequality (Eq. (7) ad (27), respectively). To reiterate, the saliet features of the proofs are: () passig to a relaxatio for Rel, (2) solvig for the best strategy or guessig a ear-best strategy, ad (3) usig probabilistic iequalities to remove the radom variable that arises from pluggig i the strategy. These steps ca be take as a rough prescriptio for the developmet of olie methods. We will illustrate the steps agai i the subsequet lectures. The ext ote is o the ature of the sequece x,..., x. Essetially, both approaches make it irrelevat how the x t s are geerated. That is ot to say that the method does ot take the side iformatio ito accout (of course it does through the losses of experts). Rather, the poit is that we ca successfully deal with adversarially geerated x s, a stregth of the experts approach. Aother stregth of the experts boud is its mild (logarithmic) depedece o the umber of experts. Oe may take a large umber of experts ad still have a average error beig o() from the average error of the best expert. Fially, we remark that the experts approach ca be see as a uio boud or a aggregatio procedure. Suppose oe has N algorithms makig predictios. The oe ca predict as well as ay of these algorithms by payig O ( ). Such a black box techique is very useful (see the versio of liearized experts below for the geeral black box statemet). For istace, suppose oe does ot kow how to choose a parameter θ [0, ] of the algorithm optimally. Oe ca the ru (at least i priciple) N = /ɛ algorithms correspodig to a ɛ-discretizatio of the parameter. If the output is i some sese Lipschitz with respect to the parameter choice, o ca claim that the resultig aggregatig procedure does as well as the best choice, plus a ɛ-precisio term, plus a O ( log(/ɛ) ) pealty..4 Liearized Experts Recall that i the experts settig itroduced i the begiig of the lecture, we observe predictios x t [, ] N of the experts, choose a distributio ŷ t (N) for the weighted vote, ad the observe the outcome y t {±}. Observe that the expoetial weights algorithm at time t does ot use x t to calculate the distributio over experts. Hece, we may thik of a settig where we choose a distributio ŷ t (N) ad the observe both predictios x t ad the true outcome y t. Rather tha mixig the advice of the experts to produce our ow, we may istead choose the expert at radom from the distributio ŷ t ad go with her advice. The the expected cost for the period t is ŷ t, z t where z t [, ] N is the vector of losses for each expert: z t (j) = y t x t (j). I fact, the loss fuctio does ot matter aymore, ad it does ot matter that data comes i the form (x t, y t ) pairs. Istead, we may just thik of each expert icurrig some cost, we are 6

7 choosig a expert at radom ad icur the same cost as that expert. I expectatio, we pay ŷ t, z t. Let us state the protocol explicitly: For t =,..., Predict ŷ t (N) Observe costs z t [, ] N Alteratively, we may choose the radom expert j accordig to ŷ t ad pay z t (j). The goal here is to have small expected cost, relative to the cost of the best expert: ŷ t, z t mi j [N] e j, z t + c (32) for ay sequece z,..., z of costs. The cost z t may be chose eve with the kowledge of our decisio ŷ t. Let us quickly prove that the expoetial weights algorithm t ŷ t (j) exp { η z s (j)} achieves the above guaratee. We write the last step of the problem (removig the / ormalizatio term) as with mi ŷ (N) mi e j, z t = max j [N] j [N] s= max { ŷ, z + Rel(z )} (33) z [,] N z t (j) N η log exp { η z t (j)} Rel(z ). (34) The rest of the proof of (32) is essetially idetical to that of the proof of Lemma. j= Refereces 7

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate

More information

Math 104: Homework 2 solutions

Math 104: Homework 2 solutions Math 04: Homework solutios. A (0, ): Sice this is a ope iterval, the miimum is udefied, ad sice the set is ot bouded above, the maximum is also udefied. if A 0 ad sup A. B { m + : m, N}: This set does

More information

HOMEWORK 2 SOLUTIONS

HOMEWORK 2 SOLUTIONS HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15 CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Posted-Price, Sealed-Bid Auctions

Posted-Price, Sealed-Bid Auctions Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

CS 330 Discussion - Probability

CS 330 Discussion - Probability CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

2.4 Sequences, Sequences of Sets

2.4 Sequences, Sequences of Sets 72 CHAPTER 2. IMPORTANT PROPERTIES OF R 2.4 Sequeces, Sequeces of Sets 2.4.1 Sequeces Defiitio 2.4.1 (sequece Let S R. 1. A sequece i S is a fuctio f : K S where K = { N : 0 for some 0 N}. 2. For each

More information

Math 216A Notes, Week 5

Math 216A Notes, Week 5 Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Shannon s noiseless coding theorem

Shannon s noiseless coding theorem 18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

7 Sequences of real numbers

7 Sequences of real numbers 40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

The Riemann Zeta Function

The Riemann Zeta Function Physics 6A Witer 6 The Riema Zeta Fuctio I this ote, I will sketch some of the mai properties of the Riema zeta fuctio, ζ(x). For x >, we defie ζ(x) =, x >. () x = For x, this sum diverges. However, we

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Math 2784 (or 2794W) University of Connecticut

Math 2784 (or 2794W) University of Connecticut ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m? MATH 529 The Boudary Problem The drukard s walk (or boudary problem) is oe of the most famous problems i the theory of radom walks. Oe versio of the problem is described as follows: Suppose a particle

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutions CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Random Models. Tusheng Zhang. February 14, 2013

Random Models. Tusheng Zhang. February 14, 2013 Radom Models Tusheg Zhag February 14, 013 1 Radom Walks Let me describe the model. Radom walks are used to describe the motio of a movig particle (object). Suppose that a particle (object) moves alog the

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

Glivenko-Cantelli Classes

Glivenko-Cantelli Classes CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce

More information

Sequences I. Chapter Introduction

Sequences I. Chapter Introduction Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Math 234 Test 1, Tuesday 27 September 2005, 4 pages, 30 points, 75 minutes.

Math 234 Test 1, Tuesday 27 September 2005, 4 pages, 30 points, 75 minutes. Math 34 Test 1, Tuesday 7 September 5, 4 pages, 3 poits, 75 miutes. The high score was 9 poits out of 3, achieved by two studets. The class average is 3.5 poits out of 3, or 77.5%, which ordiarily would

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 0. Preamble This course will focus o olie methods i machie learig. Roughly speakig, olie methods are those that process oe datum at a time. This

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15 CS 70 Discrete Mathematics ad Probability Theory Sprig 2012 Alistair Siclair Note 15 Some Importat Distributios The first importat distributio we leared about i the last Lecture Note is the biomial distributio

More information