6.883: Online Methods in Machine Learning Alexander Rakhlin
|
|
- Jemima Garrett
- 6 years ago
- Views:
Transcription
1 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform two evaluatios of φ. I may situatios, a easier method is available oe that does ot require drawig the radom variables. I fact, most of the methods ecoutered i the olie learig literature ca be see as doig precisely this gettig rid of the radom variables for future rouds. Oe of the most famous scearios i olie learig is that of predictio with expert advice. There are several roughly equivalet formulatios, but oce you ve see oe, you ll be able to modify the proof for the other. Cosider the situatio where o each roud t =,...,, we observe advice of N experts. Suppose the advice comes i the form of a vector x t [, ] N, ad we thik of x t (i) as, say, the buy/sell advice by expert i. We treat x t as side iformatio for makig our ow decisio. After seeig the advice, we decide o a mixed strategy ŷ t (N) (a distributio over the N experts) ad make a predictio ŷ t, x t [, ] by mixig the opiios accordig to ŷ t. The outcome y t {±} is the revealed. Oce we have the mea ŷ t, x t for the mixed strategy, we may either draw the actual biary-valued predictio from this distributio, or we may simply thik of y t ŷ t, x t = ŷ t, y t x t as the expected idicator loss of our strategy (see the collaborative filterig example). What is differet here from previous lectures is that our decisio variable ŷ t is ot a real umber, but a distributio. The goal of the learer is to icur small average loss y t ŷ t, x t. () As it turs out, a simple algorithm allows the learer to keep this average loss ot much worse tha the loss of the best expert, without kowig who the best is util the ed. I particular, we prove that Lemma. There is a algorithm (i fact, several distict methods) that guaratees y t ŷ t, x t mi y t x t (j) + c j [N] for ay sequece (x, y ),..., (x, y ). As a example, this boud (with c = 8) is attaied by the expoetial weights algorithm ŷ t (j) η t e s= ys xs(j) (2) N (3) j= e η t s= ys xs(j)
2 with a step size η = 2. We will preset two very similar proofs. After provig the Lemma, we will re-do the proof i the slightly simpler trasductive settig through the les of Cover s statemet. It s istructive to look at both proofs ad see the few small differeces. Both proofs will utilize the followig iequalities. First is the soft-max boud. Choose a parameter η > 0 ad let A,..., A N be real umbers. We the have max A j = j [N] η max ηa j = j η log exp max ηa j = j η log max j exp {ηa j } N η log exp {ηa j }. There is oly oe iequality betwee the maximum over j ad the soft-max fuctio. Suppose all A j are equal. The the right-had-side is larger tha the left-had-side by a additive η factor (verify this!). As η icreases, the gap betwee the two sides vaishes. Same ca be argued for the case whe the values are ot equal. I fact, the last upper boud is a equality if we are allowed to choose η. The secod iequality we use is 2 (ex + e x ) e x2 /2 which you ca prove via Taylor expasios. The iequality implies j= Ee λɛ e λ2 /2 (4) for the Rademacher radom variable ɛ ad a cotat λ R. The same boud holds for ay zero-mea radom variable Z with values i [, ]: Ee λz e λ2 /2, (5) ad it is immediate that the boud becomes e b2 λ 2 /2 for [ b, b]-valued Z.. Proof of Lemma Thaks to the idetity a b = ab that holds for a {±} ad b [, ], we may rewrite the differece betwee the loss of the algorithm ad the bechmark i (2) as y t ŷ t, x t mi j [N] y t x t (j). (6) Let us omit the fractio ad brig it back at the very ed. Whe comparig to the proof i the ext sectio, just isert this fractio throughout. Cosider the last step t =. I the first sum, all the terms except the last oe are fixed, ad so we eed to solve mi ŷ (N) max { y ŷ, x + Rel(x, y )} (7) y {±} 2
3 with Rel(x, y ) mi y t x t (j). (8) j [N] Here (N) is the probability simplex o N experts. We could choose Rel to be equal to the right-had side i (8). However, for computatioal purposes, we slightly modify this fuctio. Istead of max we shall work with soft-max. That is, take Rel(x, y ) = N η log exp {η y t x t (j)} for some η, to be determied. The algorithm (3), i the form (6), ca be writte as while the loss o roud t is (trivially) j= ŷ t (j) e η t s= ysxs(j) (9) y t ŷ t, x t = E j ŷt [y t x t (j)] = η log exp { ηe j ŷ t [y t x t (j)]}. The key observatio ow is that N j= exp {η y t x t (j)} = N j= because E j ŷ [A(j)] = N j= ŷ(j)a(j) with exp {ηy x (j)} exp {η = E j ŷ [exp {ηy x (j)}] ŷ (j) = N j= y t x t (j)} (0) exp {η y t x t (j)} () exp {η y t x t (j)} N j= exp {η y t x t (j)}. (2) Puttig everythig together, (7) is upper bouded by the particular choice of the ifimum strategy ŷ t as (7) max y {±} { ŷ, y x + Rel(x, y )} (3) = max y {±} { η log exp { ηe j ŷ [y x (j)]} + η log E j ŷ [exp {ηy x (j)}]} (4) + N η log j= exp {η We ow focus o the two terms i (4): y t x t (j)} (5) η log exp { ηe j ŷ [y x (j)]} + η log E j ŷ [exp {ηy x (j)}] (6) = η log E j ŷ [exp {η(y x (j) E j ŷ [y x (j)])}] (7) (2η) 2 η 2 = 2η (8) 3
4 by (5). Note that the rage of the zero-mea radom variable is [ 2, 2], ad so a additioal factor of 2 2 appears from the applicatio of (5). Observe that this last step of peelig off the zero-mea term makes the expressio idepedet of y ad x! I particular, it does ot matter whether the sequece of x s is geerated i.i.d. or i a arbitrary maer. Ulike the Cover s approach of solvig the max over the two alteratives, we preseted a particular ŷ t that allows (through a upper boud) to make the choice y irrelevat. While the two approaches give slightly differet algorithms, the upper bouds they ejoy are the same. Now, we simply defie ad Of course, Rel(x, y ) = N η log j= exp {η y t x t (j)} + 2η (9) Rel(x t, y t ) = N t η log exp {η y s x s (j)} + 2( t)η. (20) j= s= Rel( ) = η + 2η 2 2. (2) by choosig η = the lemma is Now, sice we iitially divided throughout by /, the boud of.2 (slightly easier) trasductive settig through the les of Cover s algorithm Cosider the simplified settig where expert advices x,..., x {±} N are fixed ad kow a priori, ad let F = {x x j j [N]} be the set of N fuctios that simply output a coordiate of x. As discussed earlier, F iduces a subset F {±} of fiite cardiality N, ad F = {y y = (x (j),..., x (j)), j [N]} φ(y) = d H (y, F ) + C = mi {f t y t } + C (22) for some appropriate C which we will defie later. I this sectio, we will directly solve for the real-valued predictio q t [, ], which ca be viewed as the mea of the mixed strategy for predictig ŷ t {±}. This is slightly differet from what was described earlier, where the predictio is calculated by mixig the advice as ŷ t, x t. We will solve the latter directly i the ext sectio. Recall that the choice of relaxatio Rel defies the algorithm. I this sectio we give the derivatio usig the very basic techique that goes back to Lecture. The algorithm that arises from Cover s lemma is ot expoetial weights, but it gives the same guaratee o performace as the expoetial weights method. 4
5 Let us take Rel(y ) as ay upper boud o the bechmark term mi We will use the soft-max upper boud (verify that it holds): {f t y t } = max f, y. (23) Rel(y ) η log exp{η f, y } (24) Check that this fuctio does ot chage by more tha / whe flippig oe bit. Now, as before, mi max {E [ q y {ŷ y }] + Rel(y )} = E ɛ Rel(y, ɛ ) + 2 By Jese s iequality (E log log E), (25) E ɛ Rel(y, ɛ ) η log E ɛ exp{η f, ỹ } (26) with ỹ = (y, ɛ ). The oly radomess i the above expressio is the ɛ o the last coordiate of ỹ. Let us abuse the otatio ad write f, ỹ = f, y +ɛ f. Our aim is to get rid of ɛ. If we succeed, we do ot have to draw the radom coi flips for radom playout at the itermediate steps. By (4), E ɛ exp {η f, ỹ } exp {η f, y } exp{η 2 /2} (27) ad, therefore, (26) is upper bouded by 2 η log exp {η f, y } + η 4 2 (28) I view of (25), we ca ow defie, Rel(y ) = 2η log exp {η f, y } + η 4 2. (29) That s it! There is o ɛ i the relaxatio at time. We peeled it off. Oe ca check that at the itermediate step t, Rel(y t ) = 2η log ( t)η exp {η f t, y t } + t 4 2 (30) with f t, y t beig defied as t s= f s y s. We also see that Rel( ) = 2η log exp {0} + η 4 = 2η + η 4 = 2 (3) 2 by choosig η =. This is a o-algorithmic derivatio, ad the algorithm is give i Lecture. We leave it as a homework exercise to write it explicitly. (hit: it does ot become expoetial weights). We also ote that the differece i the costat c comes from scalig of the idicator loss vs the absolute value loss. 5
6 .3 Discussio The two proofs are essetially the same. Both start by relaxig the max to a soft-max, ad takig this as Rel. The, the secod approach explicitly solves for the optimal real-valued predictio (the mea of the mixed strategy), while the first approach guesses a (potetially suboptimal) strategy of expoetial weights. Oce the strategy for the ifimum is plugged i, oe obtais a expressio with a zero-mea radom variable. This zero-mea variable is elimiated usig a probabilistic iequality (Eq. (7) ad (27), respectively). To reiterate, the saliet features of the proofs are: () passig to a relaxatio for Rel, (2) solvig for the best strategy or guessig a ear-best strategy, ad (3) usig probabilistic iequalities to remove the radom variable that arises from pluggig i the strategy. These steps ca be take as a rough prescriptio for the developmet of olie methods. We will illustrate the steps agai i the subsequet lectures. The ext ote is o the ature of the sequece x,..., x. Essetially, both approaches make it irrelevat how the x t s are geerated. That is ot to say that the method does ot take the side iformatio ito accout (of course it does through the losses of experts). Rather, the poit is that we ca successfully deal with adversarially geerated x s, a stregth of the experts approach. Aother stregth of the experts boud is its mild (logarithmic) depedece o the umber of experts. Oe may take a large umber of experts ad still have a average error beig o() from the average error of the best expert. Fially, we remark that the experts approach ca be see as a uio boud or a aggregatio procedure. Suppose oe has N algorithms makig predictios. The oe ca predict as well as ay of these algorithms by payig O ( ). Such a black box techique is very useful (see the versio of liearized experts below for the geeral black box statemet). For istace, suppose oe does ot kow how to choose a parameter θ [0, ] of the algorithm optimally. Oe ca the ru (at least i priciple) N = /ɛ algorithms correspodig to a ɛ-discretizatio of the parameter. If the output is i some sese Lipschitz with respect to the parameter choice, o ca claim that the resultig aggregatig procedure does as well as the best choice, plus a ɛ-precisio term, plus a O ( log(/ɛ) ) pealty..4 Liearized Experts Recall that i the experts settig itroduced i the begiig of the lecture, we observe predictios x t [, ] N of the experts, choose a distributio ŷ t (N) for the weighted vote, ad the observe the outcome y t {±}. Observe that the expoetial weights algorithm at time t does ot use x t to calculate the distributio over experts. Hece, we may thik of a settig where we choose a distributio ŷ t (N) ad the observe both predictios x t ad the true outcome y t. Rather tha mixig the advice of the experts to produce our ow, we may istead choose the expert at radom from the distributio ŷ t ad go with her advice. The the expected cost for the period t is ŷ t, z t where z t [, ] N is the vector of losses for each expert: z t (j) = y t x t (j). I fact, the loss fuctio does ot matter aymore, ad it does ot matter that data comes i the form (x t, y t ) pairs. Istead, we may just thik of each expert icurrig some cost, we are 6
7 choosig a expert at radom ad icur the same cost as that expert. I expectatio, we pay ŷ t, z t. Let us state the protocol explicitly: For t =,..., Predict ŷ t (N) Observe costs z t [, ] N Alteratively, we may choose the radom expert j accordig to ŷ t ad pay z t (j). The goal here is to have small expected cost, relative to the cost of the best expert: ŷ t, z t mi j [N] e j, z t + c (32) for ay sequece z,..., z of costs. The cost z t may be chose eve with the kowledge of our decisio ŷ t. Let us quickly prove that the expoetial weights algorithm t ŷ t (j) exp { η z s (j)} achieves the above guaratee. We write the last step of the problem (removig the / ormalizatio term) as with mi ŷ (N) mi e j, z t = max j [N] j [N] s= max { ŷ, z + Rel(z )} (33) z [,] N z t (j) N η log exp { η z t (j)} Rel(z ). (34) The rest of the proof of (32) is essetially idetical to that of the proof of Lemma. j= Refereces 7
Rademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More informationMath 104: Homework 2 solutions
Math 04: Homework solutios. A (0, ): Sice this is a ope iterval, the miimum is udefied, ad sice the set is ot bouded above, the maximum is also udefied. if A 0 ad sup A. B { m + : m, N}: This set does
More informationHOMEWORK 2 SOLUTIONS
HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 15
CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More informationTHE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS
THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationPosted-Price, Sealed-Bid Auctions
Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationCS 330 Discussion - Probability
CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More information2.4 Sequences, Sequences of Sets
72 CHAPTER 2. IMPORTANT PROPERTIES OF R 2.4 Sequeces, Sequeces of Sets 2.4.1 Sequeces Defiitio 2.4.1 (sequece Let S R. 1. A sequece i S is a fuctio f : K S where K = { N : 0 for some 0 N}. 2. For each
More informationMath 216A Notes, Week 5
Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationShannon s noiseless coding theorem
18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More information7 Sequences of real numbers
40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationThe Riemann Zeta Function
Physics 6A Witer 6 The Riema Zeta Fuctio I this ote, I will sketch some of the mai properties of the Riema zeta fuctio, ζ(x). For x >, we defie ζ(x) =, x >. () x = For x, this sum diverges. However, we
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationMath 2784 (or 2794W) University of Connecticut
ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationDiscrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationIf a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?
2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a
More information6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.
6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio
More information(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?
MATH 529 The Boudary Problem The drukard s walk (or boudary problem) is oe of the most famous problems i the theory of radom walks. Oe versio of the problem is described as follows: Suppose a particle
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationCS / MCS 401 Homework 3 grader solutions
CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationRandom Models. Tusheng Zhang. February 14, 2013
Radom Models Tusheg Zhag February 14, 013 1 Radom Walks Let me describe the model. Radom walks are used to describe the motio of a movig particle (object). Suppose that a particle (object) moves alog the
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationSequences I. Chapter Introduction
Chapter 2 Sequeces I 2. Itroductio A sequece is a list of umbers i a defiite order so that we kow which umber is i the first place, which umber is i the secod place ad, for ay atural umber, we kow which
More informationLecture 2 February 8, 2016
MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe
More informationDiscrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions
CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the
More informationWHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT
WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still
More informationMath 234 Test 1, Tuesday 27 September 2005, 4 pages, 30 points, 75 minutes.
Math 34 Test 1, Tuesday 7 September 5, 4 pages, 3 poits, 75 miutes. The high score was 9 poits out of 3, achieved by two studets. The class average is 3.5 poits out of 3, or 77.5%, which ordiarily would
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 0. Preamble This course will focus o olie methods i machie learig. Roughly speakig, olie methods are those that process oe datum at a time. This
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More informationECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002
ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom
More informationDiscrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15
CS 70 Discrete Mathematics ad Probability Theory Sprig 2012 Alistair Siclair Note 15 Some Importat Distributios The first importat distributio we leared about i the last Lecture Note is the biomial distributio
More information