6.883: Online Methods in Machine Learning Alexander Rakhlin
|
|
- Tobias Williamson
- 5 years ago
- Views:
Transcription
1 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever we have a olie liear optimizatio guaratee ŷ t, z t mi v, z t Reg () that holds for all sequeces z,..., z K, the for ay covex fuctio f K R with gradiets i K { f(v) v K} K, it holds that for f (ȳ) mi f(v) Reg (2) ȳ = I other words, by ruig a olie liear optimizatio method ad averagig the trajectory (called Polyak averagig), we obtai a poit i K with suboptimality bouded by ormalized regret Reg. For istace, if K = B 2 ad gradiets of f are also bouded by i Euclidea orm, methods developed earlier give Reg =, ad thus the average of the trajectory ȳ satisfies f(ȳ) mi f(v). Recall that this is a boud we already derived i the begiig of the course. Let s prove the claim (2). By covexity, for ay v K, f (ȳ) f(v) ŷ t. f (ŷ t ) f(v) ŷ t v, f (ŷ t ). (3) The (2) follows from () by takig z t = f (ŷ t ). This step is valid because z t ca be chose based o ŷ t (as per the olie liear optimizatio protocol). We remark that (2) holds with Reg = Reg (z,..., z ). Ideed, adaptive datadepedet bouds for optimizatio have bee derived via this approach.
2 Fially, observe that a versio of (3) f t (ŷ t ) f t (v) ŷ t v, f t (ŷ t ) (4) holds with time-varyig covex fuctios f t. Hece, a regret guaratee for olie liear optimizatio traslates ito a regret guaratee for this olie covex optimizatio sceario..2 Zero-sum games Let M be a d d 2 real matrix with bouded etries. We thik of M i,j (resp., M i,j ) as the cost for Player I (resp., Player II) whe the two players choose their moves as i ad j. As discussed a few lectures ago, the miimax value is mi max q d j [d 2 ] qt Me j = max mi p d2 i [d ] et i Mp = mi max q T Mp. (5) q d p d2 It turs out, both players ca fid a ear-miimax-optimal strategy simply by employig a olie liear optimizatio method with a o-trivial regret guaratee. More precisely, we assume that o roud t, Player I picks a probability distributio q t d o rows, while Player II picks a probability distributio p t d2 o colums of M. The first player the receives Mp t (or a radom draw Me Jt, J t p t ), while the secod player receives qt T M (or the correspodig radom draw). Let q = q t ad p = p t be the averages of the trajectories. Sice each player employs a o-regret strategy, ad qt T Mp t mi q d ( q T t M)p t mi p d2 q T Mp t Reg() (6) ( qt T M)p Reg(2). (7) We put a miuses i the secod expressio because the secod player ca preted she is the miimizig player i the game with losses give accordig to M. Addig the two expressios ad usig the defiitio of q ad p, However, we kow that max q T Mp mi q T M p p d2 q d (Reg() + Reg (2) ) (8) mi q T M p max mi q T Mp = mi max q T Mp max q T Mp. (9) q d p d2 q d q d p d2 p d2 I view of (8) ad (9), the pair ( q, p) attais a value withi (Reg() + Reg (2) ) from miimax. I other words, the average of the trajectory of both players coverges (i the sese of the miimax value) to a set of miimax optima (Nash equilibria). This techique exteds to multi-player games, whereby each decisio-maker rus a o-regret algorithm agaist the (complicated) set of moves of the other players. 2
3 .2. What if Player II best-respods As a alterative (which will be used i the ext sectio) suppose Player I employs a regret-miimizatio strategy, while Player II best respods to the observed q T t M: p t = argmi j [d 2 ] q T t Me j. This behavior also leads to the miimax strategy sice max q T Mp p mi q d max q T p t Mp = q T Mp t + Reg() q T t Mp t (0) = mi q d q T M p + Reg() () I view of (9), the pair of strategies ( q, p) is withi Reg() from miimax optimal. I particular, ote that d 2 does ot affect the rate of covergece this will be importat i the ext sectio..3 Applicatio: Boostig We briefly describe a variat of Boostig, a procedure that aggregates weak classifiers ito a strog oe. This is a batch procedure, ad the olie aspect will appear as a iterative improvemet of the aggregate. More precisely, let (x, y ),..., (x m, y m ) X {±} be a dataset, ad F a class of [, ]- valued fuctios o X. Each fuctio f F iduces a biary predictio sig(f(x t )) at x t. For the give pair (x t, y t ), a mistake occurs if y t f(x t ) < 0. Suppose the class F has fiite cardiality k (though this ca be lifted at the ed sice the fial boud does ot deped o this value) ad eumerate F = {f,..., f k }. Defie a matrix M [, ] m k with M i,j = y i f j (x i ). Cosider a miimax value q T Mp, where q is a distributio o examples ad p is a distributio o fuctios. We have mi max q m j [k] qt Me j = max mi p k i [m] et i Mp γ. (2) Suppose γ > 0. The two sides of (2) have iterestig ad distict iterpretatios. The left-had side says: o matter what distributio q o the dataset we choose, there is some fuctio f j such that E i q y i f j (x i ) γ. If you ve ecoutered the literature o boostig (search for it i the iteret if you have t), the above requiremet is kow as the weak learability coditio [FS95, SF2]. To be able to boost weak learers, oe must be able to fid, for ay discrete distributio o the m datapoits, a hypothesis that beats radom guess. The right-had side of (2) says: there exists a distributio p over the k fuctios, such that o ay data poit (x i, y i ), oe has y i f(xi ) γ, (3) 3
4 for f = k i= p if i. This ca be recogized as a separability assumptio with margi γ (see Lecture 2). Thus, separability with a margi is equivalet to the weak learability coditio. It is o surprise that weak learability meas oe ca create a mixture of fuctios that attais zero mistakes. Equivalece aside, the goal of boostig is to furish a distributio p that early achieves (3). Accordig to the previous subsectio ad, i particular, Eq. (), oe may achieve this by ruig a expoetial weightig (or some other regret miimizatio procedure) over the m choices for Player I, ad best respose for Player II. Best respose exactly correspods to the usual weak learability assumptio whereby a choice f j (a pure strategy) is retured to Player I for the give q t, a key step i ay boostig procedure. It remais to check the umber of iteratios required. If margi is γ, we ca aim for achievig a margi of γ/2. The equatig the Expoetial Weights regret boud to target accuracy γ/2, we fid Reg() = C log m = O ( log m). (4) γ2 The typical aalysis of AdaBoost claims expoetially decayig error, which is expressed as m m i= {y i f(xi ) 0} exp{ cγ 2 } (5) after iteratios. The settig of (4) esures that the upper boud is less tha /m, ad hece o errors are made (covice yourself of this fact!). That is precisely what we foud by esurig a margi of γ/2. Observe that k = F ever eters the picture sice we are usig best respose strategy for Player II..4 From Olie to Statistical Learig (idividual sequece to i.i.d.) Cosider a abstract sceario where the learer chooses ŷ t, observes z t, ad icurs a cost l(ŷ t, z t ). We assume l is covex i the first argumet ad that we are able to prove l(ŷ t, z t ) mi l(v, z t ) + Reg. (6) for ay sequece z,..., z. We are ot specifyig the sets where the decisios ad outcomes take values, as the techique we are about to describe is quite geeral. Now, suppose the data are actually i.i.d. with some distributio P. The E [ l(ŷ t, Z t )] E [mi sice the boud holds for each realizatio (Z,..., Z ). Let ȳ = 4 ŷ t. l(v, Z t )] + Reg (7)
5 Observe that El(ȳ, Z) E [ l(ŷ t, Z)] (8) where the expectatio is over both Z ad Z,..., Z. The expressio i (8) ca be writte as which is the same as O the other had, E [ E [mi E [ E [l(ŷ t (Z t ), Z) Z t ]] (9) E [l(ŷ t (Z t ), Z t ) Z t ]] = E [ Puttig everythig together, l(v, Z t )] mi El(v, Z t ) = mi El(v, Z) l(ŷ t, Z t )]. (20) El(ȳ, Z) mi El(v, Z) + Reg (2) Such a boud is a goal of Statistical Learig Theory. Moreover, if l is square loss, the above boud also traslates ito a exact oracle iequality, a subject of iterest i Statistics. We see that a regret guaratee gives rise to a statistical guaratee for the average of the trajectory if we assume that data are i.i.d. I particular, cosider the supervised learig sceario, where at each step we observe side iformatio x t, predict a real value ŷ t, ad observe y t. We ca thik of ŷ t as a fuctio of x t, sice for differet x t values we would have made a differet predictio. I other words, a equivalet formulatio would be: o roud t, predict a fuctio ŷ t X R ad observe (x t, y t ). Applyig the earlier argumet, if (X, Y ),..., (X, Y ) are i.i.d., the fuctio has the guaratee ȳ( ) = ŷ t ( ) El(ȳ(X), Y ) mi f F El(f(X), Y ) + Reg (22) wheever l(ȳ(x), Y ) is covex i ȳ( ). While further developmet of this techique is beyod the scope of this course, let us metio that uder certai assumptios, this way of costructig a estimator ȳ( ) is optimal for a wide rage of problems ad loss fuctios, both i Noparametric Statistics ad Statistical Learig Theory. (ask me if you d like to kow more) For those who kow the aalysis of ERM (empirical risk miimizatio) i Statistical Learig may ask whether we have miraculously circumveted the issue of uiform covergece of meas to expectatios, a ecessary step i the aalysis of ERM. The aswer is that we have t it is hidde i the regret boud Reg = Reg (F). More precisely, it ca be show that this term is cotrolled by a martigale versio of uiform covergece [RST5]. I may cases of iterest, this covergece is equivalet to i.i.d. covergece. Hece, by passig through olie methods, we might be gaiig i terms of algorithmic uderstadig, ad (uder certai coditios) ot losig i terms of accuracy for i.i.d. data. 5
6 .5 Equivalece of determiistic regret bouds ad martigale iequalities The coectio betwee olie regret bouds ad Statistics ad Probability are eve more itriguig tha outlied above. Let us sketch oe example. Let z,..., z B 2 be a arbitrary sequece i the uit Euclidea ball. We costruct the sequece of predictios by ŷ 0 = 0 ad ŷ t+ = Π B2 (ŷ t z t ), (23) which we ca all recogize as gradiet descet with projectios oto B 2 ad step size η =. Earlier i the course we have proved a regret boud of ŷ t v, z t (24) for ay z,..., z B 2 ad v B 2. Let us choose the optimal v ad rearrage terms: z,..., z B 2, z t ŷ t, z t. (25) Now, apply this iequality to a realizatio of a martigale differece sequece Z,..., Z (that is, E[Z t+ Z t ] = 0). Sice the iequality holds for ay realizatio, P ( Z t u) P ( ŷ t, Z t u). (26) for ay u > 0. Sice ŷ t = ŷ t (Z,..., Z t ), the right-had side is a sum of real-valued martigale differeces with values i [, ], ad so the right-had side is upper bouded by exp{ u 2 /(2)} via Hoeffdig-Azuma iequality. Therefore, P ( Z t u) exp { u 2 /(2)}. (27) Oe may pass from this statemet i probability to a expected-value boud E Z t c (28) by itegratig the tails i (27). Note that the iequality holds for all martigale differece sequeces with values i B 2. Usig the miimax theorem, oe ca show (we basically have doe this already) that there exists a strategy with (25) for ay sequece because (28) holds. That is, (25) implies (27) which implies (28) which implies (25) (with a worse costat). Hece, the statemets are, i a certai sese, equivalet. Oe ca, i fact (see the relevat paper [RS5]) improve tail bouds (27) by takig a smarter gradiet descet method with a adaptive step size, ad, coversely, it is possible to derive ew optimizatio methods by provig tail bouds for martigales. This lecture was meat to give a brief overview of some iterestig coectios betwee olie methods ad the ear-by areas of game theory, statistics, probability, ad optimizatio. 6
7 Refereces [FS95] Yoav Freud ad Robert E Schapire. A desicio-theoretic geeralizatio of o-lie learig ad a applicatio to boostig. I Computatioal learig theory, pages Spriger, 995. [RS5] Alexader Rakhli ad Karthik Sridhara. O equivalece of martigale tail bouds ad determiistic regret iequalities. arxiv preprit arxiv: , 205. [RST5] Alexader Rakhli, Karthik Sridhara, ad Ambuj Tewari. Sequetial complexities ad uiform martigale laws of large umbers. Probability Theory ad Related Fields, 6(-2): 53, 205. [SF2] Robert E Schapire ad Yoav Freud. Boostig: Foudatios ad algorithms. MIT press,
6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationSingular Continuous Measures by Michael Pejic 5/14/10
Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationLecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm
8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which
More informationMath 25 Solutions to practice problems
Math 5: Advaced Calculus UC Davis, Sprig 0 Math 5 Solutios to practice problems Questio For = 0,,, 3,... ad 0 k defie umbers C k C k =! k!( k)! (for k = 0 ad k = we defie C 0 = C = ). by = ( )... ( k +
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationLecture 4 The Simple Random Walk
Lecture 4: The Simple Radom Walk 1 of 9 Course: M36K Itro to Stochastic Processes Term: Fall 014 Istructor: Gorda Zitkovic Lecture 4 The Simple Radom Walk We have defied ad costructed a radom walk {X }
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationsin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =
60. Ratio ad root tests 60.1. Absolutely coverget series. Defiitio 13. (Absolute covergece) A series a is called absolutely coverget if the series of absolute values a is coverget. The absolute covergece
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 0. Preamble This course will focus o olie methods i machie learig. Roughly speakig, olie methods are those that process oe datum at a time. This
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More informationThe Riemann Zeta Function
Physics 6A Witer 6 The Riema Zeta Fuctio I this ote, I will sketch some of the mai properties of the Riema zeta fuctio, ζ(x). For x >, we defie ζ(x) =, x >. () x = For x, this sum diverges. However, we
More informationTHE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS
THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationMa 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5
Ma 42: Itroductio to Lebesgue Itegratio Solutios to Homework Assigmet 5 Prof. Wickerhauser Due Thursday, April th, 23 Please retur your solutios to the istructor by the ed of class o the due date. You
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationNotes 5 : More on the a.s. convergence of sums
Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series
More informationREAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS
REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More information