Optimally Sparse SVMs
|
|
- Lauren Hardy
- 5 years ago
- Views:
Transcription
1 A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but also for radomized classifiers of a more geeral form tha those by our sparsificatio procedure Sectio 4. Lemma 3.. Let R, L, 0 be give, with L /4 ad with R 2 beig a iteger. There exists a data distributio D ad a referece vector u such that u = R, L hige g u = L, ad ay w which satisfies: L 0/ g w L must ecessarily be supported o at least R 2 /2 vectors. Furthermore, the claim also holds for radomized classificatio rules that predict with probability ψg u x for some ψ : R [0, ]. Proof. We defie D such that i is sampled uiformly at radom from the set {,..., d}, with d = R 2, ad the feature vector is take to be x = e i the ith stadard uit basis vector with correspodig label distributed accordig to Pr {y = z} = L /2. The value of z {±} will be specified later. Choose u i = z for all i, so that u = R ad L hige g u = L. Take w to be a liear combiatio of k < d/2 = R 2 /2 vectors. The g w x = 0 o ay x which is ot i its support set. Suppose that wheever g w x i = 0 the algorithm predicts the label with probability p [0, ] p = ψ0 for a radomized classifier. If p /2 we ll set z =, ad if p < /2 we ll set z =. This implies that: L 0/ g w d k 2d > 4 L which cocludes the proof. B. Compressio Boud We rely o the followig compressio boud Theorem 2 of Shalev-Shwartz 200. Theorem B.. Let k ad be fixed, with 2k, ad let A : R d {±} k H be a mappig which receives a list of k labeled traiig examples, ad returs a classificatio vector w H. Use S [] k to deote a list of k traiig idices, ad let w S be the result of applyig A to the traiig elemets idexed by S. Fially, let l : R [0, ] be a loss fuctio bouded below by 0 ad above by, with L g w ad ˆL g w the expected loss, ad empirical loss o the traiig set, respectively. The, with probability, for all S: L g ws ˆL 32 g ws ˆL g ws k log log 8 k log log Proof. Cosider, for some fixed, the probability that there exists a S {,..., } of size k such that: L g ws ˆL 2 test g ws ˆL test g ws log 4 log k k where ˆL test g w = k i/ S l y ig w x i is the empirical loss o the complemet of S. It follows from Berstei s iequality that, for a particular S, the above holds with probability at most. By the uio boud: k Pr S []k : L g ws ˆL 2 test g ws ˆL test g ws log 4 log k k Let = k. Notice that k ˆL test g ws ˆL g ws, so: Pr S []k : L g ws ˆL g ws 2 ˆL g ws log k k k 2 4 log k k
2 Figure 3. Illustratio of the how our smooth loss relates to the slat ad hige losses. Our smooth loss gree upper bouds the slat-loss, ad lower bouds the slat-loss whe shifted by /6, ad the hige-loss whe shifted by /3. Because ˆL g ws ad k, it follows that k ˆL k g ws 2 log, ad therefore that ˆL k g ws 2k ˆLg ws log 2 ˆLg ws log k. Hece: k 2 k 2 Pr S []k : L g ws ˆL g ws Usig the assumptio that 2k completes the proof. C. Cocetratio-based Aalysis 8 ˆL g ws log k k k 2 4 log k I this sectio, we will prove a boud comparable to that of Theorem 4.4, but usig a proof techique based o a smooth loss, rather tha a compressio boud. I order to accomplish this, we must first modify the objective of Problem 4. by addig a orm-costrait: miimize :f w = subject to : w w max h i y i w, Φx i i:y i w,φx i >0 C. Here, as before, h i = mi, y i w, Φx i. Like Problem 4., this objective ca be optimized usig subgradiet descet, although oe must add a step i which the curret iterate is projected oto the ball of radius w after every iteratio. Despite this chage, a -suboptimal solutio ca still be foud i w 2 / 2 iteratios. The cocetratio-based versio of our mai theorem follows: Theorem C.. Let R R be fixed. With probability over the traiig sample, uiformly over all pairs w, w H such that w R ad w has objective fuctio f w /3 i Problem C.: L 0/ g w ˆL hige g w O ˆL hige g w R 2 log 3 ˆL hige g w log R2 log 3 log Proof. Because our boud is based o a smooth loss, we begi by defiig the bouded 4-smooth loss l smooth z to be if z < /2, 0 if z > 2 /3, ad /2 cos π /2 /7 2z otherwise. This fuctio is illustrated i Figure C otice that it upper-bouds the slat-loss, ad lower-bouds the hige loss eve whe shifted by /3. Applyig Theorem of Srebro et al. 200 to this smooth loss yields that, with probability, uiformly over all w such that w R: L smooth g w ˆL smooth g w O ˆL smooth g w R 2 log 3 ˆL smooth g w log R2 log 3 log
3 Just as the empirical slat-loss of a w with f w /2 is upper bouded by the empirical hige loss of w, the empirical smooth loss of a w with f w /3 is upper-bouded by the same quatity. As was argued i the proof of Lemma 4., this follows directly from Problem C., ad the defiitio of the smooth loss. Combiig this with the facts that the slat-loss lower bouds the smooth loss, ad that L slat g w = L 0/ g w, completes the proof. It s worth poitig out that the additio of a orm-costrait to the objective fuctio Problem C. is oly ecessary because we wat the theorem to apply to ay w with f w /3. If we restrict ourselves to w which are foud usig subgradiet descet with the suggested step size ad iteratio cout, the applyig the triagle iequality to the sequece of steps yields that w O w, ad the above boud still holds albeit with a differet costat hidde iside the big-oh otatio. D. Uregularized Bias Alterative I Sectio 4.6, we discussed a simple extesio of our algorithm to a SVM problem with a uregularized bias term, i which we took our sparse classifier w, b to have the same bias as our target classifier w, b i.e. b = b. I this sectio, we discuss a alterative, i which we optimize over b durig our subgradiet descet procedure. The relevat optimizatio problem aalogous to Problem 4. is: miimize :f w, b = max h i y i w, Φx i b D. i:y i w,φx i >0 with :h i = mi, y i w, Φx i b A /2-approximatio may oce more be foud usig subgradiet descet. The differece is that, before fidig a subgradiet, we will implicitly optimize over b. It ca be easily observed that the optimal b will esure that: max i:y i>0 w,φx i >0 h i w, Φx i b = max i:y i<0 w,φx i <0 h i w, Φx i b D.2 I other words, b will be chose such that the maximal violatio amog the set of positive examples will equal that amog the egative examples. Hece, durig optimizatio, we may fid the most violatig pair of oe positive ad oe egative example, ad the take a step o both elemets. The resultig subgradiet descet algorithm is:. Fid the traiig idices i : y i > 0 w, Φx i b > 0 ad i : y i < 0 w, Φx i b < 0 which maximize h i y i w t, Φx i 2. Take the subgradiet step w t w t ηφx i Φx i. Oce optimizatio has completed, b may be computed from Equatio D.2. As before, this algorithm will fid a /2-approximatio i 4 w 2 iteratios. E. Sample Complexity of SVM I this appedix, we provide a brief proof of a claim based o Lemma D. of the appedix of Cotter et al. 202b, which is the log versio of Cotter et al. 202a. This result, which follows almost immediately from Theorem of Srebro et al. 200, establishes the sample complexity boud claimed i Sectio 2. Lemma E.. See Lemma D. of Cotter et al. 202b Let u be a arbitrary liear classifier, ad suppose that we sample a traiig set of size, with give by the followig equatio, for parameters > 0 ad 0, : = Õ Lhige g u u log Let ŵ = argmi ˆL hige ŵ. The, with probability 2 over the i.i.d. traiig sample x i, y i : i {,..., }, ŵ u we have that L 0/ gŵ L hige g u 2. 2 E.
4 Figure 4. loss. Plot of a smooth ad bouded fuctio red which upper bouds the 0/ loss ad lower bouds the hige To prove this, we will first prove two helper lemmas: Lemma E.2 is a direct applicatio of Theorem of Srebro et al. 200 to a smooth fuctio which is itermediate betwee the 0/ ad hige losses this is similar to Theorem 5 of Srebro et al. 200; Lemma E.3 aalyzes the empirical error of a sigle hypothesis by a direct applicatio of Berstei s iequality. Combiig these two lemmas Sectio E.2 the gives the claimed result. E.. Helper Lemmas Lemma E.2. Suppose that we sample a traiig set of size, with give by the followig equatio, for parameters L, B, > 0 ad 0, : = Õ L B log 2 E.2 The, with probability over the i.i.d. traiig sample x i, y i : i {,..., }, uiformly for all liear classifiers w satisfyig: w B, ˆLhige g w L E.3 we have that L 0/ g w L. Proof. For a smooth loss fuctio, Theorem of Srebro et al. 200 bouds the expected loss i terms of the empirical loss, plus a factor depedig o amog other thigs the sample size. Neither the 0/ or the hige losses are smooth, so we will defie a bouded ad smooth loss fuctio which upper bouds the 0/ loss ad lower-bouds the hige loss. The particular fuctio which we use does t matter, sice its smoothess parameter ad upper boud will ultimately be absorbed ito the big-oh otatio all that is eeded is the existece of such a fuctio. Oe such is: φ x = 5/4 x < /2 x 2 x /2 x < 0 x 3 x 2 x 0 x < 0 x This fuctio, illustrated i Figure E., is 4-smooth ad 5 /4-bouded. If we defie L φ g w ad ˆL φ g w as the expected ad empirical φ-losses, respectively, the the aforemetioed theorem gives that, with probability uiformly over all w such that w B: L φ g w ˆL φ g w O B2 log 3 log ˆL φ g w B 2 log 3 log
5 Because φ is lower-bouded by the 0/ loss ad upper-bouded by the hige loss, we may replace L φ g w with L 0/ g w o the LHS of the above boud, ad ˆL φ g w with L o the RHS. Settig the big-oh expressio to ad solvig for the gives the desired result. Lemma E.3. Let u be a arbitrary liear classifier, ad suppose that we sample a traiig set of size, with give by the followig equatio, for parameters > 0 ad 0, : Lhige g u u log = 2 E.4 The, with probability over the i.i.d. traiig sample x i, y i : i {,..., }, we have that ˆLhige g u L hige g u. Proof. The hige loss is upper-bouded by u by assumptio, x with probability, from which it follows that Var x,y l y u, x u L hige g u. Hece, by Berstei s iequality: { } Pr ˆLhige g u > L hige g u exp 2 /2 u L hige g u /3 exp 2 2 u L hige g u Settig the LHS to ad solvig for gives the desired result. E.2. Proof of Lemma E. Proof. Lemma E.3 gives that ˆL hige g u L hige g u provided that Equatio E.4 is satisfied. Take L = L hige g u ad B = u, ad observe that ŵ satisfies Equatio E.3 because ˆL hige gŵ ˆL hige g u L hige g u = L ad ŵ u = B. Therefore, Lemma E.2 gives that L 0/ gŵ L hige g u 2, provided that Equatio E.2 is also satisfied. Equatio E. is what results from combiig these two bouds ad simplifyig. Each lemma holds with probability, so this result holds with probability 2.
Regression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationDifferentiable Convex Functions
Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More informationThe Maximum-Likelihood Decoding Performance of Error-Correcting Codes
The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationTest One (Answer Key)
CS395/Ma395 (Sprig 2005) Test Oe Name: Page 1 Test Oe (Aswer Key) CS395/Ma395: Aalysis of Algorithms This is a closed book, closed otes, 70 miute examiatio. It is worth 100 poits. There are twelve (12)
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationCS / MCS 401 Homework 3 grader solutions
CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of
More informationLecture #20. n ( x p i )1/p = max
COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More informationSequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018
CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More information1 Generating functions for balls in boxes
Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationAssignment 5: Solutions
McGill Uiversity Departmet of Mathematics ad Statistics MATH 54 Aalysis, Fall 05 Assigmet 5: Solutios. Let y be a ubouded sequece of positive umbers satisfyig y + > y for all N. Let x be aother sequece
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationSignal Processing. Lecture 02: Discrete Time Signals and Systems. Ahmet Taha Koru, Ph. D. Yildiz Technical University.
Sigal Processig Lecture 02: Discrete Time Sigals ad Systems Ahmet Taha Koru, Ph. D. Yildiz Techical Uiversity 2017-2018 Fall ATK (YTU) Sigal Processig 2017-2018 Fall 1 / 51 Discrete Time Sigals Discrete
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More informationLearning Bounds for Support Vector Machines with Learned Kernels
Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationChapter 6 Infinite Series
Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationSection 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations
Differece Equatios to Differetial Equatios Sectio. Calculus: Areas Ad Tagets The study of calculus begis with questios about chage. What happes to the velocity of a swigig pedulum as its positio chages?
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationPolynomial Functions and Their Graphs
Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively
More informationNICK DUFRESNE. 1 1 p(x). To determine some formulas for the generating function of the Schröder numbers, r(x) = a(x) =
AN INTRODUCTION TO SCHRÖDER AND UNKNOWN NUMBERS NICK DUFRESNE Abstract. I this article we will itroduce two types of lattice paths, Schröder paths ad Ukow paths. We will examie differet properties of each,
More informationZeros of Polynomials
Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationINEQUALITIES BJORN POONEN
INEQUALITIES BJORN POONEN 1 The AM-GM iequality The most basic arithmetic mea-geometric mea (AM-GM) iequality states simply that if x ad y are oegative real umbers, the (x + y)/2 xy, with equality if ad
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More informationStochastic Matrices in a Finite Field
Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices
More informationProperties and Tests of Zeros of Polynomial Functions
Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationTechnical Proofs for Homogeneity Pursuit
Techical Proofs for Homogeeity Pursuit bstract This is the supplemetal material for the article Homogeeity Pursuit, submitted for publicatio i Joural of the merica Statistical ssociatio. B Proofs B. Proof
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationSNAP Centre Workshop. Basic Algebraic Manipulation
SNAP Cetre Workshop Basic Algebraic Maipulatio 8 Simplifyig Algebraic Expressios Whe a expressio is writte i the most compact maer possible, it is cosidered to be simplified. Not Simplified: x(x + 4x)
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationDesign and Analysis of Algorithms
Desig ad Aalysis of Algorithms Probabilistic aalysis ad Radomized algorithms Referece: CLRS Chapter 5 Topics: Hirig problem Idicatio radom variables Radomized algorithms Huo Hogwei 1 The hirig problem
More informationIt is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.
Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable
More informationIntroductory Analysis I Fall 2014 Homework #7 Solutions
Itroductory Aalysis I Fall 214 Homework #7 Solutios Note: There were a couple of typos/omissios i the formulatio of this homework. Some of them were, I believe, quite obvious. The fact that the statemet
More informationMcGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems
McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationLinear Elliptic PDE s Elliptic partial differential equations frequently arise out of conservation statements of the form
Liear Elliptic PDE s Elliptic partial differetial equatios frequetly arise out of coservatio statemets of the form B F d B Sdx B cotaied i bouded ope set U R. Here F, S deote respectively, the flux desity
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationDiscrete-Time Systems, LTI Systems, and Discrete-Time Convolution
EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [
More informationA remark on p-summing norms of operators
A remark o p-summig orms of operators Artem Zvavitch Abstract. I this paper we improve a result of W. B. Johso ad G. Schechtma by provig that the p-summig orm of ay operator with -dimesioal domai ca be
More informationMATH 304: MIDTERM EXAM SOLUTIONS
MATH 304: MIDTERM EXAM SOLUTIONS [The problems are each worth five poits, except for problem 8, which is worth 8 poits. Thus there are 43 possible poits.] 1. Use the Euclidea algorithm to fid the greatest
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationPolynomial identity testing and global minimum cut
CHAPTER 6 Polyomial idetity testig ad global miimum cut I this lecture we will cosider two further problems that ca be solved usig probabilistic algorithms. I the first half, we will cosider the problem
More informationMath 220A Fall 2007 Homework #2. Will Garner A
Math 0A Fall 007 Homewor # Will Garer Pg 3 #: Show that {cis : a o-egative iteger} is dese i T = {z œ : z = }. For which values of q is {cis(q): a o-egative iteger} dese i T? To show that {cis : a o-egative
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More information