6.883: Online Methods in Machine Learning Alexander Rakhlin
|
|
- Margery Moody
- 5 years ago
- Views:
Transcription
1 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex fuctios Iput: σ > 0 (strog covexity parameter) Iit: = 0 for,..., do t+ = t σt t, here E t [ t ] f( t ) ed for Lemma. If f is strogly covex ith parameter σ ad E i 2 G 2 for all i [ ], the the average of the trajectory ŵ = t satisfies E[f(ŵ)] f( ) G2 2σ ( + log ). he proof is a small modificatio of the gradiet descet lemma from the previous lecture. We prove the result for o-stochastic gradiet, ad leave the stochastic versio as a exercise. Proof. Folloig the proof of the gradiet descet lemma, but ith time-varyig η t, e get f( t ), t = [ t 2 t+ 2 ] + η t 2 f( t) 2. () Hoever, there is o additioal egative term cog from strog covexity. his term ill give us the faster / covergece rate: he hich is upper bouded by his upper boud is f( t ) f( ) f( t ), t σ 2 t 2 f(ŵ) f( ) f( t ) f( ) [ t 2 t+ 2 σ 2 t 2 ] + G 2 2σt. η t 2 f( t) 2. (2)
2 A fe remarks: he logarithmic factor ca be removed by averagig oly the secod half of the trajectory, or puttig some other ouiform eights o the trajectory. I practice, the last iterate is quite good, ad possibly better tha the average. Aalysis for this last iterate as doe i [SZ2]. Oce agai, e may icorporate a Euclidea projectio step oto a covex set after each update. I this case, the guaratee is ith respect to i that set. 0. Full gradiet for empirical objectives May offlie (or, batch ) problems i machie learig ca be ritte as a empirical objective or a regularized versio of it l(, (x t, y t )) (3) l(, (x t, y t )) + λr() (4) for some pealty R, tradeoff parameter λ, ad a fuctio l that measures ho ell explais the relatioship betee x ad y. For istace, for fidig a lo-error liear separator i the o-separable case, e may try to perform gradiet descet o f() = max{0, y t, x t } (5) his ould be a o-stochastic gradiet descet, but each iteratio requires oe to compute a elemet of f( t ). We may take t here t is a subgradiet of the t-th loss. For the hige loss case, a subgradiet (ith respect to example i) ca be ritte as y i x i {y i, x i < } (6) he procedure amouts to ruig through the hole dataset to calculate the full gradiet, ad the make oe step. ime complexity, i terms of gradiet evaluatios, to obtai ɛ-accurate solutio is the R2 2 ɛ 2 here R = max x i. Check that R comes from the boud o the gradiet of the loss. Ufortuately, the boud scales ith the size of our dataset, ad oe opts for the SGD procedure. echically speakig, there are better aalyses of SGD that may eve get the log(/ɛ) depedece o target accuracy uder additioal assumptios o the fuctios. Hoever, it has bee argued i the last decade (both empirically ad theoretically) that the ability to process larger is more importat tha attaiig high accuracy for a limited. hat is, if the costrait is computatio time, rather tha amout of data, oe should opt for stochastic gradiet descet [BB08, SSSSC]. 2
3 0.2 SGD for empirical objectives I applyig SGD to batch learig problems, oe vies the objective f() = l(, (x t, y t )) (7) (or the regularized versio) as a empirical distributio. If idex I is sampled uiformly at radom from [], the ay I l(, (x I, y I )) has the property that E I Uif [ I ] f(). (8) he time complexity of SGD for attaiig a ɛ-imizer of the objective is idepedet of (!!), ad the depedece o ɛ, of course, varies accordig to the properties of l. We remark that e proved covergece of SGD for geeral radom ubiased subgradiets, but e are applyig it to a distributio of a very specific form. his has bee exploited to improve the aalysis ad the depedece o ɛ. Istead of samplig from [], it is commo to permute the data ad ru over it i order, possibly several times. here have bee several orks tryig to uderstad ho differet the radom samplig is from cyclig through the data. 0.3 SGD for Support Vector Machies Recall that the hige loss pealizes data poits close to the boudary, ad thus pushes the hyperplae to have a large margi. his is ot a etirely precise statemet, sice the very otio of margi of size as tied to the fact that is a imal-orm vector. Hece, the objective (5) is ot hat e at to imize. Istead, e eed a bi-criterio form max{0, y t, x t }, 2. (9) hat is, e at to imize loss ad the orm at the same time. here are several ays to combie the bi-criteria ito a sigle oe. Here is oe: max{0, y t, x t } + λ 2 2. (0) his is ko as the Support Vector Machie (a facy ame is a must i machie learig!) for the case of liear kerels (more o this later). Oe more caveat is that SVM does ot pealize the scalar shift of the ohomogeeous hyperplae; i the formulatio (0), hoever, this shift is absorbed i ad the orm pealizes it. Before the large-scale problems came about, the SVMs ere solved as a costraied quadratic programg problem. Pegasos, the SGD solutio to this objective, proposed by [SSSSC] (see also [Zha04]) has bee very ifluetial i practical applicatios ith large datasets. For the radomly chose example i, the subgradiet of the SVM objective is t = y i x i {y i t, x i < } + λ t. () o apply SGD it remais to decide o the step size. Sice the objective is λ-strogly covex due to the regularizatio term, e choose the step-size η t = λt (2) 3
4 ad apply SGD for strogly covex objectives. Suppose e substitue a potetially suboptimal choice = 0 i (0). he value of the objective is the at most. Hece, the optimal solutio should give a objective value o greater tha that. hat implies λ 2 2, (3) or 2/λ (ith a bit more ork, 2 ca be replaced by ). he SGD algorithm may add the projectio step oto a uit ball of this radius to help guide the search ith the extra iformatio about the locatio of. [SSSSC] reports that the projectio step makes little differece i the experimets they performed. We summarize the Pegasos algorithm belo. Algorithm 2 SGD for SVM objective (Pegasos) Iput: λ > 0 (regularizatio parameter) Iit: = 0 for,..., do Set η t = λt Sample i Uif[] if y i t, x i < the t+ = ( η t λ) t + η t y i x i else t+ = ( η t λ) t ed if Optioally, rescale t+ to have orm at most 2/λ ed for o apply Lemma o covergece of SGD for strogly covex fuctios, e eed to calculate bouds o the gradiets ad. Observe that the hige loss is R-Lipschitz, here R = max i x i. Furthermore, the update of SGD ca be ritte succictly as t+ = λt t s= y is x is {y is s, x is < } (4) (prove this by uidig the recursio), here i s is the idex chose at step s. I particular, this implies that t+ R/λ for all iterates. he Lipschitz costat of the overall fuctio is the upper bouded by 2R, ad the covergece guaratee of Pegasos for the average ŵ of the trajectory is here f is the SVM objective i (0). 0.4 Mii-batchig Ef(ŵ) f( ) 4R2 ( + log ) λ A commo practice (icludig i SGD for deep eural ets) is to take a small batch of data, evaluate the average gradiet ith respect to these data, ad the update the parameter. Mii-batchig presets a atural iterpolatio betee the full gradiet (all gradiets at oce) ad the sigle-gradiet SGD as stated above. his has the effect of reducig variace of the gradiets hile still beig computatioally cheap (ad idepedet of ). (5) 4
5 0.5 Sparse updates Exaig (4), e oly eed to keep track of the sum of y is x is that led to a correctio of the hyperplae. Hece, if x s are s-sparse, the update ca be implemeted i time O(s) rather tha O(d). his becomes hady, for istace, i documet classificatio ith the bag-of-ords (or related) sparse represetatio. 0.6 Equivalet form of SVM objective A form that oe may ecouter i the literature is,b,ξ m ξ i + λ 2 2 (6) subj to y t (, x t + b) ξ t (7) ξ t 0 (8) Refereces [BB08] Olivier Bousquet ad Léo Bottou. he tradeoffs of large scale learig. I Advaces i eural iformatio processig systems, pages 6 68, [SSBD4] Shai Shalev-Shartz ad Shai Be-David. Uderstadig machie learig: From theory to algorithms. Cambridge Uiversity Press, 204. [SSSSC] Shai Shalev-Shartz, Yoram Siger, Natha Srebro, ad Adre Cotter. Pegasos: Primal estimated sub-gradiet solver for svm. Mathematical programg, 27():3 30, 20. [SZ2] Ohad Shamir ad og Zhag. Stochastic gradiet descet for o-smooth optimizatio: Covergece results ad optimal averagig schemes. arxiv preprit arxiv:22.824, 202. [Zha04] og Zhag. Solvig large scale liear predictio problems usig stochastic gradiet descet algorithms. I Proceedigs of the tety-first iteratioal coferece o Machie learig, page 6. ACM,
Optimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationSupport Vector Machines and Kernel Methods
Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationClassification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)
Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationAccelerated Mini-Batch Stochastic Dual Coordinate Ascent
Accelerated Mii-Batch Stochastic Dual Coordiate Ascet Shai Shalev-Shwartz School of Computer Sciece ad Egieerig Hebrew Uiversity, Jerusalem, Israel Tog Zhag Departmet of Statistics Rutgers Uiversity, NJ,
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationAdmin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)
Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationCS537. Numerical Analysis and Computing
CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationCS321. Numerical Analysis and Computing
CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationPattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More information1 Hash tables. 1.1 Implementation
Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More informationLecture #20. n ( x p i )1/p = max
COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationA General Iterative Scheme for Variational Inequality Problems and Fixed Point Problems
A Geeral Iterative Scheme for Variatioal Iequality Problems ad Fixed Poit Problems Wicha Khogtham Abstract We itroduce a geeral iterative scheme for fidig a commo of the set solutios of variatioal iequality
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationDoubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data
Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationCSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN
CSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN CSI 5163 (95.5703) ALGORITHM ANALYSIS AND DESIGN (3 cr.) (T) Topics of curret iterest i the desig ad aalysis of computer algorithms for graphtheoretical
More informationWEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!
WEIGHTED LEAST SQUARES - used to give more emphasis to selected poits i the aalysis What are eighted least squares?! " i=1 i=1 Recall, i OLS e miimize Q =! % =!(Y - " - " X ) or Q = (Y_ - X "_) (Y_ - X
More informationLearning Bounds for Support Vector Machines with Learned Kernels
Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationMixed Optimization for Smooth Functions
Mixed Optimizatio for Smooth Fuctios Mehrdad Mahdavi Liju Zhag Rog Ji Departmet of Computer Sciece ad Egieerig, Michiga State Uiversity, MI, USA {mahdavim,zhaglij,rogji}@msu.edu Abstract It is well ow
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationDistributed Strongly Convex Optimization
Distributed Strogly Covex Optimizatio Kostatios I. siaos Departmet of Electrical ad Computer Egieerig McGill Uiversity Motreal, Quebec H3A 0E9 Email: ostatios.tsiaos@gmail.com ad arxiv:07.303v cs.dc] 0
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationFast Rates for Regularized Objectives
Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationPattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm
Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad
More informationLecture 11: Pseudorandom functions
COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt
More informationLecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm
8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which
More informationSelective Prediction
COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationREGRESSION (Physics 1210 Notes, Partial Modified Appendix A)
REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More informationBetter Mini-Batch Algorithms via Accelerated Gradient Methods
Better Mii-Batch Algorithms via Accelerated Gradiet Methods Adrew Cotter Toyota Techological Istitute at Chicago cotter@ttic.edu Natha Srero Toyota Techological Istitute at Chicago ati@ttic.edu Ohad Shamir
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationSVM for Statisticians
SVM for Statisticias Youyi Fog Fred Hutchiso Cacer Research Istitute November 13, 2011 1 / 21 Primal Problem ad Pealized Loss Fuctio Miimize J over b, β ad ξ uder some costraits J = 1 2 β 2 + C ξ i (1)
More informationNew Iterative Method for Variational Inclusion and Fixed Point Problems
Proceedigs of the World Cogress o Egieerig 04 Vol II, WCE 04, July - 4, 04, Lodo, U.K. Ne Iterative Method for Variatioal Iclusio ad Fixed Poit Problems Yaoaluck Khogtham Abstract We itroduce a iterative
More informationFMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu
FMA90F: Machie Learig Lecture 4: Liear Models for Classificatio Cristia Smichisescu Liear Classificatio Classificatio is itrisically o liear because of the traiig costraits that place o idetical iputs
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationAccelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
he hirty-secod AAAI Coferece o Artificial Itelligece AAAI-8 Accelerated Method for Stochastic Compositio Optimizatio with Nosmooth Regularizatio Zhouyua Huo, Bi Gu, Ji Liu, 2 Heg Huag Departmet of Electrical
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationChapter 7 Isoperimetric problem
Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated
More informationDetailed derivation of multiplicative update rules for NMF
1 Itroductio Detailed derivatio of multiplicative update rules for NMF Jua José Burred March 2014 Paris, Frace jjburred@jjburredcom The goal of No-egative Matrix Factorizatio (NMF) is to decompose a matrix
More informationAPPENDIX A SMO ALGORITHM
AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio
More informationOPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES
OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationMassachusetts Institute of Technology
Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationEGN 3353C Fluid Mechanics
Chapter 7: DIMENSIONAL ANALYSIS AND MODELING Lecture 3 dimesio measure of a physical quatity ithout umerical values (e.g., legth) uit assigs a umber to that dimesio (e.g., meter) 7 fudametal dimesios from
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationMachine Learning. Ilya Narsky, Caltech
Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set
More informationThe Basic Space Model
The Basic Space Model Let x i be the ith idividual s (i=,, ) reported positio o the th issue ( =,, m) ad let X 0 be the by m matrix of observed data here the 0 subscript idicates that elemets are missig
More information