6.883: Online Methods in Machine Learning Alexander Rakhlin

Size: px
Start display at page:

Download "6.883: Online Methods in Machine Learning Alexander Rakhlin"

Transcription

1 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex fuctios Iput: σ > 0 (strog covexity parameter) Iit: = 0 for,..., do t+ = t σt t, here E t [ t ] f( t ) ed for Lemma. If f is strogly covex ith parameter σ ad E i 2 G 2 for all i [ ], the the average of the trajectory ŵ = t satisfies E[f(ŵ)] f( ) G2 2σ ( + log ). he proof is a small modificatio of the gradiet descet lemma from the previous lecture. We prove the result for o-stochastic gradiet, ad leave the stochastic versio as a exercise. Proof. Folloig the proof of the gradiet descet lemma, but ith time-varyig η t, e get f( t ), t = [ t 2 t+ 2 ] + η t 2 f( t) 2. () Hoever, there is o additioal egative term cog from strog covexity. his term ill give us the faster / covergece rate: he hich is upper bouded by his upper boud is f( t ) f( ) f( t ), t σ 2 t 2 f(ŵ) f( ) f( t ) f( ) [ t 2 t+ 2 σ 2 t 2 ] + G 2 2σt. η t 2 f( t) 2. (2)

2 A fe remarks: he logarithmic factor ca be removed by averagig oly the secod half of the trajectory, or puttig some other ouiform eights o the trajectory. I practice, the last iterate is quite good, ad possibly better tha the average. Aalysis for this last iterate as doe i [SZ2]. Oce agai, e may icorporate a Euclidea projectio step oto a covex set after each update. I this case, the guaratee is ith respect to i that set. 0. Full gradiet for empirical objectives May offlie (or, batch ) problems i machie learig ca be ritte as a empirical objective or a regularized versio of it l(, (x t, y t )) (3) l(, (x t, y t )) + λr() (4) for some pealty R, tradeoff parameter λ, ad a fuctio l that measures ho ell explais the relatioship betee x ad y. For istace, for fidig a lo-error liear separator i the o-separable case, e may try to perform gradiet descet o f() = max{0, y t, x t } (5) his ould be a o-stochastic gradiet descet, but each iteratio requires oe to compute a elemet of f( t ). We may take t here t is a subgradiet of the t-th loss. For the hige loss case, a subgradiet (ith respect to example i) ca be ritte as y i x i {y i, x i < } (6) he procedure amouts to ruig through the hole dataset to calculate the full gradiet, ad the make oe step. ime complexity, i terms of gradiet evaluatios, to obtai ɛ-accurate solutio is the R2 2 ɛ 2 here R = max x i. Check that R comes from the boud o the gradiet of the loss. Ufortuately, the boud scales ith the size of our dataset, ad oe opts for the SGD procedure. echically speakig, there are better aalyses of SGD that may eve get the log(/ɛ) depedece o target accuracy uder additioal assumptios o the fuctios. Hoever, it has bee argued i the last decade (both empirically ad theoretically) that the ability to process larger is more importat tha attaiig high accuracy for a limited. hat is, if the costrait is computatio time, rather tha amout of data, oe should opt for stochastic gradiet descet [BB08, SSSSC]. 2

3 0.2 SGD for empirical objectives I applyig SGD to batch learig problems, oe vies the objective f() = l(, (x t, y t )) (7) (or the regularized versio) as a empirical distributio. If idex I is sampled uiformly at radom from [], the ay I l(, (x I, y I )) has the property that E I Uif [ I ] f(). (8) he time complexity of SGD for attaiig a ɛ-imizer of the objective is idepedet of (!!), ad the depedece o ɛ, of course, varies accordig to the properties of l. We remark that e proved covergece of SGD for geeral radom ubiased subgradiets, but e are applyig it to a distributio of a very specific form. his has bee exploited to improve the aalysis ad the depedece o ɛ. Istead of samplig from [], it is commo to permute the data ad ru over it i order, possibly several times. here have bee several orks tryig to uderstad ho differet the radom samplig is from cyclig through the data. 0.3 SGD for Support Vector Machies Recall that the hige loss pealizes data poits close to the boudary, ad thus pushes the hyperplae to have a large margi. his is ot a etirely precise statemet, sice the very otio of margi of size as tied to the fact that is a imal-orm vector. Hece, the objective (5) is ot hat e at to imize. Istead, e eed a bi-criterio form max{0, y t, x t }, 2. (9) hat is, e at to imize loss ad the orm at the same time. here are several ays to combie the bi-criteria ito a sigle oe. Here is oe: max{0, y t, x t } + λ 2 2. (0) his is ko as the Support Vector Machie (a facy ame is a must i machie learig!) for the case of liear kerels (more o this later). Oe more caveat is that SVM does ot pealize the scalar shift of the ohomogeeous hyperplae; i the formulatio (0), hoever, this shift is absorbed i ad the orm pealizes it. Before the large-scale problems came about, the SVMs ere solved as a costraied quadratic programg problem. Pegasos, the SGD solutio to this objective, proposed by [SSSSC] (see also [Zha04]) has bee very ifluetial i practical applicatios ith large datasets. For the radomly chose example i, the subgradiet of the SVM objective is t = y i x i {y i t, x i < } + λ t. () o apply SGD it remais to decide o the step size. Sice the objective is λ-strogly covex due to the regularizatio term, e choose the step-size η t = λt (2) 3

4 ad apply SGD for strogly covex objectives. Suppose e substitue a potetially suboptimal choice = 0 i (0). he value of the objective is the at most. Hece, the optimal solutio should give a objective value o greater tha that. hat implies λ 2 2, (3) or 2/λ (ith a bit more ork, 2 ca be replaced by ). he SGD algorithm may add the projectio step oto a uit ball of this radius to help guide the search ith the extra iformatio about the locatio of. [SSSSC] reports that the projectio step makes little differece i the experimets they performed. We summarize the Pegasos algorithm belo. Algorithm 2 SGD for SVM objective (Pegasos) Iput: λ > 0 (regularizatio parameter) Iit: = 0 for,..., do Set η t = λt Sample i Uif[] if y i t, x i < the t+ = ( η t λ) t + η t y i x i else t+ = ( η t λ) t ed if Optioally, rescale t+ to have orm at most 2/λ ed for o apply Lemma o covergece of SGD for strogly covex fuctios, e eed to calculate bouds o the gradiets ad. Observe that the hige loss is R-Lipschitz, here R = max i x i. Furthermore, the update of SGD ca be ritte succictly as t+ = λt t s= y is x is {y is s, x is < } (4) (prove this by uidig the recursio), here i s is the idex chose at step s. I particular, this implies that t+ R/λ for all iterates. he Lipschitz costat of the overall fuctio is the upper bouded by 2R, ad the covergece guaratee of Pegasos for the average ŵ of the trajectory is here f is the SVM objective i (0). 0.4 Mii-batchig Ef(ŵ) f( ) 4R2 ( + log ) λ A commo practice (icludig i SGD for deep eural ets) is to take a small batch of data, evaluate the average gradiet ith respect to these data, ad the update the parameter. Mii-batchig presets a atural iterpolatio betee the full gradiet (all gradiets at oce) ad the sigle-gradiet SGD as stated above. his has the effect of reducig variace of the gradiets hile still beig computatioally cheap (ad idepedet of ). (5) 4

5 0.5 Sparse updates Exaig (4), e oly eed to keep track of the sum of y is x is that led to a correctio of the hyperplae. Hece, if x s are s-sparse, the update ca be implemeted i time O(s) rather tha O(d). his becomes hady, for istace, i documet classificatio ith the bag-of-ords (or related) sparse represetatio. 0.6 Equivalet form of SVM objective A form that oe may ecouter i the literature is,b,ξ m ξ i + λ 2 2 (6) subj to y t (, x t + b) ξ t (7) ξ t 0 (8) Refereces [BB08] Olivier Bousquet ad Léo Bottou. he tradeoffs of large scale learig. I Advaces i eural iformatio processig systems, pages 6 68, [SSBD4] Shai Shalev-Shartz ad Shai Be-David. Uderstadig machie learig: From theory to algorithms. Cambridge Uiversity Press, 204. [SSSSC] Shai Shalev-Shartz, Yoram Siger, Natha Srebro, ad Adre Cotter. Pegasos: Primal estimated sub-gradiet solver for svm. Mathematical programg, 27():3 30, 20. [SZ2] Ohad Shamir ad og Zhag. Stochastic gradiet descet for o-smooth optimizatio: Covergece results ad optimal averagig schemes. arxiv preprit arxiv:22.824, 202. [Zha04] og Zhag. Solvig large scale liear predictio problems usig stochastic gradiet descet algorithms. I Proceedigs of the tety-first iteratioal coferece o Machie learig, page 6. ACM,

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc) Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

Accelerated Mini-Batch Stochastic Dual Coordinate Ascent Accelerated Mii-Batch Stochastic Dual Coordiate Ascet Shai Shalev-Shwartz School of Computer Sciece ad Egieerig Hebrew Uiversity, Jerusalem, Israel Tog Zhag Departmet of Statistics Rutgers Uiversity, NJ,

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min) Admi Assigmet 5! Starter REGULARIZATION David Kauchak CS 158 Fall 2016 Schedule Midterm ext week, due Friday (more o this i 1 mi Assigmet 6 due Friday before fall break Midterm Dowload from course web

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

CS537. Numerical Analysis and Computing

CS537. Numerical Analysis and Computing CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

CS321. Numerical Analysis and Computing

CS321. Numerical Analysis and Computing CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Laboratory 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his laboratory sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

1 Hash tables. 1.1 Implementation

1 Hash tables. 1.1 Implementation Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Lecture #20. n ( x p i )1/p = max

Lecture #20. n ( x p i )1/p = max COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

A General Iterative Scheme for Variational Inequality Problems and Fixed Point Problems

A General Iterative Scheme for Variational Inequality Problems and Fixed Point Problems A Geeral Iterative Scheme for Variatioal Iequality Problems ad Fixed Poit Problems Wicha Khogtham Abstract We itroduce a geeral iterative scheme for fidig a commo of the set solutios of variatioal iequality

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

CSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN

CSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN CSI 5163 (95.573) ALGORITHM ANALYSIS AND DESIGN CSI 5163 (95.5703) ALGORITHM ANALYSIS AND DESIGN (3 cr.) (T) Topics of curret iterest i the desig ad aalysis of computer algorithms for graphtheoretical

More information

WEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!

WEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =! WEIGHTED LEAST SQUARES - used to give more emphasis to selected poits i the aalysis What are eighted least squares?! " i=1 i=1 Recall, i OLS e miimize Q =! % =!(Y - " - " X ) or Q = (Y_ - X "_) (Y_ - X

More information

Learning Bounds for Support Vector Machines with Learned Kernels

Learning Bounds for Support Vector Machines with Learned Kernels Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Mixed Optimization for Smooth Functions

Mixed Optimization for Smooth Functions Mixed Optimizatio for Smooth Fuctios Mehrdad Mahdavi Liju Zhag Rog Ji Departmet of Computer Sciece ad Egieerig, Michiga State Uiversity, MI, USA {mahdavim,zhaglij,rogji}@msu.edu Abstract It is well ow

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Distributed Strongly Convex Optimization

Distributed Strongly Convex Optimization Distributed Strogly Covex Optimizatio Kostatios I. siaos Departmet of Electrical ad Computer Egieerig McGill Uiversity Motreal, Quebec H3A 0E9 Email: ostatios.tsiaos@gmail.com ad arxiv:07.303v cs.dc] 0

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm 8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which

More information

Selective Prediction

Selective Prediction COMS 6998-4 Fall 2017 November 8, 2017 Selective Predictio Preseter: Rog Zhou Scribe: Wexi Che 1 Itroductio I our previous discussio o a variatio o the Valiat Model [3], the described learer has the ability

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Better Mini-Batch Algorithms via Accelerated Gradient Methods Better Mii-Batch Algorithms via Accelerated Gradiet Methods Adrew Cotter Toyota Techological Istitute at Chicago cotter@ttic.edu Natha Srero Toyota Techological Istitute at Chicago ati@ttic.edu Ohad Shamir

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

SVM for Statisticians

SVM for Statisticians SVM for Statisticias Youyi Fog Fred Hutchiso Cacer Research Istitute November 13, 2011 1 / 21 Primal Problem ad Pealized Loss Fuctio Miimize J over b, β ad ξ uder some costraits J = 1 2 β 2 + C ξ i (1)

More information

New Iterative Method for Variational Inclusion and Fixed Point Problems

New Iterative Method for Variational Inclusion and Fixed Point Problems Proceedigs of the World Cogress o Egieerig 04 Vol II, WCE 04, July - 4, 04, Lodo, U.K. Ne Iterative Method for Variatioal Iclusio ad Fixed Poit Problems Yaoaluck Khogtham Abstract We itroduce a iterative

More information

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu FMA90F: Machie Learig Lecture 4: Liear Models for Classificatio Cristia Smichisescu Liear Classificatio Classificatio is itrisically o liear because of the traiig costraits that place o idetical iputs

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization

Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization he hirty-secod AAAI Coferece o Artificial Itelligece AAAI-8 Accelerated Method for Stochastic Compositio Optimizatio with Nosmooth Regularizatio Zhouyua Huo, Bi Gu, Ji Liu, 2 Heg Huag Departmet of Electrical

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Detailed derivation of multiplicative update rules for NMF

Detailed derivation of multiplicative update rules for NMF 1 Itroductio Detailed derivatio of multiplicative update rules for NMF Jua José Burred March 2014 Paris, Frace jjburred@jjburredcom The goal of No-egative Matrix Factorizatio (NMF) is to decompose a matrix

More information

APPENDIX A SMO ALGORITHM

APPENDIX A SMO ALGORITHM AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Istitute of Techology 6.867 Machie Learig, Fall 6 Problem Set : Solutios. (a) (5 poits) From the lecture otes (Eq 4, Lecture 5), the optimal parameter values for liear regressio give the

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 14, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 14, 2014 1 / 49 Outlie Admiistratio

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

EGN 3353C Fluid Mechanics

EGN 3353C Fluid Mechanics Chapter 7: DIMENSIONAL ANALYSIS AND MODELING Lecture 3 dimesio measure of a physical quatity ithout umerical values (e.g., legth) uit assigs a umber to that dimesio (e.g., meter) 7 fudametal dimesios from

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

The Basic Space Model

The Basic Space Model The Basic Space Model Let x i be the ith idividual s (i=,, ) reported positio o the th issue ( =,, m) ad let X 0 be the by m matrix of observed data here the 0 subscript idicates that elemets are missig

More information