1 Gradient descent for convex functions: univariate case
|
|
- Sandra Potter
- 6 years ago
- Views:
Transcription
1 prnceton unv. F 13 cos 51: Advanced Algorthm Desgn Lecture 19: Gong wth the slope: offlne, onlne, and randomly Lecturer: Sanjeev Arora Scrbe: hs lecture s about gradent descent, a popular method for contnuous optmzaton (especally nonlnear optmzaton). We start by recallng that allowng nonlnear constrants n optmzaton leads to NPhard problems n general. For nstance the followng sngle constrant can be used to force all varables to be 0/1. x (1 x ) = 0. Notce, ths constrant s nonconvex. We saw n earler lectures that the Ellpsod method can solve convex optmzaton problems effcently under farly general condtons. But t s slow n practce. Gradent descent s a popular alternatve because t s smple and t gves some knd of meanngful result for both convex and nonconvex optmzaton. It tres to mprove the functon value by movng n a drecton related to the gradent (.e., the frst dervatve). For convex optmzaton t gves the global optmum under farly general condtons. For nonconvex optmzaton t arrves at a local optmum. Fgure 1: optmum For nonconvex functons, a local optmum may be dfferent from the global We wll frst study unconstraned gradent descent where we are smply optmzng a functon f( ). Recall that the functon s convex f f(λx + (1 λ)y) λf(x) + (1 λ)f(y) for all x, y and λ [0, 1]. 1 Gradent descent for convex functons: unvarate case For warm up let s descrbe gradent descent n the unvarate case, whch wll be famlar from freshman calculus. Recall the aylor expanson of a functon all of whose dervatves exst at x: 1
2 f(x + η) = f(x) + ηf (x) + η f (x) + η3 3! f (x). (1) he functon s convex f f (x) 0 for all x. hs means that f (x) s an ncreasng functon of x. he mnmum s attaned when f (x) = 0 snce f (x) keeps ncreasng to the left and rght of that. hus the global mnmum s unque. he functon s concave f f (x) 0 for all x; such functons have a unque maxmum. Examples of convex functons: ax + b for any a, b R; exp(ax) for any a R; x α for x 0, α 1 or α 0. Another nterestng example s the negatve entropy: x log x for x 0. Examples of concave functons: ax + b for any a, b R; x α for α [0, 1] and x 0; log x for x 0. Fgure : Concave and Convex Functon o mnmze a convex functon by gradent descent we start at some x 0 and at step update x to x +1 = x + ηf (x) for some small η < 0. In other words, move n the drecton where f decreases. If we gnore terms that nvolve η 3 or hgher, then f(x +1 ) = f(x ) + ηf (x ) + η f (x ). and the best value for η (whch gves the most reducton n one step) s η = f (x)/f (x), whch gves f(x +1 ) = f(x ) (f (x )) f (x ). hus the algorthm makes progress so long as f (x ) > 0. Convex functons that satsfy f (x) > 0 for all x are called strongly convex. he above calculaton s the man dea n Newton s method, whch you may have seen n calculus. Provng convergence requres further assumptons. Convex multvarate functons A convex functon on R n, f t s dfferentable, satsfes the basc followng basc nequalty, whch says that the functon les above the tangent plane at any pont. f(x + z) f(x) + f(x) z x, y. ()
3 3 Here f(x) s the vector of frst order dervatves where the th coordnate s f/ x and called the gradent. Sometmes we restate t equvalently as f(x) (y x) f(y) f(x) x, z (3) 3.1 Basc propertes and examples 69 f(y) f(x)+ f(x) (y x) (x, f(x)) Fgure 3. If f s convex and dfferentable, then f(x)+ f(x) (y x) f(y) for all x, y dom f. Fgure 3: A dfferentable convex functon les above the tangent plane f(x)+ f(x) (y x) s gven by { 0 x C Ĩ C(x) = If hgher dervatves also exst, the x multvarate C. aylor expanson for an n-varate functon f s he convex functon ĨC s called the ndcator functon of the set C. f(x + y) = f(x) + f(x) y + y f(x)y +. (4) We can play several notatonal trcks wth the ndcator functon ĨC. For example the problem of mnmzng a functon f (defned on all of R n, say) on the set C s the same as mnmzng the functon f + ĨC over all of Rn. Indeed, the functon f + ĨC s (by our conventon) f restrcted to the set C. Here f(x) denotes the n n matrx whose, j entry s f/ x x j and t s called the Hessan. It can be checked that f s convex f the Hessan s postve semdefnte; ths In a smlar way we can extend a concave functon by defnng t to be outsde ts doman Frst-order condtons Suppose f s dfferentable (.e., ts gradent f exsts at each pont n dom f, whch s open). hen f s convex f and only f dom f s convex and f(y) f(x)+ f(x) (y x) (3.) holds for all x, y dom f. hs nequalty s llustrated n fgure 3.. he affne functon of y gven by f(x)+ f(x) (y x) s, of course, the frst-order aylor approxmaton of f near x. he nequalty (3.) states that for a convex functon, the frst-order aylor approxmaton Fgure s n fact 4: a global he underestmator Hessan of the functon. Conversely, f the frst-order aylor approxmaton of a functon s always a global underestmator of the functon, then the functon s convex. he nequalty (3.) shows that from local nformaton about a convex functon (.e., ts value and dervatve at a pont) we can derve global nformaton (.e., a global underestmator of t). hs s perhaps the most mportant property of convex functons, and explans some of the remarkable propertes of convex functons and convex optmzaton problems. As one smple example, the nequalty (3.) shows that f f(x) = 0, then for all y dom f, f(y) f(x),.e., x s a global mnmzer of the functon f. means y fy 0 for all y. Example 1 he followng are some examples of convex functons. Norms. Every l p norm s convex on R n. he reason s that a norm satsfes trangle nequalty: x + y x + y x, y. f(x) = log(e x 1 + e x + + e xn ) s convex on R n. hs fact s used n practce as an analytc approxmaton of the max functon snce max {x 1,..., x n } f(x)+ max {x 1,..., x n } + log n.
4 4 urns out ths fact s at the root of the multplcatve weght update method; the algorthm for approxmately solvng LPs that we saw n Lecture 10 can be seen as dong a gradent descent on ths functon, where the x s are the slacks of the lnear constrants. (For a lnear constrant a z b the slack s a z b.) f(x) = x Ax where A s postve semdefnte. Its Hessan s A. Some mportant examples of concave functons are: geometrc mean ( n =1 x ) 1/n and logdetermnant (defned for X R n as log det(x) where X s nterpreted as an n n matrx). Many famous nequaltes n mathematcs (such as Cauchy-Schwartz) are derved usng convex functons. Example (Lnear equatons wth PSD constrant matrx) In lnear algebra you learnt that the method of choce to solve systems of equatons Ax = b s Gaussan elmnaton. In many practcal settngs ts O(n 3 ) runnng tme may be too hgh. Instead one does gradent descent on the functon 1 x Ax b x, whose local optmum satsfes Ax = b. If A s postve semdefnte ths functon s also convex snce the Hessan s A. Actually n real lfe these are optmzed usng more advanced methods such as conjugate gradent. Also, f A s dagonal domnant, a stronger constrant than PSD, then Spelman and eng (003) have shown how to solve ths problem n tme that s near lnear n the number of nonzero entres. Example 3 (Least squares) In some settngs we are gven a set of ponts a 1, a,..., a m R n and some data values b 1, b,..., b m taken at these ponts by some functon of nterest. We suspect that the unknown functon s a lne, except the data values have a lttle error n them. One standard technque s to fnd a least squares ft: a lne that mnmzes the sum of squares of the dstance to the dataponts to the lne. he objectve functon s mn Ax b where A R m n s the matrx whose rows are the a s. (We saw n an earler lecture that the soluton s also the frst sngular vector.) hs objectve s just x A Ax (Ax) b+b b, whch s convex. In the unvarate case, gradent descent has a choce of only two drectons to move n: rght or left. In n dmensons, t can move n any drecton n R n. he most drect analog of the unvarate method s to move dametrcally opposte from the gradent. he most drect analogue of our unvarate analyss would be to assume a lowerbound of y fy for all y (n other words, a lowerbound on the egenvalues of f). hs wll be explored n the homework. In the rest of lecture we wll only assume (). 3 Gradent Descent for Constraned Optmzaton As studed n prevous lectures, constraned optmzaton conssts of solvng the followng where K s a convex set and f( ) s a convex functon. mn f(x) s.t. x K.
5 5 Example 4 (Spam classfcaton va SVMs) hs example wll run through the entre lecture. Support Vector Machne s the name n machne learnng for a lnear classfer; we saw these before n Lecture 5. Suppose we wsh to tran the classfer to classfy emals as spam/nonspam. Each emal s represented usng a vector that gves the frequences of varous words n t ( bag of words model). Say X 1, X,..., X N are the emals, and for each there s a correspondng bt y { 1, 1} where y = 1 means X s spam. SVMs use a lnear classfer to separate spam from nonspam. If spam were perfectly dentfable by a lnear classfer, there would be a functon W x such that W X 1 f X s spam, and W X 1 f not. In other words, 1 y W X 0 (5) Of course, n practce a lnear classfer makes errors, so we have to allow for the possblty that (5) s volated by some X s. hus a more robust verson of ths problem s mn Loss(1 W (y X )) + λ W, (6) where Loss( ) s a functon that penalzes unsatsfed constrants accordng to the amount by whch they are unsatsfed, and λ > 0 s a scalng constant. (Note that W s the vector of varables.) he most obvous loss functon would be to count the number of unsatsfed constrants but that s nonconvex. For ths lecture we focus on convex loss functons; the smplest s the hnge loss: Loss(t) = max {0, t}. hen the functon n (6) s convex. If x K s the current pont and we use the gradent to step to x+ x then n general ths new pont wll not be n K. hus one needs to do a projecton. Defnton 1 he projecton of a pont y on K s x K that mnmzes y x. (It s also possble to use other norms than l to defne projectons.) A projecton oracle for the convex body a black box that, for every pont y, returns ts projecton on K. Often convex sets used n applcatons are smple to project to. Example 5 If K = unt sphere, then the projecton of y s y/ y. Here s a smple algorthm for solvng the constraned optmzaton problem. he algorthm only needs to access f va a gradent oracle and K va a projecton oracle. Defnton (Gradent Oracle) A gradent oracle for a functon f s a black box that, for every pont z, returns f(z) the gradent valuated at pont z. (Notce, ths s a lnear functon of the form gx where g s the vector of partal dervatves evaluated at z.) he same value of η wll be used throughout. Gradent Descent for Constraned Optmzaton Let η == D G. Repeat for = 0 to y (+1) x () + η f(x () ) x (+1) Projecton of y (+1) on K. At the end output z = 1 x().
6 6 Let us analyse ths algorthm as follows. Let x be the pont where the optmum s attaned. Let G denote an upperbound on f(x) for any x K, and let D = max x,y K x y be the so-called dameter of K. o ensure that the output z satsfes f(z) f(x ) + ɛ we wll use = 4D G. ɛ Snce x () s a projecton of y () on K we have x (+1) x y (+1) x = x () x η f(x () ) = x () x + η (f)(x () ) + η f(x () ) (x () x ) Reorganzng and usng defnton of G we obtan: f(x () ) (x x () ) 1 η ( x () x x (+1) x ) + η G Usng (3), we can lowerbound the left hand sde by f(x () ) f(x ). We conclude that f(x () ) f(x ) 1 η ( x () x x (+1) x ) + η G. (7) Now sum the prevous nequalty over = 1,,..., and use the telescopng cancellatons to obtan 1 =1 Fnally, by convexty f( 1 x() satsfes (f(x () ) f(x )) 1 η ( x (0) x x ( ) x ) + η G. x() ) 1 f(x() ) so we conclude that the pont z = f(z) f(z ) D η + η G. Now set η = D G DG to get an upperbound on the rght hand sde of. Snce = 4D G ɛ we see that f(z) f(x ) + ɛ. 4 Onlne Gradent Descent Znkevch notced that the analyss of gradent descent apples to a much more generalzed scenaro. here s a convex set K gven va a projecton oracle. For = 1,,..., we are presented at step a convex functon f. At step we have to put forth our guess soluton x () K but the catch s that we do not know the functons that wll be presented n future. So our onlne decsons have to be made such that f x s the pont w that mnmzes f (w) (.e. the pont that we would have chosen n hndsght after all the functons were revealed) then the followng quantty (called regret) should stay small: f (x () ) f (x ).
7 7 he above gradent descent algorthm can be adapted for ths problem by replacng f(x () ) by f (x () ). hs algorthm s called Onlne Gradent Descent. he earler analyss works essentally unchanged, once we realze that the left hand sde of (7) has the regret for tral. Summng over gves the total regret on the left sde, and the rght hand sde s analysed and upperbounded as before. hus we have shown: heorem 1 (Znkevch 003) If D s the dameter of K and G s an upperbound on the norm of the gradent of any of the presented functons, and η s set to then the regret per step after steps s at most DG. D G Example 6 (Spam classfcaton aganst adaptve adversares) We return to the spam classfcaton problem of Example 4, wth the new twst that ths classfer changes over tme, as spammers learn to evade the current classfer. hus there s no fxed dstrbuton of spam emals and thus t s frutless to tran the classfer at one go. It s better to have t mprove and adapt tself as new examples come up. At step t the optmum classfer f t may not be known and s presented usng a gradent oracle. hs functon just corresponds to the term n (6) correspondng to the latest emal that was classfed as spam/nonspam. Znkevch s theorem mples that our sequence of classfers has low regret. 5 Stochastc Gradent Descent Stochastc gradent descent s a varant of the algorthm n Secton 3 that works wth convex functons presented usng an even weaker noton: an expected gradent oracle. Gven a pont z, ths oracle returns a lnear functon gx + f that s drawn from a probablty dstrbuton D z such that the expectaton E g,f Dz [gx + f] s exactly the gradent of f at z. Example 7 (Spam classfcaton usng SGD) Returnng to the spam classfcaton problem of Example 4, we see that the functon n (6) s a sum of many smlar terms. If we randomly pck a sngle term and compute just ts gradent (whch s very quck to do!) then by lnearty of expectatons, the expectaton of ths gradent s just the true gradent. hus we see that the expected gradent oracle may be a much faster computaton than the gradent oracle (a mllon tmes faster f the number of emal examples s a mllon!). In fact ths settng s not atypcal; often the convex functon of nterest s a sum of many smlar terms. Stochastc gradent descent uses Onlne Gradent Descent (OGD). Let g x be the gradent at step. hen we use ths functon whch s a lnear functon and hence convex as f n the th step of OGD. Let z = 1 =1 x(). Let x be the pont n K where f attans ts mnmum value. heorem E[f(z)] f(x ) + DG, where D s the dameter as before and G s an upperbound of the norm of any gradent vector ever output by the oracle.
8 8 Proof: f(z) f(x ) 1 1 = 1 = 1 (f(x () ) f(x )) by convexty of f f(x () ) (x () x ) usng () E[g (x () x )] E[f (x ()) f (x )] = 1 E[ (f (x () ) f (x )] Snce expected gradent s the true gradent Defn. of f and the theorem now follows snce the expresson n the E[ ] s just the regret, whch s always upperbounded by the quantty gven n Znkevch s theorem, so the same upperbound holds also for the expectaton. 6 Hnt of more advanced deas We covered some of these n the Frday specal secton. Gradent descent algorthms come n dozens of flavors. (he Boyd-Vandenberghe book s a good resource. and Nesterov s lecture notes are terser but stll have a lot of ntuton.) Surprsngly, just gong along the gradent (more precsely, dametrcally opposte drecton from gradent) s not always the best strategy. Steepest descent drecton s defned by quantfyng the best decrease n the objectve functon obtanable va a step of unt length. he catch s that dfferent norms can be used to defne unt length. For example, f dstance s measured usng l 1 norm, then the best reducton happens by pckng the largest coordnate of the gradent vector and reducng the correspondng coordnate n x (coordnate descent). he classcal Newton method s a subcase where dstance s measured usng the ellpsodal norm defned usng the Hessan. Gradent descent deas underle recent advances n algorthms for problems lke Spelman- eng style solver for Laplacan systems, near-lnear tme approxmaton algorthms for maxmum flow n undrected graphs, and Madry s faster algorthm for maxmum weght matchng. Bblography 1. Convex Optmzaton, S. Boyd and L. Vandenberghe. Cambrdge Unversty Press. (pdf avalable onlne.). Introductory Lectures on Convex Optmzaton: A Basc Course. Y. Nesterov. Sprnger Lecture notes on onlne optmzaton. Elad Hazan. 4. Lecture notes on onlne optmzaton. S. Bubeck.
1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationThe Experts/Multiplicative Weights Algorithm and Applications
Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationBézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0
Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of
More information10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)
0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationSalmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationFormulas for the Determinant
page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use
More informationSpectral Graph Theory and its Applications September 16, Lecture 5
Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationSELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.
SELECTED SOLUTIONS, SECTION 4.3 1. Weak dualty Prove that the prmal and dual values p and d defned by equatons 4.3. and 4.3.3 satsfy p d. We consder an optmzaton problem of the form The Lagrangan for ths
More informationLecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights
Lecture 7 Solvng LPs/SDPs usng Multplcatve Weghts In the last lecture we saw the Multplcatve Weghts (MW) algorthm and how t could be used to effectvely solve the experts problem n whch we have many experts
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More information14 Lagrange Multipliers
Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationPerfect Competition and the Nash Bargaining Solution
Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationStanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013
Stanford Unversty Graph Parttonng and Expanders Handout 3 Luca Trevsan May 8, 03 Lecture 3 In whch we analyze the power method to approxmate egenvalues and egenvectors, and we descrbe some more algorthmc
More informationLecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +
Topcs n Theoretcal Computer Scence May 4, 2015 Lecturer: Ola Svensson Lecture 11 Scrbes: Vncent Eggerlng, Smon Rodrguez 1 Introducton In the last lecture we covered the ellpsod method and ts applcaton
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationP exp(tx) = 1 + t 2k M 2k. k N
1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.
More informationEigenvalues of Random Graphs
Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationMath 217 Fall 2013 Homework 2 Solutions
Math 17 Fall 013 Homework Solutons Due Thursday Sept. 6, 013 5pm Ths homework conssts of 6 problems of 5 ponts each. The total s 30. You need to fully justfy your answer prove that your functon ndeed has
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More information