1 Gradient descent for convex functions: univariate case

Size: px
Start display at page:

Download "1 Gradient descent for convex functions: univariate case"

Transcription

1 prnceton unv. F 13 cos 51: Advanced Algorthm Desgn Lecture 19: Gong wth the slope: offlne, onlne, and randomly Lecturer: Sanjeev Arora Scrbe: hs lecture s about gradent descent, a popular method for contnuous optmzaton (especally nonlnear optmzaton). We start by recallng that allowng nonlnear constrants n optmzaton leads to NPhard problems n general. For nstance the followng sngle constrant can be used to force all varables to be 0/1. x (1 x ) = 0. Notce, ths constrant s nonconvex. We saw n earler lectures that the Ellpsod method can solve convex optmzaton problems effcently under farly general condtons. But t s slow n practce. Gradent descent s a popular alternatve because t s smple and t gves some knd of meanngful result for both convex and nonconvex optmzaton. It tres to mprove the functon value by movng n a drecton related to the gradent (.e., the frst dervatve). For convex optmzaton t gves the global optmum under farly general condtons. For nonconvex optmzaton t arrves at a local optmum. Fgure 1: optmum For nonconvex functons, a local optmum may be dfferent from the global We wll frst study unconstraned gradent descent where we are smply optmzng a functon f( ). Recall that the functon s convex f f(λx + (1 λ)y) λf(x) + (1 λ)f(y) for all x, y and λ [0, 1]. 1 Gradent descent for convex functons: unvarate case For warm up let s descrbe gradent descent n the unvarate case, whch wll be famlar from freshman calculus. Recall the aylor expanson of a functon all of whose dervatves exst at x: 1

2 f(x + η) = f(x) + ηf (x) + η f (x) + η3 3! f (x). (1) he functon s convex f f (x) 0 for all x. hs means that f (x) s an ncreasng functon of x. he mnmum s attaned when f (x) = 0 snce f (x) keeps ncreasng to the left and rght of that. hus the global mnmum s unque. he functon s concave f f (x) 0 for all x; such functons have a unque maxmum. Examples of convex functons: ax + b for any a, b R; exp(ax) for any a R; x α for x 0, α 1 or α 0. Another nterestng example s the negatve entropy: x log x for x 0. Examples of concave functons: ax + b for any a, b R; x α for α [0, 1] and x 0; log x for x 0. Fgure : Concave and Convex Functon o mnmze a convex functon by gradent descent we start at some x 0 and at step update x to x +1 = x + ηf (x) for some small η < 0. In other words, move n the drecton where f decreases. If we gnore terms that nvolve η 3 or hgher, then f(x +1 ) = f(x ) + ηf (x ) + η f (x ). and the best value for η (whch gves the most reducton n one step) s η = f (x)/f (x), whch gves f(x +1 ) = f(x ) (f (x )) f (x ). hus the algorthm makes progress so long as f (x ) > 0. Convex functons that satsfy f (x) > 0 for all x are called strongly convex. he above calculaton s the man dea n Newton s method, whch you may have seen n calculus. Provng convergence requres further assumptons. Convex multvarate functons A convex functon on R n, f t s dfferentable, satsfes the basc followng basc nequalty, whch says that the functon les above the tangent plane at any pont. f(x + z) f(x) + f(x) z x, y. ()

3 3 Here f(x) s the vector of frst order dervatves where the th coordnate s f/ x and called the gradent. Sometmes we restate t equvalently as f(x) (y x) f(y) f(x) x, z (3) 3.1 Basc propertes and examples 69 f(y) f(x)+ f(x) (y x) (x, f(x)) Fgure 3. If f s convex and dfferentable, then f(x)+ f(x) (y x) f(y) for all x, y dom f. Fgure 3: A dfferentable convex functon les above the tangent plane f(x)+ f(x) (y x) s gven by { 0 x C Ĩ C(x) = If hgher dervatves also exst, the x multvarate C. aylor expanson for an n-varate functon f s he convex functon ĨC s called the ndcator functon of the set C. f(x + y) = f(x) + f(x) y + y f(x)y +. (4) We can play several notatonal trcks wth the ndcator functon ĨC. For example the problem of mnmzng a functon f (defned on all of R n, say) on the set C s the same as mnmzng the functon f + ĨC over all of Rn. Indeed, the functon f + ĨC s (by our conventon) f restrcted to the set C. Here f(x) denotes the n n matrx whose, j entry s f/ x x j and t s called the Hessan. It can be checked that f s convex f the Hessan s postve semdefnte; ths In a smlar way we can extend a concave functon by defnng t to be outsde ts doman Frst-order condtons Suppose f s dfferentable (.e., ts gradent f exsts at each pont n dom f, whch s open). hen f s convex f and only f dom f s convex and f(y) f(x)+ f(x) (y x) (3.) holds for all x, y dom f. hs nequalty s llustrated n fgure 3.. he affne functon of y gven by f(x)+ f(x) (y x) s, of course, the frst-order aylor approxmaton of f near x. he nequalty (3.) states that for a convex functon, the frst-order aylor approxmaton Fgure s n fact 4: a global he underestmator Hessan of the functon. Conversely, f the frst-order aylor approxmaton of a functon s always a global underestmator of the functon, then the functon s convex. he nequalty (3.) shows that from local nformaton about a convex functon (.e., ts value and dervatve at a pont) we can derve global nformaton (.e., a global underestmator of t). hs s perhaps the most mportant property of convex functons, and explans some of the remarkable propertes of convex functons and convex optmzaton problems. As one smple example, the nequalty (3.) shows that f f(x) = 0, then for all y dom f, f(y) f(x),.e., x s a global mnmzer of the functon f. means y fy 0 for all y. Example 1 he followng are some examples of convex functons. Norms. Every l p norm s convex on R n. he reason s that a norm satsfes trangle nequalty: x + y x + y x, y. f(x) = log(e x 1 + e x + + e xn ) s convex on R n. hs fact s used n practce as an analytc approxmaton of the max functon snce max {x 1,..., x n } f(x)+ max {x 1,..., x n } + log n.

4 4 urns out ths fact s at the root of the multplcatve weght update method; the algorthm for approxmately solvng LPs that we saw n Lecture 10 can be seen as dong a gradent descent on ths functon, where the x s are the slacks of the lnear constrants. (For a lnear constrant a z b the slack s a z b.) f(x) = x Ax where A s postve semdefnte. Its Hessan s A. Some mportant examples of concave functons are: geometrc mean ( n =1 x ) 1/n and logdetermnant (defned for X R n as log det(x) where X s nterpreted as an n n matrx). Many famous nequaltes n mathematcs (such as Cauchy-Schwartz) are derved usng convex functons. Example (Lnear equatons wth PSD constrant matrx) In lnear algebra you learnt that the method of choce to solve systems of equatons Ax = b s Gaussan elmnaton. In many practcal settngs ts O(n 3 ) runnng tme may be too hgh. Instead one does gradent descent on the functon 1 x Ax b x, whose local optmum satsfes Ax = b. If A s postve semdefnte ths functon s also convex snce the Hessan s A. Actually n real lfe these are optmzed usng more advanced methods such as conjugate gradent. Also, f A s dagonal domnant, a stronger constrant than PSD, then Spelman and eng (003) have shown how to solve ths problem n tme that s near lnear n the number of nonzero entres. Example 3 (Least squares) In some settngs we are gven a set of ponts a 1, a,..., a m R n and some data values b 1, b,..., b m taken at these ponts by some functon of nterest. We suspect that the unknown functon s a lne, except the data values have a lttle error n them. One standard technque s to fnd a least squares ft: a lne that mnmzes the sum of squares of the dstance to the dataponts to the lne. he objectve functon s mn Ax b where A R m n s the matrx whose rows are the a s. (We saw n an earler lecture that the soluton s also the frst sngular vector.) hs objectve s just x A Ax (Ax) b+b b, whch s convex. In the unvarate case, gradent descent has a choce of only two drectons to move n: rght or left. In n dmensons, t can move n any drecton n R n. he most drect analog of the unvarate method s to move dametrcally opposte from the gradent. he most drect analogue of our unvarate analyss would be to assume a lowerbound of y fy for all y (n other words, a lowerbound on the egenvalues of f). hs wll be explored n the homework. In the rest of lecture we wll only assume (). 3 Gradent Descent for Constraned Optmzaton As studed n prevous lectures, constraned optmzaton conssts of solvng the followng where K s a convex set and f( ) s a convex functon. mn f(x) s.t. x K.

5 5 Example 4 (Spam classfcaton va SVMs) hs example wll run through the entre lecture. Support Vector Machne s the name n machne learnng for a lnear classfer; we saw these before n Lecture 5. Suppose we wsh to tran the classfer to classfy emals as spam/nonspam. Each emal s represented usng a vector that gves the frequences of varous words n t ( bag of words model). Say X 1, X,..., X N are the emals, and for each there s a correspondng bt y { 1, 1} where y = 1 means X s spam. SVMs use a lnear classfer to separate spam from nonspam. If spam were perfectly dentfable by a lnear classfer, there would be a functon W x such that W X 1 f X s spam, and W X 1 f not. In other words, 1 y W X 0 (5) Of course, n practce a lnear classfer makes errors, so we have to allow for the possblty that (5) s volated by some X s. hus a more robust verson of ths problem s mn Loss(1 W (y X )) + λ W, (6) where Loss( ) s a functon that penalzes unsatsfed constrants accordng to the amount by whch they are unsatsfed, and λ > 0 s a scalng constant. (Note that W s the vector of varables.) he most obvous loss functon would be to count the number of unsatsfed constrants but that s nonconvex. For ths lecture we focus on convex loss functons; the smplest s the hnge loss: Loss(t) = max {0, t}. hen the functon n (6) s convex. If x K s the current pont and we use the gradent to step to x+ x then n general ths new pont wll not be n K. hus one needs to do a projecton. Defnton 1 he projecton of a pont y on K s x K that mnmzes y x. (It s also possble to use other norms than l to defne projectons.) A projecton oracle for the convex body a black box that, for every pont y, returns ts projecton on K. Often convex sets used n applcatons are smple to project to. Example 5 If K = unt sphere, then the projecton of y s y/ y. Here s a smple algorthm for solvng the constraned optmzaton problem. he algorthm only needs to access f va a gradent oracle and K va a projecton oracle. Defnton (Gradent Oracle) A gradent oracle for a functon f s a black box that, for every pont z, returns f(z) the gradent valuated at pont z. (Notce, ths s a lnear functon of the form gx where g s the vector of partal dervatves evaluated at z.) he same value of η wll be used throughout. Gradent Descent for Constraned Optmzaton Let η == D G. Repeat for = 0 to y (+1) x () + η f(x () ) x (+1) Projecton of y (+1) on K. At the end output z = 1 x().

6 6 Let us analyse ths algorthm as follows. Let x be the pont where the optmum s attaned. Let G denote an upperbound on f(x) for any x K, and let D = max x,y K x y be the so-called dameter of K. o ensure that the output z satsfes f(z) f(x ) + ɛ we wll use = 4D G. ɛ Snce x () s a projecton of y () on K we have x (+1) x y (+1) x = x () x η f(x () ) = x () x + η (f)(x () ) + η f(x () ) (x () x ) Reorganzng and usng defnton of G we obtan: f(x () ) (x x () ) 1 η ( x () x x (+1) x ) + η G Usng (3), we can lowerbound the left hand sde by f(x () ) f(x ). We conclude that f(x () ) f(x ) 1 η ( x () x x (+1) x ) + η G. (7) Now sum the prevous nequalty over = 1,,..., and use the telescopng cancellatons to obtan 1 =1 Fnally, by convexty f( 1 x() satsfes (f(x () ) f(x )) 1 η ( x (0) x x ( ) x ) + η G. x() ) 1 f(x() ) so we conclude that the pont z = f(z) f(z ) D η + η G. Now set η = D G DG to get an upperbound on the rght hand sde of. Snce = 4D G ɛ we see that f(z) f(x ) + ɛ. 4 Onlne Gradent Descent Znkevch notced that the analyss of gradent descent apples to a much more generalzed scenaro. here s a convex set K gven va a projecton oracle. For = 1,,..., we are presented at step a convex functon f. At step we have to put forth our guess soluton x () K but the catch s that we do not know the functons that wll be presented n future. So our onlne decsons have to be made such that f x s the pont w that mnmzes f (w) (.e. the pont that we would have chosen n hndsght after all the functons were revealed) then the followng quantty (called regret) should stay small: f (x () ) f (x ).

7 7 he above gradent descent algorthm can be adapted for ths problem by replacng f(x () ) by f (x () ). hs algorthm s called Onlne Gradent Descent. he earler analyss works essentally unchanged, once we realze that the left hand sde of (7) has the regret for tral. Summng over gves the total regret on the left sde, and the rght hand sde s analysed and upperbounded as before. hus we have shown: heorem 1 (Znkevch 003) If D s the dameter of K and G s an upperbound on the norm of the gradent of any of the presented functons, and η s set to then the regret per step after steps s at most DG. D G Example 6 (Spam classfcaton aganst adaptve adversares) We return to the spam classfcaton problem of Example 4, wth the new twst that ths classfer changes over tme, as spammers learn to evade the current classfer. hus there s no fxed dstrbuton of spam emals and thus t s frutless to tran the classfer at one go. It s better to have t mprove and adapt tself as new examples come up. At step t the optmum classfer f t may not be known and s presented usng a gradent oracle. hs functon just corresponds to the term n (6) correspondng to the latest emal that was classfed as spam/nonspam. Znkevch s theorem mples that our sequence of classfers has low regret. 5 Stochastc Gradent Descent Stochastc gradent descent s a varant of the algorthm n Secton 3 that works wth convex functons presented usng an even weaker noton: an expected gradent oracle. Gven a pont z, ths oracle returns a lnear functon gx + f that s drawn from a probablty dstrbuton D z such that the expectaton E g,f Dz [gx + f] s exactly the gradent of f at z. Example 7 (Spam classfcaton usng SGD) Returnng to the spam classfcaton problem of Example 4, we see that the functon n (6) s a sum of many smlar terms. If we randomly pck a sngle term and compute just ts gradent (whch s very quck to do!) then by lnearty of expectatons, the expectaton of ths gradent s just the true gradent. hus we see that the expected gradent oracle may be a much faster computaton than the gradent oracle (a mllon tmes faster f the number of emal examples s a mllon!). In fact ths settng s not atypcal; often the convex functon of nterest s a sum of many smlar terms. Stochastc gradent descent uses Onlne Gradent Descent (OGD). Let g x be the gradent at step. hen we use ths functon whch s a lnear functon and hence convex as f n the th step of OGD. Let z = 1 =1 x(). Let x be the pont n K where f attans ts mnmum value. heorem E[f(z)] f(x ) + DG, where D s the dameter as before and G s an upperbound of the norm of any gradent vector ever output by the oracle.

8 8 Proof: f(z) f(x ) 1 1 = 1 = 1 (f(x () ) f(x )) by convexty of f f(x () ) (x () x ) usng () E[g (x () x )] E[f (x ()) f (x )] = 1 E[ (f (x () ) f (x )] Snce expected gradent s the true gradent Defn. of f and the theorem now follows snce the expresson n the E[ ] s just the regret, whch s always upperbounded by the quantty gven n Znkevch s theorem, so the same upperbound holds also for the expectaton. 6 Hnt of more advanced deas We covered some of these n the Frday specal secton. Gradent descent algorthms come n dozens of flavors. (he Boyd-Vandenberghe book s a good resource. and Nesterov s lecture notes are terser but stll have a lot of ntuton.) Surprsngly, just gong along the gradent (more precsely, dametrcally opposte drecton from gradent) s not always the best strategy. Steepest descent drecton s defned by quantfyng the best decrease n the objectve functon obtanable va a step of unt length. he catch s that dfferent norms can be used to defne unt length. For example, f dstance s measured usng l 1 norm, then the best reducton happens by pckng the largest coordnate of the gradent vector and reducng the correspondng coordnate n x (coordnate descent). he classcal Newton method s a subcase where dstance s measured usng the ellpsodal norm defned usng the Hessan. Gradent descent deas underle recent advances n algorthms for problems lke Spelman- eng style solver for Laplacan systems, near-lnear tme approxmaton algorthms for maxmum flow n undrected graphs, and Madry s faster algorthm for maxmum weght matchng. Bblography 1. Convex Optmzaton, S. Boyd and L. Vandenberghe. Cambrdge Unversty Press. (pdf avalable onlne.). Introductory Lectures on Convex Optmzaton: A Basc Course. Y. Nesterov. Sprnger Lecture notes on onlne optmzaton. Elad Hazan. 4. Lecture notes on onlne optmzaton. S. Bubeck.

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CSCI B609: Foundations of Data Science

CSCI B609: Foundations of Data Science CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

The Experts/Multiplicative Weights Algorithm and Applications

The Experts/Multiplicative Weights Algorithm and Applications Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0 Bézer curves Mchael S. Floater September 1, 215 These notes provde an ntroducton to Bézer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of

More information

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014) 0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Spectral Graph Theory and its Applications September 16, Lecture 5

Spectral Graph Theory and its Applications September 16, Lecture 5 Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d. SELECTED SOLUTIONS, SECTION 4.3 1. Weak dualty Prove that the prmal and dual values p and d defned by equatons 4.3. and 4.3.3 satsfy p d. We consder an optmzaton problem of the form The Lagrangan for ths

More information

Lecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights

Lecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights Lecture 7 Solvng LPs/SDPs usng Multplcatve Weghts In the last lecture we saw the Multplcatve Weghts (MW) algorthm and how t could be used to effectvely solve the experts problem n whch we have many experts

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

14 Lagrange Multipliers

14 Lagrange Multipliers Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Perfect Competition and the Nash Bargaining Solution

Perfect Competition and the Nash Bargaining Solution Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013 Stanford Unversty Graph Parttonng and Expanders Handout 3 Luca Trevsan May 8, 03 Lecture 3 In whch we analyze the power method to approxmate egenvalues and egenvectors, and we descrbe some more algorthmc

More information

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n + Topcs n Theoretcal Computer Scence May 4, 2015 Lecturer: Ola Svensson Lecture 11 Scrbes: Vncent Eggerlng, Smon Rodrguez 1 Introducton In the last lecture we covered the ellpsod method and ts applcaton

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

P exp(tx) = 1 + t 2k M 2k. k N

P exp(tx) = 1 + t 2k M 2k. k N 1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.

More information

Eigenvalues of Random Graphs

Eigenvalues of Random Graphs Spectral Graph Theory Lecture 2 Egenvalues of Random Graphs Danel A. Spelman November 4, 202 2. Introducton In ths lecture, we consder a random graph on n vertces n whch each edge s chosen to be n the

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Math 217 Fall 2013 Homework 2 Solutions

Math 217 Fall 2013 Homework 2 Solutions Math 17 Fall 013 Homework Solutons Due Thursday Sept. 6, 013 5pm Ths homework conssts of 6 problems of 5 ponts each. The total s 30. You need to fully justfy your answer prove that your functon ndeed has

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information