15 Lagrange Multipliers
|
|
- Amber Lamb
- 5 years ago
- Views:
Transcription
1 15 The Method of s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve physcs equatons), t s used for several ey dervatons n machne learnng. The problem set-up s as follows: We wsh to fnd extrema.e., maxma or mnma) of a dfferentable objectve functon Ex) = Ex 1,x 2,...x D ). 1) If we have no constrants on the problem, then the extrema must necessarly satsfy the followng system of equatons n terms of the gradent ofe: E = 0. 2) Ths s equvalent to requrng that E x = 0 for all. Ths equaton says that there s no way to nfntesmally perturb x to get a dfferent value for E. That s, the objectve functon s locally flat. ow, however, our goal wll be to fnd extrema subject to a constrant: gx) = 0. 3) In other words, we want to fnd the extrema among the set of pontsx, all of whch satsfygx) = 0. It s sometmes possble to reparameterze the problem to elmnate the constrants.e., so that the new parameterzaton ncludes all possble solutons to gx) = 0). But ths can be awward n some cases, and mpossble n others. Gven the constrant, gx) = 0, we are no longer loong for a pont where no perturbaton n any drecton changes E. Instead, we need to fnd a pont at whch perturbatons that satsfy the constrants do not change E. Ths can be expressed by the followng condton: E +λ g = 0, 4) for some arbtrary scalar value λ. Frst note that, for ponts on the contour gx) = 0, the gradent g s always perpendcular to the contour ths s a great exercse f you don t remember how to prove that ths s true). Hence the expresson E = λ g says that the gradent of E must be parallel to the gradent of the contour at a possble soluton pont. In other words, any perturbaton to x that changes E also maes the constrant become volated. Perturbatons that do not change g, and hence stll le on the contour gx) = 0 do not change E ether. Hence, our goal s to fnd a pont x that satsfes ths gradent condton and alsogx) = 0 In the method of Lagrange multplers, we change the constraned optmzaton above nto an unconstraned optmzaton wth a new objectve functon, called the Lagrangan: Lx,λ) = Ex)+λgx). 5) ow, our goal s to fnd extrema of L wth respect to both x and λ. The ey fact s that extrema of the unconstraned objectve L are the extrema of the orgnal constraned problem. So we Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 91
2 E x g gx) = 0 Fgure 1: The set of solutons togx) = 0 vsualzed as a curve. The gradent g s always normal to the curve. At an extremal pont, E ponts s parallel to g. Fgure from Pattern Recognton and Machne Learnng by Chrs Bshop.) have elmnated the nasty constrants by changng the objectve functon and also ntroducng new unnowns. To see why, let s loo at the extrema of L. Because L depends on two parameters ts extrema must necessarly satsfy two gradent constrants,.e., λ x = gx) = 0 6) = E +λ g = 0. 7) One can mmedately see that these gradent constrants are exactly the condtons gven above. The frst equaton ensures that gx) s zero, and the second s our constrant that the gradents of E and g mucst be parallel. Usng the Lagrangan s a convenent way to combne these two constrants nto one unconstraned optmzaton Examples Mnmzng on a crcle. We begn wth a smple geometrc example. We have the followng constraned optmzaton problem: argmn x,y x+y 8) subject tox 2 +y 2 = 1 9) In words, we want to fnd the pont on a unt crcle that mnmzes x+y. The problem s depcted n Fg. 2. Here, Ex,y) = x+y and gx,y) = x 2 +y 2 1. The Lagrangan for ths problem s gven by Lx,y,λ) = x+y +λx 2 +y 2 1). 10) Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 92
3 Fgure 2: Illustraton of the maxmzaton on a crcle problem. Image from Wpeda.) Settng the gradent to zero wth respect tox,y andλ) gves us the followng system of equatons: = 1+2λx = 0 11) x = 1+2λy = 0 12) y λ = x2 +y 2 1 = 0 13) The frst two equatons ensure that x = y. Substtutng ths nto the constrant and solvng gves two solutons x = y = ± 1 2. Substtutng these two solutons nto the objectve, we fnd that the mnmum occurs atx = y = 1 2. Estmatng a Categoral dstrbuton. A Categoral dstrbuton over a random varable c wth K possble dscrete, dsjont states or outcomes). Accordngly t s specfed by K probabltes, denoted here byp : Pc = ) p, 14) for = 1...K, and let p = p 1,...,p K ). For example, n con-flppng the outcome of a con flp follows a Bernoull dstrbuton, whch s the specal case of a Categorcal dstrbuton when K = 2, andc = 1 mght ndcate that the con lands heads sde up. Suppose we observe ndependent draws from such a random process,.e., we observe the sequence c 1:. The lelhood of the observed data s therefore the product of the ndependent lelhoods: Pc 1: p) = K =1Pc p) = p, 15) Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 93
4 where s the number of tmes that c =,.e., the number of occurrences of the -th state. To estmate ths Categorcal dstrbuton, we mnmze the negatve log-lelhood of the observed data, mn lnp 16) subject to p = 1 and p 0, for all. 17) The constrants here are requred to ensure that the p s form a vald probablty dstrbuton. One way to optmze ths problem s to reparameterze the probabltes,.e., replacep K n the lelhood by1 K 1 =1 p, and then optmze the unconstraned problem n closed-form. Whle ths method does wor n ths case, t breas the natural symmetry of the problem, resultng n some messy calculatons. Moreover, ths method often cannot be generalzed to other problems.) The Lagrangan for ths problem s Lp,λ) = ) lnp + λ p 1. 18) Here, we omt the constrant that p 0 and hope that ths constrant wll be satsfed by the soluton t wll). Settng the gradent to zero gves Multplyng / p = 0 byp, and summng over yelds = +λ = 0 for all p p 19) λ = p 1 = 0 20) 0 = K =1 +λ p = +λ, 21) snce = and p = 1. Hence, the optmal λ =. Substtutng ths nto / p and solvng yelds the estmated probabltes p =, 22) whch s the famlar maxmum-lelhood estmator for a Categorcal dstrbuton. Maxmum varance PCA. In the orgnal formulaton of PCA, the goal s to fnd a low-dmensonal projecton of data ponts y. Here, suppose we just want to fnd a one-dmensonal subspace spanned by the vector w. In that case the subspace projecton s gven by x = w T y b). 23) Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 94
5 One way to formulate PCA s as a optmzaton to fnd the drecton w whch maxmzes the varance of the projecton, subject to the constrant that w T w = 1. The Lagrangan can be expressed as ) Lw,b,λ) = 1 x 1 2 x +λw T w 1) = 1 w T y b) 1 2 w T y b)) +λw T w 1) = 1 w T y b) 1 2 y b))) +λw T w 1) = 1 w T y ȳ) ) 2 +λw T w 1) = 1 w T y ȳ)y ȳ) T w+λw T w 1) = w T 1 y ȳ)y ȳ) )w+λw T T w 1), 24) where ȳ = y /. Solvng / w = 0 yelds 1 y ȳ)y ȳ) )w T = λw 25) Ths s the egenvector equaton. That s, w must be an egenvector of the sample covarance matrx of the y s. And λ must be the correspondng egenvalue. In order to determne whch one, we can substtute ths equalty nto the Lagrangan to obtan L = w T λw+λw T w 1) = λ, 26) snce w T w = 1. Snce our goal s to maxmze the varance, we choose the egenvector w whch has the largest egenvalue λ. We have not yet selected b, but t s clear that the value of the objectve functon does not depend on b, so we mght as well set t to be the mean of the data b = y /, whch results n the x s havng zero mean,.e., x / = Least-Squares PCA n 1D Let s now consder a dfferent way to formulate PCA. Instead of fndng the drecton of maxmum varance, let s fnd the one-dmensonal projecton whch mnmzes the squared error n the subspace approxmaton. Specfcally, we are gven a collecton of data vectors y 1:, and wsh to fnd Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 95
6 a bas b, a sngle unt vectorw, and one-dmensonal coordnates x 1:, to mnmze: arg mn y wx +b) 2 27) w,x 1:,b subject tow T w = 1 28) Here, x specfes poston along a lne wth drecton w and dstance from the orgn b. The total error s the sum of squared Eucldean dstances between data pontsy and ther correspondng ponts on the model lne. 1 The vectorws called the frst prncpal component. The Lagrangan s: Lw,x 1:,b,λ) = y wx +b) 2 +λ w 2 1) 29) There are several sets of unnowns, and we derve ther optmal values each n turn. Projectons x ). We frst derve the projectons: Usngw T w = 1 and solvng for x gves: x = 2w T y wx +b)) = 0 30) x = w T y b) 31) Bas b). We begn by dfferentatng: b = 2 y wx +b)) 32) Substtutng n Equaton 31 gves b = 2 y ww T y b)+b)) = 2 y +2ww T y 2ww T b+2b = 2I ww T ) y +2I ww T )b = 0 33) Factorng out2i ww T ) from both terms, one can see that we obtan b = 1 y 34) 1 It s mportant to note that ths optmzaton problem dffers n subtle ways from the lnear regresson earler n the notes. Wth lnear regresson we had mult-dmensonal nputs and a scalar output. Here we have vector-valued data y and we are tryng to fnd a scalar nput x. In lnear regresson we mnmzed the error n the predcted y.e., the vertcal dstance of each pont to the curve), whle here the error s the Eucldean dstant from each 2D data pont y to a locaton on the model lne. Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 96
7 Bass vector w). To mae thngs smpler, we wll defne ỹ = y b) as the mean-centered data ponts, and the reconstructons are thenx = w T ỹ, and the objectve functon s: L = = = = ỹ wx 2 +λw T w 1) ỹ ww T ỹ 2 +λw T w 1) ỹ ww T ỹ ) T ỹ ww T ỹ )+λw T w 1) ỹ T ỹ 2ỹ T ww T ỹ +ỹ T ww T ww T ỹ )+λw T w 1) = ỹ T ỹ ỹ T w) 2 +λw T w 1) 35) where we have usedw T w = 1. We then dfferentate and smplfy: We can rearrange ths to get: w = 2 ỹ ỹ T w+2λw = 0 36) ) ỹ ỹ T w = λw 37) Ths s exactly the egenvector equaton, meanng that extrema for L occur when w s an egenvector of the matrx ỹỹ T, andλs the correspondng egenvalue. Multplyng both sdes by 1/, we see ths matrx has the same egenvectors as the data covarance: 1 y b)y b) )w T = λ w 38) ow we must determne whch egenvector to use. To ths end, we rewrte Eqn. 35) as L = = ỹ T ỹ w T ỹ ỹ T w+λw T w 1) ) ỹ T ỹ w T ỹ ỹ T w+λw T w 1), 39) and substtute n Eqn. 37): L = = ỹ T ỹ λw T w+λw T w 1) ỹ T ỹ λ, 40) agan usng w T w = 1. We must pc the egenvalue λ that gves the smallest value of L. Hence, we pc the largest egenvalue, and setwto be the correspondng egenvector. Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 97
8 15.3 Multple Constrants When we wsh to optmze wth respect to multple constrants {g x)},.e., argmn x Ex) 41) subject to g x) = 0 for = 1...K 42) Extrema occur when: E + λ g = 0 43) where we have ntroduced K Lagrange multplers λ. The constrants can be combned nto a sngle Lagrangan: Lx,λ 1:K ) = Ex)+ λ g x) 44) 15.4 Inequalty Constrants The method can be extended to nequalty constrants of the form gx) 0. For a soluton to be vald and maxmal, there two possble cases: The optmal soluton s nsde the constrant regon, and, hence E = 0 and gx) > 0. In ths regon, the constrant s nactve, meanng thatλcan be set to zero. The optmal soluton les on the boundarygx) = 0. In ths case, the gradent E must pont n the opposte drecton of the gradent of g; otherwse, followng the gradent of E would causeg to become postve whle also modfynge. Hence, we must have E = λ g for λ 0. ote that, n both cases, we have λgx) = 0. Hence, we can enforce that one of these cases s found wth the followng optmzaton problem: max w,λ Ex)+λgx) 45) such that gx) 0 46) λ 0 47) λgx) = 0 48) These are called the Karush-Kuhn-Tucer KKT) condtons, whch generalze the Method of Lagrange Multplers. When mnmzng, we want E to pont n the same drecton as g when on the boundary, and so we mnmze E λg nstead of E +λg. Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 98
9 E x 1 g x 2 gx) > 0 gx) = 0 Fgure 3: Illustraton of the condton for nequalty constrants: the soluton may le on the boundary of the constrant regon, or n the nteror. Fgure from Pattern Recognton and Machne Learnng by Chrs Bshop.) Copyrght c 2018 Aaron Hertzmann and Davd J. Fleet 99
14 Lagrange Multipliers
Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationLagrange Multipliers. A Somewhat Silly Example. Monday, 25 September 2013
Lagrange Multplers Monday, 5 September 013 Sometmes t s convenent to use redundant coordnates, and to effect the varaton of the acton consstent wth the constrants va the method of Lagrange undetermned
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information13 Principal Components Analysis
Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationMath1110 (Spring 2009) Prelim 3 - Solutions
Math 1110 (Sprng 2009) Solutons to Prelm 3 (04/21/2009) 1 Queston 1. (16 ponts) Short answer. Math1110 (Sprng 2009) Prelm 3 - Solutons x a 1 (a) (4 ponts) Please evaluate lm, where a and b are postve numbers.
More informationPhysics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1
P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationOPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming
OPTIMIATION Introducton ngle Varable Unconstraned Optmsaton Multvarable Unconstraned Optmsaton Lnear Programmng Chapter Optmsaton /. Introducton In an engneerng analss, sometmes etremtes, ether mnmum or
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationWeek3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity
Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLaboratory 3: Method of Least Squares
Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationPhysics 5153 Classical Mechanics. Principle of Virtual Work-1
P. Guterrez 1 Introducton Physcs 5153 Classcal Mechancs Prncple of Vrtual Work The frst varatonal prncple we encounter n mechancs s the prncple of vrtual work. It establshes the equlbrum condton of a mechancal
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More information( ) 2 ( ) ( ) Problem Set 4 Suggested Solutions. Problem 1
Problem Set 4 Suggested Solutons Problem (A) The market demand functon s the soluton to the followng utlty-maxmzaton roblem (UMP): The Lagrangean: ( x, x, x ) = + max U x, x, x x x x st.. x + x + x y x,
More information= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.
Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and
More informationUnit 5: Quadratic Equations & Functions
Date Perod Unt 5: Quadratc Equatons & Functons DAY TOPIC 1 Modelng Data wth Quadratc Functons Factorng Quadratc Epressons 3 Solvng Quadratc Equatons 4 Comple Numbers Smplfcaton, Addton/Subtracton & Multplcaton
More informationSIO 224. m(r) =(ρ(r),k s (r),µ(r))
SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationFisher Linear Discriminant Analysis
Fsher Lnear Dscrmnant Analyss Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan Fsher lnear
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationMechanics Physics 151
Mechancs Physcs 5 Lecture 0 Canoncal Transformatons (Chapter 9) What We Dd Last Tme Hamlton s Prncple n the Hamltonan formalsm Dervaton was smple δi δ p H(, p, t) = 0 Adonal end-pont constrants δ t ( )
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More information12. The Hamilton-Jacobi Equation Michael Fowler
1. The Hamlton-Jacob Equaton Mchael Fowler Back to Confguraton Space We ve establshed that the acton, regarded as a functon of ts coordnate endponts and tme, satsfes ( ) ( ) S q, t / t+ H qpt,, = 0, and
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationCHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG
Chapter 7: Constraned Optmzaton CHAPER 7 CONSRAINED OPIMIZAION : SQP AND GRG Introducton In the prevous chapter we eamned the necessary and suffcent condtons for a constraned optmum. We dd not, however,
More informationRadar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.
Radar rackers Study Gude All chapters, problems, examples and page numbers refer to Appled Optmal Estmaton, A. Gelb, Ed. Chapter Example.0- Problem Statement wo sensors Each has a sngle nose measurement
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationMATH Sensitivity of Eigenvalue Problems
MATH 537- Senstvty of Egenvalue Problems Prelmnares Let A be an n n matrx, and let λ be an egenvalue of A, correspondngly there are vectors x and y such that Ax = λx and y H A = λy H Then x s called A
More informationThe Feynman path integral
The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space
More informationLaboratory 1c: Method of Least Squares
Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly
More informationEconomics 101. Lecture 4 - Equilibrium and Efficiency
Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationLecture 6: Support Vector Machines
Lecture 6: Support Vector Machnes Marna Melă mmp@stat.washngton.edu Department of Statstcs Unversty of Washngton November, 2018 Lnear SVM s The margn and the expected classfcaton error Maxmum Margn Lnear
More informationCHAPTER 14 GENERAL PERTURBATION THEORY
CHAPTER 4 GENERAL PERTURBATION THEORY 4 Introducton A partcle n orbt around a pont mass or a sphercally symmetrc mass dstrbuton s movng n a gravtatonal potental of the form GM / r In ths potental t moves
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More information8.6 The Complex Number System
8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want
More information