Statistical inference with probabilistic graphical models

Size: px
Start display at page:

Download "Statistical inference with probabilistic graphical models"

Transcription

1 Statstcal nference wth probablstc graphcal models Angélque Drémeau, Chrstophe Schülke, Yngyng Xu, Devavrat Shah September 18, 2014 arxv: v1 [cs.lg] 17 Sep 2014 These are notes from the lecture of Devavrat Shah gven at the autumn school Statstcal Physcs, Optmzaton, Inference, and Message-Passng Algorthms, that took place n Les Houches, France from Monday September 30th, 2013, tll Frday October 11th, The school was organzed by Florent Krzakala from UPMC & ENS Pars, Federco Rcc-Tersengh from La Sapenza Roma, Lenka Zdeborová from CEA Saclay & CNRS, and Rccardo Zecchna from Poltecnco Torno. École Normale Supéreure, France Unversté Pars Dderot, France Tokyo Insttute of Technology, Japan Massachusetts Insttute of Technology, USA 1

2 Contents 1 Introducton to Graphcal Models Inference Graphcal models Drected GMs Undrected GMs Clques Factor graphs Image processng Crowd-sourcng MAP and MARG Inference Algorthms: Elmnaton, Juncton Tree and Belef Propagaton The elmnaton algorthm Juncton Tree property and chordal graphs Juncton Tree (JCT) property Chordal graph Procedure to fnd a JCT Tree wdth Belef propagaton (BP) algorthms Factor graphs Understandng Belef Propagaton Exstence of a fxed pont Nature of the fxed ponts Background on Nonlnear Optmzaton Belef Propagaton as a varatonal problem Can the fxed ponts be reached? The hardcore model Learnng Graphcal Models Parameter learnng Sngle parameter learnng Drected graphs Undrected graphs Graphcal model learnng Drected graphs Undrected graphs Latent Graphcal Model learnng: the Expectaton-maxmzaton algorthm References 27 2

3 1 Introducton to Graphcal Models 1.1 Inference Consder two random varables A and B wth a jont probablty dstrbuton P A,B. From the observaton of the realzaton of one of those varables, say B = b, we want to nfer the one that we dd not observe. To that end, we compute the condtonal probablty dstrbuton P A B, and use t to obtan an estmate â(b) of a. To quantfy how good ths estmate s, we ntroduce the error probablty: P error P (A â(b) B = b) (1) = 1 P (A = â(b) B = b), and we can see from the second equalty that mnmzng ths error probablty s equvalent to the followng maxmzaton problem, called maxmum a posteror (MAP) problem: â(b) = arg max P A B (a b). (2) a The problem of computng P A B (a b) for all a gven b s called the margnal (MARG) problem. When the number of random varables ncreases, the MARG problem becomes dffcult, because an exponental number of combnatons has to be calculated. Fano s nequalty provdes us an nformaton-theoretcal way of ganng nsght nto how much nformaton about a the knowledge of b can gve us: wth P error H(A B) 1, (3) log A H(A B) = b H(A B = b) = a P B (b)h(a B = b), ( P A B (a b)log 1 P A B (a b) ). Fano s nequalty formalses only a theoretcal bound that does not tell us how to actually make an estmaton. From a practcal pont of vew, graphcal models (GM) consttute here a powerful tool allowng us to wrte algorthms that solve nference problems. 1.2 Graphcal models Drected GMs Consder N random varables X 1 X N on a dscrete alphabet X, and ther jont probablty dstrbuton P X1 X N. We can always factorze ths jont dstrbuton n the followng way: P X1 X N = P X1 P X2 X 1 P XN X 1 X N 1 (4) and represent ths factorzed form by the followng drected graphcal model: 3

4 1 2 3 N Fgure 1: A drected graphcal model representng the factorzed form (4). In ths graphcal model, each node s affected to a random varable, and each drected edge represents a condtonng. The way that we factorzed the dstrbuton, we obtan a complcated graphcal model, n the sense that t has many edges. A much smpler graphcal model would be: N Fgure 2: A smpler graphcal model representng the factorzed form (5). The latter graphcal model corresponds to a factorzaton n whch each of the probablty dstrbutons n the product s condtoned on only one varable: P X1 X N = P X1 P X2 X 1 P XN X N 1 (5) In the most general case, we can wrte a dstrbuton represented by a drected graphcal model n the factorzed form: P X1 X N = P X X Π, (6) where X Π s the set contanng the parents of X (the vertces from whch an edge ponts to ). The followng notatons wll hold for the rest of ths chapter: random varables are captalzed: X, realzatons of random varables are lower case: x, a set of random varables {X 1 X N } s noted X, a set of realzatons of X s noted x, the subset of random varables of ndces n S s noted X S Undrected GMs Another type of graphcal model s the undrected graphcal model. In that case, we defne the graphcal model not through the factorzaton, but through ndependence. 4

5 Let: G(V, E) V = {1,, N} E V V be an undrected graphcal model, where s the set of vertces, and s the set of edges. Each vertex V of ths GM represents one random varables X, and each edge (, j) E represents a condtonal dependence. As the GM s undrected, we have (, j) (j, ). We defne: N() {j V (, j) E} the set contanng the neghbours of. (7) Undrected graphcal model captures followng dependence: P X X V\{} P X X N(), (8) meanng that only varables connected by edges have a condtonal dependence. Let A V, B V, C V. We wrte that X A X B X C f A and B are dsjont and f all pathes leadng from one element of A to one element of B lead over an element of C, as s llustrated n Fg. 3. In other words, f we remove C, then A and B are unconnected (Fg. 4). A C B Fgure 3: Schematc vew of a graphcal model n whch X A X B X C. All paths leadng from A to B go through C. remove C A C B A B Fgure 4: Smple vew showng the ndependence of A and B condtoned on C. Undrected GMs are also called Markov random felds (MRF) Clques (Defnton) A clque s a subgraph of a graph n whch all possble pars of vertces are lnked by an edge. A maxmal clque s a clque that s contaned by no other clque. 5

6 Fgure 5: In ths graphcal model, the maxmal clques are {1, 2, 4}, {2, 3, 4} and {4, 5}. Theorem 1 ([4]) Gven a MRF G and a probablty dstrbuton P X (x ) > 0. Then: P X (x ) C C φ C (x C ) (9) where C s the set of clques of G. Proof 1 ([3]) for X = {0, 1}. We wll show the followng, equvalent formulaton: by exhbtng the soluton: wth P X (x ) e C C V C(x C ) V C (x C ) = { Q(C) f x C = 1 C, 0 otherwse, (10) (11) Q(C) = ( 1) C A ( ln P X xa = 1 A, x V \A = 0 ). (12) }{{} A C G(A) Suppose we have an assgnement X N(X ) = { x = 1}. We want to prove that: G(N(X )) ln P X (x ), = C C V C (x C ), = C N(x ) Ths s equvalent to provng the two clams: Let us begn by provng C1: C1 : S C, G(S) = A S Q(A) C2 : f A s not a clque, Q(A) = 0 A S Q(A) = A S ( 1) A B G(B) B A = G(B) B S B A S Q(C). (13) ( 1) A B (14) 6

7 where we note that the term n brackets s zero except when B = S, because we can rewrte t as ( ) l ( 1) l = ( 1 + 1) k = 0. (15) k 0 l k Therefore G(S) = A S Q(A). For C2, suppose that A s not a clque, whch allows us to choose (, j) A wth (, j) / E. Then Q(A) = ( 1) A B [G(B) G(B + ) + G(B + + j) G(B + j)]. B A\{,j} Let us show that the term n brackets s zero by showng or equvalently G(B + + j) G(B + j) = G(B + ) G(B) ln P X(x B =1 B,x =1,x j=1,x V\{,j,B} =0) P X(x B =1 B,x =0,x j=1,x V\{,j,B} =0) = ln P X(x B =1 B,x =1,x j=0,x V\{,j,B} =0) P X(x B =1 B,x =0,x j=0,x V\{,j,B} =0), where V\{, j, B} stands for the set of all vertces except, j and those n B. We see that the only dfference between the left-hand sde and the rght-hand sde s the value taken by x j. Usng Bayes rule, we can rewrte both the rght-hand sde and the left-hand sde under the form ln P X(X =1 X j=±1,x B =1 B,X V\{,j,B} =0) P X(X =0 X j=±1,x B =1 B,X V\{,j,B} =0)). As (, j) / E, the condtonal probabltes on X do not depend on the value taken by X j, and therefore the rght-hand sde equals the left-hand sde, Q(A) = 0 and C2 s proved. 1.3 Factor graphs Thanks to the Hammersley-Clfford theorem, we know that we can wrte a probablty dstrbuton correspondng to a MRF G n the followng way P X (x ) C C φ C (x C ) (16) where C s the set of maxmal clques of G. In a general defnton, we can also wrte P X (x ) φ F (x F ) (17) F F where the partton F 2 V has nothng to do wth any underlyng graph. In what follows, we gve two examples n whch ntroducng factor graphs s a natural approach to an nference problem. 7

8 1.3.1 Image processng We consder an mage wth bnary pxels (X = { 1, 1}), and a probablty dstrbuton: p(x ) e V θx+ (,j) E θjxxj (18) y 1 y 2 y k x 1 x 2 x k Fgure 6: Graphcal model representng a 2D mage. The fat crcles correspond to the pxels of the mage x k, and each one s lnked to a nosy measurement y k. Adjacent pxels are lnked by edges that allow modellng the assumed smoothness of the mage. For each pxel x k, we record a nosy verson y k. We consder natural mages, n whch bg jumps n ntensty between two neghbourng pxels are unlkely. Ths can be modelled wth: a x y + b x x j (19) (,j) E Ths way, the frst term pushes x k to match the measured value y k, whle the second term favours pecewse constant mages. We can dentfy θ ay and θ j b Crowd-sourcng Crowd-sourcng s used for tasks that are easy for humans but dffcult for machnes, and that are as hard to verfy as to evaluate. Crowd-sourcng then conssts n assgnng to each of M human workers a subset of N tasks to evaluate, and to collect ther answers A. Each worker has a dfferent error probablty p { 1 2, 1}: ether he gves random answers, or he s fully relable. The goal s to nfer both the correct values of each task, t j, and the p of each worker. The factor graph correspondng to that problem s represented n Fg 7. 8

9 Workers Tasks p 1 { 1 2, 1} A M N Fgure 7: Graphcal model llustratng crowd-sourcng. Each worker s assgned a subset of the tasks for evaluaton, and for each of those tasks a, hs answer A a s collected. The condtonal probablty dstrbuton of t and p knowng the answers A reads P t,p A P A t,p P t,p P A t,p (20) where we assumed a unform dstrbuton on the jont probablty P t,p. Then P A t,p = e P Ae t e,p e (21) wth P Ae t e,p e = 1.4 MAP and MARG ( ( pe 1 p e MAP. The MAP problem conssts n solvng: θ x + max x {0,1} N ) Aet e (1 p e )p e ) 1 2. (22) (,j) E θ j x x j. (23) When θ j, neghbourng nodes can not be n the same state anymore. Ths s the hard-core model, whch s very hard to solve. MARG. The MARG focuses on the evaluaton of margnal probabltes, dependng on only one random varable, for nstance: P X1 (0) = Z(X 1 = 0) Z (24) 9

10 as well as condtonal margnal probabltes: P X2 X 1 (X 2 = 0 X 1 = 0) = Z(X 1 = 0, X 2 = 0) Z(X 1 = 0) Z(all 0) P XN X 1 X N(1) (X N = 0 X 1 X N 1 = 0) = Z(all but X N are 0) (25) (26) P X1 (0) P XN X 1 X N 1 (0) = 1 Z (27) Both of these problems are computatonally hard. Can we desgn effcent algorthms to solve them? 2 Inference Algorthms: Elmnaton, Juncton Tree and Belef Propagaton In the MAP and MARG problems descrbed prevously, the hardness comes from the fact that wth growng nstance sze, the number of combnatons of varables over whch to maxmze or margnalze becomes quckly ntractable. But when dealng wth GMs, one can explot the structure of the GM n order to reduce the number of combnatons that have to be taken nto account. Intutvely, the smaller the connectvty of the varables n the GM s, the smaller ths number of combnaton becomes. We wll formalze ths by ntroducng the elmnaton algorthm, that gves us a systematc way of makng fewer maxmzatons/margnalzatons on a gven graph. We wll see how substantally the number of operatons s reduced on a graph that s not completely connected. 2.1 The elmnaton algorthm We consder the GM n Fg. 8 whch s not fully connected. The colored subgraphs represent the maxmal clques Fgure 8: A GM and ts maxmal clques. Usng decomposton (16), we can wrte P X (x ) φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). φ 245 (x 2, x 4, x 5 ). (28) We want to solve the MARG problem on ths GM, for example for calculatng the margnal probablty of x 1 : P X1 (x 1 ) = P X (x ). (29) x 2,x 3,x 4,x 5 10

11 A pror, ths requres to evaluate X 4 terms, each of them takng X dfferent values. In the end, 3 X X 4 operatons are needed for calculatng ths margnal navely. But f we take advantage of the factorzed form (28), we can elmnate some of the varables. The elmnaton process goes along these lnes: P X1 (x 1 ) φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). φ 245 (x 2, x 4, x 5 ), (30) x 2,x 3,x 4,x 5 φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). φ 245 (x 2, x 4, x 5 ), (31) x 2,x 3,x 4 x 5 φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). m 5 (x 2, x 4 ), (32) x 2,x 3,x 4 φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). m 5 (x 2, x 4 ), (33) x 2,x 3 x 4 φ 123 (x 1, x 2, x 3 ). m 4 (x 2, x 3 ), (34) x 2,x 3 ) x 2 ( x 3 φ 123 (x 1, x 2, x 3 ) m 4 (x 2, x 3 ), (35) x 2 m 3 (x 1, x 2 ), (36) m 2 (x 1 ). (37) Wth ths elmnaton process made, the number of operatons necessary to compute the margnal scales as X 3 nstead of X 5, thereby greatly reducng the complexty of the problem by usng the structure of the GM. Smlarly, we can rewrte the MAP problem as follows max φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). φ 245 (x 2, x 4, x 5 ), (38) x 1,x 2,x 3,x 4,x 5 = max x 1,x 2,x 3,x 4 φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). max x 5 φ 245 (x 2, x 4, x 5 ), (39) = max x 1,x 2,x 3,x 4 φ 123 (x 1, x 2, x 3 ). φ 234 (x 2, x 3, x 4 ). m 5(x 2, x 4 ), (40) = max x 1,x 2,x 3 φ 123 (x 1, x 2, x 3 ). max x 4 φ 234 (x 2, x 3, x 4 ). m 5(x 2, x 4 ), (41) = max φ 123 (x 1, x 2, x 3 ). m x 1,x 2,x 3 4(x 2, x 3 ), (42) ( ) = max max φ 123 (x 1, x 2, x 3 ) m x 1,x 2 x 3 4(x 2, x 3 ), (43) = max m x 1,x 2 3(x 1, x 2 ), (44) ( ) = max max m x 1 x 2 3(x 1, x 2 ), (45) leadng to x 1 = arg max x 1 m 2(x 1 ). (46) Just lke for the MARG problem, the complexty s reduced from X 5 (a pror) to X 3. We would lke to further reduce the complexty of the margnalzatons (n X 3 ). One smple dea would be to reduce the GM nto a lnear 11

12 Fgure 9: A lnear graph. Each margnalzaton s computed n X 2 operatons. graph as n Fg. 9. Y Y Y Y Y 123 X 3 Y 234 X 3 Y 245 X 3 Fgure 10: Lnear GM obtaned by groupng varables. By groupng varables n the GM (Fg. 8), t s n fact possble to obtan a lnear graph, as shown n Fg. 10, wth the assocated potentals φ 123 (Y 123 ), φ 234 (Y 234 ) and φ 245 (Y 245 ) and the consstency constrants Y Y and Y Y For other GMs, the smplest graph achevable by groupng varables mght be a tree nstead of a smple chan. But not all groupngs of varables wll lead to a tree graph that correctly represents the problem. In order for the groupng of varables to be correct, we need to buld the tree attached to the maxmal clques, and we have to resort to the Juncton Tree property. 2.2 Juncton Tree property and chordal graphs The Juncton Tree property allows us to fnd groupngs of varables under whch the GM becomes a tree (f such groupngs exst). On ths tree, the elmnaton algorthm wll need a lower number of maxmzatons/margnalzatons than on the ntal GM. However, there s a remanng problem: runnng the algorthm on the juncton tree does not gve a straghtforward soluton to the ntal problem, as the varables on the juncton tree are groupngs of varables of the orgnal problem. Ths means that further maxmzatons/margnalzatons are then requred to have a soluton n terms of the varables of the ntal problem Juncton Tree (JCT) property (Defnton) A graph G = (V, E) s sad to possess the JCT property f t has a Juncton Tree T whch s defned as follows: t s a tree graph such that ts nodes are maxmal clques of G an edge between nodes of T s allowed only f the correspondng clques share at least one vertex for any vertex v of G, let C v denote set of all clques contanng v. Then C v forms a connected sub-tree of T. Two questons then arse Do all graphs have a JCT? 12

13 If a graph has a JCT, how can we fnd t? Chordal graph (Defnton) A graph s chordal f all of ts loops have chords. Fg. 11 gves an llustraton of the concept. Fgure 11: The graph on the left s not chordal, the one on the rght s. Proposton 1 G has a juncton tree G s a chordal graph. Proof 2 of the mplcaton. Let us take a chordal graph G = (V, E) that s not complete, as represented n Fg. 12. A S B b a Fgure 12: On a chordal graph that s not complete, two vertces a and b that are not connected, separated by a subgraph S that s fully connected. We wll use the two followng lemmas that can be shown to be true: 1. If G s chordal, has at least three nodes and s not fully connected, then V = A B S, where all three sets are dsjont and S s a fully connected subgraph that separates A from B. 2. If G s chordal and has at least two nodes, then G has at least two nodes each wth all neghbors connected. Furthermore, f G s not fully connected, then there exst two nonadjacent nodes each wth all ts neghbors connected. 13

14 The property If G s a chordal graph wth N vertces, then t has a juncton tree. can be shown by nducton on N. For N = 2, the property s trval. Now, suppose that the property s true for all ntegers up to N. Consder a chordal graph wth N + 1 nodes. By the second lemma, G has a node a wth all ts neghbors connected. Removng t creates a graph G whch s chordal, and therefore has a JCT, T. Let C be the maxmal clque that a partcpates n. Ether C \ a s a maxmal-clque node n T, and n ths case addng a to ths clque node results n a juncton tree T for G. Or C \ a s not a maxmal-clque node n T. Then, C \ a must be a subset of a maxmal-clque node D n T. Then, we add C as a new maxmal-clque node n T, whch we connect to D to obtan a juncton tree T for G Procedure to fnd a JCT Let G be the ntal GM, and G(V, E) be the GM n whch V s the set of maxmal clques of G and (c 1, c 2 ) E f the maxmal clques c 1 and c 2 share a vertex. Let us take e = (c 1, c 2 ) wth c 1, c 2 V and defne the weght of e as w e = c 1 c 2. Then, fndng a juncton tree of G s equvalent to fndng the max-cut spannng tree of G. Denotng by T the set of edges n a tree, we defne the weght of the tree as W (T ) = e T w e (47) = c 1 c 2 e T = 1 {v e}. v V e T and we clam that W (T ) s maxmal when T s a JCT. Procedure to get the maxmum weghted spannng tree Lst all edges n a decreasng order, Include e n E 1 f you can. what we are left wth at the end of the algorthm s the maxmal weght spannng tree Tree wdth (Defnton) The wdth of a tree decomposton s the sze of ts maxmal clque mnus one. 14

15 Toy examples N Fgure 13: tree wdth = 2 (left), tree wdth = N (rght) 2.3 Belef propagaton (BP) algorthms Untl now, everythng we have done was exact. The elmnaton algorthm s an exact algorthm. But as we are nterested n effcent algorthms, as opposed to exact algorthms wth too hgh complextes to actually end n reasonable tme, we wll from now on ntroduce approxmatons. (k) j Fgure 14: Message passng on a graph. Comng back to the elmnaton algorthm (30)-(37), we can generalze the notatons used as m (x j ) x φ (x ). φ,j (x, x j ). k m k (x ). (48) Consderng now the same but orented GM (arrows on fgure above), we get m j (x j ) φ (x ). φ,j (x, x j ). m k (x ), (49) x where N() s the neghbourhood of x. k N()\j 15

16 The MARG problem can then be solved usng the sum-product procedure. Sum-product BP t = 0, (, j) E, (x, x j ) X 2 : m 0 j(x j ) = m 0 j (x ) = 1. (50) t > 0, m t+1 j (x j) x φ (x ). φ j (x, x j ). P t+1 X (x ) = k N() k N()\j m t k (x ), (51) m t+1 k (x ). (52) Whle, for the MAP problem, the max-sum procedure s consdered. Max-sum BP t = 0, m 0 j(x j ) = m 0 j (x ) = 1. (53) t > 0, m t+1 j (x j) max x φ (x ). φ j (x, x j ). x t+1 = arg max x φ (x ). k N() k N()\j m t k (x ), (54) m t+1 k (x ). (55) Note: here, we use only the potentals of pars. But n case of clques, we have to consder the JCT and terate on t. To understand ths pont, let us apply the sum-product algorthm on factor graphs Factor graphs Consderng the general notatons n Fg. 15, the sum-product BP algorthm s partcularzed such that m t+1 f (x ) = m t f (x ), (56) m t+1 f (x ) = f N()\f x j,j N(f)\ f(x, x j ) j N(f)\ m t j f (x j ). (57) On a tree, the leaves are sendng the rght messages at tme 1 already, and 16

17 f 1 f f k Fgure 15: A smple factor graph. after a number of tme steps proportonal to the tree dameter 1, all messages are correct: the steady pont s reached and the algorthm s exact. Therefore, BP s exact on trees. The JCT property dscussed before s therefore useful, and can n certan cases allow us to construct graphs on whch we know that BP s exact. However, the problem mentoned before remans: f BP s run on the JCT of a GM, subsequent maxmzatons/margnalzatons wll be necessary to recover the soluton n terms of the ntal problem s varables. 3 Understandng Belef Propagaton We have seen how to use the (exact) elmnaton algorthm n order to desgn the BP algorthms max-product and sum-product, that are exact only on trees. The JCT property has taught us how to group varables of an ntal loopy GM such that the resultng GM s a tree (when t s possble), on whch we can then run BP wth a guarantee of an exact result. However, the subsequent operatons that are necessary to obtan the soluton n terms of the ntal problem s varables can be a new source of ntractablty. Therefore, we would lke to know what happens f we use BP on the ntal (loopy) graph anyway. The advantage s that BP remans tractable because of the low number of operatons per teraton. The danger s that BP s not exact anymore and therefore we need to ask ourselves the followng 3 questons: 1. Does the algorthm have fxed ponts? 2. What are those fxed ponts? 3. Are they reached? The analyss wll be made wth the sum-product BP algorthm, but could be carred out smlarly for the max-product verson. 3.1 Exstence of a fxed pont The algorthm s of the type m t+1 = F ( m t) wth m t [0, 1] 2 E X (58) and the exstence of a fxed pont s guaranteed by a theorem. 1 The eccentrcty of a vertex v n a graph s the maxmum dstance from v to any other vertex. The dameter of a graph s the maxmum eccentrcty over all vertces n a graph. 17

18 3.2 Nature of the fxed ponts Let us remnd that we had factorzed P X (x ) n ths way: P X (x ) φ (x ) ψ j (x, x j ) V (,j) E = 1 Z eq(x ). (59) The fxed ponts are a soluton of the followng problem wth P X arg max E [Q(X)] + H(µ) (60) µ M(X N ) µ E µ [Q(X)] + H(µ) = x µ(x )Q(x ) x µ(x ) log µ(x ) = F (µ). (61) Let us fnd a bound for ths quantty. From (59), we get Q(x ) = log P X (x ) + log Z. Then F (µ) = µ(x ) log Z + µ(x ) log P X (x ) (62) µ(x ) x x [ = log Z + E log P ] X µ µ(x ) [ ] PX log Z + log E usng Jensen s nequalty µ µ log Z and the equalty s reached when the dstrbutons µ and P are equal. Ths maxmzaton n equaton (60) s made over the space of all possble dstrbutons, whch s a far too bg search space. But f we restrct ourselves to trees, we know that µ has the form: µ µ j µ (63) µ µ j BP has taught us that: µ φ µ j k N() k N()\j (,j) m k (64) m k φ ψ j φ j l N(j)\ m l j (65) If we margnalze µ j wth respect to x j, we should obtan µ : x j µ j (x, x j ) = µ (x ). Wrtng ths out, we obtan: m k φ ψ j φ j m l j = φ m k (66) k N()\j xj l N()\j k N() 18

19 and ths should lead us to what we beleve from the fxed ponts of BP. Let us make a recharacterzaton n terms of the fxed ponts. In order to lghten notatons, we wll wrte φ nstead of log φ and ψ nstead of log ψ: F Bethe (µ) = E µ φ +,j ψ j E µ [log µ] (67) We now use followng factorzaton E [log µ] = E [log µ ] ( ) E [log µ j ] E [log µ ] E [log µ j ] µ µ µ j µ µj j (68) and obtan a new expresson for the Bethe free energy F Bethe = ( ) (1 d ) H µ + E [φ ] + ( ) H(µ j ) + E [ψ j + φ + φ j ], (69) µ µj j where d s the degree of node Background on Nonlnear Optmzaton The problem max q G(q) s.t. Aq = b (70) can be expressed n a dfferent form by usng Lagrange multplers λ and maxmzng L(q, λ) = G(q) + λ T (Aq b) (71) max L(q, λ) = M(λ) G(q ) q nf λ M(λ) G(q ). Let us look at all λ such that q L(q) = 0. In a sense, BP s fndng statonary ponts of ths Lagrangan Belef Propagaton as a varatonal problem In our case, here are the condtons we wll enforce wth Lagrange multplers: µ j (x, x j ) 0 (72) x µ (x ) = 1 λ (73) x j µ j (x, x j ) = µ (x ) λ j (x ) (74) x µ j (x, x j ) = µ j (x j ) λ j (x j ) (75) 19

20 The complete Lagrangan reads L = F Bethe (µ) + λ ( x µ (x ) 1 ) + µ j (x, x j ) µ (x ) λ j (x ) j xj ( ) ] + µ j (x, x j ) µ j (x j ) λ j (x j ). (76) x We need to mnmze ths Lagrangan wth respect to all possble varables, whch we obtan by settng the partal dervatves to zero: L µ (x ) = 0 (77) = (1 d )(1 + log µ (x )) + (1 d )φ (x ) + λ λ j (x ) whch mposes followng equalty for the dstrbuton µ : j N() µ (x ) e φ(x)+ 1 d 1 j N() λj (x) (78) Let us now use the transformaton λ j (x ) = k N()\j log m k (x ), and we obtan λ j (x ) (d 1) log m j (x ). (79) j N() In the same way, we can show that: j N() L µ j (x, x j ) = 0 µ j(x, x j ) e φ(x)+φj(xj)+ψj(x,xj)+λj (x)+λ j(xj) Ths way, we found the dstrbutons µ and µ j that are the fxed ponts of BP. 3.3 Can the fxed ponts be reached? We wll now try to analyze f the algorthm can actually reach those fxed ponts that we have exhbted n the prevous secton. Let us look at the smple (but loopy) graph n Fg. 16. At tme t = 1, we have φ 1 1 φ 12 φ 13 φ φ 3 φ 23 Fgure 16: A smple loopy graph. 20

21 m 1 2 1(x 1 ) x 2 φ 2 (x 2 )φ 12 (x 1, x 2 ) m 0 3 2(x 2 ) }{{} =1 (80) and m x 3 φ 3 φ 13 (81) whch also corresponds to the messages of the modfed graph n Fg. 17. φ 1 1 φ 12 φ 13 φ φ 3 Fgure 17: Graph seen by BP at tme t = 1. 1 φ 12 φ φ 23 φ Fgure 18: Graph seen by BP at tme t = 2. At tme t = 2, the messages wll be as m x 2 φ 2 φ 12 m 1 3 2(x 2 ) (82) correspondng to the messages on the modfed graph n Fg. 18. If we ncrease t, the correspondng non-loopy graph gets longer at each tme step. Another way of seng ths s by lookng at the recurson equaton: F j (m ) = m j (83) m t+1 j = F j (m t ) m t+1 j m j = F j (m t ) F j (m ) = F j (θ) T (m t m ) (mean value theorem) m t+1 m F j (θ) 1 m t m (84) From ths last nequalty, t s clear that f we can prove that F j 1 s bounded by some constant ρ < 1, the convergence s proved. Unfortunately, t s not often easy to prove such a thng. 21

22 3.3.1 The hardcore model In the hardcore model, we have φ (x ) = 1 for all x {0, 1} (85) ψ j (x, x j ) = 1 x x j. (86) Instead of usng BP, let us do the followng gradent-descent lke algorthm: [ y(t + 1) = y(t) + α(t) F ] (87) y where the operator [.] s a clppng functon that ensures that the result stays n the nterval (0, 1). Ths s a projected verson of a gradent algorthm wth varable step sze α(t). Choosng ths step sze wth followng rule: y(t) α(t) = 1 t 1 2 d (88) then we can show that n a tme T n 2 2 d 1 ɛ we wll fnd F 4 b up to ɛ, and convergence s proved. 4 Learnng Graphcal Models In ths fnal secton, we focus on the learnng problem. In partcular, we consder three dfferent cases: Parameter learnng Gven a graph, the parameters are learned from the observaton of the entre set of realzatons of all random varables. Graphcal model learnng Both the parameters and the graph are learned from the observatons of the entre set of realzatons of all random varables. Latent graphcal model learnng The parameters and the graph are learned from partal observatons: some of the random varables are assumed to be hdden. 4.1 Parameter learnng Sngle parameter learnng We consder the followng smple settng where x s a Bernoull random varable wth parameter θ: { θ f x = 1, P X (x, θ) = (89) 1 θ f x = 0. Gven observatons {x 1,..., x S }, we are nterested n the MAP estmaton of the parameter θ: ˆθ MAP = arg max P (θ x 1,... x S ), θ [0,1] = arg max P (x 1,... x S θ) p(θ), (90) θ [0,1] 22

23 where maxmzng P (x 1,... x S θ) leads to the maxmum lkelhood (ML) estmator ˆθ ML of θ. Denotng D {x 1,... x S } the observed set of realzatons, we defne the emprcal lkelhood as follows: l(d; θ) = 1 S log P (x 1,... x S θ), = 1 log P (x θ), S = ˆP (1) log θ + ˆP (0) log(1 θ), (91) wth ˆP (1) = 1 S S 1 {x =1}. Dervatng (91) and settng the result to zero, we obtan the maxmal lkelhood estmator ˆθ ML : θ l(d; θ) = ˆP (1) ˆP (0) θ 1 θ = 0, ˆθML = ˆP (1) (92) What s the amount of samples S needed to acheve ˆθ ML (S) (1±ɛ)θ? Consderng the bnomal varable B(S, θ) (whch s the sum of S ndependently drawn Bernoull varables from (89)), we can wrte Drected graphs P ( B(S, θ) Sθ > ɛsθ) exp( ɛ 2 Sθ) δ, S 1 θ 1 ɛ 2 log 1 δ (93) We consder the followng settng n whch we have not one, but many random varables to learn on a drected graph: P X (x) P X X Π (x x Π ), (94) where Π stands for the parents of node, and P X X Π (x x Π ) θ x,x Π. Agan, we look at the emprcal lkelhood l(d; θ) = ˆP (x, x Π ) log θ x,x Π, x,x Π = [ ] ˆP (x x Π ) ˆP θ x,x Π (x Π ) log x,x Π ˆP (x x Π ) + log ˆP (x x Π ), = ˆP (x x Π ) ˆP θ x,x Π (x Π ) log ˆP (x x Π ), (95) x,x Π and set the dervatve to zero n order to obtan the ML estmaton of θ, resultng n [ ] θ x,x Π ˆP (x x Π ) log ˆP (x x Π ) = Ḙ θ x,x Π log, P ˆP (x x Π ) x ˆθML x,x Π = ˆP (x x Π ) (96) 23

24 4.1.3 Undrected graphs Let us now consder the case of undrected graphs. To reduce the amount of ndces, we wll wrte nstead of x n the followng. On a tree, on a chordal graph, on a trangle-free graph, P X = P j ˆPj P possble estmator: ˆP P j P j ˆP ˆPj P X C φ C(x C ) ˆPC S φ possble estmator: S(x S ) ˆP S P X φ j ψ j For the last case, let us use the Hammersley-Clfford theorem. Let X = {0, 1}. On a trangle-free graph, the maxmal clque sze s 2, and therefore we can wrte P X (x ) exp U (x ) + V j (x, x j ). (97) j Usng the fact that we have a MRF, we get P (X = 1, X rest = 0) exp (Q()). (98) P (X = 0, X rest = 0) Also, because of the fact that on a MRF, a varable condtoned on ts neghbours s ndependent of all the others, we can wrte P (X = 1, X rest = 0) P (X = 0, X rest = 0) = P (X = 1, X N() = 0) P (X = 0, X N() = 0) (99) and therefore ths quantty can be calculated wth 2 N() +1 operatons. 4.2 Graphcal model learnng What can we learn from a set of realzatons of varables when the underlyng graph s not known? We focus now n the followng maxmsaton max l(d; G, θ G ) = max max l(d; G, θ G ). (100) G,θ G G θ G }{{} ˆl(D;G) l(d;g,ˆθ G ML) ML From the prevous subsecton, we have ˆθ G, and therefore we only need to fnd a way to evalute the maxmzaton on the possble graphs. 24

25 4.2.1 Drected graphs On a drected graph G (, Π ), the emprcal lkelhood reads ˆl(D; G) = ˆP (x, x Π ) log ˆP (x x Π ), x,x Π = [ ] ˆP (x, x Π ) ˆP (x, x Π ) log ˆP (x ) ˆP ˆP (x ), (x Π ) = = x,x Π x,x Π ˆP (x, x Π ) log ˆP (x, x Π ) ˆP (x ) ˆP (x Π ) + x ˆP (x ) log ˆP (x ), I( ˆX ; ˆX Π ) H( ˆX ). (101) Lookng for the graph maxmzng the emprcal lkelhood thus conssts n maxmsng the mutual nformaton: max G I( ˆX ; ˆX Π ). In a general settng, ths s not easy. Reducng the search space to trees however, some methods exst, lke the Chow-Lu algorthm [1], whch reles on the procedure used to get the maxmum weghted spannng tree (cf. secton 2) Undrected graphs What can we do n the case of undrected graphs? Let us restrct ourselves to the bnary case x {0, 1} N and to exponental famles: P X (x) = exp θ x +,j θ j x x j log Z(θ). (102) Agan, we denote D = {x 1,, x S } the observed dataset, and the log-lkelhood can be wrtten as l(d; θ) = θ µ + θ j µ j log Z(θ). (103),j }{{} θ,µ As l(d; θ) s a concave functon of θ, t can be effcently solved usng a gradent descent algorthm of the form θ t+1 = θ t + α(t) θ l(d; θ) θ=θ t (104) The dffculty n ths formula s the evaluaton of the gradent: θ l(d; θ) = µ E θ (X ), (105) whose second term s an expectaton that has to be calculated, usng the sumproduct algorthm or wth a Markov chan Monte Carlo method for nstance. Another queston s whether we wll be learnng nterestng graphs at all. Graph-learnng algorthms tend to lnk varables that are not lnked n the real underlyng graph. To avod ths, complcated graphs should be penalzed by ntroducng a regularzer. Unfortunately, ths s a hghly non-trval problem, and graphcal model learnng algorthms do not always perform well to ths day. 25

26 4.3 Latent Graphcal Model learnng: the Expectatonmaxmzaton algorthm In ths last case, we dstngush two dfferent varables: Y stands for observed varables, X denotes the hdden varables. The parameter θ s estmated from the observatons, namely ˆθ ML = arg max log P Y (y; θ). (106) θ The log-lkelhood s derved by margnalzng on the hdden varables l(y; θ) = log P Y (y; θ), = log x P X,Y (x, y; θ), (107) = log q(x y) P X,Y (x, y; θ), (108) q(x y) x [ ] [ ] P P = log E E L(q; θ). (109) q q q q Ths gves rase to the Expectaton-Maxmsaton (EM) algorthm [2]. EM algorthm Untl convergence, terate between E-step: estmaton of the dstrbuton q θ t q t+1 = arg max q L(q; θ t ). M-step: estmaton of the parameter θ q t+1 θ t+1 = arg max θ L(q t+1 ; θ). 26

27 References [1] C. K. Chow and C. N. Lu. Approxmatng dscrete probablty dstrbutons wth dependence trees. Informaton Theory, IEEE Transactons on, 14(3): , [2] A. P. Dempster, N. M. Lard, and D. Rubn. Maxmum lkelhood from ncomplete data va the em algorthm. Journal of the Royal Statstcal Socety, Seres B, 39(1):1 38, [3] G. R. Grmmet. A theorem about random felds. Bulletn of the London Mathematcal Socety, 5(1):81 84, [4] J. M. Hammersley and P. Clfford. Markov felds on fnte graphs and lattces. Avalable onlne: hammfest/hamm-clff.pdf,

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Probability-Theoretic Junction Trees

Probability-Theoretic Junction Trees Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

COS 511: Theoretical Machine Learning

COS 511: Theoretical Machine Learning COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

More information

Mean Field / Variational Approximations

Mean Field / Variational Approximations Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1 MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε

More information

Lecture 17 : Stochastic Processes II

Lecture 17 : Stochastic Processes II : Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)

Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders) Entropy of Marov Informaton Sources and Capacty of Dscrete Input Constraned Channels (from Immn, Codng Technques for Dgtal Recorders). Entropy of Marov Chans We have already ntroduced the noton of entropy

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Integrals and Invariants of Euler-Lagrange Equations

Integrals and Invariants of Euler-Lagrange Equations Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information