Probabilistic & Unsupervised Learning
|
|
- Maurice Long
- 5 years ago
- Views:
Transcription
1 Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008
2 Convexty A convex functon f : X R s one where for any x, y X and 0 α 1. f(αx + (1 α)y) αf(x) + (1 α)f(y) f(x) af(x)+(1-a)f(y) f(y) f(ax+(1-a)y) x y Convex functons have global mnmum (unless not bounded below) and there are effcent algorthms to optmze them subject to convex constrants. Examples: lnear programs (LP), quadratc programs (QP), second-order cone programs (SOCP), sem-defnte programs (SDP), geometrc programs.
3 Convexty and Approxmate Inference There has been much recent efforts usng convex programmng technques to solve nference problems both exactly and approxmately. Lnear programmng relaxaton as approxmate method to fnd MAP assgnment n Markov random felds. Attractve Markov random felds: bnary case exact and related to a maxmum flowmnmum cut problem n graph theory (a lnear program). Approxmate otherwse. Tree-structured convex upper bounds on the log partton functon (convexfed belef propagaton). Unfed vew of approxmate nference as optmzaton on the margnal polytope. Learnng graphcal models usng maxmum margn prncples and convex approxmate nference....
4 LP Relaxaton for Markov Random Felds Dscrete Markov random felds (MRFs) wth parwse nteractons: p(x) = 1 f j (X, X j ) f (X ) = 1 Z Z exp E j (X, X j ) + (j) (j) E (X ) The problem s to fnd the MAP assgnment X MAP : X MAP = argmax E j (X, X j ) + X (j) E (X ) Reformulate n terms of slghtly dfferent varables: b (x ) = δ(x = x ) b j (x, x j ) = δ(x = x )δ(x j = x j ) wher δ( ) = 1 f argument s true, 0 otherwse. Each b (x ) s an ndcator for whether varable X takes on value x. The ndcator varables need to satsfy certan constrants: b (x ), b j (x, x j ) {0, 1} Indcator varables are bnary varables. x b (x ) = 1 X takes on exactly one value. x j b j (x, x j ) = b (x ) Parwse ndcators are consstent wth sngle-ste ndcators.
5 LP Relaxaton for Markov Random Felds MAP assgnment problem s equvalent to: argmax b j (x, x j )E j (x, x j ) + b (x )E (x ) {b,b j } x,x j x (j) wth constrants:, j, x, x j : b (x ), b j (x, x j ) {0, 1} x b (x ) = 1 x j b j (x, x j ) = b (x ) The lnear programmng relaxaton for MRFs s: argmax b j (x, x j )E j (x, x j ) + {b,b j } x,x j (j) x b (x )E (x ) wth constrants:, j, x, x j : b (x ), b j (x, x j ) [0, 1] x b (x ) = 1 x j b j (x, x j ) = b (x )
6 LP Relaxaton for Markov Random Felds The LP relaxaton s a lnear program whch can be solved effcently. If the soluton s ntegral,.e. each b (x ), b j (x, x j ) {0, 1}, then the soluton corresponds to the MAP soluton X MAP. LP relaxaton s a zero-temperature verson of the Bethe free energy formulaton of loopy BP, where the Bethe entropy term can be gnored. If the MRF s bnary and attractve, then a slghtly dfferent reformulaton of LP relaxaton wll always gve the MAP soluton. Next: we show how to fnd the MAP soluton drectly for bnary attractve MRFs usng network flow.
7 Attractve Bnary MRFs and Max Flow-Mn Cut Bnary MRFs: p(x) = 1 Z exp W j δ(x = X j ) + c X (j) The bnary MRF s attractve f W j 0 for all, j. Neghbourng varables prefer to be n the same state n such MRFs. No loss of generalty; can be equvalently expressed as Boltzmann machnes wth postve nteractons. Many practcal MRFs are attractve, e.g. mage segmentaton, webpage classfcaton. MAP X can be found effcently by convertng problem nto a maxmum flow-mnmum cut program.
8 Attractve Bnary MRFs and Max Flow-Mn Cut The MAP problem: argmax W j δ(x = x j ) + x (j) Construct a network as follows: c x 1. Edges (j) are undrected wth weght λ j = W j ; 2. Add a source s and a snk t node; 3. c >0: Connect the source node to varable wth weght λ s = c ; 4. c j < 0: Connect varable j to the snk node wth weght λ jt = c j A cut s a partton of the nodes nto S and T wth s S and t T. The weght of the cut s Λ(S, T ) = S,j T The mnmum cut problem s to fnd the cut wth mnmum weght. λ j +c - Wj + -cj - j - + +
9 Attractve Bnary MRFs and Max Flow-Mn Cut Identfy an assgnment X = x wth a cut: S= {s} { : x = 1} T = {t} {j : x j = 0} - The weght of the cut s: Λ(S, T ) = (j) W j δ(x x j ) - -cj - + (1 x ) max(0, c ) + + Wj - j + x j max(0, c j ) +c + + j = (j) W j δ(x = x j ) x c + constant + So fndng the mnmum cut corresponds to fndng the MAP assgnment. How do we fnd the mnmum cut? The mnmum cut problem s dual to the maxmum flow problem,.e. fnd the maxmum flow allowable from the source to the snk through the network. Ths can be solved extremely effcently (see wkpeda entry). The framework can be generalzed to general attractve MRFs, but wll not be exact anymore.
10 Convexty and Exponental Famles An exponental famly dstrbuton s parametrzed by a natural parameter vector θ and equvalent by ts mean parameter vector µ. p(x θ) = exp ( θ s(x) Φ(θ) ) where Φ(θ) s the log partton functon Φ(θ) = log x exp ( θ s(x) ) Φ(θ) plays an mportant role n the characterzaton of the exponental famly. For example, t s a moment generatng functon for the dstrbuton: θ Φ(θ) = E θ[s(x)] = µ(θ) = µ 2 θ 2Φ(θ) = V θ[s(x)] The second dervatve s postve sem-defnte, so Φ(θ) s convex n θ.
11 Convexty and Exponental Famles The log partton functon and the negatve entropy are ntmately related. We express the negatve entropy as a functon of the mean parameter: θ µ = Φ(θ) + Ψ(µ) Ψ(µ) = E θ [log p(x θ)] = θ µ Φ(θ) The KL dvergence between two exponental famly dstrbutons p(x θ) and p(x θ ) s: KL(p(X θ ) p(x θ)) =KL(θ θ) = E θ [log p(x θ ) log p(x θ)] = θ µ + Φ(θ) + Ψ(µ ) 0 Ψ(µ ) θ µ Φ(θ) For any par of mean and natural parameter vectors. Because the mnmum of the KL dvergence s zero, and attaned at θ = θ, we have: Ψ(µ) = sup θ θ µ Φ(θ) The constructon on the RHS s called the convex dual of Φ(θ). functons, the dual of the dual s the orgnal functon, thus: For contnuous convex Φ(θ) = sup µ θ µ Ψ(µ)
12 Convexty and Undrected Trees Par-wse MRFs can be parametrzed as follows: p(x) = 1 f (X) f j (X, X j ) Z (j) = exp θ (x )δ(x = x ) + θ j (x, x j )δ(x = x )δ(x j = x j ) Φ(θ) x x,x j So MRFs form an exponental famly, wth natural and mean parameters: θ = [ θ (x ), θ j (x, x j ), j, x, x j ] µ = [ p(x = x ), p(x = x, X j = x j ), j, x, x j ] (j) If the MRF has tree structure T, the negatve entropy s composed of sngle-ste entropes and mutual nformatons on edges: Ψ(µ T ) = E θt log p(x ) p(x, X j ) p(x )p(x j ) (j) T = H(X ) + I(X, X j ) (j) T
13 Convex Upper Bounds on the Log Partton Functon Let us try to upper bound Φ(θ). Imagne a set of spannng trees T for the MRF, each wth ts own parameters θ T, µ T. By paddng entres of off-tree edges wth zero, we can assume that θ T has the same dmensonalty as θ. Suppose also that we have a dstrbuton β over the spannng trees so that E β [θ T ] = θ. Then by the convexty of Φ(θ), Optmzng over all θ T, we get: Φ(θ) = Φ(E β [θ T ]) E β [Φ(θ T )] Φ(θ) nf E β[φ(θ T )] θ T :E β [θ T ]=θ
14 Convex Upper Bounds on the Log Partton Functon Φ(θ) nf E β[φ(θ T )] θ T :E β [θ T ]=θ We solve ths constraned optmzaton problem usng Lagrange multplers: Settng the dervatves wrt θ T to zero, we get: L = E β [Φ(θ T )] µ (E β [θ T ] θ) β(t )µ T β(t )µ(t ) = 0 µ T = µ(t ) where µ(t ) are the Lagrange multplers correspondng to vertces and edges on the tree T. Although there can be many θ T parameters, at optmum they are all constraned: ther correspondng mean parameters are all consstent wth each other and wth µ.
15 Convex Upper Bounds on the Log Partton Functon Φ(θ) sup µ = sup µ = sup µ = sup µ = sup µ nf θ T E β [Φ(θ T )] µ (E β [θ T ] θ) µ θ + E β [Φ(θ T ) θ T µ(t )] µ θ + E β [ Ψ(µ(T ))] µ θ + E β µ θ + Ths s a convexfed Bethe free energy. H µ (X ) H µ (X ) (j) (j) T I µ (X, X j ) β j I µ (X, X j )
16 References Exact Maxmum A Posteror Estmaton for Bnary Images. Greg, Porteous and Seheult, Journal of the Royal Statstcal Socety B, 51(2): , Fast Approxmate Energy Mnmzaton va Graph Cuts. Boykov, Veksler and Zabh, Internatonal Conference on Computer Vson MAP estmaton va agreement on (hyper)trees: Message-passng and lnear-programmng approaches. Wanwrght, Jaakkola and Wllsky, IEEE Transactons on Informaton Theory, 2005, 51(11): Learnng Assocatve Markov Networks. Taskar, Chatalbashev and Koller, Internatonal Conference on Machne Learnng, A New Class of Upper Bounds on the Log Partton Functon. Wanwrght, Jaakkola and Wllsky. IEEE Transactons on Informaton Theory, 2005, 51(7): Graphcal Models, Exponental Famles, and Varatonal Inference. Wanwrght and Jordan. UC Berkeley Dept. of Statstcs, Techncal Report 649, MAP Estmaton, Lnear Programmng and Belef Propagaton wth Convex Free Energes. Wess, Yanover and Meltzer, Uncertanty n Artfcal Intellgence, 2007.
17 References
EM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationWhy BP Works STAT 232B
Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationExpectation propagation
Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationDensity Propagation and Improved Bounds on the Partition Function
Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 14853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationConvergent Propagation Algorithms via Oriented Trees
Convergent Propagaton Algorthms va Orented Trees Amr Globerson CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139 Tomm Jaakkola CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationApproximate inference using conditional entropy decompositions
Approxmate nference usng condtonal entropy decompostons Amr Globerson, Tomm Jaakkola Computer Scence and Artfcal Intellgence Laboratory MIT Cambrdge, MA 239 Abstract We ntroduce a novel method for estmatng
More informationProbability-Theoretic Junction Trees
Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationComputing Correlated Equilibria in Multi-Player Games
Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationTree Block Coordinate Descent for MAP in Graphical Models
ree Block Coordnate Descent for MAP n Graphcal Models Davd Sontag omm Jaakkola Computer Scence and Artfcal Intellgence Laboratory Massachusetts Insttute of echnology Cambrdge, MA 02139 Abstract A number
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationerror in mean TAP 0.25
Belef Optmzaton for Bnary Networks: A Stable Alternatve to Loopy Belef Propagaton Max Wellng Gatsby Computatonal Neuroscence Unt Unversty College London Queen Square London, WCN AR, U.K. Abstract We present
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationProbabilistic image processing and Bayesian network
Computatonal Intellgence Semnar (8 November, 2005, Waseda Unversty, Tokyo, Japan) Probablstc mage processng and Bayesan network Kazuyuk Tanaka 1 Graduate School of Informaton Scences, Tohoku Unversty,
More informationDensity Propagation and Improved Bounds on the Partition Function
Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 1853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationPhysical Fluctuomatics Applied Stochastic Process 9th Belief propagation
Physcal luctuomatcs ppled Stochastc Process 9th elef propagaton Kazuyuk Tanaka Graduate School of Informaton Scences Tohoku Unversty kazu@smapp.s.tohoku.ac.jp http://www.smapp.s.tohoku.ac.jp/~kazu/ Stochastc
More information10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)
0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationAn Alternating Direction Method for Dual MAP LP Relaxation
An Alternatng Drecton Method for Dual MAP LP Relaxaton Ofer Mesh and Amr Globerson The School of Computer Scence and Engneerng, The Hebrew Unversty of Jerusalem, Jerusalem, Israel {mesh,gamr}@cs.huj.ac.l
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationKristin P. Bennett. Rensselaer Polytechnic Institute
Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal
More informationLearning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation
Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationProbabilistic image processing and Bayesian network
Randomness and Computaton Jont Workshop New Horzons n Computng and Statstcal echancal Approach to Probablstc Informaton Processng (18-21 July, 2005, Senda, Japan) Lecture Note n Tutoral Sessons Probablstc
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationGaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material
Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationCombining Constraint Programming and Integer Programming
Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More informationThermodynamics and statistical mechanics in materials modelling II
Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationStrong Markov property: Same assertion holds for stopping times τ.
Brownan moton Let X ={X t : t R + } be a real-valued stochastc process: a famlty of real random varables all defned on the same probablty space. Defne F t = nformaton avalable by observng the process up
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More informationHow Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *
How Strong Are Weak Patents? Joseph Farrell and Carl Shapro Supplementary Materal Lcensng Probablstc Patents to Cournot Olgopolsts * September 007 We study here the specal case n whch downstream competton
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationA quantum-statistical-mechanical extension of Gaussian mixture model
A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationFuzzy Boundaries of Sample Selection Model
Proceedngs of the 9th WSES Internatonal Conference on ppled Mathematcs, Istanbul, Turkey, May 7-9, 006 (pp309-34) Fuzzy Boundares of Sample Selecton Model L. MUHMD SFIIH, NTON BDULBSH KMIL, M. T. BU OSMN
More informationSolutions Homework 4 March 5, 2018
1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and
More informationarxiv: v1 [math.oc] 3 Aug 2010
arxv:1008.0549v1 math.oc] 3 Aug 2010 Test Problems n Optmzaton Xn-She Yang Department of Engneerng, Unversty of Cambrdge, Cambrdge CB2 1PZ, UK Abstract Test functons are mportant to valdate new optmzaton
More informationConvex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui.
Convex Optmzaton (EE227BT: UC Berkeley) Lecture 9 (Optmalty; Conc dualty) 9/25/14 Laurent El Ghaou Organsatonal Mdterm: 10/7/14 (1.5 hours, n class, double-sded cheat sheet allowed) Project: Intal proposal
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More information