Probabilistic & Unsupervised Learning

Size: px
Start display at page:

Download "Probabilistic & Unsupervised Learning"

Transcription

1 Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008

2 Convexty A convex functon f : X R s one where for any x, y X and 0 α 1. f(αx + (1 α)y) αf(x) + (1 α)f(y) f(x) af(x)+(1-a)f(y) f(y) f(ax+(1-a)y) x y Convex functons have global mnmum (unless not bounded below) and there are effcent algorthms to optmze them subject to convex constrants. Examples: lnear programs (LP), quadratc programs (QP), second-order cone programs (SOCP), sem-defnte programs (SDP), geometrc programs.

3 Convexty and Approxmate Inference There has been much recent efforts usng convex programmng technques to solve nference problems both exactly and approxmately. Lnear programmng relaxaton as approxmate method to fnd MAP assgnment n Markov random felds. Attractve Markov random felds: bnary case exact and related to a maxmum flowmnmum cut problem n graph theory (a lnear program). Approxmate otherwse. Tree-structured convex upper bounds on the log partton functon (convexfed belef propagaton). Unfed vew of approxmate nference as optmzaton on the margnal polytope. Learnng graphcal models usng maxmum margn prncples and convex approxmate nference....

4 LP Relaxaton for Markov Random Felds Dscrete Markov random felds (MRFs) wth parwse nteractons: p(x) = 1 f j (X, X j ) f (X ) = 1 Z Z exp E j (X, X j ) + (j) (j) E (X ) The problem s to fnd the MAP assgnment X MAP : X MAP = argmax E j (X, X j ) + X (j) E (X ) Reformulate n terms of slghtly dfferent varables: b (x ) = δ(x = x ) b j (x, x j ) = δ(x = x )δ(x j = x j ) wher δ( ) = 1 f argument s true, 0 otherwse. Each b (x ) s an ndcator for whether varable X takes on value x. The ndcator varables need to satsfy certan constrants: b (x ), b j (x, x j ) {0, 1} Indcator varables are bnary varables. x b (x ) = 1 X takes on exactly one value. x j b j (x, x j ) = b (x ) Parwse ndcators are consstent wth sngle-ste ndcators.

5 LP Relaxaton for Markov Random Felds MAP assgnment problem s equvalent to: argmax b j (x, x j )E j (x, x j ) + b (x )E (x ) {b,b j } x,x j x (j) wth constrants:, j, x, x j : b (x ), b j (x, x j ) {0, 1} x b (x ) = 1 x j b j (x, x j ) = b (x ) The lnear programmng relaxaton for MRFs s: argmax b j (x, x j )E j (x, x j ) + {b,b j } x,x j (j) x b (x )E (x ) wth constrants:, j, x, x j : b (x ), b j (x, x j ) [0, 1] x b (x ) = 1 x j b j (x, x j ) = b (x )

6 LP Relaxaton for Markov Random Felds The LP relaxaton s a lnear program whch can be solved effcently. If the soluton s ntegral,.e. each b (x ), b j (x, x j ) {0, 1}, then the soluton corresponds to the MAP soluton X MAP. LP relaxaton s a zero-temperature verson of the Bethe free energy formulaton of loopy BP, where the Bethe entropy term can be gnored. If the MRF s bnary and attractve, then a slghtly dfferent reformulaton of LP relaxaton wll always gve the MAP soluton. Next: we show how to fnd the MAP soluton drectly for bnary attractve MRFs usng network flow.

7 Attractve Bnary MRFs and Max Flow-Mn Cut Bnary MRFs: p(x) = 1 Z exp W j δ(x = X j ) + c X (j) The bnary MRF s attractve f W j 0 for all, j. Neghbourng varables prefer to be n the same state n such MRFs. No loss of generalty; can be equvalently expressed as Boltzmann machnes wth postve nteractons. Many practcal MRFs are attractve, e.g. mage segmentaton, webpage classfcaton. MAP X can be found effcently by convertng problem nto a maxmum flow-mnmum cut program.

8 Attractve Bnary MRFs and Max Flow-Mn Cut The MAP problem: argmax W j δ(x = x j ) + x (j) Construct a network as follows: c x 1. Edges (j) are undrected wth weght λ j = W j ; 2. Add a source s and a snk t node; 3. c >0: Connect the source node to varable wth weght λ s = c ; 4. c j < 0: Connect varable j to the snk node wth weght λ jt = c j A cut s a partton of the nodes nto S and T wth s S and t T. The weght of the cut s Λ(S, T ) = S,j T The mnmum cut problem s to fnd the cut wth mnmum weght. λ j +c - Wj + -cj - j - + +

9 Attractve Bnary MRFs and Max Flow-Mn Cut Identfy an assgnment X = x wth a cut: S= {s} { : x = 1} T = {t} {j : x j = 0} - The weght of the cut s: Λ(S, T ) = (j) W j δ(x x j ) - -cj - + (1 x ) max(0, c ) + + Wj - j + x j max(0, c j ) +c + + j = (j) W j δ(x = x j ) x c + constant + So fndng the mnmum cut corresponds to fndng the MAP assgnment. How do we fnd the mnmum cut? The mnmum cut problem s dual to the maxmum flow problem,.e. fnd the maxmum flow allowable from the source to the snk through the network. Ths can be solved extremely effcently (see wkpeda entry). The framework can be generalzed to general attractve MRFs, but wll not be exact anymore.

10 Convexty and Exponental Famles An exponental famly dstrbuton s parametrzed by a natural parameter vector θ and equvalent by ts mean parameter vector µ. p(x θ) = exp ( θ s(x) Φ(θ) ) where Φ(θ) s the log partton functon Φ(θ) = log x exp ( θ s(x) ) Φ(θ) plays an mportant role n the characterzaton of the exponental famly. For example, t s a moment generatng functon for the dstrbuton: θ Φ(θ) = E θ[s(x)] = µ(θ) = µ 2 θ 2Φ(θ) = V θ[s(x)] The second dervatve s postve sem-defnte, so Φ(θ) s convex n θ.

11 Convexty and Exponental Famles The log partton functon and the negatve entropy are ntmately related. We express the negatve entropy as a functon of the mean parameter: θ µ = Φ(θ) + Ψ(µ) Ψ(µ) = E θ [log p(x θ)] = θ µ Φ(θ) The KL dvergence between two exponental famly dstrbutons p(x θ) and p(x θ ) s: KL(p(X θ ) p(x θ)) =KL(θ θ) = E θ [log p(x θ ) log p(x θ)] = θ µ + Φ(θ) + Ψ(µ ) 0 Ψ(µ ) θ µ Φ(θ) For any par of mean and natural parameter vectors. Because the mnmum of the KL dvergence s zero, and attaned at θ = θ, we have: Ψ(µ) = sup θ θ µ Φ(θ) The constructon on the RHS s called the convex dual of Φ(θ). functons, the dual of the dual s the orgnal functon, thus: For contnuous convex Φ(θ) = sup µ θ µ Ψ(µ)

12 Convexty and Undrected Trees Par-wse MRFs can be parametrzed as follows: p(x) = 1 f (X) f j (X, X j ) Z (j) = exp θ (x )δ(x = x ) + θ j (x, x j )δ(x = x )δ(x j = x j ) Φ(θ) x x,x j So MRFs form an exponental famly, wth natural and mean parameters: θ = [ θ (x ), θ j (x, x j ), j, x, x j ] µ = [ p(x = x ), p(x = x, X j = x j ), j, x, x j ] (j) If the MRF has tree structure T, the negatve entropy s composed of sngle-ste entropes and mutual nformatons on edges: Ψ(µ T ) = E θt log p(x ) p(x, X j ) p(x )p(x j ) (j) T = H(X ) + I(X, X j ) (j) T

13 Convex Upper Bounds on the Log Partton Functon Let us try to upper bound Φ(θ). Imagne a set of spannng trees T for the MRF, each wth ts own parameters θ T, µ T. By paddng entres of off-tree edges wth zero, we can assume that θ T has the same dmensonalty as θ. Suppose also that we have a dstrbuton β over the spannng trees so that E β [θ T ] = θ. Then by the convexty of Φ(θ), Optmzng over all θ T, we get: Φ(θ) = Φ(E β [θ T ]) E β [Φ(θ T )] Φ(θ) nf E β[φ(θ T )] θ T :E β [θ T ]=θ

14 Convex Upper Bounds on the Log Partton Functon Φ(θ) nf E β[φ(θ T )] θ T :E β [θ T ]=θ We solve ths constraned optmzaton problem usng Lagrange multplers: Settng the dervatves wrt θ T to zero, we get: L = E β [Φ(θ T )] µ (E β [θ T ] θ) β(t )µ T β(t )µ(t ) = 0 µ T = µ(t ) where µ(t ) are the Lagrange multplers correspondng to vertces and edges on the tree T. Although there can be many θ T parameters, at optmum they are all constraned: ther correspondng mean parameters are all consstent wth each other and wth µ.

15 Convex Upper Bounds on the Log Partton Functon Φ(θ) sup µ = sup µ = sup µ = sup µ = sup µ nf θ T E β [Φ(θ T )] µ (E β [θ T ] θ) µ θ + E β [Φ(θ T ) θ T µ(t )] µ θ + E β [ Ψ(µ(T ))] µ θ + E β µ θ + Ths s a convexfed Bethe free energy. H µ (X ) H µ (X ) (j) (j) T I µ (X, X j ) β j I µ (X, X j )

16 References Exact Maxmum A Posteror Estmaton for Bnary Images. Greg, Porteous and Seheult, Journal of the Royal Statstcal Socety B, 51(2): , Fast Approxmate Energy Mnmzaton va Graph Cuts. Boykov, Veksler and Zabh, Internatonal Conference on Computer Vson MAP estmaton va agreement on (hyper)trees: Message-passng and lnear-programmng approaches. Wanwrght, Jaakkola and Wllsky, IEEE Transactons on Informaton Theory, 2005, 51(11): Learnng Assocatve Markov Networks. Taskar, Chatalbashev and Koller, Internatonal Conference on Machne Learnng, A New Class of Upper Bounds on the Log Partton Functon. Wanwrght, Jaakkola and Wllsky. IEEE Transactons on Informaton Theory, 2005, 51(7): Graphcal Models, Exponental Famles, and Varatonal Inference. Wanwrght and Jordan. UC Berkeley Dept. of Statstcs, Techncal Report 649, MAP Estmaton, Lnear Programmng and Belef Propagaton wth Convex Free Energes. Wess, Yanover and Meltzer, Uncertanty n Artfcal Intellgence, 2007.

17 References

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

Density Propagation and Improved Bounds on the Partition Function

Density Propagation and Improved Bounds on the Partition Function Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 14853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Convergent Propagation Algorithms via Oriented Trees

Convergent Propagation Algorithms via Oriented Trees Convergent Propagaton Algorthms va Orented Trees Amr Globerson CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139 Tomm Jaakkola CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Approximate inference using conditional entropy decompositions

Approximate inference using conditional entropy decompositions Approxmate nference usng condtonal entropy decompostons Amr Globerson, Tomm Jaakkola Computer Scence and Artfcal Intellgence Laboratory MIT Cambrdge, MA 239 Abstract We ntroduce a novel method for estmatng

More information

Probability-Theoretic Junction Trees

Probability-Theoretic Junction Trees Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Tree Block Coordinate Descent for MAP in Graphical Models

Tree Block Coordinate Descent for MAP in Graphical Models ree Block Coordnate Descent for MAP n Graphcal Models Davd Sontag omm Jaakkola Computer Scence and Artfcal Intellgence Laboratory Massachusetts Insttute of echnology Cambrdge, MA 02139 Abstract A number

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

error in mean TAP 0.25

error in mean TAP 0.25 Belef Optmzaton for Bnary Networks: A Stable Alternatve to Loopy Belef Propagaton Max Wellng Gatsby Computatonal Neuroscence Unt Unversty College London Queen Square London, WCN AR, U.K. Abstract We present

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Probabilistic image processing and Bayesian network

Probabilistic image processing and Bayesian network Computatonal Intellgence Semnar (8 November, 2005, Waseda Unversty, Tokyo, Japan) Probablstc mage processng and Bayesan network Kazuyuk Tanaka 1 Graduate School of Informaton Scences, Tohoku Unversty,

More information

Density Propagation and Improved Bounds on the Partition Function

Density Propagation and Improved Bounds on the Partition Function Densty Propagaton and Improved Bounds on the Partton Functon Stefano Ermon, Carla P. Gomes Dept. of Computer Scence Cornell Unversty Ithaca NY 1853, U.S.A. Ashsh Sabharwal IBM Watson esearch Ctr. Yorktown

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation

Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Physcal luctuomatcs ppled Stochastc Process 9th elef propagaton Kazuyuk Tanaka Graduate School of Informaton Scences Tohoku Unversty kazu@smapp.s.tohoku.ac.jp http://www.smapp.s.tohoku.ac.jp/~kazu/ Stochastc

More information

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014) 0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

An Alternating Direction Method for Dual MAP LP Relaxation

An Alternating Direction Method for Dual MAP LP Relaxation An Alternatng Drecton Method for Dual MAP LP Relaxaton Ofer Mesh and Amr Globerson The School of Computer Scence and Engneerng, The Hebrew Unversty of Jerusalem, Jerusalem, Israel {mesh,gamr}@cs.huj.ac.l

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kristin P. Bennett. Rensselaer Polytechnic Institute Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal

More information

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Probabilistic image processing and Bayesian network

Probabilistic image processing and Bayesian network Randomness and Computaton Jont Workshop New Horzons n Computng and Statstcal echancal Approach to Probablstc Informaton Processng (18-21 July, 2005, Senda, Japan) Lecture Note n Tutoral Sessons Probablstc

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Combining Constraint Programming and Integer Programming

Combining Constraint Programming and Integer Programming Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Thermodynamics and statistical mechanics in materials modelling II

Thermodynamics and statistical mechanics in materials modelling II Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Strong Markov property: Same assertion holds for stopping times τ.

Strong Markov property: Same assertion holds for stopping times τ. Brownan moton Let X ={X t : t R + } be a real-valued stochastc process: a famlty of real random varables all defned on the same probablty space. Defne F t = nformaton avalable by observng the process up

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists * How Strong Are Weak Patents? Joseph Farrell and Carl Shapro Supplementary Materal Lcensng Probablstc Patents to Cournot Olgopolsts * September 007 We study here the specal case n whch downstream competton

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

A quantum-statistical-mechanical extension of Gaussian mixture model

A quantum-statistical-mechanical extension of Gaussian mixture model A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Fuzzy Boundaries of Sample Selection Model

Fuzzy Boundaries of Sample Selection Model Proceedngs of the 9th WSES Internatonal Conference on ppled Mathematcs, Istanbul, Turkey, May 7-9, 006 (pp309-34) Fuzzy Boundares of Sample Selecton Model L. MUHMD SFIIH, NTON BDULBSH KMIL, M. T. BU OSMN

More information

Solutions Homework 4 March 5, 2018

Solutions Homework 4 March 5, 2018 1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and

More information

arxiv: v1 [math.oc] 3 Aug 2010

arxiv: v1 [math.oc] 3 Aug 2010 arxv:1008.0549v1 math.oc] 3 Aug 2010 Test Problems n Optmzaton Xn-She Yang Department of Engneerng, Unversty of Cambrdge, Cambrdge CB2 1PZ, UK Abstract Test functons are mportant to valdate new optmzaton

More information

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui.

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui. Convex Optmzaton (EE227BT: UC Berkeley) Lecture 9 (Optmalty; Conc dualty) 9/25/14 Laurent El Ghaou Organsatonal Mdterm: 10/7/14 (1.5 hours, n class, double-sded cheat sheet allowed) Project: Intal proposal

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information