Information Geometry of Gibbs Sampler
|
|
- Barry Scott
- 6 years ago
- Views:
Transcription
1 Informaton Geometry of Gbbs Sampler Kazuya Takabatake Neuroscence Research Insttute AIST Central 2, Umezono 1-1-1, Tsukuba JAPAN Abstract: - Ths paper shows some nformaton geometrcal propertes of Gbbs sampler whch s one of Markov chan Monte Carlo(MCMC) methods. The Gbbs sampler belongs to the class of the snglecomponent-update MCMC, n whch two or more components are never updated smultaneously. When a component s updated n the sngle-component-update MCMC, the chan s dstrbuton moves along a m-flat manfold. In cases of the Gbbs sampler, the dstrbuton moves to the pont whch mnmzes the KL dvergence to the target dstrbuton on the m-flat manfold. From ths vewpont, the Gbbs sampler s nterpreted as a greedy algorthm whch mnmzes the KL dvergence n each update. Key-Words: - Gbbs sampler, Markov chan Monte Carlo, nformaton geometry, KL-dvergence, greedy algorthms, convergence 1 Introducton a matrx form The most straghtforward way to generate random numbers accordng to the gven target dstrbuton s usng the table of the target dstrbuton whch conssts of probabltes for all values the random varable takes. However the sze of ths table s proportonal to the exponental of the dmenson of the random varable, therefore ths straghtforward way s not practcal for generatng large-dmensonal random varables. Markov chan Monte Carlo(MCMC)[7] s a class of random number generators whch uses Markov chans whose dstrbuton converges to the target dstrbuton. Let {X (t) }(t = 0, 1,...) be a Markov chan, V be the range of value X (t) takes 1, p (t) be the dstrbuton of X (t), D V be the set of dstrbutons on V. Any dstrbuton q D V can be represented n a vector form 2 q = (q(x))(x V ). (1) Let W (t) be the transton 3 from X (t 1) to X (t) (p (t 1) to p (t) ). W (t) can be represented n 1 For the sake of smplcty, all random varables n ths paper take fnte dscrete values. 2 In ths paper, symbols for dstrbutons or transtons also denote ther vector or matrx form. 3 Any transton s a lnear operator D V D V. Then therefore W (t) = (W (t) xy )(x, y V ) (2) W (t) xy := Pr(X (t) = y X (t 1) = x). (3) p (t) = p (t 1) W (t) (4) p (t) = p (0) W (1)...W (t) (5) holds. The msson of MCMC s generatng X (t) for the gven target varable X ( ) and ts dstrbuton π. In desgnng MCMC, the sequence {W (t) }(t = 1, 2,...) s desgned as p (t) converges to π when t. Under some condtons, we can make such {W (t) } wthout knowng the complete table of the target dstrbuton π. In ths paper, we revew the Metropols- Hastngs algorthm[7] and ts specal case of the Gbbs sampler[3, 7] at frst. Then we show some nformaton geometrcal propertes of the snglecomponent-update MCMC and ts specal case of the Gbbs sampler. 1
2 2 Metropols-Hastngs Algorthm and Gbbs Sampler 2.1 Metropols-Hastngs algorthm Metropols-Hastngs algorthm[7] s the followng algorthm. step0 Prepare an arbtrary sequence of condtonal dstrbuton {q (t) (X X)}(t = 1, 2,...), whch s called proposal dstrbuton, where X s a canddate varable for next tme. Set an arbtrary value to x. Set t = 0. step1 Generate a random number x accordng to q (t) (x x). step2 Set x = x wth probablty α (t) (x, x ) = mn ( 1, π(x )q (t) (x x ) π(x)q (t) (x x) ), (6) whch s called acceptance probablty, otherwse keep x as t s. step3 Set t = t + 1 and go to step1. Ths algorthm smulates the Markov chan whose transton matrx s W (t) xy = { q (t) (y x)α (t) (x, y) x y 1 x y q(t) (y x)α (t) (x, y) x = y. (7) Ths transton matrx holds the followng socalled detaled balance equaton[7] for all x, y, t. π x W (t) xy = π y W (t) yx (8) Summng up eq.(8) about x, we get πw (t) = π (9).e. the transton by W (t) does not move π. It s known that f the Markov chan s weakly ergodc[2, 5] then the dstrbuton p (t) converges to π when t. To perform ths algorthm, we do not need to know the complete table of the target dstrbuton π but just the rato of probablty π(x )/π(x) n eq.(6). Ths s the major mert of the Metropols- Hastngs algorthm. 2.2 Sngle-component Metropols- Hastngs Sngle-component Metropols-Hastngs[7] s a specal case of Metropols-Hastngs algorthm. It s used n cases that X s mult-dmensonal.e. X = (X 0,..., X N 1 ). In the sngle-component Metropols-Hastngs, only one component s updated n each transton. Therefore canddate x dffer from x n one component. Assume t s X and let X be the jont varable of other components. Then for all x x q (t) (x, x x, x ) = 0. (10) There are several ways to select the component updated at tme t[7]. Let (t) be the suffx of the component updated at tme t. In ths paper, we adopt sequental-update: (t) = t mod N. (11) 2.3 Gbbs sampler Gbbs sampler[3, 7] s a specal case of Snglecomponent Metropols-Hastngs. Its proposal dstrbuton s 4 { q (t) (x, x x π(x, x ) = x ) x = x 0 x (12) x therefore acceptance probablty s α (t) ((x, x ), (x, x )) ( = mn 1, π(x, ) x )π(x x ) π(x, x )π(x x ) ( = mn 1, π(x, ) x )π(x, x )π(x ) π(x, x )π(x, x )π(x ) = 1. (13) It shows that the canddate x s always accepted n step2 n secton 2.1. Substtutng eq.(12),(13) nto eq.(7), we get W (t) (x,x )(y,y ) = { π(y x ) x = y 0 x y (14) 4 In ths paper, readers are expected to nterpret symbols for dstrbutons wth x or x as approprate condtonal or margnal dstrbutons. For example π(x x ) = Pr(X ( ) = x X ( ) = x ).
3 and substtutng ths nto eq.(4), we get p (t) (y) = p (t) (y, y ) = x,x p (t 1) (x, x )W (t) (x,x )(y,y ) = x p (t 1) (x, y )π(y y ) = p (t 1) (y )π(y y ) = p (t) (y )π(y y ). (15) π M(p (2), 3) p (3) M(p (1), 2) p (2) M(p (0), 1) The nformaton about the target dstrbuton π requred to perform the Gbbs sampler s the full condtonal dstrbuton[7] π(x x ) n eq.(12). 3 Informaton Geometry of MCMC As we descrbed n the ntroducton, we can treat a dstrbuton as a pont n a vector space. {p (t) } s the seres of ponts whch converges to π. In ths secton we show some nformaton geometrcal propertes of the MCMC whch updates only one component n each transton, and show some specal propertes the Gbbs sampler has. 3.1 Sngle-component-update MCMC When -th component s updated, other components keep ther value therefore the margnal dstrbuton about X s succeeded: x p (t) (x ) = p (t 1) (x ). (16) Let M(p, ) be the manfold of dstrbutons defned by M(p, ) := {q D V x q(x ) = p(x )}. (17) Ths manfold s m-flat(see appendx). The mportant pont s that updatng -th component can move p only along the manfold M(p, ). 3.2 Gbbs sampler For quck convergence of p (t) π, t s a natural dea that we choose the closest pont to π on M(p (t 1), ). Some dstance measure s requred to determne the meanng of closest. We adopt the KL-dvergence KL(p π) as the dstance measure. p (0) p (1) Fg. 1: Ths fgure llustrates the movement of some p (t) n the dstrbuton space D V n the case of a sngle-component-update MCMC. Dots represent dstrbutons and lnes represent m-flat manfolds In any MCMC, f we choose W (t) as t satsfes eq.(9), we get KL(p (t) π) = KL(p (t 1) W (t) πw (t) ) KL(p (t 1) π) (18) from the data processng nequalty(see appendx). It shows that KL(p (t) π) decreases 5 as tme goes. Now we consder the followng mnmzaton problem mn KL(p (t) π) (19) p (t) M(p (t 1),) The mnmzer p (t) s the e-projecton(see appendx) of π onto m-flat manfold M(p (t 1), ). Usng the chan rule of KL-dvergence(see appendx) we get KL(p (t) π) = KL(X (t) = KL(X (t) X ( ) ) X (t 1) X ( ) ) X (t 1) X (t) = KL(X (t 1) + KL(X (t) ) ). (20) The mnmzaton of eq.(19) s equvalent to mn KL(X (t) p (t) M(p (t 1) X (t),) ) (21) 5 Here decreases means at least never ncrease.
4 because the transton by W (t) does not move X (t). From eq.(34), t s clear that the mnmzaton s acheved when and only when x It s equvalent to KL(X (t) x ) = 0. (22) x, x p (t) (x x ) = π(x x ). (23) Multplyng p (t) (x ), we get the same equaton as eq.(15). It mples that the Gbbs samplers transton matrx W (t) moves p (t 1) to p (t) whch s the e-projecton of π onto M(p (t 1), ). The mportant pont s that we need no nformaton about p (t) to desgn the transton matrx W (t). In other words, by usng Gbbs sampler s transton matrx, we can move any dstrbuton p towards the closest pont(e-projecton) to π wthout knowng where p s. π M(p (2), 3) Ths algorthm s a knd of greedy algorthm, because p s moved to the mnmzer of the cost KL(p π) n each step1. In other words, p s not moved to the optmal pont n two or more movements but n each movement. Imagne the smplfed example shown n Fg.3. There are a pont p and a target pont π on a two dmensonal plane. We can move p towards the east or the west for the frst movement and towards the north-east or the south-west for the second movement. In ths example lnes from the west to the east and lnes from the south-west th the north-east correspond to the manfold M(p, ). If we move p to the closest pont to π for the frst movement, we can not make p reach π for the second movement. It s clear that the path wrtten dashed lnes s the optmal path to approach π. W p N S E π p (0) p (1) p (3) M(p (1), 2) p (2) M(p (0), 1) Fg. 2: Ths fgure llustrates the movement of some p (t) n the dstrbuton space D V n the case of a Gbbs sampler. Dots represent dstrbutons, lnes represent m-flat manfolds and dashed lnes represent e-geodescs From the property dscrbed above, we can nterpret the Gbbs sampler as the followng algorthm. step0 Set an arbtrary dstrbuton to ntal p. step1 Move p to the e-projecton of π onto M(p, ). step2 Go to step1. Fg. 3: Ths fgure llustrates the movement of p. The path wrtten n sold lnes represents the movement of the greedy algorthm and the path wrtten n dashed lnes s optmal movement to approach to π Another nterestng property of the Gbbs sampler s KL(p (t 1) π) = KL(p (t 1) p (t) ) + KL(p (t) π), (24) whch s derved from Pythagorean theorem(see appendx). It means that the dvergence whch p (t) moves s equal to the dvergence whch p (t) approaches to π n each transton. Let T D (n) be the travellng dvergence defned by T D (n) := Then we get n KL(p (t 1) p (t) ). (25) t=1 T D (n) + KL(p (n) π) = KL(p (0) π) (26) It mples that T D (n) has upper bound KL(p (0) π) and f p (t) converges to π when t, T D (t) converges to KL(p (0) π).
5 4 Concluson Markov chan Monte Carlo(MCMC) s a class of random number generators whch uses a Markov chan whose dstrbuton p (t) converges to the target dstrbuton π when t. In sngle-component-update MCMC, updatng -th component of the mult-dmensonal varable X moves X s dstrbuton p along the m-flat manfold M(p, ). Gbbs sampler s one of sngle-componentupdate MCMC. From the vewpont of nformaton geometry, the Gbbs sampler s nterpreted as the followng algorthm. step0 Set an arbtrary dstrbuton to ntal p. step1 Move p to the e-projecton of π onto M(p, ). step2 Go to step1. Ths algorthm s a knd of greedy algorthm, because p s moved to the mnmzer of the cost KL(p π) n each step1. In the Gbbs sampler, p (t) does not travels nfnte dvergence n the dstrbuton space. The dvergence p (t) travels s less or equal to KL(p (0) π). If p (t) converges to π when t, the travelng dvergence converges to KL(p (0) π). References: [1] Csszár, I., Körner, J., Informaton Theory: Codng Theorems for Dscrete Memoryless Systems, Academc Press, 1981 [2] Seneta, E., Non-negatve Matrces and Markov Chans, Second Edton, Sprnger- Verlag, 1981 [3] Geman, S., Geman, D., Stochastc Relaxaton, Gbbs Dstrbutons, and the Bayesan Restoraton of Images, IEEE Transacton on Pattern Analyss and Machne Intellgence, Vol. PAMI-6, No. 6, 1984, pp [4] Amar, S., Dfferental-Geometrcal Methods n Statstcs, Lecture Notes n Statstcs, Vol. 28. Sprnger-Verlag, 1985 [5] van Laarhoven, P. J. M., Aarts, E. H. L., Smulated Annealng: Theory and Applcatons, Kluwer Academc Publshers, 1987 [6] Amar, S., Informaton Geometry of the EM and em Algorthms for Neural Networks, Neural Networks, Vol. 8, No. 9, 1995, pp [7] Glks, W. R., Rchardson, S., Spegelhalter, D. J., Introducng Markov chan Monte Carlo, In: Glks, W. R., Rchardson, S., Spegelhalter, D. J.(ed.), Markov Chan Monte Carlo n Practce, Chapman & Hall, 1996, pp.1-19 [8] Glks, W. R.: Full condtonal dstrbutons, In: Glks, W. R., Rchardson, S., Spegelhalter, D. J.(ed.), Markov Chan Monte Carlo n Practce, Chapman & Hall, 1996, pp.1-19 Appendx: KL-dvergence Let R(X) be the range of value X takes. For two random varables X, Y whch have the same range, KL-dvergence X to Y or KL-dvergence ther dstrbutons p X to p Y s defned by KL(X Y ) = KL(p X p Y ) := p X (x) log p X(x) p Y (x). (27) x R(X) KL-dvergence s always non-negatve and KL(p X p Y ) = 0 p X = p Y. (28) Let q be the dstrbuton of a stochastc source, p be a data s dstrbuton and N be the number of samples n the data. Log-lkelhood of the data comes from the stochastc source s L(p q) = x N p(x) log q(x) (29) and t takes maxmum value NH(p) when and only when p = q, where H(p) s Shannon s entropy: H(p) = x KL-dvergence KL(p q) s p(x) log p(x) (30) KL(p q) = 1 L(p q) H(p) (31) N Therefore the meanng of KL(p q) s based loglkelhood of data whose dstrbuton s q comes out from dstrbuton p. The bas s taken as KL(p q) = 0 when p = q.
6 For any dstrbuton vectors p, q and any transton matrx W, KL(pW qw ) KL(p q) (32) holds. Ths nequalty s called data processng nequalty. Let X, Z be random varables whch have the same range and Y, W be random varables whch have the same range. The followng equaton holds for the jont varables XY and ZW. where KL(XY ZW ) = KL(X Z) + KL(Y W X) KL(Y W X) := x R(X) KL(Y W x) := (33) Pr(X = x)kl(y W x) (34) log y R(Y ) Pr(Y = y X = x) Pr(Y = y X = x) Pr(W = y Z = x). (35) Eq.(33) s called chan rule of KL-dvergence and the left sde of eq.(34) s called condtonal KL-dvergence. Let q be a dstrbuton and M be a manfold of dstrbutons. Let q be a dstrbuton, M be a manfold of dstrbutons and p be the e-projecton of q onto M. It s known that the e-geodesc form q to p and M are orthogonal at p. And for any dstrbuton r M, the followng equaton holds. KL(r q) = KL(r p) + KL(p q) (40) It s called Pythagorean theorem [1]. r Fg. 4: Ths fgure llustrates the Pythagorean theorem: KL(r q) = KL(r p) + KL(p q). M s a m-flat manfold of dstrbutons and p s the e-projecton of q onto M. Dashed lne s the e- geodesc from q to p p q M arg mn KL(p q) (36) p M s called e-projecton of q onto M [6]. If M has the followng property p, q M, 0 λ 1 λp+(1 λ)q M, (37) we call M s m-flat. It s known that f M s m-flat the e-projecton of q onto M s unque for any dstrbuton q. The followng curve s called e-geodesc from p 0 to p 1 [6]: {q log q(x) = (1 λ) log p 0 (x) + t log p 1 (x) log φ(λ), 0 λ 1}. (38) where φ(t) s the term for the normalzng condton x q(x) = 1: φ(λ) = x p 0 (x) 1 λ p 1 (x) λ. (39)
Markov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More information6. Stochastic processes (2)
6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process
More information6. Stochastic processes (2)
Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationInformation-Geometric Studies on Neuronal Spike Trains
Computatonal Neuroscence ESPRC Workshop --Warwck Informaton-Geometrc Studes on Neuronal Spke Trans Shun-ch Amar Shun-ch Amar RIKEN Bran Scence Insttute Mathematcal Neuroscence Unt Neural Frng x1 x2 x3
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationOutline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)
Markov chan Monte Carlo Rasmus Waagepetersen Department of Mathematcs Aalborg Unversty Denmark November, / Outlne for today MCMC / Condtonal smulaton for hgh-dmensonal U: Markov chan Monte Carlo Consder
More informationMarkov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal
Markov chans M. Veeraraghavan; March 17, 2004 [Tp: Study the MC, QT, and Lttle s law lectures together: CTMC (MC lecture), M/M/1 queue (QT lecture), Lttle s law lecture (when dervng the mean response tme
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationMODELING TRAFFIC LIGHTS IN INTERSECTION USING PETRI NETS
The 3 rd Internatonal Conference on Mathematcs and Statstcs (ICoMS-3) Insttut Pertanan Bogor, Indonesa, 5-6 August 28 MODELING TRAFFIC LIGHTS IN INTERSECTION USING PETRI NETS 1 Deky Adzkya and 2 Subono
More informationContinuous Time Markov Chain
Contnuous Tme Markov Chan Hu Jn Department of Electroncs and Communcaton Engneerng Hanyang Unversty ERICA Campus Contents Contnuous tme Markov Chan (CTMC) Propertes of sojourn tme Relatons Transton probablty
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationχ x B E (c) Figure 2.1.1: (a) a material particle in a body, (b) a place in space, (c) a configuration of the body
Secton.. Moton.. The Materal Body and Moton hyscal materals n the real world are modeled usng an abstract mathematcal entty called a body. Ths body conssts of an nfnte number of materal partcles. Shown
More informationECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)
ECE 534: Elements of Informaton Theory Solutons to Mdterm Eam (Sprng 6) Problem [ pts.] A dscrete memoryless source has an alphabet of three letters,, =,, 3, wth probabltes.4,.4, and., respectvely. (a)
More informationProbabilistic Graphical Models
School of Computer Scence robablstc Graphcal Models Appromate Inference: Markov Chan Monte Carlo 05 07 Erc Xng Lecture 7 March 9 04 X X 075 05 05 03 X 3 Erc Xng @ CMU 005-04 Recap of Monte Carlo Monte
More informationChapter 7 Channel Capacity and Coding
Wreless Informaton Transmsson System Lab. Chapter 7 Channel Capacty and Codng Insttute of Communcatons Engneerng atonal Sun Yat-sen Unversty Contents 7. Channel models and channel capacty 7.. Channel models
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationCSCE 790S Background Results
CSCE 790S Background Results Stephen A. Fenner September 8, 011 Abstract These results are background to the course CSCE 790S/CSCE 790B, Quantum Computaton and Informaton (Sprng 007 and Fall 011). Each
More informationFormulas for the Determinant
page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationPROPERTIES I. INTRODUCTION. Finite element (FE) models are widely used to predict the dynamic characteristics of aerospace
FINITE ELEMENT MODEL UPDATING USING BAYESIAN FRAMEWORK AND MODAL PROPERTIES Tshldz Marwala 1 and Sbusso Sbs I. INTRODUCTION Fnte element (FE) models are wdely used to predct the dynamc characterstcs of
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationError Probability for M Signals
Chapter 3 rror Probablty for M Sgnals In ths chapter we dscuss the error probablty n decdng whch of M sgnals was transmtted over an arbtrary channel. We assume the sgnals are represented by a set of orthonormal
More informationCS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo
CS 3750 Machne Learnng Lectre 6 Monte Carlo methods Mlos Haskrecht mlos@cs.ptt.ed 5329 Sennott Sqare Markov chan Monte Carlo Importance samplng: samples are generated accordng to Q and every sample from
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationLecture 3: Shannon s Theorem
CSE 533: Error-Correctng Codes (Autumn 006 Lecture 3: Shannon s Theorem October 9, 006 Lecturer: Venkatesan Guruswam Scrbe: Wdad Machmouch 1 Communcaton Model The communcaton model we are usng conssts
More informationISSN On error probability exponents of many hypotheses optimal testing illustrations
Journal Afrka Statstka Vol. 6, 2011, pages 307 315. DOI: http://dx.do.org/10.4314/afst.v61.1 Journal Afrka Statstka ISS 0825-0305 On error probablty exponents of many hypotheses optmal testng llustratons
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationModule 3: Element Properties Lecture 1: Natural Coordinates
Module 3: Element Propertes Lecture : Natural Coordnates Natural coordnate system s bascally a local coordnate system whch allows the specfcaton of a pont wthn the element by a set of dmensonless numbers
More informationLecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.
Lecture 14 (03/27/18). Channels. Decodng. Prevew of the Capacty Theorem. A. Barg The concept of a communcaton channel n nformaton theory s an abstracton for transmttng dgtal (and analog) nformaton from
More informationEntropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)
Entropy of Marov Informaton Sources and Capacty of Dscrete Input Constraned Channels (from Immn, Codng Technques for Dgtal Recorders). Entropy of Marov Chans We have already ntroduced the noton of entropy
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More information( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo
SOCHASIC SIMULAIO FOR BLOCKED DAA Stochastc System Analyss and Bayesan Model Updatng Monte Carlo smulaton Rejecton samplng Importance samplng Markov chan Monte Carlo Monte Carlo smulaton Introducton: If
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationEfficient, General Point Cloud Registration with Kernel Feature Maps
Effcent, General Pont Cloud Regstraton wth Kernel Feature Maps Hanchen Xong, Sandor Szedmak, Justus Pater Insttute of Computer Scence Unversty of Innsbruck 30 May 2013 Hanchen Xong (Un.Innsbruck) 3D Regstraton
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationAbsorbing Markov Chain Models to Determine Optimum Process Target Levels in Production Systems with Rework and Scrapping
Archve o SID Journal o Industral Engneerng 6(00) -6 Absorbng Markov Chan Models to Determne Optmum Process Target evels n Producton Systems wth Rework and Scrappng Mohammad Saber Fallah Nezhad a, Seyed
More informationHowever, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values
Fall 007 Soluton to Mdterm Examnaton STAT 7 Dr. Goel. [0 ponts] For the general lnear model = X + ε, wth uncorrelated errors havng mean zero and varance σ, suppose that the desgn matrx X s not necessarly
More informationAppendix B. The Finite Difference Scheme
140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria
ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book
More informationChapter 7 Channel Capacity and Coding
Chapter 7 Channel Capacty and Codng Contents 7. Channel models and channel capacty 7.. Channel models Bnary symmetrc channel Dscrete memoryless channels Dscrete-nput, contnuous-output channel Waveform
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information6 Supplementary Materials
6 Supplementar Materals 61 Proof of Theorem 31 Proof Let m Xt z 1:T : l m Xt X,z 1:t Wethenhave mxt z1:t ˆm HX Xt z 1:T mxt z1:t m HX Xt z 1:T + mxt z 1:T HX We consder each of the two terms n equaton
More information10.34 Fall 2015 Metropolis Monte Carlo Algorithm
10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationRandom Walks on Digraphs
Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationLecture 17 : Stochastic Processes II
: Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss
More information