Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
|
|
- Irene Crawford
- 5 years ago
- Views:
Transcription
1 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1 t 1x+η T 2 t 2x Λη 1,η 2. Then, we might ask about the family of conditional distributions P η t 1 X A t 2 X = s 2. If we suppose that tx = t 1 X, t 2 X has a density f T1,T 2 on R p then we see the conditional density of T 1 T 2 = s 2 has the form f T1,T 2 t 1, s 2 R k f T1,T 2 s 1, s 2 ds 1 = where m 0 is the push-forward of m 0 under t : Ω R 2. This is a k-parameter exponential family: e ηt 1 t 1+η T 2 s2 m 0 t 1, s 2 R e k ηt 1 s 1+η2 T s 2 m 0 s 1, s 2 ds 1 e ηt 1 t1 m 0 t 1, s 2 = R e k ηt 1 s 1 m 0 s 1, s 2 ds 1 1. The reference measure has density m 0 t 1, s 2 with respect to dt 1, Lebesgue measure on R k. 2. The sufficient statistic is t The CGF is log e ηt 1 s1 m 0 s 1, s 2 ds 1. R k Example: the Poisson trick Suppose we observe independent X i Poissonµ i, 1 i k. This is a k-parameter exponential family of distributions on R k dp ηµ dx = k e x i log µ i µ i = exp x i η i e η i. i The reference measure has density 1 k x i! with respect to counting measure on Z k restricted to the non-negative orthant. From this family, we can form a new family of distributions on R k+1 by considering the push forward under k fx 1,..., x k = x 1,..., x k, x i. 1
2 The push forward of m 0 under f will be counting measure restricted to { } k x 1,..., x k+1 : x i 0, x i = x k+1 with the same density as above. From the general picture for conditioning, we see that X 1,..., X k k X i is a k-parameter exponential family with sufficient statistic x 1,..., x k. The marginal distributon of k X k i is of course Poisson µ k i = Poisson log η i. Hence, for x 1,..., x k A n,k we see P X 1,..., X k = x 1,..., x k k k n X i = n = exp η i x i x 1,..., x k This is just the multinomial p.m.f. No surprise here Example: the Dirichlet from independent Gammas Suppose we observe independent X i Gamma1, α i, 1 i k i.e. scale 1 but shape parameter α i. This is an exponential family of distributions on R k dp η k dx = e α i logx i log Γα i e x i 1 [0, x i dx i. x i From this family, we can form a family of distributions on R k+1 by considering k fx 1,..., x k = x 1,..., x k, x i. This is again an exponential family see exercise. The marginal density of the sum of independent Gamma random variables of fixed scale is another Gamma with scale 1 and shape parameter k α i. This leads to the conclusion that X 1,..., X k k Dirichletα Exercise: behaviour under affine transformation Suppose we observe independent X i Gamma1, α i, 1 i k i.e. scale 1 but shape parameter α i. Set 1. Set fx 1,..., x k = x 1,..., x k, k x i. Show that the push-forward of the the original exponential family of distributions on R k is an exponential family of distributions on R k+1. What is the sufficient statistic? 2
3 2. What is the dimension of the natural parameter space i.e. how many parameters are there? 3. What is the reference measure? 4. Suppose gx = Ax + b. Give a sufficient condition on A, b so that the push forward of an exponential family is still an exponential family. Give an example of A, b for which the push forward fails to be an exponential family Exercise: conditioning in the general case 1. Give a general formula for Show that it is an exponential family. 2. What is its sufficient statistic? P η t1 X A t 2 X. 3. What is its reference measure? Be formal about it: what is the sample space? What is the measure? 1.2 Ising Models The Ising model is an extensively studied object in statistical physics. In statistical settings, it has applications to image analysis. For us, it is an example. The sample space of the Ising model is based on a graph G = V, E specified by a set of vertices V and a set of undirected edges E. The sample space is Ω V = { 1, 1} V Z V with reference measure counting measure restricted to Ω V. The density with respect to this reference is exp Q 1 i x i + Q 2 ijx i x j. i V i,j E The natural parameters are therefore Q 1, Q 2 R V R V V and the sufficient statistics are x, xx T Z V Z V V. We see that the CGF is ΛQ 1, Q 2 = Q 1 i x i + Q 2 ijx i x j. x { 1,1} V exp i V i,j E In general, this is quite a complicated function so we see that not all exponential families have tractable CGF not that we didn t know this already. While we won t touch on it too much right now, what makes things possible for this model, and other models with intractable CGFs is that it is often possible to simulate from the distribution P Q 1,Q 2 which makes it possible to compute an unbiased estimate of Λ Q 1,Q 2. This is the basis of stochastic optimization. 1. Given a procedure that takes arguments Q 1, Q 2 and produces a random vector DQ 1, Q 2 with E Q 1,Q 2 DQ 1, Q 2 = Λ Q 1,Q 2 3
4 2. A stochastic optimization procedure has the form ] Q 1 k+1, Q 2 k+1 = Q 1 k, Q 2 k α k [DQ 1 k, Q 2 k tx where the α k satisfy some growth assumptions, usually roughly of the form 1.3 Markov random fields k j=1 k j=1 α k k 0 k α j α 2 j k < The Ising model is an example of a something called a Markov random field. The descriptor Markov relates to a certain type of Markov property, that is a type of conditional independence. In the Ising model, suppose we consider the distribution of x i for some fixed i V, conditional on x V \i = x i. As above, this is an exponential family, but what are its parameters? First of all, note that the sample space depends on x i Ω x i = { y { 1, 1} V : y i = x i } which is sort of equivalent to { 1, 1} i but not quite the same. There are two points in Ω x i, and we can take the reference measure to be counting measure on these two points. With that, we see that the CGF is log Q 1 jy j + Q 2 ijy i y j. As a function on Ω x i, j,k E y Ω x i exp j V Q 1 jy j = Q 1 i y i + C 1 j V Q 2 ijy i y j = i,k E Q 2 ik y iy j + j,k E j,i E Q 2 jiy j y i + C 2 where C 1, C 2 are constant on Ω x i. If we assume G 2 ij is symmetric which we might as well then Q 2 jk y jy k = 2y i Q 2 ik y k + C. j,k E i,k E Finally, for the Ising model we see that the CGF of x i under this measure is log e Q1 i +2 i,k E Q2 ik x k + e Q1 i 2 i,k E Q2 ik x k + C 4
5 Let s write this CGF as Λ Q 1, Q 2 x i. This is the CGF of a {1, 1} valued random variable with natural parameter ηx i, Q 1, Q 2 = Q 1 i + 2 and counting measure on { 1, 1} as reference measure. i,k E Exercise: natural parameter under conditioning Q 2 ik x k. The notation ηx i, Q 1, Q 2 suggests that the natural parameter corresponding to sufficient statistic x i, when conditioning on x i, has changed. Does this conflict with what we saw earlier about conditioning? What we can take away from this picture is that conditioning on x i yielded a new exponential family is a function of the original natural parameters Q 1, Q 2 and the value of x i. Also, the natural parameter depends on x i only through the value at neighbours of i in G. This is the afore-mentioned Markov property Definition of a Markov random field A Markov random field is a generalization of the Ising to more general sample spaces, and more complicated interactions. We will not dwell on them too much here but, in their most natural form, they are exponential families of distributions on something like R V for some set of vertices though they could take on values other than R. The sufficient statistics are specified by subsets A V f A x = g A x i, i A i.e. f A is measurable with respect to σx i, i A. The general form of the density is dp η = exp η A f A x Λη dm 0 A V with η = η A A V where m 0 is some reference measure on R V. This is a huge natural parameter space, so obviously some restrictions are made by constraining many the η A to be 0. Let s call this a model M, i.e. a set of A such that η A is not constrained to be 0. By default, we assume M is monotone, i.e. B M, A B = A M Exercise: Ising model as random field Write the Ising model above as a Markov random field field above. Be specific as possible. 1. What are the f A s? 2. What are the η A s? 3. What is the reference measure? 5
6 The Markov property of these random fields can be expressed in terms of the Markov neighbourhood of i V. We define this assuming monotonicity of M as Then, the Markov property can be stated as Ni, M = {j : {i, j} M}. ηx i, η σ x j, j Ni, M. Or, conditioning on x i yields an exponential family whose random natural parameter depends only on the neighbours of i in the model M and the true underlying η. This property has some very useful consequences, particularly when the resulting exponential family has a simple form Pseudo-likelihood In the Ising model, the full CGF is very complicated if V is of any reasonable size. conditional CGFs of x i x i are particularly simple. This is the basis of the pseudo-loglikelihood for Q 1, Q 2 in the Ising model But, the l pseudo Q 1, Q 2 = i V [ ηx i, Q 1, Q 2 x i ΛQ 1, Q 2 x i ] Exercise: Ising model pseudolikelihood 1. Write out the pseudolikelihood for the Ising model as explicitly as possible. 2. Is it convex in Q 1, Q 2? 3. Describe a Newton-Raphson algorithm to estimate Q 1, Q 2 based on maximizing the pseudolikelihood. Be as specific as possible, i.e. compute gradients and Hessians as explicitly as possible Gibbs sampler This simple form of the conditional distributions is also the basis of one of the most natural forms of MCMC algorithms, the Gibbs sampler. The Gibbs sampler for the Ising model, say is a Markov chain on { 1, 1} V whose stationary distribution is P Q 1,Q 2. The algorithm continuously cycles through the coordinates of V in some possibly random order and each step consists of drawing from P Q 1,Q 2 X i X i. That is, if at the k-th step we are updating coordinate i, we set X k+1 i P Q 1,Q 2 X i X i X k+1 i = X k i. 6
7 1.3.7 Exercise: sampling from an Ising model Consider an Ising model on L, the lattice in Z 2 with Q 1 = α 1, Q 2 = β 11 T. 1. For β = 0, α = 1. initialize the Gibbs sampler at some random initial condition. Run the Gibbs sampler Markov chain on { 1, 1} L for some time. What do you expect the binary image to look like? 2. Repeat for β > 0 and β < 0. Note: I am not asking for an exhaustive simulation, the goal is to just get the basic mechanics of a Gibbs sampler. 7
Probabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationA canonical application of the EM algorithm is its use in fitting a mixture model where we assume we observe an IID sample of (X i ) 1 i n from
1 The EM algorithm In this set of notes, we discuss the EM (Expectation-Maximization) algorithm, which is a common algorithm used in statistical estimation to try and find the MLE. It is often used in
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationPropp-Wilson Algorithm (and sampling the Ising model)
Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationA = {(x, u) : 0 u f(x)},
Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationIntroduction to Graphical Models
Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationLikelihood Inference for Lattice Spatial Processes
Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to 1234567891011121314151617 FULL Lattice Processes Model : The Ising Model (1925), The Potts Model
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationCS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm
CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationStatistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.
352: 352: Models for discrete data April 28, 2009 1 / 33 Models for discrete data 352: Outline Dependent discrete data. Image data (binary). Ising models. Simulation: Gibbs sampling. Denoising. 2 / 33
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationCalculus 221 worksheet
Calculus 221 worksheet Graphing A function has a global maximum at some a in its domain if f(x) f(a) for all other x in the domain of f. Global maxima are sometimes also called absolute maxima. A function
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationGeneralized Exponential Random Graph Models: Inference for Weighted Graphs
Generalized Exponential Random Graph Models: Inference for Weighted Graphs James D. Wilson University of North Carolina at Chapel Hill June 18th, 2015 Political Networks, 2015 James D. Wilson GERGMs for
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationLINK scheduling algorithms based on CSMA have received
Efficient CSMA using Regional Free Energy Approximations Peruru Subrahmanya Swamy, Venkata Pavan Kumar Bellam, Radha Krishna Ganti, and Krishna Jagannathan arxiv:.v [cs.ni] Feb Abstract CSMA Carrier Sense
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationMCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham
MCMC 2: Lecture 2 Coding and output Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. General (Markov) epidemic model 2. Non-Markov epidemic model 3. Debugging
More informationBasic Definitions: Indexed Collections and Random Functions
Chapter 1 Basic Definitions: Indexed Collections and Random Functions Section 1.1 introduces stochastic processes as indexed collections of random variables. Section 1.2 builds the necessary machinery
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationTraining an RBM: Contrastive Divergence. Sargur N. Srihari
Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and
More information7.1 Coupling from the Past
Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov
More informationBayesian Estimation with Sparse Grids
Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationNonparametric Bayesian Matrix Factorization for Assortative Networks
Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationBayes Model Selection with Path Sampling: Factor Models
with Path Sampling: Factor Models Ritabrata Dutta and Jayanta K Ghosh Purdue University 07/02/11 Factor Models in Applications Factor Models in Applications Factor Models Factor Models and Factor analysis
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationMetric spaces and metrizability
1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively
More informationy = 7x 2 + 2x 7 ( x, f (x)) y = 3x + 6 f (x) = 3( x 3) 2 dy dx = 3 dy dx =14x + 2 dy dy dx = 2x = 6x 18 dx dx = 2ax + b
Rates of hange III Differentiation Workbook Limits For question, 1., draw up a artesian plane and plot your point [( x + h), f ( x + h) ] ( x, f (x)), and your point and visualise how the limit from first
More informationMarkov random fields. The Markov property
Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B
More information